Canregtools

An R package used in PBCRs

️Qiong Chen👨‍⚕️ Phd.

🏥Henan Cancer Center

Monday Nov 4, 2024

What we actually do in PBCR

A system designed to collect, store, and manage cancer cases within a population, which is crucial for monitoring cancer incidence, mortality, survival, and prevalence.

  • Data internal consistency check
  • Quality check (CI5, annual check of quality control for registry)
  • Calculation of statistical indicator
  • Data visualization
  • Reports

Which tools are available ?

  • IARCcrgTools
  • JRC tools
  • Canreg5
  • SEER*Stat

Other software or language including R, SAS, STATA, Python

What can canregtools do ?

Canregtools is an R package developed to streamline data analysis, visualization, and reporting in cancer registration. It includes five sets of R functions that cover data reading, processing, statistical calculations, visualization, and reporting.

Data processing

  • cutage()
  • expand_age_pop()
  • classify_icd10()
  • classify-childhood()
  • tidy_*()
  • write_*()

Statistics calculation

  • ageadjust()
  • truncrate()
  • cumrate()
  • cumrisk()
  • lt()
  • expand_lifetable()

Statistics sheet

  • summary()
  • create_asr()
  • create_quality()
  • create_age_rate()
  • create_sheet()

What can canregtools do ?

Canregtools is a tool designed for high-level cancer registries. It helps process data from multiple registries in batch mode, allowing users to filter data based on custom conditions and reformat or merge cancer registration data according to registry attributes.

Registry attributes

  • write_registry()
  • write_areacode()
  • write_area_type()
  • tidy_areacode()

Filter/merge object

  • cr_select()
  • cr_merge()
  • reframe_fbswicd()

Statistics sheet

  • summary()
  • create_asr()
  • create_quality()
  • create_age_rate()
  • create_sheet()

Define S3 class for generic functions

We define a set of class for different methods to carry out different functions.

S3 class for single registry

  • canreg
  • fbswicd
  • asr
  • quality

S3 class for multiple registries

  • canregs
  • fbswicds
  • asrs
  • qualities

S3 method

create_asr, create_quality, create_age_rate, create_sheet, cr_select, cr_merge, reframe_fbswicd

Internal consistency check

Internal consistency checks are a crucial step before conducting data analysis. We need to identify and address any impossible or unlikely combinations of variables to insure the data validity.

Variable

  • check_topo()
  • check_morp()
  • check_icd10()
  • check_areacode()
  • check_sex()
  • check_id()

Variables combination

  • check_sex_morp()
  • check_sex_topo()
  • check_topo_morp()
  • check_morp_beha()
  • check_morp_grad()
  • check_age_topo_morp()

Format

  • check_header()
  • check_followup()
  • ICDO3toICD10()

How to install

How to install

Install it from github repository

# install the remotes package if doesn't installed
install.packages("remotes) 
library(remotes)
install_github("gigu003/canregtools")

Install it from compiled local source package file

# install the remotes package if doesn't installed
install.packages("remotes) 
library(remotes)
install_local("canregtools_0.2.7.tar.gz", type = "source")

Single cancer registry

Contents

  1. Single cancer registry

  2. Batch mode (deal with data from multiple cancer registries)

  3. Reframe fbsws

  4. Visualization

Raw data(call for data from NCC)

The raw data is an Excel file with three sheets named FB, SW, and POP, which store incidence data, mortality data, and population data, respectively.

Reading raw data to ‘canreg’ data

library(canregtools)
library(dplyr)
files <- list.files("~/website/slides/outputs", full.names = TRUE)
file <- files[1]
data <- read_canreg(file)
class(data)
[1] "canreg" "list"  
names(data)
[1] "areacode" "FBcases"  "SWcases"  "POP"     

canreg object with class of ‘canreg’

‘canreg’ is a list contains four elements named ‘areacode’, ‘FBcases’, ‘SWcases’, and ‘POP’, which were read from “FB”, “SW” and “POP” sheets of raw data.

names(data)
[1] "areacode" "FBcases"  "SWcases"  "POP"     
data$areacode
[1] "410102"
head(data$FBcases)
# A tibble: 6 × 20
  registr      sex   birthda    addcode trib  occu  marri inciden    topo  morp 
  <chr>        <chr> <date>     <chr>   <chr> <chr> <chr> <date>     <chr> <chr>
1 21410500166… 1     1975-04-17 410102… 01    31    2     2021-10-26 C15.9 8010 
2 21410802159… 2     1991-01-16 410102… 01    14    2     2021-10-22 C53.8 8140 
3 21410102172… 2     1962-10-15 410102… 01    00    2     2021-01-19 C53.0 8070 
4 21411202123… 2     1951-03-09 410102… 01    49    2     2021-01-30 C50.9 8000 
5 21411624137… 1     1955-07-15 410102… 01    61    2     2021-05-13 C22.0 8000 
6 23411002105… 1     1939-12-15 410102… 01    28    5     2021-12-22 C34.1 8140 
# ℹ 10 more variables: beha <chr>, grad <chr>, basi <chr>, icd10 <chr>,
#   autoicd10 <chr>, lastcontact <dttm>, status <chr>, caus <chr>,
#   deathda <date>, deadplace <chr>

canreg object with class of ‘canreg’

‘canreg’ is a list contains four elements including ‘areacode’, ‘FBcases’, ‘SWcases’, and ‘POP’

names(data)
[1] "areacode" "FBcases"  "SWcases"  "POP"     
head(data$SWcases)
# A tibble: 6 × 19
  registr  sex   birthda    trib  occu  marri inciden    topo  morp  beha  grad 
  <chr>    <chr> <date>     <chr> <chr> <chr> <date>     <chr> <chr> <chr> <chr>
1 1741010… 2     1921-02-18 01    39    2     2014-10-21 C44.5 8010  3     9    
2 1841010… 1     1935-05-15 01    80    2     2018-07-14 C34.9 8010  3     9    
3 1841010… 2     1941-02-16 01    29    2     2015-04-02 C34.9 8000  3     9    
4 1841010… 1     1952-07-06 01    90    2     2018-04-17 C61.9 8140  3     3    
5 1841010… 1     1947-08-21 01    85    2     2017-11-13 C16.3 8140  3     9    
6 1841010… 1     1941-08-16 01    29    2     2015-12-23 C73.9 8050  3     9    
# ℹ 8 more variables: basi <chr>, icd10 <chr>, autoicd10 <chr>,
#   lastcontact <dttm>, status <chr>, caus <chr>, deathda <date>,
#   deadplace <chr>

canreg object with class of ‘canreg’

‘canreg’ is a list contains four elements including ‘areacode’, ‘FBcases’, ‘SWcases’, and ‘POP’

names(data)
[1] "areacode" "FBcases"  "SWcases"  "POP"     
head(data$POP)
# A tibble: 6 × 4
   year   sex agegrp   rks
  <int> <int> <fct>  <int>
1  2021     1 0~      3850
2  2021     1 1~     14405
3  2021     1 5~     15176
4  2021     1 10~    16227
5  2021     1 15~    18685
6  2021     1 20~    27799

Counting ‘canreg’ data to ‘fbswicd’ data

fbsw <- count_canreg(data, cancer_type = "big")
class(fbsw)
[1] "fbswicd" "list"   
names(fbsw)
[1] "areacode" "fbswicd"  "sitemorp" "pop"     
head(fbsw$fbswicd, 2)
    year   sex agegrp cancer   fbs   sws    mv    ub   sub m8000   dco
   <int> <int> <fctr>  <int> <int> <int> <int> <int> <int> <int> <int>
1:  2021     1   0 岁     60     3     0     2     1     1     1     0
2:  2021     1   0 岁     61     3     0     2     1     1     1     0
head(fbsw$sitemorp,2)
    year   sex cancer              site               morp
   <int> <int>  <int>            <list>             <list>
1:  2021     2    115 <data.frame[3x2]>  <data.frame[9x2]>
2:  2021     2    114 <data.frame[8x2]> <data.frame[18x2]>
head(fbsw$pop,2)
    year   sex agegrp   rks
   <int> <int> <fctr> <int>
1:  2021     1   0 岁  3850
2:  2021     1 1-4 岁 14405

Summary the canreg data

summary function could quickly calculate summary data of ‘canreg’ object.

summ <- summary(data)
class(summ)
[1] "summary" "list"   
names(summ)
 [1] "areacode"         "rks"              "fbs"              "inci"            
 [5] "sws"              "mort"             "mi"               "mv"              
 [9] "dco"              "rks_year"         "inci_vars"        "miss_r_vars_inci"
[13] "mort_vars"        "miss_r_vars_mort"
purrr::pluck(summ, "mi")
[1] 0.38
purrr::pluck(summ, "inci")
[1] 373.15
purrr::pluck(summ, "mort")
[1] 142

Calculating age standardized rate

create_asr function could calculate age standardized rate, truncated rate, and cumulated rate based on provided standard population, it could also estimate the variance and 95% confidence interval of the rate.

# list all available standard population
ls_std_vars()
# A tibble: 5 × 2
  Vars    Description                           
  <chr>   <chr>                                 
1 cn64    Standard population in Chinese in 1964
2 cn82    Standard population in Chinese in 1982
3 cn2000  Standard population in Chinese in 2000
4 wld85   Segi's world standard population      
5 wld2000 World standard population in 2000     
# calculate asr using the create_asr() function
create_asr(fbsw, event = fbs, year, sex, cancer, std = c("cn2000", "wld85"))
# A tibble: 56 × 11
    year   sex cancer no_cases     cr asr_cn2000 asr_wld85 truncr_cn2000
   <int> <int>  <int>    <int>  <dbl>      <dbl>     <dbl>         <dbl>
 1  2021     1     60     1069 327.       266.      267.          436.  
 2  2021     1     61     1050 321.       262.      263.          429.  
 3  2021     1    101       18   5.50       4.68      4.84         10.4 
 4  2021     1    102        5   1.53       1.23      1.45          2.63
 5  2021     1    103       47  14.4       10.8      10.3          12.0 
 6  2021     1    104       65  19.9       15.6      16.0          21.9 
 7  2021     1    105       87  26.6       21.0      21.7          32.9 
 8  2021     1    106       84  25.7       20.7      21.7          39.4 
 9  2021     1    107       21   6.41       5.12      5.57          9.64
10  2021     1    108       28   8.55       6.82      7.14         10.1 
# ℹ 46 more rows
# ℹ 3 more variables: truncr_wld85 <dbl>, cumur <dbl>, prop <dbl>

Calculating age standardized rate

The drop_total, drop_others, and add_labels functions can perform further processing on the ASR data, such as removing other cancers, removing total cancer, and adding labels.

create_asr(fbsw, event = fbs, year, sex, cancer) |> 
  drop_total() |> drop_others() |> 
  add_labels(lang = "en", label_type = "abbr")
# A tibble: 48 × 13
    year sex   cancer site             icd10 no_cases    cr asr_cn2000 asr_wld85
   <int> <fct>  <int> <fct>            <fct>    <int> <dbl>      <dbl>     <dbl>
 1  2021 Male     101 Oral Cavity & P… C00-…       18  5.50       4.68      4.84
 2  2021 Male     102 Nasopharynx      C11          5  1.53       1.23      1.45
 3  2021 Male     103 Esophagus        C15         47 14.4       10.8      10.3 
 4  2021 Male     104 Stomach          C16         65 19.9       15.6      16.0 
 5  2021 Male     105 Colorectum       C18-…       87 26.6       21.0      21.7 
 6  2021 Male     106 Liver            C22         84 25.7       20.7      21.7 
 7  2021 Male     107 Gallbladder      C23-…       21  6.41       5.12      5.57
 8  2021 Male     108 Pancreas         C25         28  8.55       6.82      7.14
 9  2021 Male     109 Larynx           C32         15  4.58       3.72      3.97
10  2021 Male     110 Lung             C33-…      230 70.2       57.1      59.1 
# ℹ 38 more rows
# ℹ 4 more variables: truncr_cn2000 <dbl>, truncr_wld85 <dbl>, cumur <dbl>,
#   prop <dbl>

Calculating quality indicators

create_quality function can calculate quality indicators including number of cancer cases, crude incidence, mortality, mortality:incidence ratio, proportion of morphology diagnosed cases, dco, UB%, etc based on ‘canreg’ or ‘fbswicd’ data.

# calculate quality indicators based on 'canreg' data
create_quality(data, year, sex, cancer) |> filter(!cancer == 0) |> 
  add_labels(lang = "en")
# A tibble: 56 × 16
    year sex   cancer site    icd10    rks   fbs    fbl   sws    swl    mi    mv
   <int> <fct>  <int> <fct>   <fct>  <int> <int>  <dbl> <int>  <dbl> <dbl> <dbl>
 1  2021 Male      60 All Ca… ALL   327403  1069 327.     559 171.    0.52  76.8
 2  2021 Male      61 All Ca… ALLb… 327403  1050 321.     553 169.    0.53  76.6
 3  2021 Male     101 Oral C… C00-… 327403    18   5.5     10   3.05  0.56  55.6
 4  2021 Male     102 Nasoph… C11   327403     5   1.53     4   1.22  0.8   40  
 5  2021 Male     103 Esopha… C15   327403    47  14.4     35  10.7   0.74  87.2
 6  2021 Male     104 Stomach C16   327403    65  19.8     55  16.8   0.85  73.8
 7  2021 Male     105 Conlon… C18-… 327403    87  26.6     55  16.8   0.63  77.0
 8  2021 Male     106 Liver   C22   327403    84  25.7     77  23.5   0.92  66.7
 9  2021 Male     107 Gallbl… C23-… 327403    21   6.41    16   4.89  0.76  85.7
10  2021 Male     108 Pancre… C25   327403    28   8.55    28   8.55  1     50  
# ℹ 46 more rows
# ℹ 4 more variables: dco <dbl>, ub <dbl>, sub <dbl>, m8000 <dbl>

Calculating quality indicators

# calculate quality indicators based on 'fbswicd' data
create_quality(fbsw, year, sex) |>
  add_labels(lang = "en")
# A tibble: 2 × 16
   year sex    cancer site      icd10    rks   fbs   fbl   sws   swl    mi    mv
  <int> <fct>   <dbl> <fct>     <fct>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl>
1  2021 Male       60 All Canc… ALL   327403  1069  327.   559  171.  0.52  76.8
2  2021 Female     60 All Canc… ALL   355707  1415  398.   405  114.  0.29  84.8
# ℹ 4 more variables: dco <dbl>, ub <dbl>, sub <dbl>, m8000 <dbl>
create_quality(fbsw, cancer) |>
  filter(!cancer == 0) |> 
  add_labels(lang = "en")
# A tibble: 28 × 16
    year sex   cancer site    icd10    rks   fbs    fbl   sws    swl    mi    mv
   <dbl> <fct>  <int> <fct>   <fct>  <int> <int>  <dbl> <int>  <dbl> <dbl> <dbl>
 1  9000 Total     60 All Ca… ALL   683110  2484 364.     964 141.    0.39  81.4
 2  9000 Total     61 All Ca… ALLb… 683110  2451 359.     950 139.    0.39  81.2
 3  9000 Total    101 Oral C… C00-… 683110    26   3.81    13   1.9   0.5   57.7
 4  9000 Total    102 Nasoph… C11   683110     7   1.02     5   0.73  0.71  57.1
 5  9000 Total    103 Esopha… C15   683110    66   9.66    46   6.73  0.7   86.4
 6  9000 Total    104 Stomach C16   683110   101  14.8     79  11.6   0.78  71.3
 7  9000 Total    105 Conlon… C18-… 683110   181  26.5    103  15.1   0.57  81.2
 8  9000 Total    106 Liver   C22   683110   125  18.3    100  14.6   0.8   68  
 9  9000 Total    107 Gallbl… C23-… 683110    44   6.44    31   4.54  0.7   86.4
10  9000 Total    108 Pancre… C25   683110    46   6.73    57   8.34  1.24  43.5
# ℹ 18 more rows
# ℹ 4 more variables: dco <dbl>, ub <dbl>, sub <dbl>, m8000 <dbl>

Calculating age specific rate

create_age_rate function could calculate age specific rate based on ‘canreg’ or ‘fbswicd’ data.

# calculate age specific rate from 'canreg' data.
create_age_rate(data, year, sex, cancer, format = "long") |> 
  filter(!cancer == 0) |> 
  arrange(year, sex, cancer, agegrp)
# A tibble: 1,064 × 6
    year   sex cancer agegrp   cases  rate
   <int> <int>  <int> <fct>    <int> <dbl>
 1  2021     1     60 0 岁         3  77.9
 2  2021     1     60 1-4 岁       3  20.8
 3  2021     1     60 5-9 岁       5  32.9
 4  2021     1     60 10-14 岁     0   0  
 5  2021     1     60 15-19 岁     2  10.7
 6  2021     1     60 20-24 岁     5  18.0
 7  2021     1     60 25-29 岁    14  38.7
 8  2021     1     60 30-34 岁    38 130. 
 9  2021     1     60 35-39 岁    36 121. 
10  2021     1     60 40-44 岁    60 196. 
# ℹ 1,054 more rows

Calculating age specific rate

# calculate age specific rate from 'fbswicd' data.
create_age_rate(fbsw, year, sex, cancer, format = "wide") |>
  filter(!cancer == 0) |> 
  add_labels(lang = "en")
# A tibble: 56 × 45
    year sex   cancer site       icd10    f0    f1    f2    f3    f4    f5    f6
   <int> <fct>  <int> <fct>      <fct> <int> <int> <int> <int> <int> <int> <int>
 1  2021 Male      60 All Cance… ALL    1069     3     3     5     0     2     5
 2  2021 Male      61 All Cance… ALLb…  1050     3     3     5     0     2     5
 3  2021 Male     101 Oral Cavi… C00-…    18     0     0     0     0     0     0
 4  2021 Male     102 Nasophary… C11       5     0     0     0     0     0     0
 5  2021 Male     103 Esophagus  C15      47     0     0     0     0     0     0
 6  2021 Male     104 Stomach    C16      65     0     0     0     0     0     0
 7  2021 Male     105 Conlon, R… C18-…    87     0     0     0     0     0     0
 8  2021 Male     106 Liver      C22      84     0     1     0     0     0     0
 9  2021 Male     107 Gallbladd… C23-…    21     0     0     0     0     0     0
10  2021 Male     108 Pancreas   C25      28     0     0     0     0     0     0
# ℹ 46 more rows
# ℹ 33 more variables: f7 <int>, f8 <int>, f9 <int>, f10 <int>, f11 <int>,
#   f12 <int>, f13 <int>, f14 <int>, f15 <int>, f16 <int>, f17 <int>,
#   f18 <int>, f19 <int>, r0 <dbl>, r1 <dbl>, r2 <dbl>, r3 <dbl>, r4 <dbl>,
#   r5 <dbl>, r6 <dbl>, r7 <dbl>, r8 <dbl>, r9 <dbl>, r10 <dbl>, r11 <dbl>,
#   r12 <dbl>, r13 <dbl>, r14 <dbl>, r15 <dbl>, r16 <dbl>, r17 <dbl>,
#   r18 <dbl>, r19 <dbl>

Batch mode (deal with data from multiple cancer registries)

Contents

  1. Single cancer registry

  2. Batch mode (deal with data from multiple cancer registries)

  3. Reframe fbsws

  4. Visualization

Reading data from multiple raw data files

Object with class of ‘canregs’ is a list with elements of object with class of ‘canreg’, it could be read using the ‘read_canreg’ function.

files <- list.files("~/website/slides/outputs", full.names = TRUE)
# read the first 10 raw files in outputs folder into 'canregs'
data <- read_canreg(files[1:10])
class(data)
[1] "canregs" "list"   
names(data)
 [1] "410102" "410103" "410104" "410105" "410123" "410185" "410224" "410302"
 [9] "410303" "410304"

Summary the canregs data

summary function could quickly calculate summary data of ‘canreg’ or ‘canregs’ object.

summ <- summary(data)
names(summ)
 [1] "410102" "410103" "410104" "410105" "410123" "410185" "410224" "410302"
 [9] "410303" "410304"
summ1 <- cr_select(summ, inci > 300, mi > 0.4)
names(summ1)
[1] "410302" "410303" "410304"
summ2 <- cr_select(summ, inci > 290 | mort > 150)
names(summ2)
[1] "410102" "410103" "410104" "410105" "410302" "410303" "410304"
summ3 <- cr_select(summ, index = c("410102", "410303"))
names(summ3)
[1] "410102" "410303"

Filter the canregs data

cr_select function can filter ‘carengs’, ‘fbswicds’, ‘asrs’, and, ‘summaries’ based on input conditions.

data2 <- data |> cr_select(index = names(summ2))
names(data2)
[1] "410102" "410103" "410104" "410105" "410302" "410303" "410304"
names(summ2)
[1] "410102" "410103" "410104" "410105" "410302" "410303" "410304"

Counting ‘canregs’ data

count_canreg function can count ‘canregs’ data into ‘fbswicds’ data, which is a list of elements of ‘fbswicd’ data that could used as input data for create_asr, create_quality, create_sheet, etc.

fbsw <- count_canreg(data2)
class(fbsw)
[1] "fbswicds" "list"    
names(fbsw)
[1] "410102" "410103" "410104" "410105" "410302" "410303" "410304"

Calculate age standardized rate

create_asr function can also calculate age standard rate based on ‘fbswicds’ data.

asrs <- create_asr(fbsw, event = fbs, year, sex)
class(asrs)
[1] "asrs" "list"
head(asrs)
$`410102`
# A tibble: 2 × 11
   year   sex cancer no_cases    cr asr_cn2000 asr_wld85 truncr_cn2000
  <int> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>         <dbl>
1  2021     1     60     1069  327.       266.      267.          436.
2  2021     2     60     1415  398.       319.      299.          565.
# ℹ 3 more variables: truncr_wld85 <dbl>, cumur <dbl>, prop <dbl>

$`410103`
# A tibble: 2 × 11
   year   sex cancer no_cases    cr asr_cn2000 asr_wld85 truncr_cn2000
  <int> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>         <dbl>
1  2021     1     60     1190  362.       242.      237.          380.
2  2021     2     60     1469  409.       289.      272.          539.
# ℹ 3 more variables: truncr_wld85 <dbl>, cumur <dbl>, prop <dbl>

$`410104`
# A tibble: 2 × 11
   year   sex cancer no_cases    cr asr_cn2000 asr_wld85 truncr_cn2000
  <int> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>         <dbl>
1  2021     1     60      720  316.       244.      239.          398.
2  2021     2     60      862  341.       272.      258.          500.
# ℹ 3 more variables: truncr_wld85 <dbl>, cumur <dbl>, prop <dbl>

$`410105`
# A tibble: 2 × 11
   year   sex cancer no_cases    cr asr_cn2000 asr_wld85 truncr_cn2000
  <int> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>         <dbl>
1  2021     1     60     1816  392.       286.      273.          429.
2  2021     2     60     2334  461.       344.      319.          610.
# ℹ 3 more variables: truncr_wld85 <dbl>, cumur <dbl>, prop <dbl>

$`410302`
# A tibble: 2 × 11
   year   sex cancer no_cases    cr asr_cn2000 asr_wld85 truncr_cn2000
  <int> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>         <dbl>
1  2021     1     60      289  348.       225.      222.          345.
2  2021     2     60      272  315.       188.      182.          343.
# ℹ 3 more variables: truncr_wld85 <dbl>, cumur <dbl>, prop <dbl>

$`410303`
# A tibble: 2 × 11
   year   sex cancer no_cases    cr asr_cn2000 asr_wld85 truncr_cn2000
  <int> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>         <dbl>
1  2021     1     60      625  383.       278.      310.          280.
2  2021     2     60      596  369.       230.      222.          354.
# ℹ 3 more variables: truncr_wld85 <dbl>, cumur <dbl>, prop <dbl>

Merge age standardized rate(asrs)

cr_merge function cancer merge ‘carengs’, ‘fbswicds’, ‘asrs’, ‘qualities’ into ‘canreg’, ‘fbswicd’, ‘asr’, and ‘quality’ data.

asrs2 <- asrs |> cr_merge()
head(asrs2, c(6,8))
# A tibble: 6 × 8
  areacode  year   sex cancer no_cases    cr asr_cn2000 asr_wld85
     <int> <int> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>
1   410102  2021     1     60     1069  327.       266.      267.
2   410102  2021     2     60     1415  398.       319.      299.
3   410103  2021     1     60     1190  362.       242.      237.
4   410103  2021     2     60     1469  409.       289.      272.
5   410104  2021     1     60      720  316.       244.      239.
6   410104  2021     2     60      862  341.       272.      258.
names(asrs2)
 [1] "areacode"      "year"          "sex"           "cancer"       
 [5] "no_cases"      "cr"            "asr_cn2000"    "asr_wld85"    
 [9] "truncr_cn2000" "truncr_wld85"  "cumur"         "prop"         

Creating quality indicators

qualities <- create_quality(fbsw, year, sex)
qualities
$`410102`
# A tibble: 2 × 14
   year   sex cancer    rks   fbs   fbl   sws   swl    mi    mv   dco    ub
  <int> <int>  <dbl>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1  2021     1     60 327403  1069  327.   559  171.  0.52  76.8  1.5   1.78
2  2021     2     60 355707  1415  398.   405  114.  0.29  84.8  0.85  1.98
# ℹ 2 more variables: sub <dbl>, m8000 <dbl>

$`410103`
# A tibble: 2 × 14
   year   sex cancer    rks   fbs   fbl   sws   swl    mi    mv   dco    ub
  <int> <int>  <dbl>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1  2021     1     60 328415  1190  362.   672  205.  0.56  73.9  0.67  2.18
2  2021     2     60 358941  1469  409.   408  114.  0.28  77.9  1.09  2.11
# ℹ 2 more variables: sub <dbl>, m8000 <dbl>

$`410104`
# A tibble: 2 × 14
   year   sex cancer    rks   fbs   fbl   sws   swl    mi    mv   dco    ub
  <int> <int>  <dbl>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1  2021     1     60 227511   720  316.   369 162.   0.51  82.2  0.56  1.81
2  2021     2     60 253077   862  341.   252  99.6  0.29  82.2  0.35  1.28
# ℹ 2 more variables: sub <dbl>, m8000 <dbl>

$`410105`
# A tibble: 2 × 14
   year   sex cancer    rks   fbs   fbl   sws   swl    mi    mv   dco    ub
  <int> <int>  <dbl>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1  2021     1     60 462738  1816  392.   792  171.  0.44  77.8  0.72  1.6 
2  2021     2     60 506456  2334  461.   553  109.  0.24  80.9  0.21  1.63
# ℹ 2 more variables: sub <dbl>, m8000 <dbl>

$`410302`
# A tibble: 2 × 14
   year   sex cancer   rks   fbs   fbl   sws   swl    mi    mv   dco    ub   sub
  <int> <int>  <dbl> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  2021     1     60 83010   289  348.   153  184.  0.53  69.9  0     1.04  65.7
2  2021     2     60 86359   272  315.   103  119.  0.38  75.4  0.74  0.74  72.1
# ℹ 1 more variable: m8000 <dbl>

$`410303`
# A tibble: 2 × 14
   year   sex cancer    rks   fbs   fbl   sws   swl    mi    mv   dco    ub
  <int> <int>  <dbl>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1  2021     1     60 163005   625  383.   412  253.  0.66  76.8  1.12  0.96
2  2021     2     60 161724   596  369.   252  156.  0.42  84.4  1.51  2.52
# ℹ 2 more variables: sub <dbl>, m8000 <dbl>

$`410304`
# A tibble: 2 × 14
   year   sex cancer    rks   fbs   fbl   sws   swl    mi    mv   dco    ub
  <int> <int>  <dbl>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1  2021     1     60  99735   427  428.   232  233.  0.54  71.4  2.81  2.81
2  2021     2     60 100963   353  350.   147  146.  0.42  79.3  2.55  1.13
# ℹ 2 more variables: sub <dbl>, m8000 <dbl>

attr(,"class")
[1] "qualities" "list"     

Merge quality indicators

qualities2 <- qualities |> cr_merge()
head(qualities2,c(6,12))
# A tibble: 6 × 12
  areacode  year   sex cancer    rks   fbs   fbl   sws   swl    mi    mv   dco
     <int> <int> <int>  <dbl>  <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1   410102  2021     1     60 327403  1069  327.   559 171.   0.52  76.8  1.5 
2   410102  2021     2     60 355707  1415  398.   405 114.   0.29  84.8  0.85
3   410103  2021     1     60 328415  1190  362.   672 205.   0.56  73.9  0.67
4   410103  2021     2     60 358941  1469  409.   408 114.   0.28  77.9  1.09
5   410104  2021     1     60 227511   720  316.   369 162.   0.51  82.2  0.56
6   410104  2021     2     60 253077   862  341.   252  99.6  0.29  82.2  0.35
names(qualities2)
 [1] "areacode" "year"     "sex"      "cancer"   "rks"      "fbs"     
 [7] "fbl"      "sws"      "swl"      "mi"       "mv"       "dco"     
[13] "ub"       "sub"      "m8000"   

Reframe fbsws

Contents

  1. Single cancer registry

  2. Batch mode (deal with data from multiple cancer registries)

  3. Reframe fbsws

  4. Visualization

show attributes of cancer registry affiliated with areacode

tidy_areacode function show attributes of cancer registry affiliated with areacode.

attributes <- tidy_areacode("410302")
names(attributes)
[1] "areacode"  "registry"  "province"  "city"      "area_type" "region"   

You cancer use write_registry, or write_area_type function to modify the attributes of the registry.

Reframe ‘fbswicds’ to ‘fbswicd’

You can reframe the ‘fbswicds’ according to the attribute name of registry like ‘area_type’, ‘registry’, ‘province’, etc.

data <- read_canreg(files[1:10])
fbsws <- count_canreg(data)
class(fbsws)
[1] "fbswicds" "list"    
names(fbsws)
 [1] "410102" "410103" "410104" "410105" "410123" "410185" "410224" "410302"
 [9] "410303" "410304"
fbsw <- cr_reframe(fbsws, "area_type")
class(fbsw)
[1] "fbswicds" "list"    

Ceating asr based on reframed fbsws

asr <- create_asr(fbsw, sex) |> cr_merge()
head(asr, c(6, 8))
# A tibble: 4 × 8
  areacode  year   sex cancer no_cases    cr asr_cn2000 asr_wld85
     <int> <dbl> <int>  <dbl>    <int> <dbl>      <dbl>     <dbl>
1   910000  9000     1     60     6136  363.       253.      248.
2   910000  9000     2     60     7301  400.       288.      270.
3   920000  9000     1     60     2676  233.       172.      168.
4   920000  9000     2     60     2817  254.       186.      176.
names(asr)
 [1] "areacode"      "year"          "sex"           "cancer"       
 [5] "no_cases"      "cr"            "asr_cn2000"    "asr_wld85"    
 [9] "truncr_cn2000" "truncr_wld85"  "cumur"         "prop"         

Visualization

Contents

  1. Single cancer registry

  2. Batch mode (deal with data from multiple cancer registries)

  3. Reframe fbsws

  4. Visualization

Draw pyramid plot

library(showtext)
showtext_auto()
data <- read_canreg(files[10])
fbsw <- count_canreg(data)
draw_pyramid(fbsw, show_value = F)

Draw bar chart

asr <- create_asr(fbsw,year,sex,cancer) |> drop_total() |>
  drop_others() |> add_labels(label_type = "abbr", lang = "en")
draw_barchart(asr, plot_var =cr, cate_var = site,
              side_label = c("Male","Female"))

Draw bar chart

library(dplyr)
asr1 <- create_asr(fbsw,year,sex,cancer,event = fbs) |> mutate(type="incidence")
asr2 <- create_asr(fbsw,year,sex,cancer,event = sws) |> mutate(type="mortality")
asr <- bind_rows(asr1, asr2) |> drop_others() |> drop_total() |> 
  add_labels(label_type = "abbr",lang = "en")
draw_barchart(asr, plot_var =cr, cate_var = site,group_var = type,
              side_label = c("Male","Female"))

Draw line chart

agerate <- create_age_rate(fbsw,sex) |> add_labels(lang="en")
names(agerate)
[1] "year"   "sex"    "cancer" "site"   "icd10"  "agegrp" "cases"  "rate"  
draw_line(agerate, agegrp, rate, sex)