medicalcoder: A Unified and Longitudinally Aware Framework for ICD-Based Comorbidity Assessment in R

{fig-width=3 in fig-align=“right”}

My R package medicalcoder was updated to version 0.8.0 and released on CRAN.

At its core, medicalcoder provides a portable, dependency-light framework for ICD-based comorbidity assessment in R. It is designed for reproducible health data workflows, especially in environments where dependency management, portability, and longitudinal inference matter.

Version 0.8.0 expands stability, improves longitudinal flagging behavior, and extends Elixhauser support through the 2026 AHRQ ICD definitions.

Quick Start:

library(medicalcoder)

Warning in fun(libname, pkgname): strings not representable in native encoding
will be translated to UTF-8

head(mdcr_longitudinal)

    patid       date icdv     code
1 9663901 2016-03-18   10   Z77.22
2 9663901 2016-03-24   10  IMO0002
3 9663901 2016-03-24   10 V87.7XXA
4 9663901 2016-03-25   10  J95.851
5 9663901 2016-03-30   10  IMO0002
6 9663901 2016-03-30   10    Z93.0

The example data set mdcr_longitudinal has records for 3 patients. The ICD codes are all expected to be diagnostic codes and there are both ICD-9 and ICD-10 records.

xtabs(  ~ patid + icdv, data = mdcr_longitudinal)

         icdv
patid      9 10
  231597  13  4
  650838  24  3
  9663901  0 16

A single call to medicalcoder::comorbidities() can flag Elixhauser comorbidities across ICD-9 and ICD-10 within a subject record in a single call.

mdcr_rtn <-
  medicalcoder::comorbidities(
    data        = mdcr_longitudinal,  # object inheriting data.frame
    icd.codes   = "code", # character string name of icd codes in data
    id.vars     = c("patid"),
    icdv.var    = "icdv", # variable in data denoting ICD version
    dx          = 1L, # consider all ICD codes in data are diagnostic codes
    poa         = 1L, # consider all codes present on admission
    primarydx   = 0L, # consider all diagnosis codes secondary diagnoses.
    method      = "elixhauser_quan2005",
  )

# To simplify the output, only show the columns that are not all zeros
Filter(f = function(x) any(x > 0), mdcr_rtn)

    patid WGHTLOSS PARA num_cmrb cmrb_flag mortality_index readmission_index
1  231597        0    0        0         0               0                 0
2  650838        1    1        2         1              14                16
3 9663901        0    1        1         1               5                 6

The same results can be achieved using other tools such as the R package comorbidity, but it requires a lot more work. In the code that follows, you need to split the data by ICD version, manually recombine results, repair severity hierarchies, and restore object attributes before scoring.

# split the data by ICD version and call comorbidity::comorbidity twice with the
# needed version-specific mapping
r9 <-
  comorbidity::comorbidity(
    x = subset(mdcr_longitudinal, icdv == 9),
    id = "patid",
    code = "code",
    map = "elixhauser_icd9_quan",
    assign0 = TRUE
  )
r10 <-
  comorbidity::comorbidity(
    x = subset(mdcr_longitudinal, icdv == 10),
    id = "patid",
    code = "code",
    map = "elixhauser_icd10_quan",
    assign0 = TRUE
  )

# combine and aggregate the results
rtn <- aggregate(. ~ patid, data = rbind(r9, r10), FUN = max)

# clean up flags by severity (might not be needed in this specific example, but
# is needed in general)
rtn[["diabunc" ]][rtn$diabc]    <- 0L
rtn[["hypunc"  ]][rtn$hypc]     <- 0L
rtn[["solidtum"]][rtn$metacanc] <- 0L

# apply the scoring, to use comorbidity::score the object needs to have certain
# attributes which were lost when aggregating over the icd versions.  Replace
# the attributes
attributes(rtn)[c("class", "variable.labels", "map")] <-
  attributes(r9)[c("class", "variable.labels", "map")]

# the score is based on the weights by van Walraven et al. (2009)
rtn[["score"]] <- comorbidity::score(x = rtn, weights = "vw", assign0 = TRUE)

# simplify the output by only showing columns which are not all zeros
Filter(function(x) any(x > 0), rtn)

    patid para wloss score
1  231597    0     0     0
2  650838    1     1    13
3 9663901    1     0     7

The scoring approaches differ, but the underlying comorbidity flags for weight loss and paraplegia are identical. The key difference is the workflow: medicalcoder handles mixed ICD versions in a single, unified call.

Longitudinal Flagging of Comorbidities

Flagging comorbidities over a patient history should be a consideration for all researchers. I will not say which conditions must be considered longitudinally and which might not be, but I will say that chronic conditions, even when controlled, should be considered in research.

It seems simple enough with methods such as Charlson and Elixhauser from Quan et al. (2005) to flag comorbidities at the encounter level and then carry the flag forward to subsequent encounters within a patient record.

However, in some Elixhauser variants, simply carrying forward encounter-level flags can fail to capture a condition.

For example, the Agency for Healthcare Research and Quality (AHRQ) has an Elixhauser variant which requires consideration of whether or not the ICD code is reported as present-on-admission (POA) or not. Some conditions are exempt from the POA requirement, that is, if the ICD is in the record then the patient has that condition:

subset(medicalcoder::get_elixhauser_poa(), poa_required == 0L)$condition

 [1] "AIDS"                "ALCOHOL"             "AUTOIMMUNE"         
 [4] "CANCER_LEUK"         "CANCER_LYMPH"        "CANCER_METS"        
 [7] "CANCER_NSITU"        "CANCER_SOLID"        "DEMENTIA"           
[10] "DEPRESS"             "DIAB_CX"             "DIAB_UNCX"          
[13] "DRUG_ABUSE"          "HTN_CX"              "HTN_UNCX"           
[16] "LUNG_CHRONIC"        "OBESE"               "PERIVASC"           
[19] "THYROID_HYPO"        "THYROID_OTH"         "ALCOHOLLIVER_MLD"   
[22] "DRUG_ABUSEPSYCHOSES" "VALVE_AUTOIMMUNE"

Other conditions must have a POA flag to be counted as comorbidity for that encounter:

subset(medicalcoder::get_elixhauser_poa(), poa_required == 1L)$condition

 [1] "ANEMDEF"            "BLDLOSS"            "CBVD"              
 [4] "COAG"               "HF"                 "LIVER_MLD"         
 [7] "LIVER_SEV"          "NEURO_MOVT"         "NEURO_OTH"         
[10] "NEURO_SEIZ"         "PARALYSIS"          "PSYCHOSES"         
[13] "PULMCIRC"           "RENLFL_MOD"         "RENLFL_SEV"        
[16] "ULCER_PEPTIC"       "VALVE"              "WGHTLOSS"          
[19] "CBVD_POA"           "CBVD_SQLA"          "CBVD_SQLAPARALYSIS"
[22] "HFHTN_CX"           "HTN_CXRENLFL_SEV"   "HFHTN_CXRENLFL_SEV"
[25] "NEURO_OTH_SEIZ"     "LIVER_MLD_NEURO"    "LIVER_MLD_PULMCIRC"

A Simple Longitudinal Example

Let’s build an example patient. On encounter 1 there are no ICD codes. On encounter 2, there will be a non-POA ICD-10 code for severe liver disease. We will have encounters 3 and 4 with no ICD codes.

apatient <-
  data.frame(
    patid = "Joe",
    encid = 1:4,
    icd10code = c("", "K72.10", "", ""),
    poa = c(NA_integer_, 0L, NA_integer_, NA_integer_),
    stringsAsFactors = FALSE
  )

Our example patient, Joe, has ICD-10 code K72.10 reported on encounter 2.

subset(medicalcoder::get_icd_codes(with.descriptions = TRUE), full_code == "K72.10")

       icdv dx full_code  code src known_start known_end assignable_start
151385   10  1    K72.10 K7210 cms        2014      2026             2014
       assignable_end                                 desc desc_start desc_end
151385           2026 Chronic hepatic failure without coma       2014     2026

If we try to flag comorbidities for Joe on the encounter level we get what we expect, no liver disease because liver disease requires the ICD code to be POA.

medicalcoder::comorbidities(
  data = apatient,
  id.vars = c("patid", "encid"),
  icd.codes = "icd10code",
  poa.var = "poa",
  primarydx = 0L,
  method = "elixhauser_ahrq2025",
  flag.method = "current" # default
)[, c("patid", "encid", "LIVER_SEV")]

  patid encid LIVER_SEV
1   Joe     1         0
2   Joe     2         0
3   Joe     3         0
4   Joe     4         0

At the encounter level, this is correct behavior. The severe liver disease code was not present-on-admission, so it should not be treated as a comorbidity on encounter 2.

Just to show that if the code was POA it would be flagged:

medicalcoder::comorbidities(
  data = apatient,
  id.vars = c("patid", "encid"),
  icd.codes = "icd10code",
  poa = 1,
  primarydx = 0L,
  method = "elixhauser_ahrq2025",
  flag.method = "current" # default
)[, c("patid", "encid", "LIVER_SEV")]

  patid encid LIVER_SEV
1   Joe     1         0
2   Joe     2         1
3   Joe     3         0
4   Joe     4         0

So here, we can see that if the ICD code(s) for severe liver disease are not reported and marked as POA for encounters 3 and 4, there is no way to retroactively flag the comorbidity later in the record.

medicalcoder takes care of this for you. By using flag.method = "cumulative" we get the following:

medicalcoder::comorbidities(
  data = apatient,
  id.vars = c("patid", "encid"),
  icd.codes = "icd10code",
  poa.var = "poa",
  primarydx = 0L,
  method = "elixhauser_ahrq2025",
  flag.method = "cumulative"
)[, c("patid", "encid", "LIVER_SEV")]

  patid encid LIVER_SEV
1   Joe     1         0
2   Joe     2         0
3   Joe     3         1
4   Joe     4         1

Here we see that the patient does not have the severe liver disease as a comorbidity on encounter 2 because the ICD code is not POA for encounter 2. Although the condition was not present-on-admission for encounter 2, the patient’s underlying severe liver disease does not disappear from their history. By encounter 3, it is part of the patient’s medical history and should be treated as a pre-existing comorbidity. Using flag.method = "cumulative" ensures that once a condition appears in the record, it is carried forward appropriately in subsequent encounters.

Conclusion

ICD-based comorbidity assessment is foundational for risk adjustment, phenotyping, and health services research. Yet real-world data rarely arrive in clean, single-version, single-encounter form. They span ICD-9 and ICD-10 transitions, include present-on-admission requirements, and unfold over time.

medicalcoder was designed with those realities in mind.

With a single interface, it:

Handles mixed ICD versions transparently
Respects present-on-admission requirements
Enables cumulative longitudinal flagging
Maintains portability with base R (≥ 3.5.0) only

The goal is not just convenience — it is correctness and reproducibility in complex health data environments.

Install from CRAN:

#| eval: false
install.packages("medicalcoder")

And as always, feedback and real-world use cases are welcome.