medicalcoder: A Unified and Longitudinally Aware Framework for ICD-Based Comorbidity Assessment in R
news
R
medicalcoder
pediatrics
comorbidities
Author
Peter E. DeWitt
Published
February 28, 2026
{fig-width=3 in fig-align=“right”}
My R package medicalcoder was updated to version 0.8.0 and released on CRAN.
At its core, medicalcoder provides a portable, dependency-light framework for ICD-based comorbidity assessment in R. It is designed for reproducible health data workflows, especially in environments where dependency management, portability, and longitudinal inference matter.
Version 0.8.0 expands stability, improves longitudinal flagging behavior, and extends Elixhauser support through the 2026 AHRQ ICD definitions.
Quick Start:
library(medicalcoder)
Warning in fun(libname, pkgname): strings not representable in native encoding
will be translated to UTF-8
The example data set mdcr_longitudinal has records for 3 patients. The ICD codes are all expected to be diagnostic codes and there are both ICD-9 and ICD-10 records.
A single call to medicalcoder::comorbidities() can flag Elixhauser comorbidities across ICD-9 and ICD-10 within a subject record in a single call.
mdcr_rtn <- medicalcoder::comorbidities(data = mdcr_longitudinal, # object inheriting data.frameicd.codes ="code", # character string name of icd codes in dataid.vars =c("patid"),icdv.var ="icdv", # variable in data denoting ICD versiondx =1L, # consider all ICD codes in data are diagnostic codespoa =1L, # consider all codes present on admissionprimarydx =0L, # consider all diagnosis codes secondary diagnoses.method ="elixhauser_quan2005", )# To simplify the output, only show the columns that are not all zerosFilter(f =function(x) any(x >0), mdcr_rtn)
The same results can be achieved using other tools such as the R package comorbidity, but it requires a lot more work. In the code that follows, you need to split the data by ICD version, manually recombine results, repair severity hierarchies, and restore object attributes before scoring.
# split the data by ICD version and call comorbidity::comorbidity twice with the# needed version-specific mappingr9 <- comorbidity::comorbidity(x =subset(mdcr_longitudinal, icdv ==9),id ="patid",code ="code",map ="elixhauser_icd9_quan",assign0 =TRUE )r10 <- comorbidity::comorbidity(x =subset(mdcr_longitudinal, icdv ==10),id ="patid",code ="code",map ="elixhauser_icd10_quan",assign0 =TRUE )# combine and aggregate the resultsrtn <-aggregate(. ~ patid, data =rbind(r9, r10), FUN = max)# clean up flags by severity (might not be needed in this specific example, but# is needed in general)rtn[["diabunc" ]][rtn$diabc] <-0Lrtn[["hypunc" ]][rtn$hypc] <-0Lrtn[["solidtum"]][rtn$metacanc] <-0L# apply the scoring, to use comorbidity::score the object needs to have certain# attributes which were lost when aggregating over the icd versions. Replace# the attributesattributes(rtn)[c("class", "variable.labels", "map")] <-attributes(r9)[c("class", "variable.labels", "map")]# the score is based on the weights by van Walraven et al. (2009)rtn[["score"]] <- comorbidity::score(x = rtn, weights ="vw", assign0 =TRUE)# simplify the output by only showing columns which are not all zerosFilter(function(x) any(x >0), rtn)
The scoring approaches differ, but the underlying comorbidity flags for weight loss and paraplegia are identical. The key difference is the workflow: medicalcoder handles mixed ICD versions in a single, unified call.
Longitudinal Flagging of Comorbidities
Flagging comorbidities over a patient history should be a consideration for all researchers. I will not say which conditions must be considered longitudinally and which might not be, but I will say that chronic conditions, even when controlled, should be considered in research.
It seems simple enough with methods such as Charlson and Elixhauser from Quan et al. (2005) to flag comorbidities at the encounter level and then carry the flag forward to subsequent encounters within a patient record.
However, in some Elixhauser variants, simply carrying forward encounter-level flags can fail to capture a condition.
For example, the Agency for Healthcare Research and Quality (AHRQ) has an Elixhauser variant which requires consideration of whether or not the ICD code is reported as present-on-admission (POA) or not. Some conditions are exempt from the POA requirement, that is, if the ICD is in the record then the patient has that condition:
Let’s build an example patient. On encounter 1 there are no ICD codes. On encounter 2, there will be a non-POA ICD-10 code for severe liver disease. We will have encounters 3 and 4 with no ICD codes.
If we try to flag comorbidities for Joe on the encounter level we get what we expect, no liver disease because liver disease requires the ICD code to be POA.
patid encid LIVER_SEV
1 Joe 1 0
2 Joe 2 0
3 Joe 3 0
4 Joe 4 0
At the encounter level, this is correct behavior. The severe liver disease code was not present-on-admission, so it should not be treated as a comorbidity on encounter 2.
Just to show that if the code was POA it would be flagged:
patid encid LIVER_SEV
1 Joe 1 0
2 Joe 2 1
3 Joe 3 0
4 Joe 4 0
So here, we can see that if the ICD code(s) for severe liver disease are not reported and marked as POA for encounters 3 and 4, there is no way to retroactively flag the comorbidity later in the record.
medicalcoder takes care of this for you. By using flag.method = "cumulative" we get the following:
patid encid LIVER_SEV
1 Joe 1 0
2 Joe 2 0
3 Joe 3 1
4 Joe 4 1
Here we see that the patient does not have the severe liver disease as a comorbidity on encounter 2 because the ICD code is not POA for encounter 2. Although the condition was not present-on-admission for encounter 2, the patient’s underlying severe liver disease does not disappear from their history. By encounter 3, it is part of the patient’s medical history and should be treated as a pre-existing comorbidity. Using flag.method = "cumulative" ensures that once a condition appears in the record, it is carried forward appropriately in subsequent encounters.
Conclusion
ICD-based comorbidity assessment is foundational for risk adjustment, phenotyping, and health services research. Yet real-world data rarely arrive in clean, single-version, single-encounter form. They span ICD-9 and ICD-10 transitions, include present-on-admission requirements, and unfold over time.
medicalcoder was designed with those realities in mind.
With a single interface, it:
Handles mixed ICD versions transparently
Respects present-on-admission requirements
Enables cumulative longitudinal flagging
Maintains portability with base R (≥ 3.5.0) only
The goal is not just convenience — it is correctness and reproducibility in complex health data environments.
Install from CRAN:
#| eval: falseinstall.packages("medicalcoder")
And as always, feedback and real-world use cases are welcome.