Medications in the Electronic Medical Record
Physicians look to their Electronic Medical Record (EMR) system to house all of the information for their patients over time, including their patient’s medication history. Depending on the EMR, medications could be stored in the clinical notes, logged directly into the EMR, or both. However, this information needs to get updated at every encounter, especially if it’s a specialist who doesn’t always have updated information about a patient’s other medical visits.
This system seems to work okay for individual clinical visits, but when it comes to quality reporting and looking at medication usage at a practice or system-level, this can become a processing nightmare. We talked about some of the challenges with clinical NLP in a previous post, and that applies to extracting information about medications as well, as medications can be talked about in various contexts₁.
historic references (e.g., “the patient was prescribed drug X”)
changes in dosage (e.g., “drug X will be tapered off...”)
planning (e.g., “we are going to give the patient drug X to treat problem Y”)
We won’t focus on text extraction for this post; instead, we’ll look at the case where medications are logged in the EMR for each patient encounter. Even in a ‘clean’ case like this, there is still a ton of variability that is present, including inclusion of both brand and generic drug names, combination medications, and differences in dosages. As we’ll go into later in this post, this variability is what leads to the benefit of utilizing NLP since we can use fuzzy string matching to match on all of these parameters.
Informatics Overview
Before we dive into this subject, it’s best if we take a second to familiarize ourselves with the ontologies and terminologies used for medications. The most common ontology is RxNorm₂, which is produced and maintained by the National Library of Medicine. RxNorm was created with the intention of standardizing drug names/classes for ease of communication across hospitals, pharmacies, and other physician organizations. RxNorm can also be used as a source of normalized drug names, which is sometimes known as SAB=RXNORM. These SAB=RXNORM names follow a structured pattern consisting of ingredient, strength, and dose form.
Another popular ontology is NDC (National Drug Code), which is used by the FDA to have unique identifiers for each drug. These drugs are then compiled into the National Drug Code Directory₃.
One last concept that may be useful to know is how medications are grouped together into classes. There are a variety of different ways that this can occur: based on pharmacological composition, based on therapeutic use, based on the mechanism of action, etc. RxNorm even has a whole navigator simply to explore which classes drugs are mapped to (RxClass), which shows the complexity behind this mapping. The proper mapping will depend on the use case, but it’s good to familiarize yourself with the differences between the various classes₄.
Using RxClass to explore different drugs and their associated classes, you can start to get a sense of the innate complexity. For example, the drug Sildenafil is more popularly known by its brand name: Viagra. It’s a vasodilator that can be used for pulmonary hypertension, but also as an erectile dysfunction treatment. In fact, it was initially created and used in a clinical trial for pulmonary hypertension, but because male patients kept reporting significant erections, they realized that it could be a treatment for erectile dysfunction as well. This is just one example of why it’s crucial to map medications to ALL of their underlying classes and uses, especially in a clinical care context.
Trying out the RxNorm API
To learn more about this mapping, we decided to try out the mapping tools for ourselves and see which ones worked best. To do this, we took a random list of medications that varied between brand/generic and with dosage/without dosage. The thought is that this random list could be compiled by a practice that is interested in analyzing their medication usage for quality reporting. Our goal was to then map all of these medications to their corresponding drug classes, based on the names alone.
The first method we tried was the RxNorm API, which is provided directly by the National Library of Medicine₅. Since we had a large list (and wanted this method to scale), we needed a programmatic way to feed in our drug names into the API and collect the responses. Luckily, someone had already faced this problem before and prepared a nice Python script to accomplish this:
https://github.com/cenanypirany/rxnormpy/blob/master/rxnormapi.py
If you want to try individual queries of the API yourself, you can look at the documentation and follow their examples. For instance, going to this URL in your browser:
https://rxnav.nlm.nih.gov/REST/approximateTerm?term=zocor%2010%20mg&maxEntries=4
Yields the closest matches (up to 4) for ‘zocor 10 mg’
Here we only see two matches, both with a confidence score of 75/100. We also see the RxNorm Concept Unique Identifier (CUI) and the RxNorm Atomic Unique Identifier (AUI) for the matches. This is important because we can then use the rxcui to retrieve the drag classes for the top match with our given string.
In order to do this, we wrote a function that will loop through our list of drug names, get the rxcui of the first match for each name, and then use the rxclass API to get the ATC (Anatomical Therapeutic Chemical) and EPC (Established Pharmacologic Class) for each drug.
https://gist.github.com/vkumaresan/52713b0b7b43d00df73946d537a59a35
This seemed to work okay at first, when we looked at a subset of drugs and viewed their EPC and ATC matches:
However, when we looped through our full list of 4363 drugs, we found that the function didn’t return any classes for 876 drugs, which is obviously not ideal. Thus, we moved on to identify another method that could provide us with a higher match rate.
The FDA Comes to the Rescue
Remember that NDC directory that I mentioned earlier? In our random Googling, we stumbled upon a glorious set of Excel spreadsheets from the FDA that contains product information and drug classes submitted by drug manufacturing labelers₆. This seemed way too good to be true, but when we downloaded the product file and took a look, it seemed to contain everything we needed! Not only did it have both the proprietary (brand or trade) name and nonproprietary (generic) name for each drug, but it also had ALL of the drug classes, which made things a lot simpler for us.
From here, it was easy for us to ingest this product file as a Pandas dataframe in Python, perform fuzzy string matching to match our drug names with either the proprietary name or the nonproprietary name, and then attach on the corresponding drug classes for each medication. The following Python code shows how we accomplished this:
https://gist.github.com/vkumaresan/404e33a650c96c155e83019baa192626
Note that fuzzy string matching is crucial here, and while it’s a simple NLP technique, it’s also immensely useful for this particular use case. The threshold that you set for the fuzzy string matching is an important parameter, but there’s no real hard rule for how high/low to set this number. We’ll save the specifics and nuances behind fuzzy string matching for another post, but for now we’ll just state that this threshold is the ratio of a metric known as the Levenshtein distance, which essentially is a metric that calculates the edit distance (number of edits needed to change one string to another string) between two strings. Thus, the higher the threshold, the lower the number of differences between the two strings that is tolerated in order to be recognized as a match. In practice, we suggest calibrating this number based on the sensitivity of your task; we set our threshold at 80% after trying different thresholds and manually validating the matches on a subset of our drugs.
Comparing our match rates...
Previously (RxNorm): ~80% (3487/4383)
New (NDC): ~85% (3689/4383)
...we see that there was an improvement, but we still didn’t achieve a perfect match. That being said, this new approach performs better, is relatively simpler, and doesn’t involve active API calls (which is always a plus for a scalable solution). If you are looking to implement this on your own dataset though, it’s probably worth trying both methods and seeing which one yields a higher match rate.
Clinical Use Case: Nephrology
Now that we have the mapping for most of our drugs, we decided to test out a clinical example. Let’s take the case of a nephrology practice that is interested in evaluating their CKD (Chronic Kidney Disease) patients; this is a common use case because there are specific quality reporting metrics that are required for CKD, and consolidating information across an EMR for these patients can be a convoluted process. To do this, we took a subset of our list of drug names that could be a realistic list for a single patient, shown below.
Now, mapping these drugs to drug classes is useful, but a clinician might be even more interested in getting a higher-order representation of these drugs that is related to its use in the context of their clinic. To represent this, we created our own mapping to group together classes in ‘medication types’. The hope is that with this step, we remove a layer of detail and allow for meaningful summaries of patient medications.
Programming this was similar to how we coded up the previous mapping: fuzzy-string match medications to their corresponding class and corresponding medication type (if in our defined list). We also created a column that would show the explicit mapping, just for our own validation. The results are shown below for A) the medication class, B) medication types, and C) mapping for each drug.
A
B
C
Note that for the drugs that had no match, they are still included in the mapping for posterity, but will have no entries after the arrows. An edge case that we found to be problematic was vitamins, since these are mapped to a large amount of classes that are probably not of relevance to a clinic. For CKD, Vitamin D is the only one that might be important, so we chose to ignore mapping for all other vitamins.
Our goal with this use case was to show a realistic pathway from patient medication lists in an EMR to useful clinical insights. There are obvious design choices to be made along the way, which would realistically involve conferring with the clinicians at the practice and understanding their needs and concerns. But given the ontological and programming tools at our disposal, we are able to create a scalable solution that gives clinicians quick insights on the medications that their patients are taking.
Summary
This exercise was extremely interesting and allowed us to learn more about the various ontologies and classification systems that are used to organize medications. Of course, this is just scratching the surface: now that we were able to extract drug classes, we may want to consolidate these into categories that matter for physicians (ex. Anti-hypertensive, anemia control, etc.), or analyze the integrity of these classes to ensure that they line up with current clinical knowledge. The NDC file states that it was last updated on 09/25/2019, so there could have been changes that occurred since then that will impact the interpretation of these findings; one obvious example being that COVID vaccines are not in this file.
Medication mapping is a simple example of the current gaps that exist in healthcare data, and how close (yet so far) we are from building a system that allows us to instantly derive insights from EMR data. By poking around, we were able to find various systems that we could repurpose for our mapping use case, but ideally there would be a simple package that allows one to do this in a simple one-liner, so we’re working on developing that now!
As more engineers and data scientists flock to healthcare, and as EMR systems like Athenahealth and Epic start to create developer-friendly ecosystems, the sky's the limit for the applications that we can build to truly connect the dots and facilitate better data reporting. As most of you who work in the healthcare industry already know, the EMR was primarily built with the purpose of billing in mind, so the hope is that we can turn around and make this data useful from a clinical workflow and population health basis. This is crucial not just from a financial perspective, but it will also ultimately impact patient health, as physicians will soon be able to learn and adapt based on feedback from their care management. While in some ways this still seems like a distant dream, we hope that this medication mapping walkthrough shows that small wins are achievable.
References
- Iglesias, Juan Eugenio et al. “Tracking medication information across medical records.” AMIA ... Annual Symposium proceedings. AMIA Symposium vol. 2009 266-70. 14 Nov. 2009
- https://www.nlm.nih.gov/research/umls/rxnorm/overview.html
- https://www.accessdata.fda.gov/scripts/cder/ndc/index.cfm
- https://rxnav.nlm.nih.gov/RxClassIntro.html
- https://rxnav.nlm.nih.gov/RxNormAPIs.html#
- https://www.fda.gov/drugs/drug-approvals-and-databases/ndc-product-file-definitions