Atlas-based lung boundary detection module (MATLAB Code)
We present to you software developed as part of the following article:
Candemir S, Jaeger S, Palaniappan K, Musco JP, Singh RK, Xue Z, Karargyris A, Antani S, Thoma G, McDonald CJ. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE Trans Med Imaging. 2014 Feb; 33(2):577-90. doi: 10.1109/TMI.2013.2290491. PMID: 24239990
We request that you cite the paper if this code is used for publications or product.
The software contains the following assets:
- LungSegment_module.m: performs the lung segmentation on CXRs.
- Patient_Xrays: folder contain patient X-ray to be segmented. Please locate your test X-rays in this folder. The X-ray image can be in DICOM, TIFF, PNG, BMP, or JPEG file formats.
- Model_Xrays: folder contain example X-rays, their corresponding lung masks, vertical, and horizontal profiles. These X-rays and lung masks are used during registration process.
- Model_h.mat and Model_v.mat: precomputed profiles for model x-rays.
- TEST.m: to test the system, please run this function.
- subfunctions: CEB_polygionCurveEvolution.m, cebShape2Spline.m, f_removeBlackBorderds.m, f_RemoveSmallSegments.m, FindSimilarCXR.m, fitSpline.m, read_Xray.m, read_Xray.m, read_X_Mask.m, RegisterModel.m, SmoothBoundary.m (please refer code comments for these functions. )
The system is developed using MATLAB version 8.4 on 64-bit Intel architecture running Windows 7 operating system.
- Employed functions: mexDenseSIFT(m, mex), mesDiscreteFlow(m,mex), SIFTflowc2f.m functions are employed from the “SIFT Flow: Dense Correspondence across Scenes and Its applications”. Please use the http://people.csail.mit.edu/celiu/SIFTflow/ website to download the latest version of these functions. Do not forget citing the SIFTFlow paper.
How to run:
- Download and unpack the zip file containing the code here.
- Locate your CXRs in the Patient X-ray folder.
- Run the TEST.m function. The binary lung masks will be computed for each patient X-ray. The masks will be the same size as patient X-ray.
- Adjust parameters to obtain better results for your CXRs.
Contact: Stefan Jaeger
Clinical Table Search Service (formerly "lforms-service") is a web service which software programs can use for querying clinical data tables.
View the Clinical Table Search Service API
The CORE (Clinical Observations Recording and Encoding) Problem List Subset identifies important clinical concepts in SNOMED CT that occur frequently in the problem list. It facilitates the use of SNOMED CT for clinical documentation at the summary level.
CSpell, a distributable spell checker for consumer language, is designed to detect and correct various types of spelling errors in Consumer Health Questions.
Download CSpell.
The I-MAGIC (Interactive Map-Assisted Generation of ICD Codes) Algorithm utilizes the SNOMED CT to ICD-10-CM Map in a real-time, interactive manner to generate ICD-10-CM codes. This demo simulates a problem list interface in which the user enters problems with SNOMED CT terms, which are then used to derive ICD-10-CM codes using the Map.
The Map can be used in the following scenarios:
- Real-time use by the healthcare provider – In this scenario, the Map is embedded in the problem list application of the EHR used by the physician or other healthcare provider. At the end of a clinic encounter, the clinician updates the problem list, which is encoded in SNOMED CT. The Map-enabled problem list application outputs a list of ICD-10-CM codes based on algorithmic evaluation of map rules, which makes use of patient context (e.g. age, gender) and co-morbidities (other problems on the problem list) to identify the most appropriate candidate ICD-10-CM codes, in accordance with ICD-10-CM coding guidelines and conventions. If necessary, the clinician is prompted for additional information to decide between alternative codes, or to refine the output codes. The clinician confirms the suggested ICD-10-CM codes. (See the I-MAGIC algorithm and demo page)
- Retrospective coding by coding professionals – In this scenario, the Map is used within an application to suggest candidate ICD-10-CM codes to coding professionals based on a stored SNOMED CT encoded problem list. The degree of automation can vary. Textual advice can be displayed in cases where automated rule processing is not available.
- Web Interface: http://imagic.nlm.nih.gov/imagic/code/map
- I-MAGIC Implementation Guide: http://www.nlm.nih.gov/research/umls/mapping_projects/IMAGICImplementationGuide_20120614.pdf
- About the SNOMED CT to ICD-10-CM map: http://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html
Many existing electronic health record (EHR) systems contain clinical information encoded in ICD-9-CM. To facilitate migration to SNOMED CT as the primary clinical terminology for patient problems (diseases and conditions), it is desirable that the legacy ICD-9-CM data be translated to SNOMED CT. This will make it possible to compare newly collected data with historic data, and will also allow the EHR to make use of SNOMED CT to provide clinical decision support and other functions. The goal of the ICD-9-CM to SNOMED CT Map (herein referred to as “the Map”) is to facilitate the translation of legacy data and the transition to prospective use of SNOMED CT for patient problem lists. Note that this Map is not the same as, and serves different purposes from, the SNOMED CT to ICD-9-CM Map.
The most useful mappings are the one-to-one maps, in which a single SNOMED CT concept can be used to represent the full meaning of an ICD-9-CM code. This allows the automatic translation of ICD-9-CM codes into SNOMED CT codes without loss of meaning. The Map tries to identify as many one-to-one maps as possible, however, due to the differences between the two coding systems, one-to-one maps cannot be found for some ICD-9-CM codes. This difference is usually due to one of two reasons. Firstly, in ICD-9-CM, some codes are “catch-all” codes that encompass heterogeneous diseases or conditions (e.g. pneumonia due to other specified bacteria). These codes, commonly known as “NEC codes” (not elsewhere classified codes), will not have one-to-one maps because of their nature. Secondly, since SNOMED CT is more granular than ICD-9-CM in most disease areas, some ICD-9-CM diseases or conditions are further refined as more specific concepts in SNOMED CT. For such cases, it is not possible to map to a more specific SNOMED CT concept without the input of additional information.
The Map is published in two separate files, one containing the one-to-one maps, and the other the one-to-many maps. Also included in the files are the usage frequency of the ICD-9-CM codes, and the usage frequency of the SNOMED CT concepts from the CORE Problem List Subset data. The latter information can help users to identify the more commonly used SNOMED CT targets in the one-to-many maps.
Mapping Methodology
Two lists were obtained from the Centers for Medicare & Medicaid Services (CMS), covering commonly used ICD-9-CM codes in short-stay and outpatient hospitals respectively, for the year 2009. SNOMED CT maps for the ICD-9-CM codes in the lists were derived primarily from two existing knowledge sources: the synonymy between ICD-9-CM and SNOMED CT terms in the Unified Medical Language System (UMLS), and the SNOMED CT to ICD-9-CM Cross Maps published in the International release of SNOMED CT. The choice of target SNOMED CT codes was limited to concepts in three hierarchies: Clinical finding, Situation with explicit context, and Events. One-to-one maps identified by UMLS synonymy were not manually validated. One-to-many maps that were algorithmically identified which involved less than 5 SNOMED CT targets were manually reviewed, with the intention to reduce them to one-to-one maps if possible. ICD-9-CM codes with no maps, or one-to-many maps involving a large number of targets were not manually reviewed.
LexAccess is a lexical access tool in the SPECIALIST Lexicon family. LexAccess is developed to integrate with LexBuild and provide access to the information from the SPECIALIST lexicon.
Download LexAccess
LexCheck is a software package to check and auto-correct the syntax and contents of Lexical record(s) in the LEXICON based on the technical report of "The SPECIALIST LEXICON".
Download LexCheck
The SPECIALIST lexical tools are a set of JAVA programs designed to help users manage lexical variation in biomedical text. The tools use information from the SPECIALIST lexicon and other data to generate lexical variants of words or terms appropriate for use in indexing and other NLP applications.
The SPECIALIST lexical tools are a set of JAVA programs designed to help users manage lexical variation in biomedical text. The tools use information from the SPECIALIST lexicon and other data to generate lexical variants of words or terms appropriate for use in indexing and other NLP applications.
Try Lexical Web Tools online.
Background:
Lung region detection in chest radiographs is an important early step in a machine learning (ML) pipeline for pulmonary disease screening and diagnosis. We are providing a dataset of lung masks with corresponding 55 frontal images (contrast enhanced) that have been subset from the NLM Open-i Indiana chest x-ray dataset (https://openi.nlm.nih.gov/faq#collection). This data set was first used in the publication below [1].
Reference:
- Xue Z, Yang F, Rajaraman S, Zamzmi G, Antani S, “Cross Dataset Analysis of Domain Shift in CXR Lung Region Detection”, Diagnostics 2023.
This page hosts a repository of P. vivax and P. falciparum images in both thin and thick blood smears from the Malaria Screener research activity.
To reduce the burden for microscopists in resource-constrained regions and improve diagnostic accuracy, researchers at the Lister Hill National Center for Biomedical Communications (LHNCBC), part of National Library of Medicine (NLM), have developed a mobile application, called “Malaria Screener”, which runs on a standard Android smartphone attached to a conventional light microscope. The smartphone's built-in camera acquired images of slides for each microscopic field of view. The images were manually annotated by an expert slide reader at the Mahidol-Oxford Tropical Medicine Research Unit in Bangkok, Thailand. The de-identified images and annotations are archived at NLM (IRB#12972).
The dataset includes five main parts:
-
Giemsa-stained thick blood smear slides from 150 P. falciparum-infected patients were collected and photographed at Chittagong Medical College Hospital, Bangladesh. We have developed the first deep learning method that can detect P. falciparum parasites in thick blood smear images and can run on smartphones, which consists of two modules: an intensity-based Iterative Global Minimum Screening (IGMS) module for parasite candidate screening and a customized CNN classifier for final classification. The data was published along with the following publication:
Yang F, Poostchi M, Yu H, Zhou Z, Silamut K, Yu J, Maude RJ, Jaeger S, Antani S. Deep Learning for Smartphone-Based Malaria Parasite Detection in Thick Blood Smears. IEEE J Biomed Health Inform. 2020 May;24(5):1427-1438. (URL: https://ieeexplore.ieee.org/document/8846750 ) -
Giemsa-stained thick blood smear slides from 150 P. vivax-infected patients and 50 uninfected patients were collected and photographed at Chittagong Medical College Hospital, Bangladesh. Based on a dataset of 350 malaria patients, we proposed PlasmodiumVF-Net to diagnose a patients as uninfected, P. vivax-infected, or P. falciparum-infected. The data was published along with the publication:
Kassim Y M, Yang F, Yu H, Maude R J, Jaeger S. Diagnosing Malaria Patients with Plasmodium falciparum and vivax Using Deep Learning for Thick Smear Images. Diagnostic, 11(11):1994, 2021. (URL: https://www.mdpi.com/2075-4418/11/11/1994 ) -
Giemsa-stained thin blood smear slides from 148 P. falciparum-infected, and 45 uninfected patients were collected and photographed at Chittagong Medical College Hospital, Bangladesh. We proposed RBCNet that consists of a U-Net first stage for cell-cluster or super pixel segmentation, followed by a second refinement stage Faster R-CNN for detecting small cell objects within the connected component clusters. The corresponding publication is:
Kassim YM, Palaniappan K, Yang F, Poostchi M, Palaniappan N, Maude RJ, Antani S, Jaeger S. Clustering-Based Dual Deep Learning Architecture for Detecting Red Blood Cells in Malaria Diagnostic Smears. IEEE J Biomed Health Inform. 2021 May;25(5):1735-1746. (URL: https://ieeexplore.ieee.org/document/9244549 ) -
Giemsa-stained thin blood smear slides from 171 P. vivax-infected patients were collected and photographed in Bangkok, Thailand. We developed a rapid and robust diagnosis system for the automated detection of P. vivax parasites using a cascaded YOLO model. This system consists of a YOLOv2 model and a classifier for hard-negative mining; see the following publication:
Yang F, Quizon N, Silamut K, Maude RJ, Jaeger S, Antani SK. Cascading YOLO: Automated Malaria Parasite Detection for Plasmodium Vivax in Thin Blood Smears. Proc. SPIE 11314, Medical Imaging 2020: Computer-Aided Diagnosis, 113141Q (16 March 2020); (URL: https://doi.org/10.1117/12.2549701 ) -
We acquired cell images from 150 P. falciparum-infected and 50 uninfected patients in Giemsa-stained thin blood smears that were collected and photographed at Chittagong Medical College Hospital, Bangladesh. The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells. An instance of how the patient-ID is encoded into the cell name is shown herewith: “P1” denotes the patient-ID for the cell labeled “C33P1thinF_IMG_20150619_114756a_cell_179.png”. We have also included the CSV files containing the Patient-ID to cell mappings for the parasitized and uninfected classes. The CSV file for the parasitized class contains 151 patient-ID entries. The slide images for the parasitized patient-ID “C47P8thinOriginal” are read from two different microscope models (Olympus and Motif). The CSV file for the uninfected class contains 201 entries since the normal cells from the infected patients’ slides are also in the normal cell category (151+50 = 201). Experiments with the data were reported in the following paper (PeerJ6:e4568):
Rajaraman S, Antani SK, Poostchi M, Silamut K, Hossain MA, Maude, RJ, Jaeger S, Thoma GR. (2018) Pre-trained convolutional neural networks as feature extractors toward improved Malaria parasite detection in thin blood smear images. (URL: https://doi.org/10.7717/peerj.4568 )
Malaria Screener datasheet Details of datasets and download links
Malaria Screener Github Repository Access source code of Malaria Screener
Malaria Screener App Download Malaria Screener from the Google Play store
The MEDLINE n-gram set is used to retrieve multiwords for building the SPECIALIST lexicon. Lexical Systems Group (LSG) would like to share this n-gram set (n = 1 ~ 5) with NLP|MLP community. Please download from the following link.
URL: https://meshb.nlm.nih.gov/MeSHonDemand
Narrative clinical reports contain a rich set of clinical knowledge that could be invaluable for clinical research. However, they usually contain personal identifiers. The presence of personal identifiers in clinical reports renders the contents of those reports as protected health information, which is associated with use restrictions and risks to privacy. The Privacy Rule of Health Insurance Portability and Accountability Act (HIPAA) requires that clinical documents be stripped of personally identifying information before they can be released to researchers and others. Our solution, NLM-Scrubber, is a HIPAA compliant, clinical text de-identification tool designed and developed at the National Library of Medicine. It is freely available.
The main purpose of the Nursing Problem List Subset of SNOMED CT is to facilitate the use of SNOMED CT as the primary coding terminology for nursing problems used in care planning, problem lists, or other summary level clinical documentation.
The RxNorm Current Prescribable Content is a subset of currently prescribable drugs found in RxNorm. We intend it to be an approximation of the prescription drugs currently marketed in the US. The subset also includes some frequently-prescribed over-the-counter drugs.
The subset includes only the active RxNorm normalized names, codes (RxCUIs), attributes, and relationships, as well as the FDA structured product label drugs and ingredients. It does not include data from any of the other 10 RxNorm data providers, such as First DataBank, Micromedex, or the Veterans Administration. We also removed suppressed and obsolete data.
The National Library of Medicine provides this subset without any licensing restrictions. You do not need to log into the UMLS Terminology Services to access the subset.
The RxNorm Prescribable API is a web service for accessing the RxNorm Current Prescribable Content from your program.
- API: https://lhncbc.nlm.nih.gov/RxNav/APIs/PrescribableAPIs.html
- Learn More About the RxNorm Prescribable Content: http://www.nlm.nih.gov/research/umls/rxnorm/docs/prescribe.html
The RxClass Browser is a web application for exploring and navigating through the class hierarchies to find the RxNorm drug members associated with each class.
The RxClass API is available for users to include RxClass data in their applications.
RxMix has been updated! RxMix is a web application that allows users to combine functions from the RxNorm, NDF-RT and RxTerms APIs to create custom applications that can be run interactively or in a batch mode.
- Function composition. The RxMix interface allows the user to build a workflow of API functions to execute. This saves the user from having to write complex programs to handle multiple function calls. Examples of function composition are contained in the examples below.
- Batch processing. Through the user interface, RxMix allows the user to process large amounts of data through the user defined workflow. The user can provide a file containing a list of inputs, such as drug names or drug identifiers, for input to the workflow. RxMix will execute the workflow and inform the user via email when the job has completed, providing information on how to retrieve the results.
- Output in XML, JSON or Text. RxMix offers the user the choice of formatting the output in XML, JSON, or text.
- Interactive mode. RxMix allows users to interactively test and display the results of the workflow on a single input value.
Users of the RxMix interface should be familiar with the RxNorm, NDF-RT and/or the RxTerms API functions.
**Note: RxMix will not work properly with Internet Explorer. Please use FireFox, Chrome or Safari to run RxMix.
- Web interface: http://mor.nlm.nih.gov/RxMix/
- Learn More: http://rxnav.nlm.nih.gov/RxMixTutorial.html
- RxMix Tutorial Batch Example: http://rxnav.nlm.nih.gov/RxMixTutorial2.html
RxNav is a browser for several drug information sources, including RxNorm, RxTerms and NDF-RT. RxNav finds drugs in RxNorm from the names and codes in its constituent vocabularies. RxNav displays links from clinical drugs, both branded and generic, to their active ingredients, drug components and related brand names. RxNav also provides lists of NDC codes and links to package inserts in DailyMed. The RxTerms record for a given drug can be accessed through RxNav, as well as clinical information from NDF-RT, including pharmacologic classes, mechanisms of action, and physiologic effects.
- Web Interface: https://mor.nlm.nih.gov/RxNav/
RxNav-in-a-Box provides users with a locally-installable Docker composition of RxNav, RxClass, RxMix, and RESTful companion APIs, including RxNorm, Prescribable RxNorm, RxTerms, and RxClass.
Go to RxNav-in-a-Box
The RxNorm API is a web service for accessing information from the RxNorm data set.
Go to the RxNorm API
RxTerms is a drug interface terminology derived from RxNorm for prescription writing or medication history recording (e.g. in e-prescribing systems, PHRs). An API is available to provide developers with functions for retrieving RxTerms data from the most current RxTerms data set.
- API: https://lhncbc.nlm.nih.gov/RxNav/APIs/RxTermsAPIs.html
- Learn More About RxTerms: https://lhncbc.nlm.nih.gov/MOR/RxTerms/
The Route of Administration subset of SNOMED CT is a listing of the current set of terms related to the location of administration for clinical therapeutics.
SNOMED CT to ICD-10 Cross Maps (created and maintained by IHTSDO) - support epidemiological, statistical, and administrative reporting.
The map is updated and included with every International release of SNOMED CT which can be downloaded here. http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctfiles.html
Mapping SNOMED CT codes to and from ICD codes
SNOMED CT is clinically-based, and oriented for direct use by healthcare providers, to document whatever is needed for patient care. ICD codes are oriented more for coding professionals to use after patient care has already been provided, for statistical data collection and billing. ICD codes lump less common diseases together in "catch-all" categories, for example, J15.8 Pneumonia due to other specified bacteria, which could result in loss of information. SNOMED Ct has more "granular" (specific) clinical coverage than ICD:SNOMED CT (clinical finding) has 100,000 codes, ICD-10-CM has 68,000 codes, and ICD-9-CM has 14,000 codes.
Due to the differences in granularity, emphasis and organizing principles between SNOMED CT and ICD-10-CM, it is not always possible to have a one-to-one map between a SNOMED CT concept and an ICD-10-CM code. To address this challenge, the SNOMED CT to ICD-10-CM Map follows an approach that is consistent with the approach used by the IHTSDO and WHO. When there is a need to choose between alternative ICD-10-CM codes, each possible target code is represented as a “map rule” (the essence of “rule-based mapping”). Related map rules are grouped into a “map group”. Map rules within a map group are evaluated in a prescribed order at run-time, based on contextual information and co-morbidities. Each map group will resolve to at most one ICD-10-CM code. In the event that a SNOMED CT concept requires more than one ICD-10-CM code to fully represent its meaning, the map will consist of multiple map groups.
We have created the SNOMED CT to ICD-10-CM Map to support semi-automated generation of ICD-10-CM codes from clinical data encoded in SNOMED CT for reimbursement and statistical purposes.
- Download: http://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html
- Latest release in September 2014 provides ICD-10-CM maps for 54,262 SNOMED CT concepts
- Third release (35,000 SNOMED CT concepts mapped to ICD-10-CM) is anticipated for June 2013.
- Second release was in July 2012 (15,000 SNOMED CT concepts mapped to ICD-10-CM).
- First release was in February 2012 (7000 SNOMED CT concepts mapped to ICD-10-CM).
The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English, designed/developed to provide the lexical information needed for the SPECIALIST Natural Language Processing System (NLP) which includes SemRep, MetaMap, and the Lexical Tools. It is intended to be a general English lexicon that includes many biomedical terms. Coverage includes both commonly occurring English words and biomedical vocabulary from a variety of sources.
Download the SPECIALIST Lexicon
The Sub-Term Mapping Tools (STMT) is a generic tool set that provides comprehensive sub-term related features:
- to find all sub-terms
- to find all prefix sub-terms
- to find the longest prefix sub-term
- to find all sub-term patterns
- to find all permutations of synonymous sub-term substitutions (query expansion)
This is a set of web services (APIs) for programs to use when working with units from the Unified Code for Units of Measure (UCUM) system.
Go to the Unified Code for Units of Measure (UCUM) Validation and Conversion API
The publicly-available Visible Human Project reference data sets are complete, anatomically detailed, three-dimensional representations of normal male and female human bodies. They include transverse CT, MR, and cryosection images. The male was sectioned at one millimeter intervals, the female at one-third of a millimeter intervals. The data sets are used in education, diagnosis, treatment planning, virtual reality, and virtual surgeries.
- Description, access information, and license agreement documents: http://www.nlm.nih.gov/research/visible/getting_data.html