Development and Validation of a Natural Language Processing Algorithm to Extract Descriptors of Microbial Keratitis From the Electronic Health Record

Document Type


Publication Date


Publication Title



PURPOSE: The purpose of this article was to develop and validate a natural language processing (NLP) algorithm to extract qualitative descriptors of microbial keratitis (MK) from electronic health records.

METHODS: In this retrospective cohort study, patients with MK diagnoses from 2 academic centers were identified using electronic health records. An NLP algorithm was created to extract MK centrality, depth, and thinning. A random sample of patient with MK encounters were used to train the algorithm (400 encounters of 100 patients) and compared with expert chart review. The algorithm was evaluated in internal (n = 100) and external validation data sets (n = 59) in comparison with masked chart review. Outcomes were sensitivity and specificity of the NLP algorithm to extract qualitative MK features as compared with masked chart review performed by an ophthalmologist.

RESULTS: Across data sets, gold-standard chart review found centrality was documented in 64.0% to 79.3% of charts, depth in 15.0% to 20.3%, and thinning in 25.4% to 31.3%. Compared with chart review, the NLP algorithm had a sensitivity of 80.3%, 50.0%, and 66.7% for identifying central MK, 85.4%, 66.7%, and 100% for deep MK, and 100.0%, 95.2%, and 100% for thin MK, in the training, internal, and external validation samples, respectively. Specificity was 41.1%, 38.6%, and 46.2% for centrality, 100%, 83.3%, and 71.4% for depth, and 93.3%, 100%, and was not applicable (n = 0) to the external data for thinning, in the samples, respectively.

CONCLUSIONS: MK features are not documented consistently showing a lack of standardization in recording MK examination elements. NLP shows promise but will be limited if the available clinical data are missing from the chart.

PubMed ID



ePub ahead of print