HSR&D Citation Abstract

HSR&D Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

Schroeck FR, Patterson OV, Alba PR, Pattison EA, Seigne JD, DuVall SL, Robertson DJ, Sirovich B, Goodney PP. Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research. Urology. 2017 Dec 1; 110:84-91.

Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions

Search for Abstract from PubMed

Abstract:

OBJECTIVE: To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports. METHODS: Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer. RESULTS: When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer. CONCLUSION: NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data.

Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.

VA Health Systems Research

HSR&D Citation Abstract

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.