Privacy preserving clinical information extraction pipeline
Authors
Pradeep Kumar, Srimathi, Sanjeevi Vishnu, Senthil prakash*
Abstract
This paper presents a privacy-preserving Clinical Information Extraction Pipeline designed to extract structured medical knowledge from unstructured clinical narratives while ensuring robust patient data confidentiality. The proposed pipeline integrates state-of-the-art Natural Language Processing (NLP) techniques including Named Entity Recognition (NER), relation extraction, and medical concept normalization with privacy-enhancing technologies such as differential privacy, federated learning, and de-identification modules compliant with HIPAA and GDPR standards. Clinical entities such as diagnoses, medications, procedures, and laboratory findings are accurately identified and extracted without exposing personally identifiable information (PII). Experimental evaluations conducted on benchmark clinical datasets demonstrate that the pipeline achieves competitive extraction accuracy while maintaining strong privacy guarantees, with minimal utility loss. The results highlight the feasibility of deploying privacy-aware NLP systems in real-world healthcare environments, paving the way for secure secondary use of clinical data in medical research, pharmacovigilance, and clinical decision support systems.
Keywords
Publication Details
Published In
Volume 1, Issue 1