Module 3: Data — The Heart of AI in Medicine
Learning Objectives:
- Understand different types of clinical data used in AI.
- Learn key concepts like data collection, preprocessing, quality control, and bias detection.
- Recognize the challenges of clinical data management.
- Explore examples relevant to real-world medical AI projects.
Section 1 : What Is Data
- What Is Data in Medicine?
- Data Definition: In AI, data refers to any information the model can process, including patient records, lab results, and medical images.
- Why It Matters: Models learn patterns from data. Without data, there’s no learning.
- Types of Medical Data
Type | Example | Use Case in AI |
Structured Data | Lab tests, vitals, demographics | Predictive models (cancer risk) |
Unstructured Data | Clinical notes, imaging scans | NLP models, image analysis |
Time-Series Data | Heart rate, blood glucose levels | Monitoring and alerts |
Text Data | Physician notes, discharge summaries | Summarization, clinical decision |
Imaging Data | Endoscopy, CT scans, pathology slides | Disease detection (e.g., cancer) |

- Real-World Clinical Example:
- Scenario: An AI system predicting pneumonia uses:
- Structured Data: Patient’s temperature, oxygen level.
- Unstructured Data: Endoscopy images and videos
- Time-Series Data: Pulse oximetry from wearables.
- Scenario: An AI system predicting pneumonia uses:
- Clinical Analogy:
- Think of data as “patient history.” Just as doctors build knowledge by seeing many patients, AI models learn from diverse datasets.

Section 2 : Data Preprocessing
- Why Preprocess Data?
- Raw Data Problems:
- Incomplete records (e.g., missing lab values).
- Inconsistent formats (e.g., different units like mg/dL vs. mmol/L).
- Errors like typos or incorrect data entries.
- Raw Data Problems:
- Data Preprocessing Steps:
Step | What It Means | Example |
Data Cleaning | Removing incorrect data | Correcting typos in diagnoses |
Data Standardization | Converting to standard formats | Standardizing drug doses |
Data Normalization | Scaling numeric values | Rescaling lab test values |
Data Imputation | Filling in missing data | Estimating missing BP readings |
Data Transformation | Encoding text into numbers | Converting disease labels to IDs |
- Example:
- Preprocessing for AI in Colonoscopy:
- Ensure all colonoscopy images have the same resolution.
- Normalize pixel values to standardize brightness.
- Preprocessing for AI in Colonoscopy:
- Clinical Analogy:
- Preprocessing is like organizing patient charts before rounds: everything needs to be readable, complete, and up to date.
Section 3: Data Quality Check
- Why Data Quality Matters:
- Poor data = bad predictions.
- In healthcare, incorrect predictions can harm patients.
- Key Data Quality Dimensions:
Dimension | Definition | Example |
Completeness | No missing data | No missing lab results |
Accuracy | Correct and reliable data | Correct drug dosage recorded |
Consistency | Uniform data formatting | Consistent diagnosis codes |
Timeliness | Data available when needed | Latest vitals during monitoring |
Integrity | No corruption or tampering | Secure patient records |
- Example:
- Case: A cancer prediction AI using incomplete records may miss critical lab test results, leading to incorrect predictions.
- Clinical Analogy:
- Data quality is like ensuring lab tests are accurate before making a clinical decision.
Section 4 : Data Bias Detection
- Why Detect Bias?
- Medical AI must work equally well for all populations.
- Bias leads to incorrect or unfair predictions, impacting patient care.
- Common Types of Bias in Clinical Data:
Type of Bias | What It Means | Example |
Selection Bias | Unrepresentative patient samples | Data from only one hospital |
Measurement Bias | Inconsistent test methods | Different MRI machines used |
Historical Bias | Learning outdated practices | Use of old treatment protocols |
Confirmation Bias | Model favors specific assumptions | Model assuming gender-specific diseases |
Population Bias | Missing patient groups | Few records from minority groups |
- Example:
- Case: An AI trained only on data from urban hospitals may not work well in rural areas with different healthcare practices.
- Clinical Analogy:
- Detecting data bias is like considering patient-specific factors before prescribing treatment.