The AI Institute for Gastroenterology

Module 3: Data — The Heart of AI in Medicine

Learning Objectives:

  • Understand different types of clinical data used in AI.
  • Learn key concepts like data collection, preprocessing, quality control, and bias detection.
  • Recognize the challenges of clinical data management.
  • Explore examples relevant to real-world medical AI projects.
Type Example Use Case in AI
Structured Data Lab tests, vitals, demographics Predictive models (cancer risk)
Unstructured Data Clinical notes, imaging scans NLP models, image analysis
Time-Series Data Heart rate, blood glucose levels Monitoring and alerts
Text Data Physician notes, discharge summaries Summarization, clinical decision
Imaging Data Endoscopy, CT scans, pathology slides Disease detection (e.g., cancer)

Section 1: What Is Data

  1. What Is Data in Medicine?
    1. Why It Matters: Models learn patterns from data. Without data, there’s no learning.
    2. Data Definition: In AI, data refers to any information the model can process, including patient records, lab results, and medical images.
  2. Types of Medical Data
introduction-module3-1
  1. Real-World Clinical Example:
    • Scenario: An AI system predicting pneumonia uses:
      • Structured Data: Patient’s temperature, oxygen level.
      • Unstructured Data: Endoscopy images and videos
      • Time-Series Data: Pulse oximetry from wearables.
  2. Clinical Analogy:
    • Think of data as “patient history.” Just as doctors build knowledge by seeing many patients, AI models learn from diverse datasets.
introduction-module3-2

Section 2 : Data Preprocessing

  1. Why Preprocess Data?
    • Raw Data Problems:
      • Incomplete records (e.g., missing lab values).
      • Inconsistent formats (e.g., different units like mg/dL vs. mmol/L).
      • Errors like typos or incorrect data entries.
  2. Data Preprocessing Steps:
Step What It Means Example
Data Cleaning Removing incorrect data Correcting typos in diagnoses
Data Standardization Converting to standard formats Standardizing drug doses
Data Normalization Scaling numeric values Rescaling lab test values
Data Imputation Filling in missing data Estimating missing BP readings
Data Transformation Encoding text into numbers Converting disease labels to IDs
  1. Example:
    • Preprocessing for AI in Colonoscopy:
      • Ensure all colonoscopy images have the same resolution.
      • Normalize pixel values to standardize brightness.
  2. Clinical Analogy:
    • Preprocessing is like organizing patient charts before rounds: everything needs to be readable, complete, and up to date.

Section 3: Data Quality Check

  1. Why Data Quality Matters:
    • Poor data = bad predictions.
    • In healthcare, incorrect predictions can harm patients.
  2. Key Data Quality Dimensions:
Dimension Definition Example
Completeness No missing data No missing lab results
Accuracy Correct and reliable data Correct drug dosage recorded
Consistency Uniform data formatting Consistent diagnosis codes
Timeliness Data available when needed Latest vitals during monitoring
Integrity No corruption or tampering Secure patient records
  1. Example:
    • Case: A cancer prediction AI using incomplete records may miss critical lab test results, leading to incorrect predictions.
  2. Clinical Analogy:
    • Data quality is like ensuring lab tests are accurate before making a clinical decision.

Section 4 : Data Bias Detection

  1. Why Detect Bias?
    • Medical AI must work equally well for all populations.
    • Bias leads to incorrect or unfair predictions, impacting patient care.
  2. Common Types of Bias in Clinical Data:
Type of Bias What It Means Example
Selection Bias Unrepresentative patient samples Data from only one hospital
Measurement Bias Inconsistent test methods Different MRI machines used
Historical Bias Learning outdated practices Use of old treatment protocols
Confirmation Bias Model favors specific assumptions Model assuming gender-specific diseases
Population Bias Missing patient groups Few records from minority groups
  1. Example:
    • Case: An AI trained only on data from urban hospitals may not work well in rural areas with different healthcare practices.
  2. Clinical Analogy:
    • Detecting data bias is like considering patient-specific factors before prescribing treatment.