The AI Institute for Gastroenterology

Module 3: Data — The Heart of AI in Medicine

Learning Objectives:

  • Understand different types of clinical data used in AI.
  • Learn key concepts like data collection, preprocessing, quality control, and bias detection.
  • Recognize the challenges of clinical data management.
  • Explore examples relevant to real-world medical AI projects.

Section 1 : What Is Data

  1. What Is Data in Medicine?
    • Data Definition: In AI, data refers to any information the model can process, including patient records, lab results, and medical images.
    • Why It Matters: Models learn patterns from data. Without data, there’s no learning.
  2. Types of Medical Data

TypeExampleUse Case in AI
Structured DataLab tests, vitals, demographicsPredictive models (cancer risk)
Unstructured DataClinical notes, imaging scansNLP models, image analysis
Time-Series DataHeart rate, blood glucose levelsMonitoring and alerts
Text DataPhysician notes, discharge summariesSummarization, clinical decision
Imaging DataEndoscopy, CT scans, pathology slidesDisease detection (e.g., cancer)
introduction-module3-1
  1. Real-World Clinical Example:
    • Scenario: An AI system predicting pneumonia uses:
      • Structured Data: Patient’s temperature, oxygen level.
      • Unstructured Data: Endoscopy images and videos
      • Time-Series Data: Pulse oximetry from wearables.
  2. Clinical Analogy:
    • Think of data as “patient history.” Just as doctors build knowledge by seeing many patients, AI models learn from diverse datasets.
introduction-module3-2

Section 2 : Data Preprocessing

  1. Why Preprocess Data?
    • Raw Data Problems:
      • Incomplete records (e.g., missing lab values).
      • Inconsistent formats (e.g., different units like mg/dL vs. mmol/L).
      • Errors like typos or incorrect data entries.
  2. Data Preprocessing Steps:
StepWhat It MeansExample
Data CleaningRemoving incorrect dataCorrecting typos in diagnoses
Data StandardizationConverting to standard formatsStandardizing drug doses
Data NormalizationScaling numeric valuesRescaling lab test values
Data ImputationFilling in missing dataEstimating missing BP readings
Data TransformationEncoding text into numbersConverting disease labels to IDs
  1. Example:
    • Preprocessing for AI in Colonoscopy:
      • Ensure all colonoscopy images have the same resolution.
      • Normalize pixel values to standardize brightness.
  2. Clinical Analogy:
    • Preprocessing is like organizing patient charts before rounds: everything needs to be readable, complete, and up to date.

Section 3: Data Quality Check

  1. Why Data Quality Matters:
    • Poor data = bad predictions.
    • In healthcare, incorrect predictions can harm patients.
  2. Key Data Quality Dimensions:
DimensionDefinitionExample
CompletenessNo missing dataNo missing lab results
AccuracyCorrect and reliable dataCorrect drug dosage recorded
ConsistencyUniform data formattingConsistent diagnosis codes
TimelinessData available when neededLatest vitals during monitoring
IntegrityNo corruption or tamperingSecure patient records
  1. Example:
    • Case: A cancer prediction AI using incomplete records may miss critical lab test results, leading to incorrect predictions.
  2. Clinical Analogy:
    • Data quality is like ensuring lab tests are accurate before making a clinical decision.

Section 4 : Data Bias Detection

  1. Why Detect Bias?
    • Medical AI must work equally well for all populations.
    • Bias leads to incorrect or unfair predictions, impacting patient care.
  2. Common Types of Bias in Clinical Data:

Type of Bias

What It Means

Example

Selection Bias

Unrepresentative patient samples

Data from only one hospital

Measurement Bias

Inconsistent test methods

Different MRI machines used

Historical Bias

Learning outdated practices

Use of old treatment protocols

Confirmation Bias

Model favors specific assumptions

Model assuming gender-specific diseases

Population Bias

Missing patient groups

Few records from minority groups

  1. Example:
    • Case: An AI trained only on data from urban hospitals may not work well in rural areas with different healthcare practices.
  2. Clinical Analogy:
    • Detecting data bias is like considering patient-specific factors before prescribing treatment.