Module 2: Basics of Machine Learning

Machine learning (ML) is a subset of artificial intelligence (AI) that enables machines to learn patterns and make predictions from data without being explicitly programmed. It is a powerful tool used across various fields, including healthcare, to solve complex problems. In this module we simplify the terms for easier understanding.

Driving Forces Behind ML

Three major advancements have propelled ML:

Algorithms: Innovative techniques like neural networks and decision trees.
Data: Growth in data availability due to the digital revolution.
Computation: Enhanced hardware, especially GPUs, supports computationally intensive tasks.

1. Key Concepts in ML

Definition and Purpose

Machine learning is the process where computers analyze data, identify patterns, and make predictions or decisions based on insights.
Unlike traditional programming, ML algorithms derive rules from data rather than being explicitly coded.

How It Works

Learning from Data: Machines are trained using large datasets that represent the problem they are solving.
Prediction: Once trained, the machine predicts outcomes for new, unseen data.
Improvement: Through repeated feedback, the machine adjusts to improve its performance.

This diagram illustrates the progression from human judgment to rules-based systems and finally to machine learning (ML) as tasks become increasingly complex.

Human Judgment: At the core, humans rely on experience and intuition to make decisions, which works well for simple, low-complexity tasks.
Rules-Based Systems: As complexity grows, predefined rules encoded in software handle structured data. These systems are limited by their inability to adapt to new, unforeseen situations.
Machine Learning: For tasks with high data complexity (e.g., large datasets) and rules complexity (e.g., intricate relationships in data), ML is the optimal solution. ML adapts and learns patterns directly from data, surpassing the limitations of static rules.

This hierarchical view highlights ML’s unique capability to handle the nuanced challenges often encountered in fields like gastroenterology, where vast, complex data can inform diagnoses and decision-making.

2. Data: The Foundation of ML

Types of Data

Structured Data: Organized data in tables (e.g., patient demographics, lab results).
Unstructured Data: Data like images, videos, and text (e.g., endoscopy images).
Semi-structured Data: A mix of structured and unstructured data, such as JSON files.

Role of Data

Data serves as the machine’s “experience.”
Quality and volume of data significantly impact the accuracy of ML models.

As we progress more in our modules, we will discuss the data setup process. As the subtitle suggests, its ‘the foundation of ML’, so adequate care is needed at this stage. If this process is not done right, it may lead to biases in the algorithm. More about that in our future modules.

3. Algorithms: The Heart of ML

Types of ML Algorithms

Supervised Learning: The machine learns from labeled data where outcomes are known.
- Example: Predicting patient outcomes based on medical history.
Unsupervised Learning: The machine identifies patterns in data without pre-existing labels.
- Example: Clustering patients based on symptom similarity.
Reinforcement Learning: The machine learns by trial and error to maximize rewards.
- Example: Navigating surgical tools in simulations.

Common Algorithms

Linear Regression: Identifies linear relationships between variables.
Decision Trees: Splits data into branches based on decision rules.
Neural Networks: Mimics the human brain’s neurons to identify complex patterns.

4. Training and Testing: The Learning Process

Steps of Model Development

Training:
- The model learns from a training dataset.
- Adjusts its parameters (weights and biases) to minimize errors.
Validation:
- Ensures the model generalizes well to new data by using a validation dataset.
Testing:
- Evaluates the model’s accuracy and performance on unseen data.

Metrics for Evaluation

Accuracy, precision, recall, and mean squared error (MSE) are commonly used metrics to assess the model’s effectiveness.

You can read about the importance of different metrics to a specific question in this paper.

5. Key ML Concepts

Feedback Loops

ML models operate on a cycle: Predict → Measure → Adjust.
Algorithms fine-tune predictions by analyzing errors and making corrections.

Gradient Descent

A mathematical technique used to optimize the model by finding the best set of parameters.
It adjusts weights based on the gradient of the loss function, ensuring the model learns effectively.

Activation Functions

Functions like ReLU (Rectified Linear Unit) introduce non-linearity, enabling models to handle complex data relationships.

Introduction Module 2