Homework 5 - Deep Learning for Medical Image Classification

Warning: this assignment is not yet released. Check back on March 18, 2026.

This assignment is due on Wednesday, March 25, 2026 before 11:59PM.

Get Started:

Accept the assignment on GitHub Classroom — You’ll get your own private repository with starter code and data
Clone your repo and complete the exercises in hw5_dl.py
Commit regularly as you work (this is part of your grade!)
Push your completed work to GitHub before the deadline

GPU Access: This assignment involves training neural networks. While the models are small enough to train on CPU, GPU access will make training significantly faster. See the Resources section for GPU options.

Learning Objectives

By completing this assignment, you will:

Build and train a CNN for medical image classification using PyTorch
Apply transfer learning from pretrained models
Implement data augmentation for medical images
Evaluate model performance with appropriate metrics
Visualize what neural networks learn (filters, activations, Grad-CAM)

Background

Deep learning has revolutionized medical image analysis. CNNs can now detect diabetic retinopathy, classify skin lesions, and identify lung nodules with expert-level performance. But building these models requires understanding both the deep learning fundamentals and the unique challenges of medical imaging.

In this assignment, you’ll classify chest X-rays using the MedMNIST dataset — a standardized collection designed for educational purposes. You’ll start with a simple CNN, then apply transfer learning, and finally interpret what your model has learned.

The Dataset

We’ll use PneumoniaMNIST from the MedMNIST collection:

Task: Binary classification (Normal vs Pneumonia)
Images: 5,856 chest X-ray images (28×28 grayscale)
Split: 4,708 train / 524 val / 624 test
Source: Derived from the Chest X-Ray Images (Pneumonia) dataset

The images are intentionally small (28×28) to enable fast training without GPUs while still teaching core concepts.

Class	Label	Count (Train)
Normal	0	~1,214
Pneumonia	1	~3,494

Note: The dataset is imbalanced (~75% pneumonia). Keep this in mind for evaluation!

Instructions

Part 1: Data Loading & Exploration (15 points)

1.1 Load the Dataset (5 pts)

Use the medmnist package to load PneumoniaMNIST
Create PyTorch DataLoaders for train, validation, and test sets
Use appropriate batch sizes (e.g., 64 or 128)

1.2 Visualize the Data (5 pts)

Display a grid of sample images from each class
Show the class distribution (bar plot)
Discuss: How might class imbalance affect training?

1.3 Data Augmentation (5 pts)

Implement data augmentation for the training set:
- Random horizontal flip
- Random rotation (±10 degrees)
- Random brightness/contrast adjustment
Show examples of augmented images
Why is augmentation important for medical imaging?

Part 2: Build a Simple CNN (25 points)

2.1 Define the Architecture (10 pts)

Build a CNN from scratch with:

2-3 convolutional layers with ReLU activation
Max pooling after each conv layer
1-2 fully connected layers
Appropriate output layer for binary classification

Document your architecture choices.

2.2 Training Loop (10 pts)

Implement a training loop that:

Uses Binary Cross Entropy loss (or CrossEntropyLoss)
Uses Adam optimizer
Tracks training and validation loss/accuracy per epoch
Implements early stopping based on validation loss

2.3 Training Curves (5 pts)

Plot training vs validation loss over epochs
Plot training vs validation accuracy over epochs
Is your model overfitting? How can you tell?

Part 3: Transfer Learning (25 points)

3.1 Load Pretrained Model (8 pts)

Use a pretrained model (e.g., ResNet18) and adapt it for our task:

Load pretrained weights (ImageNet)
Modify the first conv layer to accept 1-channel input (grayscale)
Replace the final layer for binary classification
Discuss: Why might ImageNet pretrained weights help for medical images?

3.2 Fine-Tuning Strategy (8 pts)

Implement fine-tuning:

First: Freeze all layers except the final classifier, train for a few epochs
Then: Unfreeze and train the full model with a lower learning rate
Compare this approach to training from scratch

3.3 Compare Models (9 pts)

Create a comparison table:

Model	Test Accuracy	Test AUC	Precision	Recall	F1
Simple CNN
ResNet18 (from scratch)
ResNet18 (pretrained)

Which model performs best? Why?

Part 4: Model Evaluation (20 points)

4.1 Confusion Matrix & Metrics (8 pts)

For your best model:

Display the confusion matrix on the test set
Calculate and report: Accuracy, Precision, Recall, F1, AUC
Given the clinical context (pneumonia detection), which metric matters most?

4.2 ROC and PR Curves (6 pts)

Plot the ROC curve with AUC
Plot the Precision-Recall curve with AP
Mark the operating point for different threshold choices

4.3 Error Analysis (6 pts)

Display examples of:
- True Positives (correctly identified pneumonia)
- True Negatives (correctly identified normal)
- False Positives (normal misclassified as pneumonia)
- False Negatives (pneumonia misclassified as normal)
Can you identify patterns in the errors?

Part 5: Model Interpretability (15 points)

5.1 Visualize Filters (5 pts)

Display the learned filters from the first convolutional layer
What patterns do these filters detect?

5.2 Grad-CAM Visualization (10 pts)

Implement Grad-CAM to visualize what the model focuses on:

Select 4-6 test images (mix of correct and incorrect predictions)
Generate Grad-CAM heatmaps
Overlay heatmaps on original images
Discuss: Is the model looking at clinically relevant regions?

Reflection Questions

Answer these in code comments:

Clinical Deployment: If this model were deployed for pneumonia screening, what threshold would you choose? What are the consequences of false positives vs false negatives?
Limitations: What are three limitations of using MedMNIST (28×28 images) compared to real clinical chest X-rays?
Next Steps: What would you do differently if you had access to full-resolution chest X-rays and a GPU cluster?

Submission via GitHub

Complete your work in hw5_dl.py
Save your figures to the outputs/ directory
Commit your changes with meaningful messages
Push to GitHub before the deadline

Deliverables

Your repository should contain:

hw5_dl.py — Completed code with comments
outputs/ — Generated figures (PNG files)
Clear commit history showing your progress

Grading Rubric

Component	Points
Part 1: Data Loading & Exploration	15
1.1 Load dataset	5
1.2 Visualize data	5
1.3 Data augmentation	5
Part 2: Simple CNN	25
2.1 Define architecture	10
2.2 Training loop	10
2.3 Training curves	5
Part 3: Transfer Learning	25
3.1 Load pretrained model	8
3.2 Fine-tuning strategy	8
3.3 Compare models	9
Part 4: Model Evaluation	20
4.1 Confusion matrix & metrics	8
4.2 ROC and PR curves	6
4.3 Error analysis	6
Part 5: Interpretability	15
5.1 Visualize filters	5
5.2 Grad-CAM	10
Subtotal	100
Git Workflow
Multiple meaningful commits	-5 if missing
Clear commit messages	-5 if missing

Resources

GPU Access Options

Google Colab (free): Limited GPU time, good for experimentation
Penn HPC (if available): Check with your department
Local GPU: If you have one

Documentation

Tips

Start simple: Get the basic CNN working before transfer learning
Monitor for overfitting: Use validation loss, not just accuracy
Class imbalance: Consider class weights or oversampling
Grad-CAM can be tricky: Use existing implementations (e.g., pytorch-grad-cam)
Commit often: After each part works
CPU is fine: MedMNIST is designed to be trainable without GPUs