Skip to main content
Warning: this assignment is not yet released. Check back on April 1, 2026.
This assignment is due on Wednesday, April 15, 2026 before 11:59PM.

Get Started:

  1. Accept the assignment on GitHub Classroom — You’ll get your own private repository with starter code
  2. Clone your repo and complete the exercises in hw6_nlp.py
  3. Commit regularly as you work (this is part of your grade!)
  4. Push your completed work to GitHub before the deadline

LLM Options: Part 3 requires access to a large language model. You have several options:

See the Resources section for setup instructions.


Learning Objectives

By completing this assignment, you will:


Background

Clinical notes contain rich information that’s often inaccessible to traditional analysis. A radiology report might say “no evidence of acute cardiopulmonary process” — a human understands this means the lungs and heart look fine, but extracting that meaning computationally is challenging.

This assignment covers the NLP pipeline from traditional techniques to modern LLMs:


The Data

We’ll use synthetic clinical notes designed to mimic real discharge summaries without containing actual patient information. The notes include:


Instructions

Part 1: Text Preprocessing & Traditional NLP (20 points)

1.1 Text Preprocessing (8 pts)

Clinical text is messy. Implement preprocessing steps:

Compare your preprocessed output to the raw text.

1.2 TF-IDF Analysis (7 pts)

Using a collection of clinical notes:

1.3 Text Classification Baseline (5 pts)

Build a simple text classifier using TF-IDF + Logistic Regression:


Part 2: Medical Entity Extraction (25 points)

2.1 Setup scispaCy (5 pts)

Install and configure scispaCy with a clinical model:

pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

Load the model and process a sample clinical note.

2.2 Named Entity Recognition (10 pts)

Extract medical entities from clinical notes:

Visualize the extracted entities (displacy or custom visualization).

2.3 UMLS Linking (10 pts)

Link extracted entities to UMLS concepts:


Part 3: LLM Prompting for Clinical Tasks (30 points)

3.1 Zero-Shot Extraction (10 pts)

Use an LLM (GPT-4 or similar) to extract structured information:

prompt = """
Extract the following from this clinical note:
- Chief Complaint
- Medications (list)
- Allergies
- Primary Diagnosis

Clinical Note:
{note_text}

Return as JSON.
"""

Compare LLM extraction to scispaCy results. Which is more accurate?

3.2 Clinical Summarization (10 pts)

Prompt the LLM to generate:

Evaluate the summaries for:

3.3 Few-Shot Learning (10 pts)

Compare zero-shot vs few-shot prompting:

Document your prompt engineering process.


Part 4: LLM Evaluation & Safety (25 points)

4.1 Hallucination Detection (10 pts)

LLMs can “hallucinate” — generate plausible but incorrect information. This is especially dangerous in clinical settings.

Design tests to detect hallucinations:

Report your hallucination rate.

4.2 Comparison Evaluation (8 pts)

For a set of clinical notes, compare:

Method Accuracy Completeness Time Cost
scispaCy       Free
GPT-3.5       ~$0.002/note
GPT-4       ~$0.03/note

When would you use each approach?

4.3 Clinical Safety Analysis (7 pts)

Analyze the failure modes of LLM-based extraction:


Reflection Questions

Answer these in code comments:

  1. Deployment: Would you trust an LLM to extract medication lists for a clinical decision support system? What safeguards would you require?

  2. Cost-Benefit: Given the cost and accuracy tradeoffs, when does using GPT-4 make sense vs. traditional NLP?

  3. Future Directions: How might clinical NLP change in the next 5 years? What will still require human review?


Submission via GitHub

  1. Complete your work in hw6_nlp.py
  2. Save your outputs to the outputs/ directory
  3. Commit your changes with meaningful messages
  4. Push to GitHub before the deadline

Important: Do NOT commit your API keys! Use environment variables.

Deliverables

Your repository should contain:


Grading Rubric

Component Points
Part 1: Traditional NLP 20
1.1 Text preprocessing 8
1.2 TF-IDF analysis 7
1.3 Text classification baseline 5
Part 2: Medical Entity Extraction 25
2.1 Setup scispaCy 5
2.2 Named entity recognition 10
2.3 UMLS linking 10
Part 3: LLM Prompting 30
3.1 Zero-shot extraction 10
3.2 Clinical summarization 10
3.3 Few-shot learning 10
Part 4: Evaluation & Safety 25
4.1 Hallucination detection 10
4.2 Comparison evaluation 8
4.3 Clinical safety analysis 7
Subtotal 100
Git Workflow  
Multiple meaningful commits -5 if missing
Clear commit messages -5 if missing

Resources

LLM Access Options

Option 1: Ollama + Llama 3 (Recommended — Free)

Run models locally on your laptop. Works great for this assignment.

# Install Ollama (macOS)
brew install ollama

# Or download from https://ollama.ai

# Pull Llama 3 (8B model, ~4GB)
ollama pull llama3

# Run it
ollama run llama3

In Python:

import requests

def query_ollama(prompt, model="llama3"):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": model, "prompt": prompt, "stream": False}
    )
    return response.json()["response"]

Option 2: Course Azure Credits

Limited credits available for GPT-4 access. Request via Canvas assignment. Use sparingly — these are shared resources.

Option 3: OpenAI API

If you prefer commercial API: platform.openai.com

Documentation

Papers


Tips