Get Started:
hw6_nlp.pyLLM Options: Part 3 requires access to a large language model. You have several options:
See the Resources section for setup instructions.
By completing this assignment, you will:
Clinical notes contain rich information that’s often inaccessible to traditional analysis. A radiology report might say “no evidence of acute cardiopulmonary process” — a human understands this means the lungs and heart look fine, but extracting that meaning computationally is challenging.
This assignment covers the NLP pipeline from traditional techniques to modern LLMs:
We’ll use synthetic clinical notes designed to mimic real discharge summaries without containing actual patient information. The notes include:
1.1 Text Preprocessing (8 pts)
Clinical text is messy. Implement preprocessing steps:
Compare your preprocessed output to the raw text.
1.2 TF-IDF Analysis (7 pts)
Using a collection of clinical notes:
1.3 Text Classification Baseline (5 pts)
Build a simple text classifier using TF-IDF + Logistic Regression:
2.1 Setup scispaCy (5 pts)
Install and configure scispaCy with a clinical model:
pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz
Load the model and process a sample clinical note.
2.2 Named Entity Recognition (10 pts)
Extract medical entities from clinical notes:
Visualize the extracted entities (displacy or custom visualization).
2.3 UMLS Linking (10 pts)
Link extracted entities to UMLS concepts:
3.1 Zero-Shot Extraction (10 pts)
Use an LLM (GPT-4 or similar) to extract structured information:
prompt = """
Extract the following from this clinical note:
- Chief Complaint
- Medications (list)
- Allergies
- Primary Diagnosis
Clinical Note:
{note_text}
Return as JSON.
"""
Compare LLM extraction to scispaCy results. Which is more accurate?
3.2 Clinical Summarization (10 pts)
Prompt the LLM to generate:
Evaluate the summaries for:
3.3 Few-Shot Learning (10 pts)
Compare zero-shot vs few-shot prompting:
Document your prompt engineering process.
4.1 Hallucination Detection (10 pts)
LLMs can “hallucinate” — generate plausible but incorrect information. This is especially dangerous in clinical settings.
Design tests to detect hallucinations:
Report your hallucination rate.
4.2 Comparison Evaluation (8 pts)
For a set of clinical notes, compare:
| Method | Accuracy | Completeness | Time | Cost |
|---|---|---|---|---|
| scispaCy | Free | |||
| GPT-3.5 | ~$0.002/note | |||
| GPT-4 | ~$0.03/note |
When would you use each approach?
4.3 Clinical Safety Analysis (7 pts)
Analyze the failure modes of LLM-based extraction:
Answer these in code comments:
Deployment: Would you trust an LLM to extract medication lists for a clinical decision support system? What safeguards would you require?
Cost-Benefit: Given the cost and accuracy tradeoffs, when does using GPT-4 make sense vs. traditional NLP?
Future Directions: How might clinical NLP change in the next 5 years? What will still require human review?
hw6_nlp.pyoutputs/ directoryImportant: Do NOT commit your API keys! Use environment variables.
Your repository should contain:
hw6_nlp.py — Completed code with commentsoutputs/ — Generated outputs and visualizations| Component | Points |
|---|---|
| Part 1: Traditional NLP | 20 |
| 1.1 Text preprocessing | 8 |
| 1.2 TF-IDF analysis | 7 |
| 1.3 Text classification baseline | 5 |
| Part 2: Medical Entity Extraction | 25 |
| 2.1 Setup scispaCy | 5 |
| 2.2 Named entity recognition | 10 |
| 2.3 UMLS linking | 10 |
| Part 3: LLM Prompting | 30 |
| 3.1 Zero-shot extraction | 10 |
| 3.2 Clinical summarization | 10 |
| 3.3 Few-shot learning | 10 |
| Part 4: Evaluation & Safety | 25 |
| 4.1 Hallucination detection | 10 |
| 4.2 Comparison evaluation | 8 |
| 4.3 Clinical safety analysis | 7 |
| Subtotal | 100 |
| Git Workflow | |
| Multiple meaningful commits | -5 if missing |
| Clear commit messages | -5 if missing |
Option 1: Ollama + Llama 3 (Recommended — Free)
Run models locally on your laptop. Works great for this assignment.
# Install Ollama (macOS)
brew install ollama
# Or download from https://ollama.ai
# Pull Llama 3 (8B model, ~4GB)
ollama pull llama3
# Run it
ollama run llama3
In Python:
import requests
def query_ollama(prompt, model="llama3"):
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": model, "prompt": prompt, "stream": False}
)
return response.json()["response"]
Option 2: Course Azure Credits
Limited credits available for GPT-4 access. Request via Canvas assignment. Use sparingly — these are shared resources.
Option 3: OpenAI API
If you prefer commercial API: platform.openai.com