Get Started:
hw4_consultant.py (code) and memos/ (written deliverables)This assignment is different. You are not following a recipe. You are playing the role of a consultant — you decide what to explore, which metrics matter, and what to recommend. There are traps in the data that you need to catch. Read carefully, think critically, and write clearly.
By completing this assignment, you will:
You are an AI consultant hired by Mercy Community Hospital, a mid-sized hospital serving a diverse patient population. Their Chief Medical Officer (CMO) wants to know: can we use machine learning to screen for diabetes in our primary care clinics?
The hospital has given you:
data/mercy_hospital_patients.csv) — ~1,200 de-identified patients from their primary care clinicsdata/colleague_predictions.csv) — Another consultant already built a model. The CMO wants your opinion on whether it’s ready to deploy.data/colleague_feature_importance.csv) — What their model relies onYour job is to audit the data, build your own model, evaluate the colleague’s work, and make a deployment recommendation. The CMO is not technical — your written memos need to be clear and honest about what ML can and cannot do here.
data/mercy_hospital_patients.csv contains ~1,200 patients with the following features:
| Feature | Description |
|---|---|
patient_id |
Unique identifier |
age |
Age in years |
sex |
Patient sex (M/F) |
bmi |
Body mass index |
systolic_bp |
Systolic blood pressure (mmHg) |
diastolic_bp |
Diastolic blood pressure (mmHg) |
fasting_glucose |
Fasting plasma glucose (mg/dL) |
total_cholesterol |
Total cholesterol (mg/dL) |
hdl_cholesterol |
HDL cholesterol (mg/dL) |
smoking_status |
Current / Former / Never |
physical_activity |
Low / Moderate / High |
num_office_visits |
Office visits in last 12 months |
family_history_diabetes |
Yes / No |
insurance_type |
Private / Medicare / Medicaid / Uninsured |
hba1c |
Hemoglobin A1c (%) |
metformin_prescribed |
Whether metformin is currently prescribed (Yes/No) |
diabetes_diagnosis |
Target variable (1 = diabetes, 0 = no diabetes) |
Not all of these features should be used as predictors. Part of your job is to figure out which ones are appropriate. If you train a model on all columns blindly, you will get a very wrong answer.
Your repository should contain:
| File | Contents |
|---|---|
hw4_consultant.py |
All code (EDA, modeling, evaluation, peer review) |
outputs/ |
Generated figures (PNG files) |
memos/phase1_data_audit.md |
Phase 1 written memo |
memos/phase3_threshold.md |
Phase 3 written memo |
memos/phase4_peer_review.md |
Phase 4 written memo |
memos/phase5_deployment.md |
Phase 5 deployment recommendation |
Before you touch a model, you need to understand the data. A responsible consultant doesn’t start training until they’ve audited what they’re working with.
Explore the dataset. At minimum:
You decide what to plot and what to compute. There is no checklist.
Write a memo in memos/phase1_data_audit.md (minimum 300 words) documenting your data quality findings before any modeling. Address at least three concerns you found in the data.
Your memo will be graded on what you catch:
| Finding | Points |
|---|---|
| Identify the HbA1c leakage problem and explain why it’s leakage | 4 |
| Identify the metformin leakage problem and explain why it’s leakage | 3 |
| Identify informative missingness or another valid data quality concern | 3 |
| Clear explanation of why each issue matters for modeling | 2 |
Hint: Think about the causal relationship between each feature and the diagnosis. Which features could you actually measure before you know the outcome?
Now build your own model(s) using a cleaned version of the data — with leakage features removed and missingness handled appropriately.
This is open-ended. You choose which metrics to compute and which plots to generate. We’re looking for evidence that you:
In a comment block or markdown cell in your code file, briefly explain (3-5 sentences): Why did you choose these specific evaluation metrics? What makes them appropriate for a diabetes screening task at a community hospital?
The CMO tells you:
“We can’t have more than 1 in 10 patients flagged as high-risk turning out to be healthy — our follow-up clinic is already overwhelmed.”
Write a memo in memos/phase3_threshold.md addressing:
Translation (3 pts): What does the CMO’s constraint mean in ML terms? Which metric does “1 in 10 flagged patients turning out to be healthy” correspond to? State the constraint precisely.
Recommendation (4 pts): What threshold do you recommend? What sensitivity (recall) do you achieve at that threshold? Show your work.
Clinical Tradeoff (5 pts): At your recommended threshold, how many diabetic patients will your model miss? Is this tradeoff clinically acceptable? What happens to the patients your model misses — are they harmed, or will they be caught through other means? Write as if you’re explaining this to the CMO.
Your colleague already built a diabetes prediction model. The CMO is excited because it has an AUC of 0.93. Your job is to determine whether it’s actually ready for deployment.
You have:
data/colleague_predictions.csv — Contains patient_id, true_label, predicted_probability, age, sexdata/colleague_feature_importance.csv — The features their model relies on mostWrite a memo in memos/phase4_peer_review.md (minimum 400 words). This is a professional peer review — be specific, cite numbers, and explain clinical implications.
| Finding | Points |
|---|---|
| Identify the calibration problem and explain what it means for patients | 5 |
| Identify the subgroup performance failure and explain who it harms | 5 |
| Final recommendation (deploy / fix / reject) with clear reasoning | 5 |
Hint: A model can have a great AUC and still be dangerous. Check whether predicted probabilities match observed rates, and whether the model works equally well for all patient groups.
Write a one-page memo in memos/phase5_deployment.md to the CMO recommending whether Mercy Community Hospital should deploy your model (not the colleague’s) for diabetes screening.
This memo should synthesize everything you’ve learned across the assignment. Address:
| Component | Points |
|---|---|
| Expected performance and honest limitations of your model | 5 |
| Fairness considerations — which patient groups need monitoring? What disparities did you observe or suspect? | 5 |
| What would need to be true before going live? (validation plan, monitoring strategy, failure modes) | 5 |
This is not a summary of Phase 2 results. We’re looking for evidence that you’ve thought about what it means to put a model into clinical practice — the gap between “works on test data” and “safe to use on patients.”
hw4_consultant.pymemos/ directory (Markdown format)outputs/ directory| Component | Code | Written | Total |
|---|---|---|---|
| Phase 1: Data Audit | 8 | 12 | 20 |
| EDA and exploration | 8 | ||
| Audit memo (leakage, missingness, quality) | 12 | ||
| Phase 2: Model Development | 20 | 5 | 25 |
| Two classifiers on cleaned data | 20 | ||
| Metric justification | 5 | ||
| Phase 3: Clinical Constraints | 8 | 12 | 20 |
| Threshold analysis code and visualization | 8 | ||
| Threshold memo (translation, recommendation, tradeoff) | 12 | ||
| Phase 4: Peer Review | 5 | 15 | 20 |
| Reproduce AUC, calibration plot, subgroup analysis | 5 | ||
| Peer review memo | 15 | ||
| Phase 5: Deployment Recommendation | — | 15 | 15 |
| Deployment memo | 15 | ||
| Subtotal | 41 | 59 | 100 |
| Git Workflow | |||
| Multiple meaningful commits | -5 if missing | ||
| Clear commit messages | -5 if missing |