Module 6: From ML to Deep Learning

Step 1: A Neuron IS Logistic Regression

You already know logistic regression: multiply inputs by weights, add bias, apply sigmoid. That's a neuron.

Adjust the weights:

x1 (pixel intensity): 0.8 w1 (weight): 0.7 x2 (pixel intensity): 0.3 w2 (weight): -0.5 b (bias): 0.1

z = (0.7 x 0.8) + (-0.5 x 0.3) + 0.1 = 0.51
y = sigma(z) = 0.625

The neuron:

This diagram IS the equation on the left. Same math, different picture.

A neuron is just logistic regression with a different name.

Step 2: What One Neuron Can't Do

A single neuron draws one straight line through the data. If the boundary between classes isn't straight, it fails. Add more neurons and the boundary bends.

Choose a problem:

Neurons in hidden layer: 1 Hidden layers: 1

1 neuron, 1 layer: Can only draw a single straight line.
This IS logistic regression. Same thing.

Accuracy: --

Blue = Class A (Normal) Orange = Class B (Pneumonia)
Background color = what the network predicts at each point.

One neuron = one line. Not enough for real problems.

Step 3: Why Activation Functions?

Without an activation function, a neuron is just a straight line. Stack 100 straight lines and you still get... a straight line. Activations add the curves.

INPUTS

x1, x2, ...

->

WEIGHTS x INPUTS + BIAS

z = Sum(wi*xi) + b

->

ACTIVATION

sigma(z)

->

OUTPUT

y-hat

The activation function sits right here -- between the weighted sum and the output

Choose an activation:

Sigmoid: sigma(z) = 1 / (1 + e^-z)
Squishes any number into (0, 1) -- like a probability

Output range: (0, 1)
When to use: Output layer for binary classification

Why it matters -- stacking layers:

Each curve below is a different neuron's output. Watch what happens when you turn activation off:

With activation: each layer can bend the function into new shapes.
2 layers can approximate any curve. That's the universal approximation theorem.

Input value (z): 0.0

f(z) = 0.500

Linear + Linear = Linear. Linear + Activation + Linear + Activation = Can learn anything.

Step 4: Stack Layers -> "Deep" Learning

Each layer detects patterns in the previous layer's output. Simple to Complex.

Number of layers: 1 Neurons per layer: 4

Layer 1: Detects edges, gradients, simple textures
-> "Is this pixel brighter than its neighbor?"

Total parameters: --
"Deep" = more layers = more abstract features

In medical imaging:
pixels -> edges -> textures -> anatomy -> pathology

Step 5: Images Are Special

A fully-connected layer treats every pixel independently. But images have spatial structure -- nearby pixels matter more than distant ones. Let's see why that's a problem.

Demo 1: Shuffle the pixels

A fully-connected network can't tell the difference between these two images. It just sees a list of numbers.

Original

Pixels shuffled

An FC layer treats both the same -- it doesn't know about spatial arrangement.

Demo 2: Neighbors matter

Highlight a pixel and see its 3x3 neighborhood. Edges, textures, anatomy -- all defined by LOCAL patterns.

Hover over the image to see a pixel's neighborhood.

Why fully-connected fails for images:

(a) Too many parameters

A 224x224 image with 128 neurons = 6.4 million weights in ONE layer. Most are wasted.

(b) Ignores spatial structure

Pixel [0,0] connects to the same neuron as pixel [223,223]. The network can't know they're far apart.

(c) No translation invariance

A tumor in the top-left activates totally different weights than the same tumor in the bottom-right.

Images have structure. Fully-connected layers throw it away.

Step 6: The Fix -- Convolution

Convolution solves all three problems from Step 5. Instead of connecting every pixel to every neuron, use a small sliding window.

Local connections -> Respects spatial structure
Each neuron only sees a 3x3 neighborhood. Nearby pixels, not distant ones.

Weight sharing -> Translation invariance
Same kernel everywhere. A tumor looks the same wherever it appears.

Fewer parameters -> Efficient learning
9 weights per kernel instead of millions. Less data needed to train.

See the difference:

Image size: 28x28 Number of neurons/filters in first layer: 32

25,088

Fully Connected: every pixel -> every neuron

288

Conv 3x3: 9 weights x neurons (shared!)

Drag the image size slider to see the fully-connected count explode while convolution stays flat.

Step 7: Kernel Explorer

A convolution kernel is a small grid of weights that slides across the image. Different weights detect different patterns.

Choose a kernel:

Kernel values:

These 9 numbers ARE the "weights" the CNN learns. Different weights = different features detected.

Or build your own:

Original -> Filtered:

A CNN learns these kernels automatically from data!

Step 8: Pooling -- Shrink but Remember

After convolution, we downsample with pooling. Keep the strongest activations, discard the rest. The image shrinks but the features get richer.

Input grid (random values):

Pool size: 2x2

After max pooling:

Max pooling: In each region, keep only the maximum value.
The dimmed values are discarded. The green values survive.

Size progression in a real CNN:

Each pooling layer halves the spatial dimensions. Features get compressed but more meaningful.

Keeps the strongest activations. Each layer, the image shrinks but the features get richer.

Step 9: The Full CNN Pipeline

Now put it all together. Click each stage to see what it does and how the data shape changes.

Click any stage above to learn what it does.

pixels -> edges -> textures -> shapes -> anatomy -> diagnosis.
The network builds this hierarchy automatically.