The foundation of everything. Use MediaPipe Hands — it gives you 21 3D landmarks per hand at 30fps on a laptop. It's the most reliable free option available and doesn't require a GPU.
// CORE LANDMARK REFERENCE
WRIST = 0
THUMB_TIP = 4
INDEX_TIP = 8
INDEX_MCP = 5
MIDDLE_TIP = 12
RING_TIP = 16
PINKY_TIP = 20
def get_finger_pos(landmark, frame_w, frame_h):
return int(landmark.x * frame_w), int(landmark.y * frame_h)
GESTURE DETECTION SYSTEM
Don't use ML models for gesture detection. Use geometric logic on landmark positions — it's instant, reliable, and transparent to debug.
☝️
DRAW MODE
Index finger extended only
Check: INDEX_TIP.y < INDEX_MCP.y AND all other finger tips below their MCP joints. Track index tip position.
✊
ERASE
All fingers curled (fist)
Check: all 4 fingertips have y-position GREATER than their PIP joints. Bounding box of fist acts as eraser radius.
🖐️
MENU / PAUSE
Open palm — all 5 extended
Check: all 5 fingertips above their MCP knuckles. Hold for 0.4s to confirm (prevents accidental trigger).
🤏
PINCH / SELECT
Thumb + Index close together
Distance between THUMB_TIP and INDEX_TIP < threshold (~30px). Used for menu item selection and confirm actions.
✌️
MOVE MODE
Index + Middle extended
Both index and middle tips above MCP, ring and pinky curled. Use midpoint between two fingers as cursor — prevents accidental drawing.
👍
UNDO / CONFIRM
Thumbs up gesture
Thumb tip above wrist, all fingers curled. Hold for 0.5s. Assign to undo last stroke — easy to trigger intentionally.
// GESTURE DETECTOR CLASS SKELETON
class GestureDetector:
def __init__(self):
self.gesture_history = []
self.hold_start = None
self.HOLD_THRESHOLD = 0.4
def fingers_up(self, lm) -> list:
tips = [4, 8, 12, 16, 20]
joints = [3, 6, 10, 14, 18]
return [1 if lm[tips[i]].y < lm[joints[i]].y else 0
for i in range(5)]
def detect(self, lm) -> str:
f = self.fingers_up(lm)
if f == [0,1,0,0,0]: return "DRAW"
elif f == [0,0,0,0,0]: return "ERASE"
elif f == [1,1,1,1,1]: return "MENU"
elif f == [0,1,1,0,0]: return "MOVE"
return "NONE"
Critical stability tip: Apply a gesture debounce buffer — only register a gesture as "confirmed" if it appears in 5 consecutive frames. This alone eliminates 90% of false triggers.
The drawing engine is where visual quality is won or lost. Raw fingertip coordinates are jittery. You need two layers of smoothing: positional (where the cursor is) and stroke (how lines are drawn).
// EXPONENTIAL SMOOTHING (Position)
α = 0.5
sx, sy = 0, 0
def smooth_pos(raw_x, raw_y):
global sx, sy
sx = α * raw_x + (1 - α) * sx
sy = α * raw_y + (1 - α) * sy
return int(sx), int(sy)
// CATMULL-ROM SPLINE (Stroke Quality)
from scipy.interpolate import splprep, splev
def smooth_stroke(points):
if len(points) < 4: return points
pts = np.array(points).T
tck, _ = splprep(pts, s=10, k=3)
t_new = np.linspace(0, 1, 100)
return list(zip(*splev(t_new, tck)))
CANVAS ARCHITECTURE
Use a dual-layer canvas system. This is how real drawing apps work internally.
Layer 1
PERSISTENT CANVAS
A NumPy array (same size as webcam frame) that stores all completed strokes. Black background by default. Blended onto the camera feed at ~70% opacity for a ghosted, futuristic look.
Layer 2
LIVE STROKE BUFFER
A temporary array for the current in-progress stroke. When the user lifts their finger (gesture changes from DRAW), this buffer is composited into Layer 1 and cleared. Enables undo at the stroke level.
Layer 3
UI OVERLAY
A transparent RGBA layer for cursor, gesture feedback, menus, and HUD elements. Rendered on top of everything last. Never pollutes the drawing canvas.
Merge Step
FRAME COMPOSITOR
Each frame: flip webcam → blend Layer 1 → render live stroke → apply UI layer → display. Keep this in a single function. Order matters — never mix layers.
// CANVAS MANAGER SKELETON
class CanvasManager:
def __init__(self, w, h):
self.canvas = np.zeros((h, w, 3), dtype=np.uint8)
self.stroke = []
self.history = []
self.color = (0, 229, 255)
self.thickness = 3
def add_point(self, pt):
self.stroke.append(pt)
if len(self.stroke) >= 2:
cv2.line(self.canvas, self.stroke[-2],
self.stroke[-1], self.color, self.thickness)
def commit_stroke(self):
self.history.append(self.stroke.copy())
self.stroke.clear()
def undo(self):
if not self.history: return
self.history.pop()
self.canvas = np.zeros_like(self.canvas)
for stroke in self.history:
for i in range(1, len(stroke)):
cv2.line(self.canvas, stroke[i-1], stroke[i],
self.color, self.thickness)
def composite(self, frame, alpha=0.7):
mask = self.canvas > 0
frame[mask] = cv2.addWeighted(
frame, 1 - alpha, self.canvas, alpha, 0)[mask]
return frame
Air writing has one hard problem: segmentation (knowing when a letter starts and ends). Solve this first, then recognition becomes simple.
RECOMMENDED APPROACH: HOG + SVM
Skip CNNs for now. A Histogram of Oriented Gradients + SVM pipeline achieves 85–92% accuracy on air-drawn letters, trains in seconds, and runs at full speed on CPU. You can always upgrade to CNN later.
Step 1
SEGMENTATION
Use a pause detector: if the index finger hasn't moved more than 15px for 0.8 seconds, treat that as "letter done." Show a countdown ring in the UI as visual feedback. Reset the stroke buffer after recognition fires.
Step 2
NORMALIZATION
Take the bounding box of the drawn stroke. Resize it to a fixed 64×64 canvas. Center it. This removes position and scale variance — your model doesn't care where on screen you drew, only the shape.
Step 3
FEATURE EXTRACTION
Extract HOG features from the 64×64 image. HOG captures directional strokes perfectly — it's literally built for this. Results in a ~1764-dimensional feature vector per letter.
Step 4
CLASSIFICATION
Train an SVM (rbf kernel) on your own air-drawn samples. Collect 15–20 samples per letter (A–Z + digits). Takes ~5 minutes of collection, ~2 seconds to train. Serialize with joblib.
// LETTER RECOGNIZER PIPELINE
from skimage.feature import hog
from sklearn.svm import SVC
import joblib
class LetterRecognizer:
def __init__(self):
self.model = None
self.IMG_SIZE = (64, 64)
self.pause_t = 0.8
self.min_dist = 15
def preprocess(self, stroke_points) -> np.ndarray:
tmp = np.zeros((200, 200), dtype=np.uint8)
for i in range(1, len(stroke_points)):
cv2.line(tmp, stroke_points[i-1], stroke_points[i], 255, 3)
x,y,w,h = cv2.boundingRect(cv2.findNonZero(tmp))
cropped = tmp[y:y+h, x:x+w]
resized = cv2.resize(cropped, self.IMG_SIZE)
features = hog(resized, orientations=9,
pixels_per_cell=(8,8),
cells_per_block=(2,2))
return features
def predict(self, stroke_points) -> str:
feat = self.preprocess(stroke_points)
return self.model.predict([feat])[0]
def train(self, X, y):
self.model = SVC(kernel='rbf', C=10, probability=True)
self.model.fit(X, y)
joblib.dump(self.model, 'models/letter_svm.pkl')
Alternative shortcut: If you don't want to train a model at all, use template matching — stroke direction sequence (up/down/left/right/diagonal) encoded as a string, matched against a dictionary. 70% accuracy, zero training. Good enough for MVP demo.
The best visual trick in this whole project: user draws a rough shape, system snaps it to a perfect geometric form. This alone makes it feel like a real product. Use OpenCV contour analysis — no ML needed.
// SHAPE DETECTION PIPELINE
class ShapeDetector:
def detect_and_snap(self, canvas, stroke_pts) -> str:
tmp = np.zeros_like(canvas[:,:,0])
for i in range(1, len(stroke_pts)):
cv2.line(tmp, stroke_pts[i-1], stroke_pts[i], 255, 3)
contours, _ = cv2.findContours(
tmp, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if not contours: return "NONE"
c = max(contours, key=cv2.contourArea)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.04 * peri, True)
verts = len(approx)
if verts == 2:
return self.snap_line(canvas, approx)
elif verts == 4:
return self.snap_rect(canvas, approx)
elif verts > 8:
return self.snap_circle(canvas, c)
elif verts == 3:
return self.snap_triangle(canvas, approx)
def snap_circle(self, canvas, contour):
(x, y), r = cv2.minEnclosingCircle(contour)
cv2.circle(canvas, (int(x), int(y)), int(r),
(0, 229, 255), 2)
return "CIRCLE"
def snap_rect(self, canvas, approx):
x,y,w,h = cv2.boundingRect(approx)
cv2.rectangle(canvas, (x, y), (x+w, y+h),
(0, 229, 255), 2)
return "RECT"
After snapping, erase the freehand stroke and replace it with the clean geometric version. Add a brief flash animation on the snap moment — this is the "wow" effect. The transition from rough → clean happens in one frame and looks like magic.
| Shape |
Detection Method |
Vertex Count |
Snap Function |
| LINE | approxPolyDP | 2 vertices | cv2.line (fitted endpoints) |
| TRIANGLE | approxPolyDP | 3 vertices | cv2.polylines (3 perfect pts) |
| RECTANGLE | approxPolyDP + aspect ratio | 4 vertices | cv2.rectangle (bounding) |
| SQUARE | 4 vertices + AR ≈ 1.0 | 4 vertices | cv2.rectangle (equal sides) |
| CIRCLE | >8 vertices + circularity | 8+ vertices | cv2.minEnclosingCircle |
The UI is what separates a student project from a product demo. Every visual element must feel intentional. Build this in OpenCV using transparent overlay compositing with RGBA blending.
Element 1
CURSOR DESIGN
Don't draw a dot. Draw a crosshair ring: outer circle (thin, dim) + inner dot + 4 tick marks. Color changes by gesture: cyan = draw, red = erase, amber = menu. Add a trailing ghost that fades over 8 frames (store last N positions, draw with decreasing opacity).
Element 2
GESTURE FEEDBACK HUD
Bottom-left corner: show current gesture name in a monospace font with a colored status LED. Add a hold-progress arc that fills around the LED when a hold-gesture is being confirmed. Disappears 1s after gesture changes.
Element 3
FLOATING TOOL MENU
Triggered by open palm. Renders as a radial arc of 5–6 tool icons centered on the hand position. Use circles with icons drawn via cv2 primitives. Pinch on a tool to select. Menu fades out after 2s or on selection.
Element 4
TEXT RECOGNITION POPUP
When a letter is recognized, show it in a large glowing overlay that fades in quickly and slides to the text output strip at the top of screen. Add a subtle flash effect on the moment of recognition.
Element 5
SHAPE SNAP FLASH
On shape snap: one white flash frame, then the clean shape fades in with a glow bloom effect. Render the shape label (CIRCLE, RECT) in small monospace text near the shape center for 1.5 seconds.
Element 6
HEADER HUD STRIP
Top bar: AIRBOARD PRO logo left, current mode center (DRAW / ERASE / SELECT), FPS counter right. Text output strip shows recognized letters building up in real time. Height: 40px, semi-transparent dark background.
// OVERLAY RENDERER — TRANSPARENT BLEND
def draw_overlay(frame, overlay_rgba):
alpha = overlay_rgba[:, :, 3] / 255.0
alpha = np.stack([alpha]*3, axis=-1)
frame[:] = (overlay_rgba[:,:,:3] * alpha
+ frame * (1 - alpha)).astype(np.uint8)
return frame
def draw_cursor(overlay, x, y, gesture, trail=[]):
color_map = {"DRAW": (0,229,255), "ERASE": (255,60,60),
"MENU": (255,171,0), "MOVE": (180,180,255)}
col = color_map.get(gesture, (200,200,200))
for i, pt in enumerate(trail[-8:]):
a = int(30 * (i / 8))
cv2.circle(overlay, pt, 3, (*col, a), -1)
cv2.circle(overlay, (x, y), 18, (*col, 180), 1)
cv2.circle(overlay, (x, y), 3, (*col, 255), -1)
for dx, dy in [(0,-24),(0,24),(-24,0),(24,0)]:
cv2.line(overlay, (x,y), (x+dx//2, y+dy//2), (*col,120), 1)
// FILE STRUCTURE
airboard_pro/
├── core/
│ ├── hand_tracker.py
│ ├── gesture_detector.py
│ ├── canvas_manager.py
│ └── stroke_smoother.py
├── recognition/
│ ├── shape_detector.py
│ ├── letter_recognizer.py
│ └── data_collector.py
├── ui/
│ ├── renderer.py
│ ├── menu.py
│ └── hud.py
├── models/
│ └── letter_svm.pkl
├── assets/
│ └── sounds/
├── config.py
├── app.py
└── requirements.txt
// DATA FLOW (each frame)
STEP 01
CAPTURE
Webcam frame → flip horizontal → BGR copy for display
STEP 02
TRACK
MediaPipe → 21 landmarks → convert to pixel coords → get index tip
STEP 03
GESTURE
Landmark geometry → gesture string → debounce buffer → confirmed gesture
STEP 04
ACTION
Route gesture → CanvasManager / ShapeDetector / LetterRecognizer
STEP 05
RENDER
Composite canvas → apply UI overlays → render cursor → show FPS
STEP 06
DISPLAY
cv2.imshow → cv2.waitKey(1) → loop
// app.py — MAIN LOOP SKELETON
def main():
cam = cv2.VideoCapture(0)
tracker = HandTracker()
gesture = GestureDetector()
canvas = CanvasManager(W, H)
shapes = ShapeDetector()
letters = LetterRecognizer()
renderer = Renderer(W, H)
prev_gesture = None
text_output = ""
while True:
ret, frame = cam.read()
frame = cv2.flip(frame, 1)
landmarks = tracker.process(frame)
if landmarks:
g = gesture.detect(landmarks)
tip = get_finger_pos(landmarks[8], W, H)
if g == "DRAW": canvas.add_point(tip)
elif g == "ERASE": canvas.erase_at(tip, radius=30)
elif g == "MENU": renderer.show_menu(tip)
if prev_gesture == "DRAW" and g != "DRAW":
canvas.commit_stroke()
result = letters.predict(canvas.history[-1])
text_output += result
prev_gesture = g
out = canvas.composite(frame.copy())
out = renderer.render_ui(out, gesture=g, text=text_output)
cv2.imshow("AIRBOARD PRO", out)
if cv2.waitKey(1) == 27: break
GHOST DRAWING
High Priority
System draws when you're just moving your hand (not intending to draw). Most common frustration in demos.
Fix: Add a "draw intent zone" — only draw when index tip is in the lower 70% of frame. Add a 3-frame gesture confirmation buffer before drawing starts. Show a "ready" indicator before draw mode activates.
MediaPipe loses tracking briefly when hand moves fast or goes to edge. Causes stroke breaks or phantom lines to (0,0).
Fix: Store last_valid_position. If landmarks not found for <5 frames, use last position (freeze cursor). If >5 frames: commit current stroke and wait. Never connect a fresh detection to the last known point without distance check.
Transitioning between gestures creates ambiguous frames where the wrong gesture fires briefly. Causes eraser to fire while drawing, etc.
Fix: Debounce all gestures with a 5-frame voting buffer. Only change active gesture if 4/5 recent frames agree. Add a "gesture lock" period of 0.3s after any gesture change where the system doesn't accept new gesture changes.
Similar letters (I/L, O/C, U/V) get misclassified. HOG can't always distinguish slight shape variations.
Fix: Collect more samples for confused pairs. Add a confidence threshold — only accept predictions above 75%. For low-confidence predictions, show top 3 options in UI and let user pinch-select the correct one.
LIGHTING SENSITIVITY
Low-Med
MediaPipe hand detection degrades in low light or with backlighting, causing jitter and frequent tracking loss.
Fix: Apply CLAHE (histogram equalization) to the webcam input frame before passing to MediaPipe. Add an on-screen lighting warning if average frame brightness < 80. Suggest user turn on overhead light in startup message.
The pause-based letter segmentation fires too early (cuts a stroke mid-letter) or too late (merges two letters).
Fix: Make pause threshold configurable (default 0.8s). Add a visual countdown ring around the cursor during the pause window so user knows when recognition will fire. Allow manual trigger with a wink/blink gesture as alternative segmentation trigger.
MVP CORE
// TARGET: 3–5 DAYS
Hand tracking with MediaPipe (1 hand)
3 gestures: Draw (index), Erase (fist), Clear (open palm)
Real-time drawing on canvas overlay
Basic exponential position smoothing
Eraser that clears a radius around fist
Save canvas as PNG (press S key)
Simple shape snap (circle + rectangle)
Gesture name HUD (bottom corner)
Futuristic cursor design (ring + crosshair)
ADVANCED V2
// TARGET: DAY 6–10
Air writing A–Z + digits with HOG+SVM model
Radial floating tool menu (open palm trigger)
Color palette selector via gesture
Stroke undo history (thumbs up)
Full shape library (triangle, line, circle, rect)
Pinch-to-select tool from menu
Finger trail animation (motion blur effect)
Shape snap flash animation
Text output strip with recognized letters
Sound feedback on gesture confirm (optional)
Build Rule: The MVP must be a working demo by Day 4. Everything after Day 4 is polish and features. A working MVP shown well always beats an unfinished advanced version.
1
DAY 1 — FOUNDATION
HAND TRACKER + GESTURE ENGINE
Set up project structure. Install all deps. Get MediaPipe working with webcam. Extract landmark positions. Build GestureDetector with 3 gestures. Log gesture name to console. Test debounce buffer. End of day goal: gesture name prints stably without false fires.
2
DAY 2 — DRAWING ENGINE
CANVAS MANAGER + SMOOTHING
Build CanvasManager. Implement add_point, commit_stroke, undo, erase_at. Add exponential position smoothing. Test drawing quality. Implement canvas composite over webcam. Add minimum distance filter to stroke points. End of day goal: can draw smooth lines on screen with index finger.
3
DAY 3 — UI SYSTEM
CURSOR + HUD + OVERLAYS
Build Renderer class. Implement futuristic cursor (ring + ticks + trail). Add gesture HUD in bottom corner. Add header strip with mode + FPS. Implement color-coded cursor by gesture. Tune all visual parameters. End of day goal: project looks like a product, not a script. Record first demo clip.
4
DAY 4 — SHAPE RECOGNITION
CONTOUR DETECTION + SNAP EFFECT
Build ShapeDetector. Implement detect_and_snap for circle, rect, line, triangle. Wire into main loop on stroke commit. Add snap flash animation. Add shape label display. Test with different drawing styles. End of day goal: rough shapes snap to clean geometry reliably. MVP is feature-complete.
5
DAY 5 — DATA COLLECTION
LETTER TRAINING DATA + SVM
Build data_collector.py tool (press key to label + save stroke). Collect 20 samples each for A–Z. Preprocess with HOG pipeline. Train SVM. Test accuracy. Collect more data for confused pairs. End of day goal: model achieves >80% on your own handwriting. Save to models/letter_svm.pkl.
6
DAY 6 — AIR WRITING INTEGRATION
SEGMENTATION + TEXT OUTPUT
Wire LetterRecognizer into main loop. Implement pause-based segmentation. Add countdown ring visual during pause window. Build text output strip in HUD. Test full writing flow. Add confidence threshold gating. End of day goal: can write words in air and see them appear on screen.
7–10
DAYS 7–10 — POLISH + ADVANCED FEATURES
RADIAL MENU + UNDO + COLOR PALETTE
Add radial floating menu (open palm). Implement pinch-to-select. Add undo gesture. Build color picker tool. Add stroke width selector. Optimize performance (target 28+ FPS). Record final demo video with all features. Clean up code, add comments. Write README. End goal: portfolio-ready project with 2-minute demo video.
// Required Dependencies
opencv-python>=4.8
mediapipe>=0.10
numpy>=1.24
scikit-learn>=1.3
scikit-image>=0.21
scipy>=1.11
joblib>=1.3
pygame>=2.5
// Quick Start Command
pip install -r requirements.txt
python recognition/data_collector.py
python app.py
// Portfolio Demo Tips
Record in 1080p. Use a dark background behind you. Wear a plain-color sleeve (helps hand tracking contrast). Show all 3 moments: smooth drawing → shape snap → letter recognition. Add music. Keep demo under 90 seconds. Upload to YouTube + GitHub.