In Progress
Inkwell
React TypeScript Vite WebGPU ONNX Runtime Hugging Face perfect-freehand KaTeX
Overview
Inkwell is a browser-native handwritten formula recognizer. You sketch a formula on a single ink sheet, pick a model, and hit recognize, the cropped image is sent to a vision-language model running entirely on WebGPU and you get back rendered LaTeX plus the raw source. The load-bearing constraint is local-first: no backend, no API keys, no data leaves the device. The interesting work is making modern VLMs actually run inside the browser, ONNX export, weight caching, WebGPU feature detection, and a clean adapter boundary so the rest of the app does not care which model is loaded.
What I built
- In-browser inference of vision-language models (LiquidAI LFM2.5-VL 450M, FastVLM 0.5B) via ONNX Runtime Web on WebGPU
- Runtime-swappable VLM adapter, pick the model from a dropdown, weights stream in with a live progress bar, the rest of the pipeline is unchanged
- Status-aware loading pipeline (idle → checking → loading → ready/error) so users see real feedback while a multi-hundred-MB model downloads and warms up
- Ink sheet built on perfect-freehand with pencil/eraser tools, adjustable stroke width, ink color picker, plus undo and clear
- Crop + rasterize step that bounds-boxes only the inked region before inference, keeping latency reasonable on small models
- Validator between VLM output and the UI, raw model output is parsed and sanity-checked before LaTeX is rendered, errors surface inline instead of producing garbage
- KaTeX render of the recognized formula plus a side panel showing the raw LaTeX source and a thumbnail of the exact crop the model saw
What I learned
- Running modern ONNX models in the browser, quantization tradeoffs, weight caching, WebGPU feature detection, and how to surface unsupported-device states gracefully
- Designing a clean adapter boundary so the rest of the app does not care which VLM is loaded, only that it returns structured LaTeX
- Driving a long-running async load with real status (progress, error, ready) instead of a single spinner, and how much that changes the perceived quality of a heavy ML feature
- Why showing the model what you sent it (the crop preview) and what it returned (raw LaTeX) is non-negotiable when the model is small and sometimes wrong
- Building against a moving target, this is a third-party model running in a browser runtime, so wrapping it behind a stable interface paid for itself the first time I swapped LFM2.5 for FastVLM
Architecture
flowchart LR
User([" User"])
subgraph App["Inkwell · React + Vite (browser only)"]
Sheet["Sheet Surface\n(perfect-freehand canvas)"]
Crop["Crop + Rasterize"]
subgraph VLM["VLM Adapter · WebGPU"]
LFM["LFM2.5-VL 450M\n(ONNX Runtime)"]
Fast["FastVLM 0.5B\n(ONNX Runtime)"]
end
Validate["Validator"]
Render["KaTeX + LaTeX Source"]
end
User -->|"draw"| Sheet
Sheet -->|"recognize"| Crop
Crop --> VLM
VLM -->|"raw LaTeX"| Validate
Validate --> Render
Render --> User