In Progress

Inkwell

React TypeScript Vite WebGPU ONNX Runtime Hugging Face perfect-freehand KaTeX

Overview

Inkwell is a browser-native handwritten formula recognizer. You sketch a formula on a single ink sheet, pick a model, and hit recognize, the cropped image is sent to a vision-language model running entirely on WebGPU and you get back rendered LaTeX plus the raw source. The load-bearing constraint is local-first: no backend, no API keys, no data leaves the device. The interesting work is making modern VLMs actually run inside the browser, ONNX export, weight caching, WebGPU feature detection, and a clean adapter boundary so the rest of the app does not care which model is loaded.

What I built

In-browser inference of vision-language models (LiquidAI LFM2.5-VL 450M, FastVLM 0.5B) via ONNX Runtime Web on WebGPU
Runtime-swappable VLM adapter, pick the model from a dropdown, weights stream in with a live progress bar, the rest of the pipeline is unchanged
Status-aware loading pipeline (idle → checking → loading → ready/error) so users see real feedback while a multi-hundred-MB model downloads and warms up
Ink sheet built on perfect-freehand with pencil/eraser tools, adjustable stroke width, ink color picker, plus undo and clear
Crop + rasterize step that bounds-boxes only the inked region before inference, keeping latency reasonable on small models
Validator between VLM output and the UI, raw model output is parsed and sanity-checked before LaTeX is rendered, errors surface inline instead of producing garbage
KaTeX render of the recognized formula plus a side panel showing the raw LaTeX source and a thumbnail of the exact crop the model saw

What I learned

Running modern ONNX models in the browser, quantization tradeoffs, weight caching, WebGPU feature detection, and how to surface unsupported-device states gracefully
Designing a clean adapter boundary so the rest of the app does not care which VLM is loaded, only that it returns structured LaTeX
Driving a long-running async load with real status (progress, error, ready) instead of a single spinner, and how much that changes the perceived quality of a heavy ML feature
Why showing the model what you sent it (the crop preview) and what it returned (raw LaTeX) is non-negotiable when the model is small and sometimes wrong
Building against a moving target, this is a third-party model running in a browser runtime, so wrapping it behind a stable interface paid for itself the first time I swapped LFM2.5 for FastVLM

Architecture

flowchart LR
    User([" User"])

    subgraph App["Inkwell · React + Vite (browser only)"]
        Sheet["Sheet Surface\n(perfect-freehand canvas)"]
        Crop["Crop + Rasterize"]

        subgraph VLM["VLM Adapter · WebGPU"]
            LFM["LFM2.5-VL 450M\n(ONNX Runtime)"]
            Fast["FastVLM 0.5B\n(ONNX Runtime)"]
        end

        Validate["Validator"]
        Render["KaTeX + LaTeX Source"]
    end

    User -->|"draw"| Sheet
    Sheet -->|"recognize"| Crop
    Crop --> VLM
    VLM -->|"raw LaTeX"| Validate
    Validate --> Render
    Render --> User