Skip to main content
This page is a work in progress.

How it works

FilDOS indexes your files in the background using sentence-transformer models running entirely on-device via WASM. No data ever leaves your machine. The pipeline:
  1. Extract — text is pulled from plain text, code, Markdown, CSV, JSON, and similar files. Binary files and files over the size cap are skipped.
  2. Chunk — long documents are split into ~512-token windows with overlap so context isn’t lost at boundaries.
  3. Embed — each chunk is passed through the active model to produce a vector.
  4. Store — vectors are stored as Float32 BLOBs in SQLite and searched with brute-force cosine similarity.

Choosing a model

Open Settings → AI to pick an embedding model and download it. Models are cached in userData/models and loaded only once per session.
ModelDimBest for
MiniLM L6 v2384Fast, general text
BGE Small v1.5384Strong retrieval quality
GTE Small384Multilingual queries
CLIP ViT-B/32512Text and images in one space

Indexing

The indexer runs in the background and crawls your home directory by default. It skips:
  • Dotfiles and hidden directories
  • node_modules, build outputs, caches
  • Files you’ve explicitly excluded via the context menu (Exclude from AI index)
Progress is visible in Settings → Indexing. You can pause, resume, or clear the index at any time. Type a natural-language query in the search bar and press . FilDOS runs a hybrid search — keyword matching fused with semantic vector retrieval — so you get precise results for filenames and fuzzy results for content.