This page is a work in progress.
How it works
FilDOS indexes your files in the background using sentence-transformer models running entirely on-device via WASM. No data ever leaves your machine. The pipeline:- Extract — text is pulled from plain text, code, Markdown, CSV, JSON, and similar files. Binary files and files over the size cap are skipped.
- Chunk — long documents are split into ~512-token windows with overlap so context isn’t lost at boundaries.
- Embed — each chunk is passed through the active model to produce a vector.
- Store — vectors are stored as Float32 BLOBs in SQLite and searched with brute-force cosine similarity.
Choosing a model
Open Settings → AI to pick an embedding model and download it. Models are cached inuserData/models and loaded only once per session.
| Model | Dim | Best for |
|---|---|---|
| MiniLM L6 v2 | 384 | Fast, general text |
| BGE Small v1.5 | 384 | Strong retrieval quality |
| GTE Small | 384 | Multilingual queries |
| CLIP ViT-B/32 | 512 | Text and images in one space |
Indexing
The indexer runs in the background and crawls your home directory by default. It skips:- Dotfiles and hidden directories
node_modules, build outputs, caches- Files you’ve explicitly excluded via the context menu (Exclude from AI index)
Search
Type a natural-language query in the search bar and press↵. FilDOS runs a
hybrid search — keyword matching fused with semantic vector retrieval — so you
get precise results for filenames and fuzzy results for content.