Nathanim Tadele Builds a Local AI File Brain

Nathanim Tadele recently shared something that caught my attention because it describes a problem almost every developer and knowledge worker quietly suffers from:

ctrl+f is a lie on a modern dev laptop.
i built a local ai that actually finds my stuff.

He followed it with the painfully familiar scene: you are sure you saved a PDF, but your search tool returns a pile of final_final_v3 lookalikes. Nathanim said he hit that wall enough times that he snapped and built his own local "file brain".

That idea is more than a neat side project. It is a practical blueprint for turning the messiest part of digital work (your downloads folder, your meeting notes, your scattered docs) into something you can actually query like a memory system.

€1 trial · 7 days

Automate your LinkedIn for 30 days

Start €1 trial

The real issue: search tools do not match how you remember

Most built-in file search is optimized for:

Filenames
Basic metadata (date modified, file type)
Keyword matches in indexed text (sometimes)

But humans rarely remember an exact filename. We remember meaning:

"That doc where we compared Q3 revenue across regions"
"The contract clause about termination"
"My notes from the meeting where we decided the architecture"

Nathanim's point that ctrl+f is a lie is not about the shortcut itself. It is about the promise behind it: that your system can retrieve what you need when you need it. On a modern laptop with years of PDFs, random exports, screenshots, and notes, that promise breaks.

Nathanim's local AI file brain, explained simply

Nathanim described a tool that:

Scans local folders (PDF, DOCX, Markdown, TXT)
Extracts real content, not just filenames
Generates embeddings locally with Ollama using nomic-embed-text
Searches by meaning, so you type what you remember

The magic is the semantic layer. Instead of asking, "Does this file contain the exact words I typed?" you ask, "Which file is closest in meaning to what I typed?"

So a query like a remembered phrase about quarterly revenue can return the correct document even if the file is named final_v3_updated_REAL.pdf.

Under the hood: the architecture is refreshingly practical

What I like about Nathanim's build is that it is not overcomplicated. It is a local-first indexing pipeline with two modes of retrieval: semantic search for meaning, plus keyword search as a backup.

Step 1: Ingest and extract text

The first hurdle in local search is not AI. It is extraction.

PDFs can be text-based or scanned images.
DOCX files are structured containers.
Notes might be Markdown with frontmatter.

If you only index filenames, you recreate the same failure mode. Nathanim explicitly called out that his tool extracts content, not just names, which is the right foundation.

Step 2: Create embeddings locally with Ollama

Embeddings turn chunks of text into vectors so you can compare "semantic distance".

Nathanim uses Ollama to generate embeddings on-device with nomic-embed-text. That matters because it removes friction:

No external API calls
No key management
No surprise costs
No latency spikes when your internet is slow

If your goal is to build a dependable second brain, reliability beats novelty.

Step 3: Store vectors in LanceDB

Vector storage is where many prototypes get stuck. You need a database that can:

Insert lots of vectors
Persist them on disk
Perform fast similarity search

LanceDB is a solid choice for local vector search because it is built for this workload and plays well with modern developer tooling.

Step 4: Add SQLite FTS5 for keyword fallback

Nathanim also mentioned SQLite FTS5 as a keyword backup.

This is a quietly great decision. Semantic search is powerful, but it is not perfect for:

Exact strings (invoice numbers, IDs)
Code symbols
Proper nouns

Hybrid retrieval (semantic + keyword) is often the difference between a demo and a tool you actually trust.

Step 5: Wrap it in a TypeScript CLI

A CLI sounds small, but it is the right interface for an evolving tool:

Easy to run against one folder
Easy to schedule
Easy to integrate with scripts

Nathanim noted he used TypeScript plus Commander.js. That is a builder-friendly stack that favors iteration.

Why local-first matters more than ever

Nathanim emphasized the part that matters most:

nothing leaves my machine.

That single constraint changes everything.

Privacy becomes a feature, not a policy
You do not have to interpret "we may use your data" fine print. Your files stay yours.
You can index sensitive folders
Contracts, medical docs, taxes, HR notes, client research, internal designs. Local-first is what makes this feasible.
Your tool works offline
Search should not depend on Wi-Fi.
Performance is predictable
Once indexed, retrieval is fast, and you are not paying per query.

€1 trial · 7 days

Automate your LinkedIn for 30 days

Start €1 trial

Build this for one folder first (a practical path)

Nathanim suggested starting small: pick one directory and index it locally.

If you are tempted to build your own, here is a realistic plan that matches the spirit of his project.

1) Choose a single folder with clear value

Good candidates:

docs/ for proposals and specs
contracts/
tax/
meeting notes for one project

2) Extract and normalize text

Keep it boring:

Convert each file into plain text
Store basic metadata: path, modified time, type

3) Chunk the text

Do not embed entire books in one vector. Chunk by paragraphs or fixed sizes so retrieval can point you to the right section.

4) Embed locally

Use Ollama embeddings so your pipeline stays self-contained.

5) Index and search

Vector index for semantic similarity
FTS index for exact keyword search

6) Add a minimal loop that feels good

The simplest UX is:

Index command: point at a folder
Search command: return top matches with file path and a snippet

If it returns the right thing twice in a row, you will keep using it.

What semantic search over personal files actually feels like

Once you have it, you stop thinking in filenames. You start thinking in memories.

You can ask:

"Where did we discuss the tradeoff between SQLite and Postgres?"
"Find my notes about the onboarding pain points"
"Show the doc with the scope change and timeline impact"

This is why Nathanim said semantic search feels like cheating. It is not that it is magical. It is that it matches the way your brain stores context.

Where Nathanim can take it next: Q&A and lightweight RAG

Nathanim mentioned two logical next steps: natural language Q&A over files and better auto-organization.

Q&A is where retrieval augmented generation (RAG) fits naturally:

Retrieve the most relevant chunks from your local index
Feed only those chunks into a local LLM for an answer
Cite sources back to the original files

If done carefully, this becomes a private research assistant for your own documents.

Auto-organization is also compelling, but I agree with the implied priority: semantic search first. Classification and folder cleanup are easier once retrieval is dependable.

Why this became a strong piece of LinkedIn content

Even with modest engagement numbers, the post has the ingredients of viral posts:

A sharp hook: ctrl+f as a broken promise
A universal pain point: messy file graves
A concrete build: tools, stack, and example
A clear stance: local-first, no cloud
A simple call to action: try it on one folder

This is a reminder for anyone thinking about content strategy: specificity wins. Nathanim did not say, "I built an AI app." He showed the exact workflow and why it matters.

Try one question on your own machine

If your laptop is also a graveyard of PDFs and notes, Nathanim's advice is the right starting point: pick one folder and index it.

Then ask yourself: what is the one query you wish you could run against your own files?

That question is the real product spec.

This blog post expands on a viral LinkedIn post by Nathanim Tadele, Full-stack Software Engineer . Building second brain . ALX SE Alumni. View the original LinkedIn post ->