Ian Macomber's Blueprint for a Vendor-Proof Data Stack
A deeper take on Ian Macomber's viral LinkedIn idea: build an LLM-ready data stack, iterate fast, and avoid BI lock-in today.
Ian Macomber (Data at Ramp ๐ณ๐ค) recently shared something that caught my attention: "The real question for any data leader building a stack today: will vertical data tools become "the Cursor of data", or will the Cursor of data... just be Cursor [sub in Claude Code, Codex, etc.]".
That framing lands because it cuts through the usual tooling debate. Instead of arguing whether a specialized, vertical data product will win, Ian is asking a more durable question: if the interface to data work becomes AI-first, do we even need a new class of vertical tools, or do we need foundations that let AI work well across whatever tools we pick?
I want to expand on Ian's point, because it doubles as a practical strategy for building a modern data stack that survives vendor churn.
The real shift: from SQL-first to prose-first
Ian wrote, "We are increasingly solving problems by writing prose (not SQL), and chaining simple tools together (search tables, run query) with more advanced models." That is the center of gravity.
If you accept that premise, then a lot of decisions change:
- Your "front door" to analytics may become a chat-like interface.
- The most valuable asset is not the dashboard - it is the consistently defined metrics and transformations behind it.
- The stack that wins is the one that is easiest for both humans and models to understand.
Key insight: if models are going to read, reason, and act, your data stack has to be legible, not just correct.
This is why the right question is not "Which BI tool is best?" It is "What should remain stable when BI tools and AI copilots change every 12 months?"
Ian's three optimizations for a greenfield stack
Ian said that if he were building from scratch, he would optimize for three things:
- "LLM-readable source of truth based on hosted markdown files in git + dbt"
- "Fast iteration"
- "Owning your own semantic layer / not handcuffing yourself to one BI vendor"
Let me unpack each one with some concrete implications.
1) LLM-readable source of truth: markdown in git plus dbt
The phrase "LLM-readable" is doing a lot of work. Most organizations technically have documentation, but it is scattered: wiki pages nobody trusts, dashboard descriptions nobody reads, and tribal knowledge locked in Slack.
A markdown-based source of truth in git is different because it is:
- Versioned: you can see who changed a metric definition and why.
- Reviewable: documentation changes can ride the same pull request workflow as transformation code.
- Portable: it is not trapped inside a BI vendor's UI.
- Parsable: models can ingest markdown and relate it to dbt models, tests, and lineage.
Where dbt fits is equally important. dbt already encourages a contract-like approach to transformations: named models, tests, exposures, and documentation. When your transformations and your explanations live side by side, you get a system that supports both humans (during onboarding, incident response, metric debates) and AI agents (during discovery, query planning, and debugging).
Practical example: imagine a folder in your repo like /metrics with markdown files that define core terms:
- what "Active Customer" means
- how revenue is recognized
- what tables are authoritative
- what edge cases exist (refunds, migrations, partial data)
Then you link those definitions directly to the dbt models that implement them. A model can now answer "What is the revenue definition?" and also trace it to the transformation logic.
2) Fast iteration: shorten the idea-to-insight loop
Fast iteration is not just developer velocity. It is organizational learning speed.
A stack that supports fast iteration usually has:
- Clear environments (dev, staging, prod) with easy promotion.
- CI that runs dbt tests and basic data quality checks.
- A simple path to add new sources and models without weeks of ceremony.
- Lightweight governance that does not block every change.
In an AI-assisted workflow, iteration speed matters even more because:
- People will ask more questions (because it is easier to ask).
- Models will propose more experiments (because exploration is cheap).
- The bottleneck becomes validation and definition alignment, not query writing.
So "fast iteration" should include fast agreement on meaning. That is why Ian pairs it with a semantic layer you own. Otherwise you iterate quickly into inconsistency.
If you speed up questions without stabilizing definitions, you scale confusion.
3) Own your semantic layer: avoid BI tool handcuffs
Ian's line here is blunt and correct: "Only put your semantics inside a BI tool if you're confident you'll live there 5+ years (...you shouldn't be)."
Why is this such a big deal?
Because the semantic layer is where your business logic becomes reusable:
- metric definitions (MRR, churn, conversion)
- dimensions (region, segment, plan)
- time logic (fiscal calendars, cohorts)
- security rules (who can see what)
If that logic is embedded inside one BI tool, you are paying an invisible tax:
- Migrations become rewrites.
- Consistency across tools becomes manual.
- AI assistants have to reverse engineer meaning from charts and dashboard configs.
Owning your semantic layer does not mean you must build everything yourself. It means you choose a representation of metrics that is portable and referenceable: dbt metrics, a dedicated semantic layer product, or even carefully managed views and YAML definitions, as long as the definitions are not locked in a single visualization layer.
A simple litmus test: if you removed your BI vendor tomorrow, would you still know exactly how your KPIs are defined and computed? If the answer is no, you do not own your semantics.
What vertical data tools are really selling
Ian also made a nuanced point: "The company-specific value of data vendors is distribution: beautiful data apps, frictionless embedding, native primitives, SCIM provisioning, user context. Choose the tool that fits what you need right now."
That is a helpful way to separate durable value from transient value.
Durable (you should own it):
- canonical definitions
- transformation logic
- documentation
- lineage you can export
Transient but valuable (you can rent it):
- UI polish and workflow
- embedding and distribution to end users
- enterprise admin features like SCIM
- opinionated "native" experiences
This matters because vendors will keep competing on experience. The best embedded analytics today may not be the best next year. AI interfaces may compress entire UI categories. But the distribution layer is still useful - you just do not want it to be the only place your definitions live.
A practical blueprint for a vendor-proof, AI-ready stack
If I translate Ian's guidance into an actionable checklist, it looks like this:
Step 1: Put definitions where models and humans can read them
- Store metric definitions and data contracts in markdown and YAML.
- Keep them in git, with code review.
- Link docs to dbt models and exposures.
Step 2: Make dbt the spine of transformation and trust
- Standardize naming and model layers (staging, intermediate, marts).
- Require tests for core models (uniqueness, not null, accepted values).
- Use docs generation and enforce freshness on critical sources.
Step 3: Choose a semantic layer you can move
- Prefer definitions outside the BI tool.
- Ensure you can export metric logic and reuse it across BI and AI assistants.
- Treat semantics like product code: versioned, reviewed, observable.
Step 4: Rent distribution where it helps, without surrendering ownership
- Pick tools for embedding, app-like experiences, and admin needs.
- Integrate user context and permissions at the edge.
- Keep the source of truth upstream so switching tools is a change, not a rewrite.
Step 5: Prepare for agentic workflows
If "Cursor-like" experiences become the default, you want your stack to support safe tool chaining:
- table search and schema discovery
- governed query execution
- explanation of metric provenance
- logging and review of AI-generated queries and outputs
The point is not to bet on one assistant (Cursor, Claude Code, Codex, or the next one). The point is to make your data system legible and controllable so any assistant can be useful.
Closing thought: build what outlasts the UI
Ian's post is ultimately about durability. Tools will change. Interfaces will change. But a stack that has readable definitions, fast iteration, and portable semantics gives you leverage.
Build the parts you want to keep for 5 years. Rent the parts you might replace next year.
This is how you get the upside of modern vendors (speed and distribution) without waking up in two years realizing you have to rebuild your BI tool to escape it.
This blog post expands on a viral LinkedIn post by Ian Macomber, Data at Ramp ๐ณ๐ค. View the original LinkedIn post โ