Case study — Spak.ai

The problem

Confident answers, zero accountability.

A general chatbot will answer a tax question instantly — and just as instantly invent a deduction limit or a bracket threshold that's plausible but wrong. For tax, "plausible but wrong" isn't a UX wrinkle; it's a filing error. The trust gap isn't fluency. It's that you can't see where the answer came from.

The product question wasn't "can it answer?" It was "can the user verify the answer without trusting us?"

Who it's for

Three users, one shared need.

Individual filers

Want a straight answer to a real situation — a Roth conversion, an RSU vest, a side business — without parsing a 40-page publication.

Advisors & preparers

Need speed, but can't act on an unsourced claim. A citation they can click is the difference between useful and unusable.

California taxpayers

Federal-only tools quietly mislead. State conformity differences are exactly where generic answers break — and where grounding pays off.

The unifying need: all three will forgive "let me check," but none will forgive a wrong number stated with confidence. That's why Spak is built around sourcing, not fluency.

What got prioritized

Grounding first, everything else second.

Spak over-invests in the one thing that creates trust, and keeps everything else at "good enough" until the core promise is proven:

1 · Retrieval quality

Hybrid dense + sparse search with cross-encoder reranking, so the answer cites the right passage — not a passage that merely sounds related.

2 · Deterministic math

Every number comes from real code behind a tool boundary, never the model. A bracket calc is reproducible and auditable.

3 · Visible citations

Each answer carries clickable sources. The proof of trust is in the UI, not buried in a system prompt.

4 · California parity

State rules treated as first-class, not an afterthought — the place generic tools fail is the place to win.

Scope decisions

What Spak doesn't do — yet.

A good product is defined as much by what it leaves out. Each of these is a real, defensible feature, deliberately deferred so the core promise ships sharp instead of the whole thing shipping blurry.

Conversational memory

Tempting, but it multiplies the surface area for a wrong answer to compound across turns. Single-shot, well-sourced answers had to be right first.

A login & saved returns

Real product value, zero bearing on whether the grounding works. Pure scope creep against the core hypothesis.

Every tax topic

Depth over breadth. A narrow set of topics answered impeccably beats a broad set answered vaguely — and proves the approach.

A live public model on day one

An unguarded answer endpoint is a cost-and-abuse magnet. The demo ships grounded and safe; the live path waits for rate-limiting and caps.

Roadmap

Now, next, later.

Now — shipped

Grounded retrieval with reranking
Deterministic federal + CA calculator
Clickable citations on every answer
Streaming React interface

Live answers behind a guarded endpoint
Per-user rate limits + daily caps
Expanded topic + publication coverage
Answer feedback loop for retrieval tuning

Later

Saved questions & return context
Document upload (read a real 1099)
Multi-state beyond California
Advisor workspace

The principle

Never claim more than it can prove.

Every decision in Spak traces back to one rule: the product never claims more than a user can verify. Tying the core promise — traceable answers — to something visible and clickable keeps every choice honest. If a feature doesn't make answers more verifiable, it waits. In a domain where a confident wrong number is a real cost, that discipline is the product.

See it in action.

Try the grounded demo, or read the architecture behind it.

Try the demo See the architecture

Building a tax assistant people can actually trust.