Product case study

Building a tax assistant people can actually trust.

Spak rebuilds a 2018 intent-classification tax chatbot as a modern retrieval-grounded system. The hard part was never the model — it was making every answer verifiable. This is the product story: the problem, how Spak solves it, and where it's headed.

Built for
Filers, advisors, CA taxpayers
Coverage
U.S. federal + California tax
Stack
RAG · MCP · React/Vite
Status
Feature-complete
The problem

Confident answers, zero accountability.

A general chatbot will answer a tax question instantly — and just as instantly invent a deduction limit or a bracket threshold that's plausible but wrong. For tax, "plausible but wrong" isn't a UX wrinkle; it's a filing error. The trust gap isn't fluency. It's that you can't see where the answer came from.

The product question wasn't "can it answer?" It was "can the user verify the answer without trusting us?"

Who it's for

Three users, one shared need.

Individual filers

Want a straight answer to a real situation — a Roth conversion, an RSU vest, a side business — without parsing a 40-page publication.

Advisors & preparers

Need speed, but can't act on an unsourced claim. A citation they can click is the difference between useful and unusable.

California taxpayers

Federal-only tools quietly mislead. State conformity differences are exactly where generic answers break — and where grounding pays off.

The unifying need: all three will forgive "let me check," but none will forgive a wrong number stated with confidence. That's why Spak is built around sourcing, not fluency.

What got prioritized

Grounding first, everything else second.

Spak over-invests in the one thing that creates trust, and keeps everything else at "good enough" until the core promise is proven:

1 · Retrieval quality

Hybrid dense + sparse search with cross-encoder reranking, so the answer cites the right passage — not a passage that merely sounds related.

2 · Deterministic math

Every number comes from real code behind a tool boundary, never the model. A bracket calc is reproducible and auditable.

3 · Visible citations

Each answer carries clickable sources. The proof of trust is in the UI, not buried in a system prompt.

4 · California parity

State rules treated as first-class, not an afterthought — the place generic tools fail is the place to win.

Scope decisions

What Spak doesn't do — yet.

A good product is defined as much by what it leaves out. Each of these is a real, defensible feature, deliberately deferred so the core promise ships sharp instead of the whole thing shipping blurry.

Conversational memory
Tempting, but it multiplies the surface area for a wrong answer to compound across turns. Single-shot, well-sourced answers had to be right first.
A login & saved returns
Real product value, zero bearing on whether the grounding works. Pure scope creep against the core hypothesis.
Every tax topic
Depth over breadth. A narrow set of topics answered impeccably beats a broad set answered vaguely — and proves the approach.
A live public model on day one
An unguarded answer endpoint is a cost-and-abuse magnet. The demo ships grounded and safe; the live path waits for rate-limiting and caps.
Roadmap

Now, next, later.

Now — shipped
  • Grounded retrieval with reranking
  • Deterministic federal + CA calculator
  • Clickable citations on every answer
  • Streaming React interface
Later
  • Saved questions & return context
  • Document upload (read a real 1099)
  • Multi-state beyond California
  • Advisor workspace
The principle

Never claim more than it can prove.

Every decision in Spak traces back to one rule: the product never claims more than a user can verify. Tying the core promise — traceable answers — to something visible and clickable keeps every choice honest. If a feature doesn't make answers more verifiable, it waits. In a domain where a confident wrong number is a real cost, that discipline is the product.

See it in action.

Try the grounded demo, or read the architecture behind it.