Case study · 2025–2026

Phos4

Make SQL feel like thinking, not typing.

A browser-based SQL IDE shipped with Mozart Data. I led three workstreams over the engagement: AI transformation for the team and the product, a frontend rebuild on a modern foundation with a deep theming engine, and the brand and web design Phos4 launched under.

AI transformationFrontend rebuildBrand & web designLive site

Phos4 with the first theme palette applied: warm red accents on dark.

Phos4 with the second theme palette applied: neutral red accents on dark.

Context

How an AI training engagement widened into product, frontend, and brand.

Mozart Data was building Phos4 with an 11-person engineering team. They brought me in as an AI transformation consultant. The starting brief was to install agentic development as a discipline on the team.

Within the first few weeks the scope had to widen. The existing frontend wasn't built for what shipped on it: the SQL editor, the agent surfaces, the theming, the chart layer. The marketing site and brand were placeholder. The velocity needed to ship a product of this size was several multiples ahead of where the team was.

The engagement landed on three concurrent workstreams. I led all three.

AI transformation

Pair sessions, prompt patterns, review workflows, quality gates.

Mozart's engineers had AI tools available but no shared practice. Nobody on the team had used agentic development with discipline: no prompt patterns, no review conventions for AI-generated code, no quality gates, no shared mental model for when to drive the agent and when to take over. The work hadn't been compounding.

I embedded with the team. The work was concrete: pair sessions on real production tickets; a prompt architecture standardised across the team; review workflows that distinguish AI-authored from human-authored code; quality gates for AI-touched changes on critical paths; and shared templates, harnesses, and examples.

Team velocity

Commits per week

Weekly engineering output across an eleven-engineer team. The CEO learned the agentic practice alongside the team and moved from typical-engineer commit volume to roughly thirty percent of all commits.

Weekly commit history, Jul 2025 through Apr 2026. Full-rate reference: 924 commits/week.

Team velocity

Weekly commits climbed from about fifty to nearly a thousand.

The CEO learned the practice alongside the team. He moved from typical-engineer commit volume to producing about thirty percent of all commits, and held that lead by a wide margin. The chart above is the modeled view of that transition. A model projects 924 commits per full-rate week from sustained per-engineer throughput across the team. Weekly lines changed moved from 5K–10K to 30K–80K. The practice is still in place after I left.

Frontend rebuild

Rebuilt the frontend: React 19, Vite, 285 components.

A SQL editor with dialect-aware intelligence, an agent surface, a chart engine, a schema browser, a profiler. All running concurrently, all sharing state, all driven by the same theming system. The existing codebase wasn't built for it.

I rebuilt the foundation. React 19 and TypeScript over Vite. 285 components organised across 33 feature categories. 66 reusable UI primitives built on Radix and shadcn/ui with Tailwind v4: 13 button variants and 5 input variants, every primitive documented in Storybook. Nine React Context providers for cross-cutting concerns (auth, file management, warehouse metadata, focus mode, user preferences). Ten Redux slices for domain logic (editor state, query execution, AG Grid configuration, formatting, profiler, schema browser, AI assistant). TanStack Query for fetching. AG Grid Enterprise for tabular data. Vitest for unit tests, Playwright for end-to-end, Storybook for visual regression.

Components

285

UI Primitives

Palettes

The theming engine

Ten palettes flowing through every layer.

Engineers tune their environment all day, and the product had to absorb that without losing state, switching builds, or looking inconsistent across panels.

Phos4 ships ten palettes across light and dark, driven by a dynamic CSS-variable system and cached in localStorage so a switch is instant on the next reload. Every layer reads from the same tokens (chart primitives, editor decorations, Monaco themes, surface chrome), so a palette change flows through the whole product without anything looking grafted on. The viewer at the top of this page is the proof.

In the product

Four surfaces: the agents, the memory, the data, the editor.

Four surfaces run on the foundation: the routed agent system, the memory layer, the data surfaces around the editor, and the editor itself.

In-product agents

Three agents: write, review, pressure-test.

Alongside the coaching, I designed and built Phos4's first agentic AI surfaces inside the product. The architecture is a routed multi-agent system: one agent writes SQL, another reviews it, a third pressure-tests the data.

Backlog · 20

INV-412Investigate null_pct spike in signup.events

PIV-118Pivot ARR by cohort quarter

COR-089Correlate retention × onboarding completion

SCH-077Retention_cohort column on users

ANO-223Flag anomalous refund surge (Q3)

OPT-056Optimize quarterly_report.sql

AUD-145Audit PII access on prod mirror

SEG-091Segment power users by tenure

INV-318Investigate latency regression on /search

PIV-224Pivot churn by acquisition channel

COR-156Correlate NPS with ticket volume

SCH-104Add cohort_activity_daily rollup

OPT-087Index scan → seq scan on events_log

AUD-210Audit row-level policies on reporting

INV-499Investigate duplicate event_ids after replay

PIV-301Pivot feature adoption by plan tier

COR-402Correlate session length × repeat_purchase

SCH-142Drop deprecated user_metadata_v1

OPT-033Rewrite dashboard_view to avoid full scan

AUD-055Verify PII encryption on backups

Agents working · 8

CoddPIV-118Pivot ARR by cohort quarter

expecting ~981 rows post-aggregate

TukeyINV-412Investigate null_pct spike in signup.events

lining up deploy log against event pattern

KimballCOR-089Correlate retention × onboarding completion

splitting by self-serve vs sales-led

KnuthSCH-077Retention_cohort column on users

drafting rollback alongside migration

HopperANO-223Flag anomalous refund surge (Q3)

lining up deploy log against event pattern

ShannonOPT-056Optimize quarterly_report.sql

shortlisting window-function vs indexed scan

WickhamAUD-145Audit PII access on prod mirror

pulling access logs · last 30 days

BayesSEG-091Segment power users by tenure

splitting by self-serve vs sales-led

Ready for you · 5

INV-240Refactor cohort_activity to incremental model

new

SCH-019Add retention_cohort column to users

approved

PIV-088Pivot signups by campaign × week

new

OPT-044Rewrite fact_sessions for partition pruning

new

COR-107Correlate D30 retention × onboarding steps

comment

Three agents working a single query: write, review, pressure-test.

Under the hood: multiple LLM providers through OpenRouter. Streaming insights over server-sent events. A tool layer that dispatches to specialised handlers for dbt commands, git operations, schema lookups, and query analysis. Model selection so users can choose their provider.

Watch the routed system on a single SQL file. The agent surfaces the problem, drafts a reviewed fix, shows the diff, and benchmarks the output against the baseline. The engineer judges the work.

WITH scored AS (

SELECT

  g.week,

  t.abbr                       AS team,

  t.division,

  g.points_scored              AS pts,

  AVG(g.points_scored) OVER (

    PARTITION BY t.abbr

    ORDER BY g.week

    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

  )                            AS rolling_3,

  AVG(g.points_scored) OVER (

    PARTITION BY t.abbr

  )                            AS season_avg

FROM games g

JOIN teams t USING (team_id)

WHERE g.season = 2025

)

SELECT team, division, week, pts,

     ROUND(rolling_3, 1)      AS last3_avg,

     ROUND(rolling_3 - season_avg, 1) AS vs_season

FROM   scored

WHERE  week >= 5

ORDER  BY vs_season DESC

LIMIT  25;

Running nightly scan…

Ask phos4 to refactor a model…

Claude Thanos 4.8

An agent turning a flagged SQL problem into a reviewed fix, a diff, and a benchmark.

Memory and context

A working memory the agent earns through the session.

Conversation history carries context chips so users can pin specific tables, schemas, or prior queries to the conversation. Auto-analysis surfaces insight about query results inline. Findings accumulate across the session. The ones that hold up reaffirm, the ones that don't get invalidated, and what's left is a working memory of what the engineer has actually learned about the data.

Memory graph · 100 lobe · 380 totalentity tour

entity

Acme Inc.

reaffirmed

CRM · synced 2h ago2023-04-14confidence high

Enterprise customer since 2023. VP Data: Maya Chen. 120-seat team, pays quarterly, renewed twice.

Maya ChenContract #8241Acme data team

Reaffirmed 4× · last accessed 2h ago

Linked from onboarding intake

The memory graph as it accumulates, reaffirms, and invalidates findings across a session.

Data surfaces

Around the editor: pivot, profile, browse.

The editor is one panel; the rest of the tool is a working analytics surface. AG Grid Enterprise drives the result table. A custom column profiler reads distribution, density, and drift on every field. The schema browser, the pivot configuration, and the profiler share state with the editor through Redux slices, so a query, its results, and a pivoted view of those results are reading from the same ground truth.

Pivot settings

Rows02

Aateam.divisionasc

Aateam.nameasc

Columns01

123game.weekasc

Values01

123points_scoredsum

Filters01

123season=2025

123game.week≥4

Add filter

synced

Team

Total

Avg

AFC East3 teams

BUFBills

—

197

MIADolphins

—

162

NEPatriots

—

124

AFC North3 teams

BALRavens

—

201

CINBengals

—

154

PITSteelers

—

152

AFC West2 teams

KCChiefs

—

183

LACChargers

—

174

NFC East2 teams

PHIEagles

—

192

DALCowboys

—

174

NFC North2 teams

DETLions

—

199

GBPackers

—

160

NFC West2 teams

SF49ers

—

205

LARRams

—

162

League Avg

199

Column profile

nfl_combine · 100 rowspulled from BAL · W6

player_name

100 uniq

overall_pick

1 → 224

drafted_by

28 teams

position

WR · 21

wingspan_in

μ 77.1

composite

left-skew

athletic_rank

uniform

bench_reps

mode 14

forty_yard

μ 4.58

vertical_in

μ 34.6

is_senior

65% T

shuttle_run

μ 4.32

The pivot surface, settings panel, and column profiler: three layers around the editor sharing one state model.

The editor

Monaco that comprehends SQL.

Phos4's Monaco integration goes well past highlighting. 18 themes, semantic token mapping, CSS-variable integration. Dialect-aware behaviour for Snowflake, BigQuery, Databricks, and Redshift: CTE-aware code folding that follows the query's logical structure, a join profiler with predicate analysis and join-type detection, dialect-specific function autocomplete, Jinja templating for dbt workflows, and a baseline diff for comparing query versions. Focus Mode dims inactive panels; Zen Mode strips everything but the query and the result.

phos4 / queries / weekly_trends.sql

WITH scored AS (

SELECT

  g.week,

  t.abbr                       AS team,

  t.division,

  g.points_scored              AS pts,

  AVG(g.points_scored) OVER (

    PARTITION BY t.abbr

    ORDER BY g.week

    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

  )                            AS rolling_3,

  AVG(g.points_scored) OVER (

    PARTITION BY t.abbr

  )                            AS season_avg

FROM games g

JOIN teams t USING (team_id)

WHERE g.season = 2025

)

SELECT team, division, week, pts,

     ROUND(rolling_3, 1)      AS last3_avg,

     ROUND(rolling_3 - season_avg, 1) AS vs_season

FROM   scored

WHERE  week >= 5

ORDER  BY vs_season DESC

LIMIT  25;

connected · prod-readonlypostgres 16

SQLUTF-8LFLn 20, Col 12

The editor itself: a four-tab Monaco surface with dialect-aware highlighting, CTE folding, and inline join profiling.

The team practice and the in-product agents were built side by side. Same scoping rules, same review conventions, same line between agent judgment and human judgment. That's why the team practice didn't unwind the day I left, and why the in-product AI rides the same review patterns the team uses.

Credit where it's due. The VP of Frontend Engineering built the DuckDB WASM integration that handles client-side SQL processing with Apache Arrow and Parquet, eliminating server round-trips for query execution. Other engineers built backend services, the profiler pipeline, and the warehouse connection layer. I owned the design, the frontend architecture, the theming engine, and the AI integration layer. The team built on top of it.

Brand and web design

An Edison-bulb visual language for engineers close to the metal.

Phos4 had no mark, no typographic system, and no voice. The marketing site at phos4.ai was placeholder.

I designed the brand from scratch. Mark, palette, typography, motion. The visual language reaches back to the Edison-bulb and CRT-phosphor era of computing, when builders were tinkerers and the path from intent to result ran through equipment they could open up and adjust by hand. That era is the right reference for SQL because the work hasn't actually changed shape. Engineers still run real queries against real warehouses with real consequences, and the agentic layer compounds the parts that should compound without removing the engineer from the metal.

The marketing site at phos4.ai runs on the same tokens as the product, so an engineer arriving from a peer recommendation lands on the same surface they'll work inside all day. The hero on phos4.ai has a quiet trick: drag the cursor across the surface and the screen smudges along your trail like a finger across a CRT, exposing the code that's been there the whole time.

phos4.ai in motion.

The web presence does its own work: positioning Phos4 against legacy SQL IDEs on one side and generic AI-wrapper products on the other; demoing the editor in motion; surfacing the agent's behaviour without making it the headline; converting the engineer who landed there from a peer recommendation.

Outcome

What shipped.

Phos4 launched as a production SQL IDE at phos4.ai. The CEO is now the team's top contributor by a wide margin, the team is sustaining the new velocity, and a model projects a 924-commit full-rate week. The frontend ships 285 components on a foundation that carries the product. The brand and the marketing site put Phos4 in front of the engineers it's for.

Top contributor share

30%

Projected commits/wk

924

Components shipped

285

Colophon

Every figure here is a live React component. Each is scoped so it can't leak into the page, and the same component renders inside chat cards through a shared registry. Ask the agent to show you one.

Praise

Honestly, I don't think we'd have shipped on time without David. We had a broken front end, a team that hadn't really used AI in production, and no brand to put on the door. He worked all three at the same time, and we made it to our public beta.
Leon TchikindasCEO, Phos4
LT