top of page

AI Designer·Solo·2026

An AI chat app
you can see through.

Memory you can toggle. Influence you can witness. A knowledge base you can walk through. Local-first, zero cloud, zero telemetry.

~1 week

Solo build

4

Design surfaces

~300+

Chunks tested

~900ms

Firing pulse

0

Servers, 0 telemetry

WeChat Screenshot_20260422120446.png

Role

Solo designer, researcher, engineer — product strategy through frontend, backend, and ML integration.

Type

Self-initiated product build. Shipped as a cross-platform desktop app.

Stack

Electron · UMAP-JS · k-means · text-embedding-3-small · Anthropic / OpenAI APIs

Platforms

macOS · Windows · Linux — single-file Electron with zero build step for dev.

01 · Problem

Today's AI chat
is opaque by default.

Single-turn Q&A works great. But using AI as a long-term thinking partner breaks on three opacities — none of which are about model capability.

01 / MEMORY

Can't see what it knows.

Memory lists surface after the fact, flat, uncategorized. No situational toggles. "Use this tone for work, not personal" has no affordance.

02 / INFLUENCE

Can't see what mattered.

Users don't know which memories or documents shaped a reply. Output feels like luck. Users over-trust or under-trust — neither is good UX.

03 / SHAPE

Can't see what's in there.

Uploaded files become a name in a sidebar. A 300-page PDF and a one-line note look the same. The knowledge base is a black bag.

02 · Iteration

The four moves weren't a plan.
They were what survived.

Instead of defining everything upfront, I approached this as an iterative design process.
I built a minimal version, tested it in real use, identified failure points, and refined one problem at a time. After multiple iterations, four key design decisions emerged as consistently effective.
The following sections document that evolution.

ITER 01

WHERE I STARTED

Starting with a stateless conversation model

What it was: A plain chat box. Type a question, get a reply. No files, no memory, nothing saved between sessions.

Why that wasn't enough

Every time I opened it, it had forgotten everything. If I told it Monday "I'm a designer working on a music app," on Tuesday it had no idea who I was. Great for one-off questions. Useless as a thinking partner.

ITER 02

+ FILE READING

Grounding AI responses with user data

What I added: Drag in a PDF, a résumé, meeting notes — the AI would actually read them and use them when answering. I also added a little "Sources" list under each reply so you could see where an answer came from.
 

Why that wasn't enough

The sources were broken. I'd ask "what's my identity?" and it would cite a random SEO doc. The same file got listed three or four times for the same answer — like filler. The "Sources" label looked like evidence but was really just noise. I couldn't trust my own app.
 

ITER 03

+ MEMORY CARDS

Introducing persistent user context (memory)

What I added: Little notes the AI always sees — organized by type. Who I am. Projects I'm working on. Preferences. Things I don't want it to do. Flip a card off when it doesn't apply to the current conversation.
 

Why that wasn't enough

The cards lived inside a separate pop-up window. Out of sight, out of mind. I'd write "keep replies under 3 sentences" and then have no way of knowing whether the AI was actually using that card on any given reply, or just ignoring it. The memory existed — but I couldn't see it working.
 

ITER 04

+ VISUAL FEEDBACK

Making AI reasoning partially visible

What I added: When the AI replied, the cards it had actually used would flash amber for about a second, like a little heartbeat. I also built a visual map of everything in the knowledge base — dots grouped into topic-clusters you could zoom and pan around.
 

Why that wasn't enough

Both of these lived inside pop-up windows. To see which card flashed, I had to click the Memory button, leave the chat, watch the flash, then come back to read the reply. The information was there — the cost of looking at it was too high. So I just stopped looking.
 

ITER 05

WHERE I AM NOW

Integrating context and reasoning into the conversation layer

What I added: Small coloured tags directly under every AI reply, showing which memory cards it just used. Hover one to preview the content. Click it to edit the memory on the spot. For sources: only show a file if it's actually relevant to the question, and never the same file twice.

What changed

I can now glance at a reply and know why the AI said what it said — without ever leaving the chat. Sources became real evidence again instead of noise. The four design moves on the next page (Structure, Surface, Navigate, Retain) are the ones that earned their place after surviving all of this.
 

03 · Design framework

Structure → Surface → Navigate → Retain

MOVE 01

Structure

Turn hidden memory into a deck of typed, toggleable cards. User curates, model respects.

MOVE 02

Surface

Animate the invisible. A 900ms amber pulse shows which memories fired — without stealing attention from the answer.

MOVE 03

Navigate

Project embeddings to 2D. Make latent space a place users can pan, zoom, and query — not a lookup table.

MOVE 04

Retain

Store everything locally in one deletable folder. Trust through architecture, not policy copy.

MOVE 01 · STRUCTURE

Memory Cards
— a deck, not a textarea.

Six typed categories. identity · project · preference · taboo · style · general. Chosen after testing free-form tags (sprawled to 40+ in a week) and three broad buckets (too coarse — preferences and taboos behave differently at prompt-injection time). Six was the smallest set the model actually respected.

A parse-paste flow. User pastes a messy brain-dump. One LLM call turns it into cards. User reviews, edits, toggles. 30-minute setup → 30 seconds.

Screeen.gif

MOVE 02 · SURFACE

The firing pulse
— explaining AI influence.

After every reply, the app embeds the response and cosine-scores it against every enabled card. Top cards pulse amber for ~900ms. No numbers, no citation popup — just a quiet witnessable cue: these memories shaped this answer.

Killed the citation list. An earlier version listed "memories used" under each reply. Accurate but ugly — and it trained users to read the footnote instead of the answer. Ambient pulse > explicit citation when the user's primary task is reading.

Screeen.gif

MOVE 03 · NAVIGATE

The 2D semantic map
— latent space, walkable.

Every chunk: embedded (1,536-D), projected via UMAP to 2D, k-means clustered, LLM-labeled. Every dot is a chunk. Every color is a topic. RAG becomes navigation, not retrieval.

Interactions: cursor-anchored wheel zoom, click-drag pan with 5px click/drag threshold, rich hover tooltip with match-% bar, query mode that highlights top-K with connecting lines, fullscreen, keyboard shortcuts (+/−/0/F). Map first. Text second.

WeChat Screenshot_20260422181743.png

MOVE 04 · RETAIN

Local-first architecture
— trust via locality.

All state — every conversation, memory card, KB vector, Gmail credential, API key — lives in one 2.4 MB JSON file on the user's machine. No backend. No account. No sync. No telemetry. The only network traffic is the explicit API call the user configured.

Delete the file = complete reset. That clarity is the feature. The best trust signal is a file the user can delete.

C:\ Users \ …\ AppData \ Roaming \
local-rag-visual-map-system

└── config.json // 2.4 MB — everything you own { conversations: [...], memoryCards: [...], kb: { docs, embeddings }, kbMap: { points, clusters }, anthropicKey: "***", gmailAppPwd: "***" }

→ Delete the file to fully reset.

WeChat Screenshot_20260422231839.png

Animation is an explainability primitive.
Latent space rewards spatial UI.
Trust is a folder the user can delete.

Three things I learned

04 · Tradeoffs

Key design trade-offs that shaped the system

Every design decision eliminates alternatives. These are the trade-offs that materially shaped the experience.

Decision

Chose

Ruled out

Projection (Mental Model, Stability)

UMAP — maintains spatial consistency across re-embedding, preserving users’ mental map as the system evolves.

t-SNE — repositions points on each run, breaking spatial memory.
PCA — oversimplifies semantic relationships, reducing interpretability.

Memory schema (Cognitive Load, Control)

Six fixed categories — reduces cognitive overhead and forces clear decisions about what the AI should retain (e.g. preference vs. constraint).

Free-form tags — quickly fragmented into dozens of labels, reducing clarity.
Three broad buckets — too coarse to support meaningful control.

Explainability UX (Transparency, Flow)

Ambient firing pulse — surfaces system activity subtly without interrupting reading or decision-making.

Citation lists — shifted attention away from answers toward validation mechanics, increasing cognitive friction.

Data architecture (Trust, Privacy)

Local-first — ensures full user control over data, minimizes trust barriers, and reduces perceived risk.

Cloud sync — introduces account dependency, platform lock-in, and additional trust overhead (TOS, storage concerns).

Aesthetic direction (Emotion, Approachability)

Warm beige — frames the product as a thinking space rather than a technical tool, lowering intimidation and increasing approachability.

Dark “tech” UI — reinforces distance and signals a system the user doesn’t fully control.

05 · In motion

30 seconds of the thing
actually working.

Upload PDFs → chunk & embed → build vector map → query “Dallas economy” → watch top-8 relevant chunks light up while the rest fade out.

06 · In motion

A shipped desktop app, four named moves, one working trust model.

4moves

Structure, Surface, Navigate, Retain — each mapped to a shipped surface.

1,536→2D

First consumer-grade UMAP knowledge-base view I'm aware of.

~20s

From 12-PDF upload to a clustered, labeled, queryable semantic map.

100%

Local data. Zero servers. Zero telemetry. Zero account required.

07 · Methodology

How I decided
what to actually build.

Built with Claude as a coding agent via the Agent SDK — standard for 2026. The interesting part isn't the stack; it's the filters every feature had to pass before it got shipped, and the way iteration decisions got made when "working code" stopped being the bottleneck.

STRATEGY — WHAT I OPTIMIZED FOR

Three design filters.
Every feature had to pass all three.

  • Proximity over proof. Evidence signals had to live where the eye already was. Citation lists at the bottom of the chat failed this test. Inline chips directly under each reply passed. Modals failed by default, no matter how right the content was.

  • Structure over freedom. Flat memory lists scaled poorly with use. Six typed categories (IDENTITY, PREFERENCE, TABOO, STYLE, PROJECT, GENERAL) gave users a grammar to think in — the taxonomy was the prompt. Constraint was the feature.

  • Architecture over promise. "Your data is safe" is a policy line. "Delete this one file to fully reset" is a primitive. Trust had to be demonstrable by deletion, not by documentation.

  • Animation over explanation. A 900ms amber pulse communicated "this memory fired" in less attention than any citation UI I drafted. Motion as a primitive, not decoration.

METHODOLOGY — HOW I DECIDED

teration rules I operated under
when execution was no longer the constraint.

  • Friction was the signal. Every feature got a 72-hour test drive in my own workflow. If I avoided opening the app after building it, the feature failed — regardless of whether it technically worked.

  • Kill before adding. Each iteration required deleting something. Inline chips only landed after I removed the firing modal. The fix was never "more UI"; it was relocating the UI that already existed.

  • Reframe, don't polish. When retrieval cited the same source 4× and chip labels read GENERAL, I didn't iterate the component — I re-derived the shape of the data layer. Small symptoms, deep cause.

  • Decisions, not deliverables. In an agent-assisted loop, shipping is cheap and judgment becomes the bottleneck. I structured days around the next taste call (what should a disabled chip look like, should dedupe happen before or after ranking), not around building tickets.

STACK

Electron 30 · vanilla JS · umap-js for 2D projection · Claude / GPT pluggable · text-embedding-3-small · one local config.json as the entire store. No framework, no backend, no telemetry.

AGENT LOOP

Claude driving an Electron repo through the Agent SDK. Custom skills for doc generation, RAG scaffolding, and plugin packaging. My role: spec, taste, direction. Agent's role: code, glue, debug.

DURATION

~1 week of evening work · 5 iterations
Equivalent to a ~4-week build cycle, accelerated through an agent-assisted workflow.

The real shift

When a working build lands in an afternoon, the bottleneck stops being can we build it and becomes should we. That's a design bar going up, not down. The tempo of judgment is the new craft — and the methodology above is what I use to keep taste faster than code.

08 · Reflection

Three things I learned
designing for AI.

01 / ANIMATION

Animation is an explainability primitive.

Started writing citation UIs. Ended with a 900ms amber pulse. The pulse communicates more per millisecond of attention than a paragraph of explanation — because it respects that the user is trying to read the answer, not the meta-answer.

02 / SPATIAL

Latent space rewards spatial UI.

Every time I showed the map, the first reaction was spatial — "that lump over there," never "that topic." Giving a semantic space a location turns abstract AI behavior into something a body can reason about.

03 / TRUST

The best trust signal is a folder the user can delete.

No privacy policy did what a single line in the README did: "Delete the folder to fully reset." Concrete beats legal. Designing for AI trust is, at its best, designing for the moment the user decides to leave.
 

bottom of page