Knowledge Hub, Part 2: Giving it a voice (and a web address)

Part 1 ended with a working Telegram bot: words, quizzes, conversations, and daily recaps pushed into a chat.

Then the limits became obvious. A language tool should let you speak, and Telegram was the wrong surface for the WebSocket/WebRTC loop behind real-time voice. I also wanted to test a timeline-first interface in plain JavaScript. So the POC grew a web version and a voice mode. The useful part: neither required a rewrite.

The thing that made this cheap: the engine already stood alone

The learning engine already stood apart from Telegram. It delivers items, tracks seen → learned → mastered, generates and caches content, and routes quizzes by status. Telegram was only one shell around it.

That made the web app a second shell, not a second product. Words, quizzes, and explanations call the same plugin, cache, and model factory from Part 1. The expensive part was already done.

Rule one: do not break the bot that works

The Telegram bot is stable. The web app is experimental. Those should not share fate.

The split is one flag: KNOWLEDGE_HUB_APP_ROLE=web. It boots a dedicated web runtime with the background scheduler off. Same codebase, same database, isolated processes. I can iterate on the web app without risking the webhook that serves Telegram.

The web app: low-tech on purpose

The browser front end is plain ES modules: auth.js, quiz.js, timeline.js, voice.js, words.js. No framework, no bundler, no build step.

That is deliberate. The goal was to test the timeline concept quickly, with JavaScript the browser can run directly. The timeline is the main navigation: what you learned, what happened today, and what comes next. Auth is also web-native: email code, signed-cookie sessions, separate from Telegram but tied back to the same user model.

The new sense: real-time voice

The web app can talk.

A LiveKit agent runs a Google Gemini real-time native-audio model, so you can roleplay a German scenario out loud. The spoken scenario comes from the same prompt definition used by the text conversation. One "order a coffee in a café" scenario, two modes: typed in Telegram, spoken in the browser.

The voice-token route is kept small and separate because that is the security-sensitive part.

Where the cheap-and-cached story finally runs out

Part 1 leaned hard on caching: generate once, reuse forever, default to a cheap model. Voice is different.

You cannot cache a live conversation. Real-time audio is paid per session, per minute. So voice has to be treated as a metered feature, used where speaking practice is worth the cost. Caching makes text cheap; voice is where the bill becomes a product decision.

Honest about the seams

The timeline is still a prototype. It is assembled from topic profiles, item status, and daily stats, not a real event log. A proper learning_events table, persisted web quiz and voice sessions, and review forecasting are planned, not shipped.

That is why the web surface is allowed to be messy while Telegram stays stable.

The reusable pattern

Once your core stands alone, new surfaces are additive — share the core, vary the shell, and isolate the fast-moving surface so it cannot take down the stable one.

Keep the engine separate from the surface.

Use runtime roles so stable and experimental apps do not share fate.

Use plain JavaScript when speed and durability matter more than tooling.

Reuse domain definitions across text and voice.

Know where caching stops and metering starts.

Where it leaves the project

The arc is simple: a Telegram learning loop made cheap by caching, then a web surface for real-time voice and timeline experiments. The point was never just German or Telegram. It was the bet that if the engine is right, the next subject, surface, or modality should be additive.

The project facts, status, and stack live in the Knowledge Hub lab note.