Describe
Who's in the room?
Name the people. Drop a photo or paste their LinkedIn — Conjure extracts their behavior and the question they keep asking under pressure.
Claude · web_search
The build
Multi-agent live video room, end to end on Runway — and a debrief pipeline that keeps talking after you leave the room.
How it works
Most products that touch Runway use one endpoint. Conjure composes the catalog — Custom Avatars, Realtime Sessions, Avatar Videos, and three text-to-video models — into a single room that starts with a description and ends with a film.
Who's in the room?
Name the people. Drop a photo or paste their LinkedIn — Conjure extracts their behavior and the question they keep asking under pressure.
Claude · web_search
Three faces, sixty seconds.
A photo becomes a Runway Custom Avatar. A behavioral note becomes a personality prompt. Three avatars are forged in parallel and seated in the room.
Runway Custom Avatars
Three avatars. One coordinator.
Three live Realtime Sessions in one browser. A server-side coordinator routes your microphone to whoever should hear you, and relays cross-character context via tool calls.
Realtime Sessions · LiveKit · Deepgram
The room keeps talking.
After the room: a narrated debrief, a single cinematic shot of the most charged moment, and a postmortem in which the personas talk about you while you’re not in the room.
Avatar Videos · Seedance 2 · Sonnet
The hard part
Three avatars in a 2×2 grid is the easy part. Making them feel like a room — taking turns, knowing what the others said, reacting to each other — is the work. Conjure’s coordinator is the difference between three chatbots in a video call and a meeting you walked into.
Mic routing for turn-taking
Your microphone is attached to one avatar at a time via WebRTC track manipulation. Only that avatar can hear you. That single input gate produces natural turn-taking instead of a chorus.
Cross-character awareness via tool calls
Each avatar’s personality includes an instruction to call check_room_state before responding. The server returns what the other avatars have said since this one last spoke — plus a behavioral nudge — so each character reacts to the room, not just to you.
Live video, not text
Three concurrent WebRTC sessions render in your browser at once. No Zoom bot, no screen recording — the room is the page.
After the room
One pipeline, three different jobs. Each uses a different Runway primitive — and that’s deliberate.
The lead persona reads back the room in their own voice — what landed, what to sharpen, what to bring next time.
One Claude pass distills the conversation into a cinematographer's shot description. Runway Seedance renders it — eight seconds, ambient sound.
A 60–90 second debrief between the three personas, recorded as you watch. They disagree about you on at least one specific moment. By design.
The stack
Runway
Anthropic
Voice & video
Application
The constraints
Naming what was hard is more credible than pretending nothing was. Each of these is a real Runway / Anthropic API constraint we built around — not a bug list.
Runway Realtime Sessions hard-cap at 5 minutes. Conjure frames it as a feature: focused conversation, not open-ended chat. Every demo is built to land under 90 seconds of conversation.
Tier 2 caps you at three concurrent video sessions. Three is the right number for a room — it's enough to feel populated, not so many that the user gets lost. The constraint shaped the product.
Runway disables webcam input AND tool calling when a custom voice is used on a session. Conjure depends on both. Voice cloning is therefore deferred to v0.2 — until either the API permits the combination or we architect around it.
To give each persona a distinct in-character voice, the debrief and postmortem pipelines synthesize speech via Deepgram Aura 2 and pass the audio to Runway's avatar-video endpoint. Native per-persona Runway voices land when the constraint above is resolved.
Avatars cannot hear each other natively — only the user. Cross-character context flows through a server-side coordinator that returns a summary via the check_room_state tool call. Realistic per-turn budget is 1.0–1.5 seconds. Masked by deliberate visual 'thinking' beats on the listening tile.
The Circle stores personas in localStorage for the current browser. Tier 3 — persistent, refining personas across rooms — lands in v1.0; v0.1 keeps the scope tight.
The road
Multi-agent live room, three artifacts post-session (debrief, cinema, postmortem), persona library scoped to localStorage.
Once Runway resolves the voice + webcam + tool-calling constraint, custom voices unlock. Real-photo persona uploads with explicit consent flow.
The room knows about the meeting on your calendar tomorrow. iOS app for last-minute prep on the way to the room.
Tier 3 fidelity: the people you regularly walk into rooms with are saved, refined, and improve across sessions. The product compounds.