Integrating a real, keyed greenscreen actor into an AI-generated Jefferson Memorial background whose camera move matches the live plate — what was tried, what won, what died, what runs next.
The pipeline works end-to-end. Keying is broadcast-clean, the locked-shot composite is proven, and the camera problem is solved two ways: pushed to Seedance's interpretive ceiling for look, and hard-locked by construction with local depth-conditioned generation for accuracy. The central law learned:
| Stage | State | Evidence |
|---|---|---|
| Keying (CorridorKey) | solved | All 3 plates keyed clean; true-alpha ProRes 4444 delivered (jib) |
| Locked-shot composite | proven | Statue-reframe comp v1 — §08 |
| Camera match — look-first | model ceiling | Seedance start+end keyframe clamp — §05 |
| Camera match — accuracy-first | proven | LTX depth render-to-real, trajectory locked by construction — §06 |
| Camera-true BG plates — all 3 shots | solved | Previz-steered generation (MegaSaM → Blender → keyframes → Seedance) — §07 |
| Dolly / jib moving comps | interim | Dolly comp over the previz-steered BG done; jib/locked comps + grade match open — §08 |
| Relight / harmonization | pending | Build-our-own stack scaffolded, not installed — §10 |
Five workflows were tested (or designed) this cycle. W1 is the production path and W5 is now the production BG generator feeding it — everything else feeds them or was eliminated getting here. Arrows show data direction; every generated background ultimately meets the real keyed subject in W1's composite.
| Strategy | Camera accuracy | Look | Use when |
|---|---|---|---|
| A · Seedance + keyframes | endpoint-clamped, mid-path drifts | 10/10 | Hero look, forgiving moves |
| B · Seedance empty-BG preset | interpretive (preset vocabulary) | 10/10 | Key+composite shots, art-directed plates |
| C · LTX depth render-to-real | exact, by construction | photoreal, below Seedance | Accuracy-critical camera, local, free |
One performer, three camera behaviors: a locked-off medium, a push-in dolly, and a crane-down jib. Together they cover the camera-matching problem space from trivial to hard.
CorridorKey on the 5090 (BiRefNet hint → neural key + despill → sharp hybrid composite) outputs the subject on flat gray-148 for clean matte extraction. Naive chroma keying failed outright — the warm dark-olive "green" had r≈g. A distance-ramp matte off the gray, plus interior hole-fill for near-gray dress folds, yields the delivered true-alpha ProRes 4444.
03-keyed/jeff_jib_key_ALPHA.mov (ProRes 4444, 4K, 246 MB — local)Keyer validation suite — two representative cases shown source-beside-result (keys composited on a gradient specifically to expose edge artifacts), plus one deliberate edge case.
The law from four early iterations still holds — Seedance interprets a camera, it never hard-locks one — but the ceiling moved. Feeding the solved-camera previz as a video_references motion track (rather than describing the move or clamping endpoints alone) is what closed the gap; that recipe graduated into W5 (§07). Current evidence, all against the MegaSaM-solved cameras:
routeB_seedance*.mp4, station_kf/_compare/.The other side of the deciding axis: structure as conditioning. Two engines tested — one died on look, one is the proven accuracy path. (Naming note: the repo's research docs use "Route A" for Uni3C reference-copy — untested to date; the Wan test below was run as a hard-lock engine trial.)


pushIn preset renders an optical zoom (uniform magnification, no parallax). Its superDollyIn preset — and the LTX depth path by construction — produce true forward translation: foreground columns slide past frame edges at a different rate than the back wall. Always inspect start/mid/end frames for parallax before accepting a "dolly."The winning recipe, proven end-to-end on all three shots on Jul 4. The real camera is tracked, not described: MegaSaM (DROID-SLAM + monocular depth, on the 5090) solves every frame of each greenscreen plate — position, rotation, and focal — where Blender's tracker failed outright on the flat green cloth. The solves import into one Blender scene as keyframed cameras under registration empties (focal-derived endpoints → look/travel alignment → de-roll), get staged against the art-directed rotunda blockout (with a Gaussian-smoothed twin baked for shaky solves), and render as clay previz — stills for keyframe generation, full-length move videos as the motion reference. Canonical write-up: WORKFLOW-BG-GENERATION.md.
1 · Tracking → Blender. Solved cameras ghosted over their plates — the registration check that earned the "spot on" sign-off before anything was generated:
DollyCamSmooth) is baked alongside2 · Previz move renders. Staged scene, subject marks hidden, one clay move video per shot — these are fed to the video model verbatim as video_references:
LockedCamSmooth)3 · Previz stills → generated keyframes. Each clay previz frame (left) beside the photoreal keyframe generated from it (right). Two models, two jobs: GPT-Image-2 (Higgsfield, 2K/high) marries the previz composition with a REAL photograph's materials — the only model tested that swaps the clay placeholder for the real Jefferson statue in one pass. Nano Banana (imagen-nano-banana-2) derives every other view from that master frame, keeping all shots in one coherent generated rotunda:









4 · Elements per shot. Exactly what went into each Seedance 2.0 run (Higgsfield, 720p/5s; job IDs from prior generations work directly as media references):
| Shot | start_image | end_image | video_references | Result |
|---|---|---|---|---|
| Dolly | GPT-Image-2 start | NB end | Dolly previz move (smoothed) | plate adopted (operator's own run of the recipe) |
| Jib | NB angle-match v2d | — none — | Jib previz move (raw) | previz alone held the path; end frame unnecessary |
| Locked | GPT-Image-2 start | — none — | Locked previz move (smoothed) | carries the plate's tripod micro-motion |
5 · The prompts. Verbatim production prompts — the bolded clauses are load-bearing (removing any one reproduced a documented failure):
Generate a photorealistic image of the Thomas Jefferson Memorial interior. The FIRST reference image is a gray clay 3D previz frame — it defines the EXACT composition to reproduce: a LOW-ANGLE camera looking slightly up, the bronze statue on its pedestal at frame-right against the coffered dome, columns placed exactly as shown, floor plane low in frame. Match this composition precisely — do not recenter or reframe. The SECOND reference image is a real photograph of the actual Jefferson Memorial interior — use it as the source of truth for everything visual: the real white Georgia marble and its veining, the real engraved inscription text panels with laurel wreaths on the walls, the real dark weathered bronze Jefferson statue (replace the clay placeholder figure with the real statue's actual sculpted form), the real coffered dome detail, the polished floor reflections, and its soft natural daylight. Empty interior, no people.
IMAGE 1 is a real photograph of the Thomas Jefferson Memorial interior — the MASTER SCENE: this exact bronze statue, white marble, inscription panels, soft even neutral daylight. IMAGE 2 is a gray clay 3D previz frame that defines the CAMERA for a second photo of the exact same scene — and its framing is COMPLETELY DIFFERENT from image 1: the camera is down near the FLOOR, tilted strongly UPWARD. Reproduce IMAGE 2's framing exactly: the coffered dome ceiling fills the entire TOP HALF of the frame, the statue on its pedestal stands at the RIGHT THIRD of frame seen from below against the dome, columns lean inward with strong upward perspective convergence, and only a small strip of floor shows at the very bottom. Do NOT reuse image 1's eye-level framing. Every material and lighting property still comes from IMAGE 1 unchanged: same neutral white-balanced daylight (not warm, not golden, not moody), same marble, same dark weathered bronze, same grade — two photos minutes apart in the same session. Empty interior, no people. Photorealistic.
Slow cinematic crane shot inside a neoclassical marble rotunda. The image reference is the exact opening frame: a low camera position looking up, coffered dome filling the top of frame, memorial sculpture at the right third. The video reference is a gray 3D architectural previsualization showing the exact camera path to follow: a smooth motorized crane move — the camera starts low and rises steadily while tilting down, moving closer, settling on a tighter framing near the sculpture's stone base. Constant speed, no handheld sway, no walking rhythm, no speed ramps. Follow the previsualization's framing trajectory exactly. Keep the marble architecture, engraved wall text and soft neutral daylight from the opening frame consistent for the whole shot. Empty interior, documentary style.
A static tripod shot inside a neoclassical marble rotunda. The image reference is the exact opening frame — hold this exact composition for the whole shot. The video reference is a gray 3D architectural previsualization of the EXACT camera behavior to reproduce: a locked-off tripod camera with only the faintest natural micro-movement — follow it exactly. NO push, NO drift, NO pan, NO tilt, NO zoom, NO handheld sway beyond what the previsualization shows. The interior is empty and still; soft neutral daylight through the colonnade. Keep the marble architecture, engraved wall text, sculpture and lighting from the opening frame perfectly consistent from first frame to last. Documentary style, photorealistic.
6 · Generated plates + previz verification. Every generation ships with a 50% previz-ghost overlay — the acceptance test that the model rode the solved camera:
Real keyed subject over generated backgrounds. The locked shot is proven; the dolly comps exist in two BG flavors; the jib comp is next (its empty Seedance BG already exists — §09).




Feeding Seedance the raw greenscreen video + a reference BG image + "replace the background, keep her exactly" produced a repainted stranger on a random stage. Tested on two platforms (magnific + Higgsfield manual runs) — same law both times.
Decision locked after a Beeble/SwitchLight build-vs-buy analysis: build our own. IC-Light v1 FBC + DiffusionLight (single-frame HDRI lock — the anti-shimmer trick for our lighting-static BGs) + Marigold, on the 5090. Open stack ≈ 80–90% of SwitchLight on stills; Beeble API ($0.10/gen) kept as the reference bar only. Scaffolded at relight/, ~15 GB install not yet run.
| Model / method | Camera control | Preserves real footage? | Look | Output sizes | Where it fits |
|---|---|---|---|---|---|
| Seedance 2.0 (keyframes) | endpoint clamp; presets; interprets | no — repaints | 10/10 | 480p·720p·1080p·4K | Pipeline base for BG look |
| Seedance 2.0 (video ref) | copies plate move well | no — repaints subject | 10/10 | 480p·720p·1080p·4K | BG-only gen; Obscura candidate |
| LTX-2 depth IC-LoRA (local) | exact — structure conditioned | n/a (generates BG) | photoreal, sub-Seedance | VRAM-bound · ~720p on 5090 | Accuracy-critical BG |
| Wan2.2 Fun Control (local) | exact — depth conditioned | n/a | 3/10 local | 480p·720p | Rejected; possible guide layer |
| Uni3C (Wan2.2 cloud) | geometry-aware ref copy | no — re-renders | 6–7 cloud | host-dependent | Repo "Route A" — untested fusion ingredient |
| Kling 3.0 / Omni | endpoint clamp — tightest of the 4-way | no — regenerates | high | 720p·1080p·4K (video-ref ≤1080p) | Shootout run Jul 3 — clamp honored; Omni video-ref re-anchored the move onto the statue |
| Runway Gen-4.5 | start frame only — no end clamp | no — regenerates | high | 720p native (+4K upscale) | Shootout run Jul 3 — drifted off brief without an end keyframe |
| Veo 3.1 | endpoint clamp honored | no — regenerates | high | 720p·1080p·4K | Shootout run Jul 3 — viable clamp alternate |
| LTX-2.3 Obscura Remova | n/a (V2V removal) | yes — reconstructs behind occluder | 1080p cap | 1080p max | Planned: strip AI person from Seedance ref output |
| CorridorKey (5090) | n/a | yes — it IS the footage | broadcast-clean | matches source (4K tested) | The subject track |