R&D Report · Interim

AI Camera Match.

Integrating a real, keyed greenscreen actor into an AI-generated Jefferson Memorial background whose camera move matches the live plate — what was tried, what won, what died, what runs next.

Report · 2026-07-04 v0.10.0 · interim
4 stages solved 3 interim / pending 3 dead ends closed 8 research notes 62 evidence proxies · 30 MB
01

Executive Summary

The pipeline works end-to-end. Keying is broadcast-clean, the locked-shot composite is proven, and the camera problem is solved two ways: pushed to Seedance's interpretive ceiling for look, and hard-locked by construction with local depth-conditioned generation for accuracy. The central law learned:

Every reference-video model we tested repaints the subject (Seedance 2.0, confirmed independently on two platforms). "Keep the person, replace the background" is only achievable by keying the real subject and compositing — so the pipeline splits: real subject (key) + generated background (camera-matched), married in the comp. Seedance is the current BG recommend — a SOTA shootout (Kling, Runway, Veo) is planned before locking it.
StageStateEvidence
Keying (CorridorKey)solvedAll 3 plates keyed clean; true-alpha ProRes 4444 delivered (jib)
Locked-shot compositeprovenStatue-reframe comp v1 — §08
Camera match — look-firstmodel ceilingSeedance start+end keyframe clamp — §05
Camera match — accuracy-firstprovenLTX depth render-to-real, trajectory locked by construction — §06
Camera-true BG plates — all 3 shotssolvedPreviz-steered generation (MegaSaM → Blender → keyframes → Seedance) — §07
Dolly / jib moving compsinterimDolly comp over the previz-steered BG done; jib/locked comps + grade match open — §08
Relight / harmonizationpendingBuild-our-own stack scaffolded, not installed — §10
02

The Pipeline

Five workflows were tested (or designed) this cycle. W1 is the production path and W5 is now the production BG generator feeding it — everything else feeds them or was eliminated getting here. Arrows show data direction; every generated background ultimately meets the real keyed subject in W1's composite.

W1

Key + Composite — the production path

adopted
GREENSCREEN PLATE4K source · 3 shotsCORRIDORKEY → ALPHAneural key · ProRes 4444 · 5090CAMERA SOLVE2D track · push-ramp 1.82×SEEDANCE BG GENA · clamp — B · empty presetCOMPOSITEreal subject over gen BGRELIGHTpendingDELIVERYEXR · AEcamera scale
W2

Render-to-Real — camera hard-lock

proven
GREENSCREEN PLATEdolly · 143fCAMERA SOLVEplanar → push-ramp 1.82×BLENDER PREVIZclay guide · measured rampLTX DEPTH IC-LoRAexact camera · photoreal · 220s▸ COMPOSITEinto W1WAN2.2 FUN CONTROLcamera exact · look 3/10 ✗
W3

Video-Ref Regenerate

dead end
RAW GREENSCREEN VIDEOreference · real crane moveREFERENCE BG IMAGEJefferson platePROMPTreplace BG · keep subjectSEEDANCE VIDEO-REFmagnific + Higgsfield runsREPAINTED SUBJECT ✗identity lost · staging randomEMPTY BG · REAL CRANE MOVEusable jib BG plate▸ COMPOSITEinto W1
W4

Obscura Hybrid

planned
SEEDANCE VIDEO-REF · 4Kaccurate move · repainted personOBSCURA REMOVALTX-2.3 V2V removal · 1080pMASK-BACK INTO 4Kreveal sliver onlyCOMPOSITE REAL SUBJECTplanned · not yet run
W5

Previz-Steered Generation

production BG path
GREENSCREEN PLATE3 shots · 4KMEGASAM SOLVEAI track · all frames + focal · 5090BLENDER SCENEregister · stage · smooth twinPREVIZ RENDERSstills + move video · marks hiddenGPT-IMAGE-2 STARTpreviz comp + REAL photo materialsNB ANGLE-MATCHmaster scene → previz camera angleSEEDANCE 2.0start_image + previz video_references▸ COMPOSITEinto W1OVERLAY VERIFY50% previz ghost over gen — every shotstills + movestart frameend (optional)motion ref
solved / adopted works with caveats pending / planned dead end
StrategyCamera accuracyLookUse when
A · Seedance + keyframesendpoint-clamped, mid-path drifts10/10Hero look, forgiving moves
B · Seedance empty-BG presetinterpretive (preset vocabulary)10/10Key+composite shots, art-directed plates
C · LTX depth render-to-realexact, by constructionphotoreal, below SeedanceAccuracy-critical camera, local, free
03

The Three Test Shots

One performer, three camera behaviors: a locked-off medium, a push-in dolly, and a crane-down jib. Together they cover the camera-matching problem space from trivial to hard.

LOCKED · 151f · 2D track confirms static (worst drift 1.56% ≈ handheld micro-motion)
DOLLY · 143f push-in · planar solve (1 marker) → feature-spread ramp measured 1.82×
JIB · 151f crane high→low · planar; move read from footage, no 3D solve
04

Keying

solved

CorridorKey on the 5090 (BiRefNet hint → neural key + despill → sharp hybrid composite) outputs the subject on flat gray-148 for clean matte extraction. Naive chroma keying failed outright — the warm dark-olive "green" had r≈g. A distance-ramp matte off the gray, plus interior hole-fill for near-gray dress folds, yields the delivered true-alpha ProRes 4444.

Locked key · gray-148 output
Dolly key · re-keyed after byte-verified transfer fix (stale-footage bug caught & killed)
Jib key · full crane range; alpha master: 03-keyed/jeff_jib_key_ALPHA.mov (ProRes 4444, 4K, 246 MB — local)

Keyer validation suite — two representative cases shown source-beside-result (keys composited on a gradient specifically to expose edge artifacts), plus one deliberate edge case.

Original — full-body dancer, 4K greenscreen source
keyedSharp-hybrid composite — the keyer's signature pass: neural matte + high-freq detail recovery
Original — fine blonde strands against a dark suit, greenscreen source
keyedFine strands keyed — low-contrast hair edges hold
before / afterGreen plate → keyed — split-screen validation, dancer at 4K
edge caseFlying dark hair, mid-motion — a deliberate stress test well beyond normal production conditions; strands survive
Pipeline bug worth remembering: the original key run silently keyed stale footage — scp to a fixed filename dropped without error. Fix: dd-over-ssh push with byte-count gate before every key. Upgrade path researched: MatAnyone 2 (−26% MAD) as the single highest-leverage keyer swap.
05

Camera Match — Seedance Track

model ceiling

The law from four early iterations still holds — Seedance interprets a camera, it never hard-locks one — but the ceiling moved. Feeding the solved-camera previz as a video_references motion track (rather than describing the move or clamping endpoints alone) is what closed the gap; that recipe graduated into W5 (§07). Current evidence, all against the MegaSaM-solved cameras:

Dolly move-match quad — plate · previz · generated candidates side-by-side against the solved camera
Jib move-match quad — same audit for the crane move
engine A/BSeedance vs Gemini Omni — identical inputs (keyframes + previz motion + mechanical-dolly language); Omni holds staged composition truer, Seedance pushes harder and runs the full 5s (Omni clamps ~4s)
closedFinal dolly vs previz — the adopted generation against the solved-camera previz
Lineage (evidence retained in repo): v1 generic-still reference re-composed the shot entirely → v2 legible previz landed composition but inverted the depth direction → v4 start+end keyframes pinned the endpoints with mid-path drift → previz-as-motion-reference (W5) finally carried the full trajectory. Masters: routeB_seedance*.mp4, station_kf/_compare/.
06

Camera Match — Hard-Lock Track

The other side of the deciding axis: structure as conditioning. Two engines tested — one died on look, one is the proven accuracy path. (Naming note: the repo's research docs use "Route A" for Uni3C reference-copy — untested to date; the Wan test below was run as a hard-lock engine trial.)

dead endWan2.2 Fun Control (5090, depth) — camera locked by construction but look = 3/10, unusable for finals. Kept only as a possible guide layer in an untested fusion.
provenLTX-2 depth IC-LoRA render-to-real (5090) — Blender dolly previz in, photoreal rotunda out, trajectory identical frame-for-frame. 220s local, free. Statue blobby (crude proxy + 512×288) — levers identified. Reverses the Jun-27 LTX demotion only under depth conditioning — free generation stays demoted. See research note 2026-07-02.
Blender clay guide, start/mid/end
Structural guide · Blender clay rotunda, 143f measured 1.82× push, real parallax baked in (start / mid / end)
LTX photoreal output, start/mid/end
Photorealized · same frames after depth-conditioned LTX — camera is the model's input, not its guess
Zoom vs dolly — the parallax test. Seedance's pushIn preset renders an optical zoom (uniform magnification, no parallax). Its superDollyIn preset — and the LTX depth path by construction — produce true forward translation: foreground columns slide past frame edges at a different rate than the back wall. Always inspect start/mid/end frames for parallax before accepting a "dolly."
zoomSeedance pushIn — magnification only; flagged in review, replaced
dollySeedance superDollyIn — genuine travel: camera passes between the foreground columns
07

Previz-Steered Generation — the Production BG Path

solved · all 3 shots

The winning recipe, proven end-to-end on all three shots on Jul 4. The real camera is tracked, not described: MegaSaM (DROID-SLAM + monocular depth, on the 5090) solves every frame of each greenscreen plate — position, rotation, and focal — where Blender's tracker failed outright on the flat green cloth. The solves import into one Blender scene as keyframed cameras under registration empties (focal-derived endpoints → look/travel alignment → de-roll), get staged against the art-directed rotunda blockout (with a Gaussian-smoothed twin baked for shaky solves), and render as clay previz — stills for keyframe generation, full-length move videos as the motion reference. Canonical write-up: WORKFLOW-BG-GENERATION.md.

1 · Tracking → Blender. Solved cameras ghosted over their plates — the registration check that earned the "spot on" sign-off before anything was generated:

Jib solve · 151f / 31mm — MegaSaM camera over the plate; crane rise-and-tilt captured frame-for-frame
Dolly solve · 143f / 34.2mm — raw solve carries real handheld jitter; a σ5 Gaussian world-space twin (DollyCamSmooth) is baked alongside
Locked solve · 151f / 27.3mm — "static" still solves to real tripod micro-motion, kept deliberately

2 · Previz move renders. Staged scene, subject marks hidden, one clay move video per shot — these are fed to the video model verbatim as video_references:

Dolly previz · 143f · from the smoothed twin
Jib previz · 151f · raw solve (crane already clean)
Locked previz · 151f · σ5-smoothed twin (LockedCamSmooth)

3 · Previz stills → generated keyframes. Each clay previz frame (left) beside the photoreal keyframe generated from it (right). Two models, two jobs: GPT-Image-2 (Higgsfield, 2K/high) marries the previz composition with a REAL photograph's materials — the only model tested that swaps the clay placeholder for the real Jefferson statue in one pass. Nano Banana (imagen-nano-banana-2) derives every other view from that master frame, keeping all shots in one coherent generated rotunda:

Dolly start — clay previz
Dolly start · previz — solved camera, staged scene, frame 1
Dolly start keyframe — GPT-Image-2
masterDolly start · GPT-Image-2 — previz still = composition, real photo = marble, inscriptions, statue
Dolly end — clay previz
Dolly end · previz — same camera at frame 143 of the push-in
Dolly end keyframe — Nano Banana
Dolly end · NB — "photographer walked forward" edit of the master vs the previz end still
Jib start — clay previz
Jib start · previz — low camera, dome overhead, statue right third
Jib start keyframe — NB angle-match
Jib start · NB angle-match (v2d, picked) — master scene forced to the previz camera; needed hard framing language (see prompts)
Jib end — clay previz
Jib end · previz — crane settled tight on the pedestal base. No end keyframe was generated for the jib — the previz motion reference alone held the path (elements table below)
Locked start — clay previz
Locked start · previz — static framing; the tripod micro-motion lives in the move video
Locked start keyframe — GPT-Image-2
Locked start · GPT-Image-2 — eye-level static framing from the locked previz still

4 · Elements per shot. Exactly what went into each Seedance 2.0 run (Higgsfield, 720p/5s; job IDs from prior generations work directly as media references):

Shotstart_imageend_imagevideo_referencesResult
DollyGPT-Image-2 startNB endDolly previz move (smoothed)plate adopted (operator's own run of the recipe)
JibNB angle-match v2d— none —Jib previz move (raw)previz alone held the path; end frame unnecessary
LockedGPT-Image-2 start— none —Locked previz move (smoothed)carries the plate's tripod micro-motion

5 · The prompts. Verbatim production prompts — the bolded clauses are load-bearing (removing any one reproduced a documented failure):

GPT-Image-2 · start frame (two-role reference prompt)
Generate a photorealistic image of the Thomas Jefferson Memorial interior. The FIRST reference image is a gray clay 3D previz frame — it defines the EXACT composition to reproduce: a LOW-ANGLE camera looking slightly up, the bronze statue on its pedestal at frame-right against the coffered dome, columns placed exactly as shown, floor plane low in frame. Match this composition precisely — do not recenter or reframe. The SECOND reference image is a real photograph of the actual Jefferson Memorial interior — use it as the source of truth for everything visual: the real white Georgia marble and its veining, the real engraved inscription text panels with laurel wreaths on the walls, the real dark weathered bronze Jefferson statue (replace the clay placeholder figure with the real statue's actual sculpted form), the real coffered dome detail, the polished floor reflections, and its soft natural daylight. Empty interior, no people.
Nano Banana · angle-match (new camera angle of the master scene — jib start v2d)
IMAGE 1 is a real photograph of the Thomas Jefferson Memorial interior — the MASTER SCENE: this exact bronze statue, white marble, inscription panels, soft even neutral daylight. IMAGE 2 is a gray clay 3D previz frame that defines the CAMERA for a second photo of the exact same scene — and its framing is COMPLETELY DIFFERENT from image 1: the camera is down near the FLOOR, tilted strongly UPWARD. Reproduce IMAGE 2's framing exactly: the coffered dome ceiling fills the entire TOP HALF of the frame, the statue on its pedestal stands at the RIGHT THIRD of frame seen from below against the dome, columns lean inward with strong upward perspective convergence, and only a small strip of floor shows at the very bottom. Do NOT reuse image 1's eye-level framing. Every material and lighting property still comes from IMAGE 1 unchanged: same neutral white-balanced daylight (not warm, not golden, not moody), same marble, same dark weathered bronze, same grade — two photos minutes apart in the same session. Empty interior, no people. Photorealistic.
Seedance 2.0 · moving shot (jib — same skeleton for the dolly with "mechanical dolly on rails")
Slow cinematic crane shot inside a neoclassical marble rotunda. The image reference is the exact opening frame: a low camera position looking up, coffered dome filling the top of frame, memorial sculpture at the right third. The video reference is a gray 3D architectural previsualization showing the exact camera path to follow: a smooth motorized crane move — the camera starts low and rises steadily while tilting down, moving closer, settling on a tighter framing near the sculpture's stone base. Constant speed, no handheld sway, no walking rhythm, no speed ramps. Follow the previsualization's framing trajectory exactly. Keep the marble architecture, engraved wall text and soft neutral daylight from the opening frame consistent for the whole shot. Empty interior, documentary style.
Seedance 2.0 · locked shot (previz still supplies the micro-motion)
A static tripod shot inside a neoclassical marble rotunda. The image reference is the exact opening frame — hold this exact composition for the whole shot. The video reference is a gray 3D architectural previsualization of the EXACT camera behavior to reproduce: a locked-off tripod camera with only the faintest natural micro-movement — follow it exactly. NO push, NO drift, NO pan, NO tilt, NO zoom, NO handheld sway beyond what the previsualization shows. The interior is empty and still; soft neutral daylight through the colonnade. Keep the marble architecture, engraved wall text, sculpture and lighting from the opening frame perfectly consistent from first frame to last. Documentary style, photorealistic.
Prompt failure modes captured: without the mechanical-move clause the dolly rendered as human POV walking (footstep bob); without the lighting lock NB invented a warm backlit mood; without the hard low-angle framing block NB stayed at the master's eye level; "Jefferson Memorial" naming plus certain reference images trips Higgsfield's moderation — the flagger is image-side and non-deterministic (playbook: resubmit verbatim once → soften wording → launder the image with a 2% crop + JPEG re-encode + grain).

6 · Generated plates + previz verification. Every generation ships with a 50% previz-ghost overlay — the acceptance test that the model rode the solved camera:

adoptedDolly BG plate · Seedance, full 3-reference recipe
adoptedJib BG plate · start + previz motion only
adoptedLocked BG plate · smoothed-previz micro-motion
Dolly overlay — previz ghost tracks the push-in
Jib overlay — clay pedestal glued to the bronze through the rise
Locked overlay — first/last-frame diff 2.8/255: genuinely locked
interimFirst composite on a W5 plate — keyed subject over the dolly BG (tape-cross marker removed via animated matte). Open: FG↔BG grade match — she carries the warm greenscreen light against the cool daylight plate.
08

Composites

interim

Real keyed subject over generated backgrounds. The locked shot is proven; the dolly comps exist in two BG flavors; the jib comp is next (its empty Seedance BG already exists — §09).

provenLocked comp v1 — statue-reframe: subject at native scale, Jefferson plate reframed behind her at human scale
interimDolly comp · static BG + ramp zoom — BG is a still given the measured 1.82× push; no BG parallax
interimDolly comp · Seedance BG — moving generated BG behind untouched subject; BG move amount > subject move (matching open)
Jefferson locked BG option A
Locked BG plate (Nano Banana, option A — chosen)
driftedLocked-shot Seedance BG — "static" generation drifts; needs 2D stabilize or regen before the locked comp upgrades to a moving-grade BG (open item)
Dolly eye-level BG plate
Dolly wide eye-level plate — statue right, inscription wall legible (nailed first pass)
Jib end low-angle plate
okJib end low-hero plate
Jib start high-angle plate — failed
failedJib start high-angle — Nano Banana snapped to eye-level and invented a person; stills won't hold composition (known law). Redo or derive from Blender.
09

The Regenerate Dead End — and Its Useful Half

Feeding Seedance the raw greenscreen video + a reference BG image + "replace the background, keep her exactly" produced a repainted stranger on a random stage. Tested on two platforms (magnific + Higgsfield manual runs) — same law both times.

dead endVideo-ref BG-replace (dolly) — output unrelated to the plate; subject repainted
confirms lawHiggsfield manual jib run 1 — she's in the rotunda with the crane move, but the model repainted her face/wardrobe. Second platform, same law.
usableHiggsfield manual jib run 2 — empty rotunda with the real crane move: a ready-made jib BG plate for the key+composite path
Planned test (not yet run): LTX-2.3 Obscura Remova — generate Seedance video-ref at 4K (accurate camera, repainted person), remove the AI person with Obscura, mask the 1080p reconstruction back into the native-4K frame (the real keyed subject re-covers almost all of it), composite the real performer. Best look + camera accuracy if Obscura's temporal reconstruction holds.
10

Relight

pending

Decision locked after a Beeble/SwitchLight build-vs-buy analysis: build our own. IC-Light v1 FBC + DiffusionLight (single-frame HDRI lock — the anti-shimmer trick for our lighting-static BGs) + Marigold, on the 5090. Open stack ≈ 80–90% of SwitchLight on stills; Beeble API ($0.10/gen) kept as the reference bar only. Scaffolded at relight/, ~15 GB install not yet run.

11

Model Capability Map

Model / methodCamera controlPreserves real footage?LookOutput sizesWhere it fits
Seedance 2.0 (keyframes)endpoint clamp; presets; interpretsno — repaints10/10480p·720p·1080p·4KPipeline base for BG look
Seedance 2.0 (video ref)copies plate move wellno — repaints subject10/10480p·720p·1080p·4KBG-only gen; Obscura candidate
LTX-2 depth IC-LoRA (local)exact — structure conditionedn/a (generates BG)photoreal, sub-SeedanceVRAM-bound · ~720p on 5090Accuracy-critical BG
Wan2.2 Fun Control (local)exact — depth conditionedn/a3/10 local480p·720pRejected; possible guide layer
Uni3C (Wan2.2 cloud)geometry-aware ref copyno — re-renders6–7 cloudhost-dependentRepo "Route A" — untested fusion ingredient
Kling 3.0 / Omniendpoint clamp — tightest of the 4-wayno — regenerateshigh720p·1080p·4K (video-ref ≤1080p)Shootout run Jul 3 — clamp honored; Omni video-ref re-anchored the move onto the statue
Runway Gen-4.5start frame only — no end clampno — regenerateshigh720p native (+4K upscale)Shootout run Jul 3 — drifted off brief without an end keyframe
Veo 3.1endpoint clamp honoredno — regenerateshigh720p·1080p·4KShootout run Jul 3 — viable clamp alternate
LTX-2.3 Obscura Removan/a (V2V removal)yes — reconstructs behind occluder1080p cap1080p maxPlanned: strip AI person from Seedance ref output
CorridorKey (5090)n/ayes — it IS the footagebroadcast-cleanmatches source (4K tested)The subject track
Testing standard: all comparative tests run at 720p — cheapest tier every candidate model shares, so results compare like-for-like and iterate fast. Output sizes above are what each model offers for finals; masters get regenerated or upscaled at delivery res once a route wins.