R&D Report · Interim

AI Camera Match.

Integrating a real, keyed greenscreen actor into an AI-generated Jefferson Memorial background whose camera move matches the live plate — what was tried, what won, what died, what runs next.

Report · 2026-07-04 v0.16.0 · interim
01

Summary

Traditional VFX — real plates, a NeRF or Gaussian-splat reconstruction, full CG, or an LED volume — is still the best route, and none of this beats it. This is R&D into a faster, lower-cost alternative, and the finding is that it has crossed into potentially usable.

What we discovered. The question: can an AI-generated background carry the same camera move as a real greenscreen plate, well enough to drop a keyed performer in? The look is there — fully generated shots can look great. Control is the hard part. You can get a generated background very close to a real plate's move, but exact only happens if the subject is regenerated in the same shot too; keep the real keyed performer and the camera lands close, not exact. Control comes from concrete inputs — a previz carrying the tracked camera data and a start frame that blocks out the subject — not from wordier prompts.

Recommendation: two methods, chosen by how exact the camera must be. Method 1 · Previz-Steered — track the plate, generate the BG from it, key and composite the real subject over it (closest to exact). Method 2 · Plate-Direct — drive generation from the plate itself with person-lock (faster, close enough to composite over). Seedance 2.0 is the recommended engine; Gemini Omni is a strong alternate capped at 720p; Kling, Veo, Runway and local Wan/LTX were ruled out — see the table below.

02

Video Generation Models

Every video generation model tested. Movement is the deciding column — matching the original plate's camera move is the goal (only depth-conditioned local gen is truly exact; Seedance/Omni get very close); quality and output limits follow. The 4K-capable rows are the 4K-capable options.

ModelMovementQualityMax outputVerdict
Seedance 2.0very close to the plate move (not exact) via previz/plate refs · true parallaxbest tested4K · 15sRECOMMEND
Gemini Omnivery close to the plate move (not exact) · best physicsstrong720p cap · ~4s capALTERNATE
Kling 3.0tightest keyframesmiddling — not bad, below Seedance/Omni4K · 15sREJECTED — quality
Kling Omni (video-ref)re-anchors composition1080pREJECTED
Veo 3.1keyframes okstrong4K · 8sREJECTED
Runway Gen-4.5drifts (start-only)4K · 10sREJECTED
LTX-2 depth IC-LoRAexact (depth input)weak detail — ok behind heavy bokeh only4K · local (VRAM-bound)GUIDE LAYER ONLY
Wan2.2 Fun Controlexact (depth)unusable720p localDEAD END
Generation is stochastic — budget several rolls. A usable take typically arrives within 2–4 generations of the same recipe (bad camera reads, drifted scenes, or moderation false-positives eat the rest); single-shot success is the exception, not the plan.
Reading the table: cyan = good, violet = usable with caveats, red = ruled out. All comparative tests ran at 720p — the tier every model shares; note Omni pairs its 720p cap with heavy compression (~0.9 MB for a 4s clip).
03

The Pipeline

Everything converges on two output methods: Method 1 (previz-steered — closest to frame-exact) generates BG plates that meet the real keyed subject in the W1 composite, and Method 2 (plate-direct) generates the subject in-scene from the plate itself. The remaining graphs are the supporting and historical workflows. Arrows show data direction.

M1

Method 1 · Previz-Steered Generation

frame-exact · production
GREENSCREEN PLATE3 shots · 4KMEGASAM CAMERA TRACKAI camera solve · all frames + focalBLENDER PREVIZregister · stage · stills + moveGPT-IMAGE-2 STARTpreviz comp + REAL photo materialsNANO BANANA ANGLE-MATCHend / new angle from the masterPREVIZ MOTIONthe clay move · video_referencesSEEDANCE 2.0start_image+ end_image+ video_references▸ COMPOSITEinto W1startendmotion
M2

Method 2 · Plate-Direct Generation

fast · composite-workable
GREENSCREEN PLATEmotion + performance referenceNano Banana START FRAMEplate frame 1 + GPT master scenePERSON-LOCK PROMPT"keep her 100% identical"SEEDANCE 2.0start_image + video_referencesSUBJECT-IN-SCENE VIDEOmove close, not path-exact · she is preservedOBSCURA — PERSON REMOVALselective · removes her, keeps scene intact▸ COMPOSITEreal subject back in · into W1
W1

Key + Composite — the production path

GREENSCREEN PLATE4K source · 3 shotsCORRIDORKEY → ALPHAneural key · ProRes 4444 · 5090CAMERA SOLVE2D track · push-ramp 1.82×SEEDANCE BG GENA · clamp — B · empty presetCOMPOSITEreal subject over gen BGFINALcomposited shotcamera scale
W4

Selective Person Removal

SEEDANCE VIDEO-REF · 4Kaccurate move · repainted personOBSCURA REMOVALTX-2.3 V2V removal · 1080pMASK-BACK INTO 4Kreveal sliver onlyCOMPOSITE REAL SUBJECTLTX-2.3 V2V removal path
StrategyCamera accuracyLookUse when
A · Seedance + keyframespretty close — not pixel-perfect10/10Hero look, forgiving moves
B · Seedance empty-BG presetpretty close — not pixel-perfect10/10Key+composite shots, art-directed plates
C · LTX depth render-to-realexact, by constructionweak — holds up only with heavy bokeh / shallow DOF, not for detail matchLocal, free; camera-exact guide layers, not finals
04

Test Plates

One performer, three camera behaviors: a locked-off medium, a push-in dolly, and a crane-down jib. Together they cover the camera-matching problem space from trivial to hard.

LOCKED · 151f · 2D track confirms static (worst drift 1.56% ≈ handheld micro-motion)
DOLLY · 143f push-in · planar solve (1 marker) → feature-spread ramp measured 1.82×
JIB · 151f crane high→low · planar; move read from footage, no 3D solve
05

Keying

CorridorKey on the 5090 (BiRefNet hint → neural key + despill → sharp hybrid composite) outputs the subject on flat gray-148 for clean matte extraction. Naive chroma keying failed outright — the warm dark-olive "green" had r≈g. A distance-ramp matte off the gray, plus interior hole-fill for near-gray dress folds, yields the final true-alpha ProRes 4444.

Locked key · gray-148 output
Dolly key · re-keyed after byte-verified transfer fix (stale-footage bug caught & killed)
Jib key · full crane range; alpha master: 03-keyed/jeff_jib_key_ALPHA.mov (ProRes 4444, 4K, 246 MB — local)

Keyer validation suite — original greenscreen source and keyed result side-by-side in each clip (keys composited on a gradient specifically to expose edge artifacts).

keyedFull-body dancer · 4K — original above, sharp-hybrid key below (neural matte + high-freq detail recovery)
keyedFine blonde strands + dark suit — original above, keyed below; low-contrast hair edges hold
keyedFlying dark hair, mid-motion — original above, keyed below; strands survive
06

Camera Match — Seedance Track

model ceiling

The law from four early iterations still holds — Seedance interprets a camera, it never hard-locks one — but the ceiling moved. Feeding the solved-camera previz as a video_references motion track (rather than describing the move or clamping endpoints alone) is what closed the gap; that recipe graduated into W5. Current evidence, all against the MegaSaM-solved cameras:

Dolly move-match quad — previz (top-left) vs three generated candidates: top-right rejected, bottom two good. Expect this spread — a usable take typically needs several generations
Jib move-match quad — previz vs three candidates: top-right rejected (move mismatch), bottom-left good with minor image hallucinations, bottom-right good
closedFinal dolly vs previz — the adopted generation against the solved-camera previz
Engine verdict: Seedance 2.0 and Gemini Omni produced the best generations of everything tested. Omni's physics are the standout out-of-the-box — it matches the reference motion's weight and momentum with no special prompting — but its output is capped at 720p; Seedance runs the full 5s and scales to higher resolutions. Current posture: Seedance primary, Omni the physics benchmark.
Method 1

Previz-Steered Generation

closest to frame-exact · all 3 shots · production default

The first of the two methods: the real camera is tracked and re-authored, so the generated plate comes closest to a frame-exact match with the solve. Proven end-to-end on all three shots on Jul 4. The camera is tracked, not described: MegaSaM (DROID-SLAM + monocular depth, on the 5090) solves every frame of each greenscreen plate — position, rotation, and focal — where Blender's tracker failed outright on the flat green cloth. The solves import into one Blender scene as keyframed cameras under registration empties (focal-derived endpoints → look/travel alignment → de-roll), get staged against the art-directed rotunda blockout (with a Gaussian-smoothed twin baked for shaky solves), and render as clay previz — stills for keyframe generation, full-length move videos as the motion reference. Canonical write-up: WORKFLOW-BG-GENERATION.md.

1 · Tracking → Blender. Solved cameras ghosted over their plates — the registration check that earned the "spot on" sign-off before anything was generated:

Jib solve · 151f / 31mm — MegaSaM camera over the plate; crane rise-and-tilt captured frame-for-frame
Dolly solve · 143f / 34.2mm — raw solve carries real handheld jitter; a σ5 Gaussian world-space twin (DollyCamSmooth) is baked alongside
Locked solve · 151f / 27.3mm — "static" still solves to real tripod micro-motion, kept deliberately

2 · Previz move renders. Staged scene, subject marks hidden, one clay move video per shot — these are fed to the video model verbatim as video_references:

Dolly previz · 143f · from the smoothed twin
Jib previz · 151f · raw solve (crane already clean)
Locked previz · 151f · σ5-smoothed twin (LockedCamSmooth)

3 · Previz stills → generated keyframes. Clay previz frame (left) beside the photoreal keyframe generated from it (right). GPT-Image-2 marries the previz composition with a REAL photograph's materials — the only model tested that swaps the clay placeholder for the real Jefferson statue in one pass. Nano Banana derives every other view from that master frame, keeping all shots in one coherent generated rotunda:

Dolly start — previz beside generated
masterDolly start — previz | GPT-Image-2 (previz = composition, real photo = materials)
Dolly end — previz beside generated
Dolly end — previz | Nano Banana ("photographer walked forward" edit of the master)
Jib start — previz beside generated
Jib start — previz | Nano Banana angle-match (master scene forced to the previz camera; hard framing language)
Locked start — previz beside generated
Locked start — previz | GPT-Image-2 (eye-level static framing)

4 · Elements per shot. Exactly what went into each Seedance 2.0 run (Higgsfield, 720p/5s; job IDs from prior generations work directly as media references):

Shotstart_imageend_imagevideo_referencesResult
DollyGPT-Image-2 startNano Banana endDolly previz move (smoothed)plate adopted (operator's own run of the recipe)
JibNano Banana angle-match v2d— none —Jib previz move (raw)previz alone held the path; end frame unnecessary
LockedGPT-Image-2 start— none —Locked previz move (smoothed)carries the plate's tripod micro-motion

5 · The prompts. Verbatim production prompts, image rail and video rail — the bolded clauses are load-bearing (removing any one reproduced a documented failure).

IMAGE — keyframes

GPT-Image-2 · start frame (two-role reference prompt)
Generate a photorealistic image of the Thomas Jefferson Memorial interior. The FIRST reference image is a gray clay 3D previz frame — it defines the EXACT composition to reproduce: a LOW-ANGLE camera looking slightly up, the bronze statue on its pedestal at frame-right against the coffered dome, columns placed exactly as shown, floor plane low in frame. Match this composition precisely — do not recenter or reframe. The SECOND reference image is a real photograph of the actual Jefferson Memorial interior — use it as the source of truth for everything visual: the real white Georgia marble and its veining, the real engraved inscription text panels with laurel wreaths on the walls, the real dark weathered bronze Jefferson statue (replace the clay placeholder figure with the real statue's actual sculpted form), the real coffered dome detail, the polished floor reflections, and its soft natural daylight. Empty interior, no people.
Nano Banana · angle-match (new camera angle of the master scene — jib start v2d)
IMAGE 1 is a real photograph of the Thomas Jefferson Memorial interior — the MASTER SCENE: this exact bronze statue, white marble, inscription panels, soft even neutral daylight. IMAGE 2 is a gray clay 3D previz frame that defines the CAMERA for a second photo of the exact same scene — and its framing is COMPLETELY DIFFERENT from image 1: the camera is down near the FLOOR, tilted strongly UPWARD. Reproduce IMAGE 2's framing exactly: the coffered dome ceiling fills the entire TOP HALF of the frame, the statue on its pedestal stands at the RIGHT THIRD of frame seen from below against the dome, columns lean inward with strong upward perspective convergence, and only a small strip of floor shows at the very bottom. Do NOT reuse image 1's eye-level framing. Every material and lighting property still comes from IMAGE 1 unchanged: same neutral white-balanced daylight (not warm, not golden, not moody), same marble, same dark weathered bronze, same grade — two photos minutes apart in the same session. Empty interior, no people. Photorealistic.

VIDEO — generation

Seedance 2.0 · moving shot (jib — same skeleton for the dolly with "mechanical dolly on rails")
Slow cinematic crane shot inside a neoclassical marble rotunda. The image reference is the exact opening frame: a low camera position looking up, coffered dome filling the top of frame, memorial sculpture at the right third. The video reference is a gray 3D architectural previsualization showing the exact camera path to follow: a smooth motorized crane move — the camera starts low and rises steadily while tilting down, moving closer, settling on a tighter framing near the sculpture's stone base. Constant speed, no handheld sway, no walking rhythm, no speed ramps. Follow the previsualization's framing trajectory exactly. Keep the marble architecture, engraved wall text and soft neutral daylight from the opening frame consistent for the whole shot. Empty interior, documentary style.
Seedance 2.0 · locked shot (previz still supplies the micro-motion)
A static tripod shot inside a neoclassical marble rotunda. The image reference is the exact opening frame — hold this exact composition for the whole shot. The video reference is a gray 3D architectural previsualization of the EXACT camera behavior to reproduce: a locked-off tripod camera with only the faintest natural micro-movement — follow it exactly. NO push, NO drift, NO pan, NO tilt, NO zoom, NO handheld sway beyond what the previsualization shows. The interior is empty and still; soft neutral daylight through the colonnade. Keep the marble architecture, engraved wall text, sculpture and lighting from the opening frame perfectly consistent from first frame to last. Documentary style, photorealistic.

6 · Generated plates + previz verification. Every generation ships with a 50% previz-ghost overlay — the acceptance test that the model rode the solved camera:

adoptedDolly BG plate · Seedance, full 3-reference recipe
adoptedJib BG plate · start + previz motion only
adoptedLocked BG plate · smoothed-previz micro-motion
Dolly overlay — previz ghost tracks the push-in
Jib overlay — clay pedestal glued to the bronze through the rise
Locked overlay — first/last-frame diff 2.8/255: genuinely locked

7 · Composite. Method 1's endpoint — the real keyed subject over each previz-steered plate. All three assembled; the one visible seam is grade, not key (she carries the warm greenscreen light against the plate's cooler daylight — a curves pass closes it).

Dolly composite — keyed subject over the dolly plate
Jib composite — true-alpha ProRes key over the jib plate
Locked composite — keyed subject over the locked plate
Method 2

Plate-Direct Generation

fast · movement-close · composite-workable

The second of the two methods: no solve, no Blender — the original plate itself is the reference. An Nano Banana start frame places her in the master scene at the plate's framing, then the plate drives motion and performance, person-locked. The camera match is close but interpretive — and close enough that compositing over it works. (This began as a dead end: the early "replace the background" test repainted a stranger. That verdict was prompt-shaped and reference-shaped, not architectural.)

The reversed recipe, run on all three shots — each with its own Nano Banana start frame (plate frame 1 + GPT master) and its own original plate as motion/performance reference:

dollyDolly — real push-in, she's preserved, scene is ours
jibJib — real crane rise from the low-angle Nano Banana start
lockedLocked — tripod micro-motion from the plate
verificationJib direct-gen vs the original plate (50% ghost) — she rides her own plate silhouette through the crane move. Caveat: this track is NOT an exact camera match to the original plate — the model reproduces the move closely but interpretively. In practice the match is close enough that compositing the keyed subject over it works
hybridDolly · previz camera reference — Method 1's previz move added as a second video reference alongside the plate: the previz supplies the 3D parallax the flat green plate can't, her performance still comes from the plate
showcaseGemini Omni person-locked (jib) — the person-lock reversal on the second engine: she rides the real crane move in the referenced rotunda, identity held, Omni's natural-physics motion
showcaseGemini Omni cine-gen (jib) — person-locked with the Nano-Banana-relit cinematic reference: the lighting mood transfers into motion while she and the crane move stay true (the "models copy LIGHT, not GRADE" law in action)
useful halfEmpty-BG variant (jib) — the same video-ref mechanism with no person requested: a ready-made BG plate for the key+composite path
track A/BThe two methods, head-to-head (jib) — A: person-locked direct generation. B: the removal loop — the direct-gen video regenerated empty (generation-as-removal, the 720p Obscura stand-in), then the real keyed subject composited over it. Both ride the same generated camera; B keeps her true face
What reversed it: (1) person-locked language — "background replacement ONLY, keep her 100% IDENTICAL"; (2) a controlled start frame built from the previz pipeline's GPT master scene instead of a generic reference image; (3) the plate itself as the motion reference. Residual caveat from the jib A/B: identity still softens in extreme close-ups — key+composite remains the identity-safe default.