R&D Report

AI Camera Match.

Integrating a real, keyed greenscreen actor into an AI-generated Jefferson Memorial background whose camera move matches the live plate — what was tried, what won, what died, what runs next.

R&D Report v1.0.0

Summary

Traditional VFX — real plates, a NeRF or Gaussian-splat reconstruction, full CG, or an LED volume — is still the best route, and none of this beats it. This is R&D into a faster, lower-cost alternative, and the finding is that it has crossed into potentially usable.

What we discovered. The question: can an AI-generated background carry the same camera move as a real greenscreen plate, well enough to drop a keyed performer in? The look is there — fully generated shots can look great. Control is the hard part. You can get a generated background very close to a real plate's move, but exact only happens if the subject is regenerated in the same shot too; keep the real keyed performer and the camera lands close, not exact. Control comes from concrete inputs — a previz carrying the tracked camera data and a start frame that blocks out the subject — not from wordier prompts.

Recommendation: two methods, chosen by how exact the camera must be. Method 1 · Previz-Steered — track the plate, generate the BG from it, key and composite the real subject over it (closest to exact). Method 2 · Plate-Direct — drive generation from the plate itself with person-lock (faster, close enough to composite over). Seedance 2.0 is the recommended engine; Gemini Omni is a strong alternate capped at 720p; Kling, Veo, Runway and local Wan/LTX were ruled out — see the table below.

Video Generation Models

Every video generation model tested. Movement is the deciding column — matching the original plate's camera move is the goal (only depth-conditioned local gen is truly exact; Seedance/Omni get very close); quality and output limits follow. The 4K-capable rows are the 4K-capable options.

Model	Movement	Quality	Max output	Verdict
Seedance 2.0	very close to the plate move (not exact) via previz/plate refs · true parallax	best tested	4K · 15s	RECOMMEND
Gemini Omni	very close to the plate move (not exact) · best physics	strong	720p · 10s	ALTERNATE
Kling 3.0	tightest keyframes	middling — not bad, below Seedance/Omni	4K · 15s	REJECTED — quality
Kling Omni (video-ref)	re-anchors composition	—	1080p	REJECTED
Veo 3.1	keyframes ok	strong	4K · 8s	REJECTED
Runway Gen-4.5	drifts (start-only)	—	4K · 10s	REJECTED
LTX-2 depth IC-LoRA	exact (depth input)	weak detail — ok behind heavy bokeh only	4K · local (VRAM-bound)	GUIDE LAYER ONLY
Wan2.2 Fun Control	exact (depth)	unusable	720p local	DEAD END

Generation is stochastic — budget several rolls. A usable take typically arrives within 2–4 generations of the same recipe (bad camera reads, drifted scenes, or moderation false-positives eat the rest); single-shot success is the exception, not the plan.

Reading the table: cyan = good, violet = usable with caveats, red = ruled out. All comparative tests ran at 720p — the tier every model shares; note Omni pairs its 720p cap with heavy compression (~0.9 MB for a 4s clip).

The Pipeline

Everything converges on two output methods: Method 1 (previz-steered — closest to frame-exact) generates BG plates that meet the real keyed subject in the W1 composite, and Method 2 (plate-direct) generates the subject in-scene from the plate itself. The remaining graphs are the supporting and historical workflows. Arrows show data direction.

Method 1 · Previz-Steered Generation

frame-exact · production

Method 2 · Plate-Direct Generation

fast · composite-workable

Key + Composite — the production path

Selective Person Removal

Test Plates

One performer, three camera behaviors: a locked-off medium, a push-in dolly, and a crane-down jib. Together they cover the camera-matching problem space from trivial to hard.

LOCKED · 151f · 2D track confirms static (worst drift 1.56% ≈ handheld micro-motion)

DOLLY · 143f push-in · planar solve (1 marker) → feature-spread ramp measured 1.82×

JIB · 151f crane high→low · planar; move read from footage, no 3D solve

Keying

CorridorKey on the 5090 (BiRefNet hint → neural key + despill → sharp hybrid composite) outputs the subject on flat gray-148 for clean matte extraction. Naive chroma keying failed outright — the warm dark-olive "green" had r≈g. A distance-ramp matte off the gray, plus interior hole-fill for near-gray dress folds, yields the final true-alpha ProRes 4444.

Locked key · gray-148 output

Dolly key · re-keyed after byte-verified transfer fix (stale-footage bug caught & killed)

Jib key · full crane range; alpha master: 03-keyed/jeff_jib_key_ALPHA.mov (ProRes 4444, 4K, 246 MB — local)

Keyer validation suite — original greenscreen source and keyed result side-by-side in each clip (keys composited on a gradient specifically to expose edge artifacts).

keyedFull-body dancer · 4K — original above, sharp-hybrid key below (neural matte + high-freq detail recovery)

keyedFine blonde strands + dark suit — original above, keyed below; low-contrast hair edges hold

keyedFlying dark hair, mid-motion — original above, keyed below; strands survive

Camera Match — Seedance Track

model ceiling

The law from four early iterations still holds — Seedance interprets a camera, it never hard-locks one — but the ceiling moved. Feeding the solved-camera previz as a video_references motion track (rather than describing the move or clamping endpoints alone) is what closed the gap; that recipe graduated into W5. Current evidence, all against the MegaSaM-solved cameras:

Dolly move-match quad — previz (top-left) vs three generated candidates: top-right rejected, bottom two good. Expect this spread — a usable take typically needs several generations

Jib move-match quad — previz vs three candidates: top-right rejected (move mismatch), bottom-left good with minor image hallucinations, bottom-right good

closedFinal dolly vs previz — the adopted generation against the solved-camera previz

Engine verdict: Seedance 2.0 and Gemini Omni produced the best generations of everything tested. Omni's physics are the standout out-of-the-box — it matches the reference motion's weight and momentum with no special prompting — but its output is capped at 720p; Seedance runs the full 5s and scales to higher resolutions. Current posture: Seedance primary, Omni the physics benchmark.

Method 1

Previz-Steered Generation

closest to frame-exact · all 3 shots · production default

The first of the two methods: the real camera is tracked and re-authored, so the generated plate comes closest to a frame-exact match with the solve.The camera is tracked, not described: MegaSaM (DROID-SLAM + monocular depth, on the 5090) solves every frame of each greenscreen plate — position, rotation, and focal — where Blender's tracker failed outright on the flat green cloth. (MegaSaM's AI solve was only necessary because these test plates were rough greenscreen with no tracking markers — a properly tracked shoot would solve with a standard tracker.) The solves import into one Blender scene as keyframed cameras under registration empties (focal-derived endpoints → look/travel alignment → de-roll), get staged against the art-directed rotunda blockout (with a Gaussian-smoothed twin baked for shaky solves), and render as clay previz — stills for keyframe generation, full-length move videos as the motion reference.

1 · Tracking → Blender. Solved cameras ghosted over their plates — the registration check that earned the "spot on" sign-off before anything was generated:

Jib solve · 151f / 31mm — MegaSaM camera over the plate; crane rise-and-tilt captured frame-for-frame

Dolly solve · 143f / 34.2mm — raw solve carries real handheld jitter; a σ5 Gaussian world-space twin (DollyCamSmooth) is baked alongside

Locked solve · 151f / 27.3mm — "static" still solves to real tripod micro-motion, kept deliberately

2 · Previz move renders. Staged scene, subject marks hidden, one clay move video per shot — these are fed to the video model verbatim as video_references:

Dolly previz · 143f · from the smoothed twin

Jib previz · 151f · raw solve (crane already clean)

Locked previz · 151f · σ5-smoothed twin (LockedCamSmooth)

3 · Previz stills → generated keyframes. Clay previz frame (left) beside the photoreal keyframe generated from it (right). GPT-Image-2 marries the previz composition with a REAL photograph's materials — the only model tested that swaps the clay placeholder for the real Jefferson statue in one pass. Nano Banana derives every other view from that master frame, keeping all shots in one coherent generated rotunda:

masterDolly start — previz | GPT-Image-2 (previz = composition, real photo = materials)

Dolly end — previz | Nano Banana ("photographer walked forward" edit of the master)

Jib start — previz | Nano Banana angle-match (master scene forced to the previz camera; hard framing language)

Locked start — previz | GPT-Image-2 (eye-level static framing)

4 · Elements per shot. Exactly what went into each Seedance 2.0 run (Higgsfield, 720p/5s; job IDs from prior generations work directly as media references):

Shot	start_image	end_image	video_references	Result
Dolly	GPT-Image-2 start	Nano Banana end	Dolly previz move (smoothed)	plate adopted (operator's own run of the recipe)
Jib	Nano Banana angle-match v2d	— none —	Jib previz move (raw)	previz alone held the path; end frame unnecessary
Locked	GPT-Image-2 start	— none —	Locked previz move (smoothed)	carries the plate's tripod micro-motion

5 · The prompts. Verbatim production prompts, image rail and video rail — the bolded clauses are load-bearing (removing any one reproduced a documented failure).

IMAGE — keyframes

GPT-Image-2 · start frame (two-role reference prompt)

Generate a photorealistic image of the Thomas Jefferson Memorial interior. The FIRST reference image is a gray clay 3D previz frame — it defines the EXACT composition to reproduce: a LOW-ANGLE camera looking slightly up, the bronze statue on its pedestal at frame-right against the coffered dome, columns placed exactly as shown, floor plane low in frame. Match this composition precisely — do not recenter or reframe. The SECOND reference image is a real photograph of the actual Jefferson Memorial interior — use it as the source of truth for everything visual: the real white Georgia marble and its veining, the real engraved inscription text panels with laurel wreaths on the walls, the real dark weathered bronze Jefferson statue (replace the clay placeholder figure with the real statue's actual sculpted form), the real coffered dome detail, the polished floor reflections, and its soft natural daylight. Empty interior, no people.

Nano Banana · angle-match (new camera angle of the master scene — jib start v2d)

IMAGE 1 is a real photograph of the Thomas Jefferson Memorial interior — the MASTER SCENE: this exact bronze statue, white marble, inscription panels, soft even neutral daylight. IMAGE 2 is a gray clay 3D previz frame that defines the CAMERA for a second photo of the exact same scene — and its framing is COMPLETELY DIFFERENT from image 1: the camera is down near the FLOOR, tilted strongly UPWARD. Reproduce IMAGE 2's framing exactly: the coffered dome ceiling fills the entire TOP HALF of the frame, the statue on its pedestal stands at the RIGHT THIRD of frame seen from below against the dome, columns lean inward with strong upward perspective convergence, and only a small strip of floor shows at the very bottom. Do NOT reuse image 1's eye-level framing. Every material and lighting property still comes from IMAGE 1 unchanged: same neutral white-balanced daylight (not warm, not golden, not moody), same marble, same dark weathered bronze, same grade — two photos minutes apart in the same session. Empty interior, no people. Photorealistic.

VIDEO — generation

Seedance 2.0 · moving shot (jib — same skeleton for the dolly with "mechanical dolly on rails")

Slow cinematic crane shot inside a neoclassical marble rotunda. The image reference is the exact opening frame: a low camera position looking up, coffered dome filling the top of frame, memorial sculpture at the right third. The video reference is a gray 3D architectural previsualization showing the exact camera path to follow: a smooth motorized crane move — the camera starts low and rises steadily while tilting down, moving closer, settling on a tighter framing near the sculpture's stone base. Constant speed, no handheld sway, no walking rhythm, no speed ramps. Follow the previsualization's framing trajectory exactly. Keep the marble architecture, engraved wall text and soft neutral daylight from the opening frame consistent for the whole shot. Empty interior, documentary style.

Seedance 2.0 · locked shot (previz still supplies the micro-motion)

A static tripod shot inside a neoclassical marble rotunda. The image reference is the exact opening frame — hold this exact composition for the whole shot. The video reference is a gray 3D architectural previsualization of the EXACT camera behavior to reproduce: a locked-off tripod camera with only the faintest natural micro-movement — follow it exactly. NO push, NO drift, NO pan, NO tilt, NO zoom, NO handheld sway beyond what the previsualization shows. The interior is empty and still; soft neutral daylight through the colonnade. Keep the marble architecture, engraved wall text, sculpture and lighting from the opening frame perfectly consistent from first frame to last. Documentary style, photorealistic.

6 · Generated plates + previz verification. Every generation ships with a 50% previz-ghost overlay — the acceptance test that the model rode the solved camera:

adoptedDolly BG plate · Seedance, full 3-reference recipe

adoptedJib BG plate · start + previz motion only

adoptedLocked BG plate · smoothed-previz micro-motion

Dolly overlay — previz ghost tracks the push-in

Jib overlay — clay pedestal glued to the bronze through the rise

Locked overlay — first/last-frame diff 2.8/255: genuinely locked

7 · Composite. Method 1's endpoint — the real keyed subject over each previz-steered plate. All three assembled; the one visible seam is grade, not key (she carries the warm greenscreen light against the plate's cooler daylight — a curves pass closes it).

Dolly composite — keyed subject over the dolly plate

Jib composite — true-alpha ProRes key over the jib plate

Locked composite — keyed subject over the locked plate

Method 2

Plate-Direct Generation

fast · movement-close · composite-workable

The second of the two methods: no solve, no Blender — the original plate itself is the reference. An Nano Banana start frame places her in the master scene at the plate's framing, then the plate drives motion and performance, person-locked. The camera match is close but interpretive — and close enough that compositing over it works. (This began as a dead end: the early "replace the background" test repainted a stranger. That verdict was prompt-shaped and reference-shaped, not architectural.)

The reversed recipe, run on all three shots — each with its own Nano Banana start frame (plate frame 1 + GPT master) and its own original plate as motion/performance reference:

dollyDolly — real push-in, she's preserved, scene is ours

jibJib — real crane rise from the low-angle Nano Banana start

lockedLocked — tripod micro-motion from the plate

verificationJib direct-gen vs the original plate (50% ghost) — she rides her own plate silhouette through the crane move. Caveat: this track is NOT an exact camera match to the original plate — the model reproduces the move closely but interpretively. In practice the match is close enough that compositing the keyed subject over it works

hybridDolly · previz camera reference — Method 1's previz move added as a second video reference alongside the plate: the previz supplies the 3D parallax the flat green plate can't, her performance still comes from the plate

showcaseGemini Omni person-locked (jib) — the person-lock reversal on the second engine: she rides the real crane move in the referenced rotunda, identity held, Omni's natural-physics motion

showcaseGemini Omni cine-gen (jib) — person-locked with the Nano-Banana-relit cinematic reference: the lighting mood transfers into motion while she and the crane move stay true (the "models copy LIGHT, not GRADE" law in action)

useful halfEmpty-BG variant (jib) — the same video-ref mechanism with no person requested: a ready-made BG plate for the key+composite path

track A/BThe two methods, head-to-head (jib) — A: person-locked direct generation. B: the removal loop — the direct-gen video regenerated empty (generation-as-removal, the 720p Obscura stand-in), then the real keyed subject composited over it. Both ride the same generated camera; B keeps her true face