Skip to content

wanshuiyin/ARIS-Movie-Director

Repository files navigation

ARIS — Auto Research in Sleep

ARIS-Movie-Director

🌐 中文 README · English

Hand a fuzzy story to your agent, wake up to a cross-model-audited movie 🎬 — no forgotten facts, no frame signing off on itself.
🎞️ Image-based today, video next.
🤖 Agentic by design — planning · gating · cross-model review on the agent; rendering on the diffusion model.

📚 Jump to▶ Watch the movie · ⚡ Quick Start · 🔄 Workflows · 📝 Make your own · 🧩 Layout · 💬 Community · 📖 Cite · 🤝 Contributing

Join Community · Cite · CI · ARIS Stars · arXiv · HF Daily #1 · PaperWeekly · awesome-agent-skills

This is an agentic, long-horizon visual generation task: hand a fuzzy story to an agent and produce a whole image-based movie (the reference run is a 19-scene / 24-frame story), not a single image. The concrete job is fuzzy story → authored comic.json → audited panels → single-file viewer.

The hard part is faithfulness over time. Generated visual stories can look coherent while quietly changing the facts — a chart rounds a number, a label mutates, a character's face drifts — and the run still ships, because the same system that drew the frame is the one saying it looks fine. Across a long horizon, two failure modes dominate:

  • 🧠 Long-range forgetting — over many frames, identity, established facts, and earlier decisions drift.
  • 🗣️ Linear, self-approved streaming — each frame is committed by the model that drew it, so mistakes compound unchecked.

🔬 Method at a glance. Read the figure left-to-right — the /movie-pipeline agent workflow runs the full loop (author a source of truth → bake → cross-model gate): (1) comic-author turns fuzzy intent into an authored comic.json + locked refs, (2) comic-director runs the per-panel audited spiral — the multi-agent debate (CC ‖ Gemini ‖ Codex blind-read → deterministic diff), logging every attempt / decision to the research-wiki(3) the pipeline assembles accepted panels into the released viewer. The bottom-left failure is the whole rule: a beautiful but wrong literal still fails (+6.2 expected vs +6.25 observed).

ARIS-Movie-Director — method overview

Figure 1 — from story intent to a verified movie, end-to-end (the loop described above). Full caption ↓

Figure 1 — stages, gates, and provenance (click to expand)

(1) Authored source of truth — asset library · outline · storyboard compile into comic.json (content_svg · expected_literals · identity_ref). (2) The audited spiral (per panel) — a content-SVG blueprint is baked by image_gen, then a 3-reviewer cross-model panel_gate (CC narrative ‖ Gemini + Codex visual blind-transcribe → a deterministic token-diff · single-vote veto) returns a deterministic verdict: KEEP, or RETRY (≤4/panel) re-baked with the failed attempt's repair note; every attempt/review/decision/failure is logged to the research-wiki. (3) Assembly + release — a cast-aware page_assembly_gate (repair drift → re-bake, ≤6/run) ships PNG panels + a single-file HTML viewer. The punchline (bottom-left): a beautiful panel with a wrong number does not pass+6.2 expected vs +6.25 observed fails the token-diff.

This figure was itself produced by the same loop it depicts: a labeled blueprint conditioned gpt-image-2 (driven by Codex GPT-5.5 xhigh), then 4 generation rounds were ratified by the method-figure panel — Gemini + Codex blind-transcribe + a deterministic content_diff, then a Claude structural sign-off — until clean. The exact prompt sequence that baked this image (all 4 rounds + the cross-model critiques) is published verbatim as a reference: skills/method-figure/examples/method_figure/PROMPTS.md.

ARIS-Movie-Director treats every frame as an auditable artifact: author a deterministic comic.json first (lock the expected_literals + identity refs before any pixels), let a generative model bake the look, then require independent cross-model blind-transcription + a deterministic token-diff before a panel is kept. Looks right ≠ passes — a beautiful frame with a wrong literal is rejected. It answers the two failure modes with two ideas from the ARIS series:

  • 🧠→ a research-wiki — persistent, inspectable memory (locked refs · expected_literals · every decision & failure as a node) that anchors late frames to early truth.
  • 🗣️→ multi-agent debate — independent cross-model reviewers blind-read every frame and a deterministic diff decides KEEP / RETRY, so no frame signs off on itself. Every attempt / decision lands in that wiki trace.

This first release is image-based — the movie is told in baked still frames you flip through. Video-based generation is what comes next; this is just the beginning.

Watch the image-based movie in your browser — flip through all 19 scenes of the cross-model-audited reference run.

Watch the movie — cover

audit pagemulti-panel pagethe fix beat

A few frames from the reference movie — including the story's own integrity beat: a run that reported +6.2 improvement but really moved +1.4 (that's the plot, distinct from the figure's bake-time +6.2/+6.25 token-diff). Watch all 19 scenes →

⚡ What the audit gate actually catches — the problem → mechanism table
The problem What ARIS-Movie-Director does about it
A panel can look right while changing a number, label, or code token. comic.json locks the expected_literals; independent visual reviewers blind-transcribe what's actually in the pixels; an exact token-diff rejects any wrong or missing literal.
The model that baked an image can wave its own output through. The bake never self-attests — independent visual models (a different family from the generator) read it blind, and a deterministic diff, not a model's opinion, decides KEEP / RETRY.
A frame can be baked with nothing to check it against. Phase 1 authors content_svg · identity_ref · ART_BIBLE · expected_literals before pixels; a baked figure-panel with no gateable literals fails closed.
Character & style drift accumulate across a long sequence. Locked identity refs + asset review (准×3) + a style bible + a cast-aware assembly gate check consistency while allowing intended scene/cast variation (absence ≠ drift).
Retry loops go opaque or endless. Per-panel attempts and assembly repairs are bounded; each failure carries a repair note; a non-convergent panel is flagged for a human, never silently shipped.
A demo hides what was tried and thrown away. Every attempt / review / decision / failure-mode is written to the research-wiki — the reference run ships a 198-node trace you can read.

⚡ Quick Start

# 1 · get the repo + Python deps  (everything but jsonschema is stdlib)
git clone https://github.com/wanshuiyin/ARIS-Movie-Director.git && cd ARIS-Movie-Director
python3 -m pip install -r requirements.txt

# 2 · external tools for the bake/review stages — install + authenticate, then verify:
#     codex CLI  ·  gemini CLI (uses auto-gemini-3)  ·  headless Chrome / Chromium
python3 cli/preflight.py

There is no bundled installer — the skills/ are followed by your coding agent pointed at this repo; the deterministic CLIs run in-repo.

Run the two workflows — in a coding agent that has these skills (/… is a slash-command agent workflow, not a shell binary):

> /movie-pipeline "a short film about an autonomous research run"   # Workflow 1 — fuzzy idea → audited movie + viewer (pauses at intent + outline)
> /method-figure  path/to/method_figure_brief.json                 # Workflow 2 — a brief → audited Figure-1 (render + verify → Claude sign-off)

See it first — zero setup, no API: ▶ open the hosted movie at https://wanshuiyin.github.io/ARIS-Movie-Director/comic/ (all 19 scenes / 24 frames). To rebuild it locally, see Workflow 1 → See the reference movie below.

Full mapfuzzy story → /movie-pipeline → comic.json + audited panels + outputs/index.html · method_figure_brief.json → /method-figure → figure.png + blueprint + trace


🔄 Workflows

Two cross-model-audited workflows. Each is one slash-command agent workflow (the ARIS /research-pipeline paradigm — an agent runs it, pausing at the human gates), with a deterministic CLI core you can also run standalone (the part CI tests).

🎬 Workflow 1 · Movie pipeline — /movie-pipeline (fuzzy idea → audited movie)

Hand your agent a fuzzy story, approve two story gates, wake up to a baked, cross-model-audited movie + a clickable viewer:

> /movie-pipeline "a short film about an autonomous research run"

It chains comic-author (Phase 1 — author the source of truth) → the zero-credit p0_proof gate → comic-director (Phase 2/3 — the audited spiral → viewer); the orchestrator is movie-pipeline. It's an agent workflow, not a shell binary — it needs a coding-agent runtime and pauses at intent + outline for your approval. (A step that doesn't fire is safe — each layer consumes the prior LOCKED node, so a skipped step fails closed at the next gate, never shipping a wrong frame.)

  • 🧭 Intent — fuzzy idea → intent_spec · stop for your approval
  • 🎨 Style lock — the ART_BIBLE.md + locked style_anchors (warm-lab / dark-cyber / starfield)
  • 🧱 Outline — 3-lens debate → synthesis → outline_spec · stop for your approval
  • 🎞️ Storyboard — pages · panels · the MOTIF STATE TABLE · consolidated asset_requests
  • 🧑‍🎨 Assets — a single-source library, reviewed to locked (准×3 same-round unanimity)
  • 📐 Blueprints — a deterministic content_svg per panel (no baked bubbles)
  • 🧾 Prompts — exact bake prompts + verbatim expected_literals (搬运工原則)
  • Compile — schema-valid comic.json; the zero-credit p0_proof gate runs BEFORE any image credit
  • 🔥 Spiral bake — render → agent mcp__codex__codex image_gen → 3-reviewer panel_gate → keep / retry-same-panel → assembly_gate (re-bakes only NAMED drifting panels on cross-page drift) → viewer

📐 Flow — the skill chain (trace it top-to-bottom):

/movie-pipeline "fuzzy idea"            one slash-command · agent-run, NOT a shell binary
   │
   ▼   comic-author drives these Phase-1 skills IN ORDER (they are not separate slash-commands):
   comic-intent-parser
     → ⟨HUMAN APPROVE — intent⟩
     → comic-style-bible-lock
     → comic-outline-creator
     → ⟨HUMAN APPROVE — outline⟩
     → comic-storyboard-creator
     → comic-asset-ref-generator → comic-asset-review-loop          (准×3 unanimity → assets LOCKED)
     → comic-blueprint-author → comic-panel-prompt-builder → comic-json-compiler
   ├──────────────── Phase 1 · comic-author — author the source of truth ────────────────┤
   │
   │   → comic.json + locked assets + the author wiki trace
   ▼
   comic-cross-layer-gate --gate p0_proof     ├─ P0 · ZERO-CREDIT proof — must pass before any image credit ─┤
   │
   ▼   comic-director — the audited spiral     (run_comic.py  |  packages/core/spiral_engine.js)
   per panel:  content_svg → codex image_gen → panel_gate
                  reviewers: CC narrative ‖ Gemini visual ‖ Codex visual
                  → blind transcriptions → deterministic token-diff vs expected_literals
               verdict ─ KEEP → page pool
                       ├ RETRY ≤4   (re-bake the SAME panel + repair note)
                       └ assembly_gate repair ≤6 — re-bake ONLY the NAMED drifting panels when drift is localized (seed-anchored; never wipes prior good panels)
   page assembly_gate → project accepted panels to comic.json → build_comic.py → outputs/index.html
   ├──────────────── Phase 2/3 · comic-director — audited spiral + viewer ────────────────┤

   📚 research-wiki — reads the locked source nodes before each layer; writes every attempt / review /
                      decision / failure after each gate & bake (the inspectable audit trace).
   🔄 human-in-loop — intent + outline are HARD stops; a failed p0 / panel / assembly gate stops or
                      escalates; a non-convergent panel is flagged for you, never silently shipped.

The panel_gate (Phase 2/3, per panel): the bake is read by 3 independent reviewers — CC narrative (does it land the beat?) ‖ Gemini visual ‖ Codex visual (a second, different-family eye) — who blind-transcribe the pixels; a deterministic token-diff of observed_literals vs the authored expected_literals decides KEEP / RETRY; content_corruption is a single-vote veto, both visual reviewers must score, and no model self-acquits. Every attempt / review×3 / decision / failure is written to the research-wiki.

Phase 2/3 standalone (once comic.json exists). Only the zero-credit --dry-run below is genuinely agent-free — that's the CI-tested slice:

python3 skills/comic-director/scripts/run_comic.py --project examples/<name> --page <PAGE> --panels S01,S02 --dry-run    # zero credit, NO agent: prints bake prompts (the CI-tested slice)
python3 skills/comic-director/scripts/run_comic.py --project examples/<name> --page <PAGE> --panels S01,S02 --finalize   # REAL bake — REQUIRES the agent sidecar SOP (see below)

run_comic.py is a subprocess port of packages/core/spiral_engine.js's movie branch; it starts from an existing comic.jsonit cannot start from a fuzzy idea. The deterministic CLI core (validation, blueprint render, the token-diff gates) is what CI exercises, but the --finalize bake is NOT agent-free: each panel bake is fulfilled by the agent-sidecar SOP — the comic-director skill agent watches for the core's *.bakereq.json, calls mcp__codex__codex (with config: {include_image_gen_tool: true, model_reasoning_effort: xhigh}) to fire the native image tool, and writes back *.bakestatus.json. So a real movie bake needs a coding-agent runtime; only --dry-run runs standalone. Throttling: a rate-limited bake stops cleanly with fresh_run_required — after cooldown launch a fresh run for the remaining panels, do not resume cached state. Caps: ≤4 attempts/panel · ≤6 rollbacks/run · no concurrent bakes (docs/spiral-runtime.md).

Authoring template: copy the Phase-1 author-node shapes from examples/comic_min_author/ when adding a new project. The flow diagram above is the canonical skill chain — the two human gates and the fail-closed gates are shown there; prereqs (codex + gemini + headless Chrome) are in Quick Start.

▶️ See the reference movie — what Workflow 1 produces (zero setup, no API)

python3 cli/validate_wiki.py examples/comic_m3_audit             # verify the shipped trace → PASS (198 nodes, 26 edges)
python3 packages/viewer/build_comic.py examples/comic_m3_audit  # (re)build the single-file viewer from comic.json + panels
open  examples/comic_m3_audit/outputs/index.html

…or just open the hosted one: https://wanshuiyin.github.io/ARIS-Movie-Director/comic/ — all 19 scenes / 24 frames of the cross-model-audited reference run.

🖼️ Workflow 2 · Method figure — /method-figure (a brief → Figure-1)

One slash-command, /method-figure — give it a method_figure_brief.json (the same brief paper-plan emits after its claims_matrix) and it runs the whole audited spiral to a signed-off figure (Step-0 compile → render condition → gpt-image-2 bake → Gemini + Codex blind panel + content_diff → retry until clean → Claude structural sign-off):

> /method-figure path/to/method_figure_brief.json

The deterministic core — Step-0 compile, blueprint validation, the condition render, and the content-diff gate — is driven by one command, run_spiral.py. Its zero-credit slices (--p0-only / --dry-run) run with no agent runtime; a real gpt-image-2 bake, however, REQUIRES the agent sidecar SOP (the method-figure skill agent services the core's *.bakereq.json by calling mcp__codex__codex with config: {include_image_gen_tool: true, model_reasoning_effort: xhigh}):

# OUR example brief bakes ARIS's own Figure 1 — swap in your own method_figure_brief.json
python3 skills/method-figure/scripts/run_spiral.py \
    skills/method-figure/examples/method_figure/method_figure_brief.json \
    --out-dir figures/method_figure/demo
#  → figures/method_figure/demo/figure.png   (the PANEL-CLEAN candidate, awaiting your structural sign-off)
#     (+ blueprint.json + traceability.json + trace.jsonl of every round)
#  the REAL bake above needs the agent sidecar (mcp__codex__codex servicing .bakereq.json) — NOT agent-free.
#  first run? add --p0-only — that slice IS agent-free (zero image credits: validate + compile + render + lint).
#  NOTE: --dry-run (also agent-free) still writes these files.
📐 flow — /method-figure (brief → audited render → sign-off)
(upstream — only if you have a paper; NOT part of /method-figure):
   paper_to_brief.md  →  method_figure_brief.json      agent-authored; claims / numbers copied VERBATIM
        │
        ▼
/method-figure  <method_figure_brief.json | blueprint.json>        the skill = pure render + verify
   run_spiral.py — the deterministic core (starts from a brief):
     compile_brief.py (Step-0) → blueprint.json + traceability.json     every node traces to a brief field — else FAIL-CLOSED
       → validate_blueprint.py → render_condition.py                    labeled condition SVG → PNG
       → agent mcp__codex__codex (workspace-write, config{xhigh}) → gpt-image-2 bake    fail-closed verifier, not a sandbox setting
       → Gemini blind-transcribe ‖ Codex blind-transcribe → content_diff.py     deterministic vetoes
       → RETRY ≤4 (re-assert the locked labels) → Claude STRUCTURAL sign-off
   → figure.png   (+ blueprint.json + traceability.json + trace.jsonl of every round)
   ├──────────────── method-figure · audited render / verify spiral ────────────────┤

Step-0 is deterministic inside the skill — you never hand-write a blueprint or place coordinates; the identity sheet resolves from the brief's identity_refs[].path (no separate --identity); every node is traceability-checked back to a brief field (un-traceable → fail-closed). No model self-acquits — the bake is ratified by Gemini + Codex blind reads + a deterministic content_diff, then a separate Claude structural sign-off. Needs the codex + gemini CLIs + headless Chrome (python3 cli/preflight.py).

Only have a paper, no brief yet? Author the brief FIRST — point your coding agent (e.g. the ARIS main project) at your paper via paper_to_brief.md (claims/numbers verbatim) — then /method-figure it. Power-user: already have a hand-tuned blueprint? run_spiral.py blueprint.json --identity sheet.png --out-dir … --from-blueprint runs the legacy path. The worked 4-round convergence (the exact prompts) is in PROMPTS.md.

🛡️ Why the two gates differ (both correct, by design)

The movie panel_gate is a 3-reviewer panel (CC narrative ‖ Gemini visual ‖ Codex visual) — a story panel must land its beat AND be visually on-model. The method-figure panel is Gemini + Codex blind-transcribe + the deterministic content_diff, with Claude the post-pass structural sign-off — a figure has no "beat", so the bar is exact labels + clean layout, not narrative. Both obey the one rule:

The gate is the point: a beautiful panel/figure with a wrong number does not pass. Faithfulness is a token-diff over reviewers who are never shown the expected values.

🧭 Positioning — ARIS lineage; an audit layer, not a generation/orchestration claim

ARIS-Movie-Director is the multimodal vertical of the ARIS series: ARIS taught a research agent to work while you sleep, and its research-wiki + multi-agent-debate ideas are exactly what make a generated story auditable — the same loop (author a source of truth → let a model bake the look → ratify with a cross-model panel), now producing character-consistent visual stories instead of papers.

The framework knows nothing about any particular story — a project plugs in via examples/<name>/movie.project.json + a comic.json IR. The reference example builds a 19-scene / 24-frame pixel-art movie about an autonomous research run, and ships its real, inspectable generation trace as proof the multi-agent loop actually ran.

Related work — where we fit. The field already generates and orchestrates visual stories well, and we build on that, not against it: JoyAI-Echo (JD) is strong at long-form text→video with synced audio and memory-based character consistency; FireRed-OpenStoryline (Xiaohongshu) is a strong conversational video-editing agent — NL planning, tool orchestration, human-in-the-loop, reusable style skills; NEWTON shows planner-plus-verifier loops for physically-grounded video. ARIS-Movie-Director is complementary — it doesn't claim better generation or broader orchestration; it adds the audit layer around a generated visual story: blind cross-model reads → deterministic literal-diffs → bounded retry → an inspectable trace.


🧩 Layout

  • packages/ — the framework runtime (core spiral, gates, conditioning, viewer)
  • protocols/ — the cross-model review / governance contracts (framework-owned)
  • schemas/ — versioned IR + wiki schemas (comic.schema.json, node_schema.json, edge_schema.json)
  • cli/validate_wiki.py (the stdlib release gate for a project's wiki)
  • docs/comic-json.md (the authored-input spec), architecture.md (SSOT), spiral-runtime.md, GENERATION_RETRO.md
  • examples/comic_m3_audit/ — the reference movie: comic.json IR, gen/ blueprint scripts, panels/ baked art, wiki/ the 198-node generation trace, outputs/ the built viewer

💬 Community

New workflows, gates, and example projects welcome. ARIS-Movie-Director is part of the ARIS series — PRs that add an audited workflow, a gate with teeth, an example project, or a domain adaptation are welcome (start at CONTRIBUTING.md).

Join the WeChat group (shared with the ARIS main project) for discussion on Claude Code + AI-driven multimodal generation:

WeChat Group QR Code

📖 Citation

ARIS-Movie-Director is the multimodal vertical of the ARIS series. If you use it in your research, please cite the ARIS paper:

@article{yang2026aris,
  title={ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration},
  author={Yang, Ruofeng and Li, Yongcan and Li, Shuai},
  journal={arXiv preprint arXiv:2605.03042},
  year={2026}
}

📄 License

MIT — see LICENSE.

The example artwork is AI-generated — this is a disclosure, not a second license; reuse should still follow your image-model provider's terms.

About

Agentic, long-horizon visual generation: a fuzzy story → a cross-model-audited image-based movie. Brings ARIS's research-wiki + multi-agent debate to multimodal generation (intelligence lives in the agent; the diffusion model just renders). Image-based today, video next.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors