How Pitchstage works

A step-by-step guide · For the machine-readable API spec see the headless API reference.

Pitchstage turns your product into a launch kit. Point it at your app — paste a URL or drop a handful of screenshots — add one paragraph about what you shipped, and an AI director studies it, writes an editable outline in your voice, binds camera moves to every line, and renders a narrated demo video, a LinkedIn carousel, and ready-to-post launch copy. Use it in the app (no setup) or drive it from an agent, a terminal, or CI.

1. The mental model

project  →  outline  →  scenes  →  artifacts

Project — your launch: the product input (URL or screenshots) + a goal.
Outline — the director’s draft script: a hook, beats, and a call-to-action. Fully editable — it’s the single source the artifacts render from.
Scenes — each screenshot becomes a scene with narration + a camera move.
Artifacts — what ships: a demo video (16:9 / 9:16 / 1:1), a carousel, and copy.

Everything downstream comes from the outline, so editing the outline (or a scene) re-shapes the video and carousel together.

2. Make your first launch (in the app)

No install, no API key. Roughly two minutes of input, then a render.

New project → pick a launch type. Feature drop, major release, funding milestone, Product Hunt launch, changelog, or custom — this sets the narrative shape.
Add your product. Paste your app URL and the director drives it like a person and captures each screen, or drop 3–20 screenshots (PNG/JPG/HEIC, ≤10 MB each). Behind a login? See “behind a login” below.
Describe what you shipped — one paragraph. This steers the outline; you can edit or re-draft it.
Review the outline. The director drafts the hook, the beats (one per feature), and the CTA. Edit any line — this is your script.
Brand + voice (optional). A brand kit (colors, font) and a voice profile (paste a few of your posts; it learns your tone) make the output look and sound like you. There’s always a sensible default, so you can skip this.
Pick your artifacts and per-artifact options (aspect ratio, captions, music, hook and CTA cards, smart-zoom, transitions).
Refine scenes (optional). Per scene you can tune narration, duration, importance, the camera move, an on-screen pull-quote, and the layout.
Render. Watch progress live (~30–90s for a real flow); download the video, carousel, and copy when it’s done.

3. What the director can do

The director isn’t a template filler — it makes editorial decisions and reconciles incoherent input (and tells you what it changed). Its toolkit:

Narrative templates

feature_drop · major_release · funding_milestone · product_hunt_launch · changelog · custom — each sets a different pace, opening, and CTA.

Adaptive openings

The director picks how to open for your audience: end_state_first (lead with the outcome), cold_open_problem (lead with the pain), straight_to_product (jump in — good for developer audiences), or brief_intro.

Motion (per scene)

Motion	When the director uses it
`kenburns`	gentle eased zoom — the default for a calm screen
`punch_in`	hard cut to tighter framing — clicks / state changes
`spotlight`	pushes into an AI-detected focus region
`cursor`	follows a cursor movement through the UI
`callouts`	annotations / highlights on the screen
`slow_pan`	eased pan across a wide screen or long list
`static_hold`	no motion — for text-heavy screens
`none`	a plain still

And

Hero emphasis — your one differentiating feature gets extra screen-time, words, and motion.
Smart zoom — pushes into the key element as the narration reaches it.
Transitions — hard cuts or crossfades.
Kinetic text — a ≤6-word pull-quote on screen, separate from the spoken narration.
Portrait layout (9:16) — a phone-frame split or a blurred backdrop.
Music — none, a library track, or your own upload, auto-ducked under narration.
Captions — karaoke (word-by-word), plain, sparse, or off.
Your voice + brand — narration and copy in your tone; your colors and font on every frame.

4. What you can configure (the plan)

Every render is described by one resolved plan. It’s tiered: stick to intent for the 80% case; reach into director and scenes for fine control. The app edits these for you; agents send them as JSON.

Tier 1 — intent (the 80%)

Field	Meaning
`artifactType`	demo_video_landscape · linkedin_carousel · launch_copy
`goal`	what this launch should achieve (free text)
`audience`	founders · developers · enterprise · consumer · existing_users · waitlist
`tone`	warm · direct · playful · technical · urgent
`aspect`	16:9 · 9:16 · 1:1
`targetLengthS`	desired length in seconds (10–600, or auto)
`captionStyle`	karaoke · plain · sparse · off
`brandKitId` / `voiceProfileId`	use a saved brand kit / voice

Tier 2 — director (stable)

Field	Meaning
`outline`	hook, problem, solution, beats[], callToAction, close
`hook` / `cta`	enable + text for the opening/closing cards
`music`	source (none/library/upload), track, gain, ducking
`smartZoom`	push into the key element as narration reaches it
`transitions`	none · crossfade
`portraitLayout`	split · blur (9:16 only)
`narrativeTemplate`	one of the six launch types
`openingStyle`	the adaptive opening (director-decided)
`heroBeatIdx`	which beat gets the hero treatment
`brand` / `voice`	resolved colors/font + voice attributes & style rules

Tier 3 — scenes (advanced, no stability promise)

Per-scene overrides: durationS, narrationText, emotion, importance, kineticText, emphasisPhrase, motionPlan, layout (full_bleed / device_frame / split / collage / slide), and more.

The director reconciles — it never silently obeys

Send something incoherent and it fixes it, returning a warnings[] note so you know:

Warning	What it means
`TARGET_LENGTH_CLAMPED`	your length was raised to the floor for the scene count
`LAYOUT_IGNORED_LANDSCAPE`	a portrait layout was ignored on a non-9:16 aspect
`MUSIC_DROPPED_NO_URL`	an upload track with no file → music dropped to none
`HERO_BEAT_CLAMPED`	the hero beat index was clamped into range
`KINETIC_TRUNCATED`	a pull-quote over 6 words was trimmed

5. Drive it with agents, the CLI, or the API

The same engine is fully headless — point an AI agent, a terminal, or CI at it. It’s a two-phase loop: a cheap plan (resolved config + preview frames + a fit-score, no render cost) that you iterate on, then one billable render.

First: mint an API key

In the app go to Settings → API keys → Create key. Choose scopes — plan (read-only, free) and/or render (billable) — and a monthly spend cap. The raw key pk_live_… is shown once. Defaults per key: 60 requests/min and a $5/mo spend cap.

A. From an AI agent (MCP — Claude, Cursor)

Mint a render-scope key (above).
Add the Pitchstage MCP server to your client config and paste your key:

{
  "mcpServers": {
    "pitchstage": {
      "command": "npx",
      "args": ["-y", "@pitchstage/mcp"],
      "env": { "PITCHSTAGE_API_KEY": "pk_live_…" }
    }
  }
}

Restart the client. The agent now has five tools:

Tool	What it does
`pitchstage_create_project`	create a project from a goal → presigned upload URLs
`pitchstage_plan`	run the director without rendering → resolved plan + fit-score + preview frames (as inline images the agent can SEE)
`pitchstage_render`	commit a plan → fork a variant + enqueue the render
`pitchstage_get_render`	inspect a render: status, outputs, transcript, poster, plan
`pitchstage_list_variants`	list a project’s variants

A natural agent loop: create_project → upload screenshots → plan → look at the preview frames + fit-score, tweak the plan, plan again → render → get_render. Because plan returns the frames as images, the agent can judge and improve the result before spending a render. (Optional base URL override: PITCHSTAGE_API_URL, default https://pitchstage.ai/api.)

B. From the terminal / CI (CLI)

npm i -g pitchstage          # or: pnpm add -g pitchstage
export PITCHSTAGE_API_KEY=pk_live_…

Point it at a folder of screenshots. Each subfolder is a feature (one outline beat); loose images become an “Overview”:

screenshots/
  Inbox/      a.png  b.png    →  feature "Inbox"
  Settings/   c.jpg           →  feature "Settings"
  cover.png                   →  "Overview"

Command	What it does
`pitchstage plan <dir> --goal "…"`	ingest the folder, run a plan, print the resolved plan + fit-score
`pitchstage render <dir> --goal "…"`	plan then commit a render; polls to completion
`pitchstage render … --watch`	re-plan on file changes — keep a demo in sync in CI
`pitchstage get <renderId>`	print a render’s machine-readable result
`pitchstage keys`	show the configured key + base URL

pitchstage render ./screenshots --goal "30s launch teaser for developers"

C. Straight HTTP (the API)

Prefer raw requests? Every endpoint, the resolved-plan contract, error codes, and limits are in the headless API reference (the machine-readable spec is at /api/v1/openapi.json).

Q&A

Do I have to write a script?

No. The director drafts the narration and outline from your screenshots + paragraph. You edit what you want and leave the rest.

My app is behind a login — how do I capture it?

Easiest is to drop screenshots of the signed-in screens (you’re already looking at them). To have the director drive your live app, use the in-app “Log in here” hosted browser (nothing to install — Google/SSO and 2FA work, and your session is used once and never stored). A one-click browser extension is coming to the Chrome Web Store.

Will it sound like me?

Set up a voice profile (paste a few of your posts) and the narration + copy are written in your tone. Without one, a clean default voice is used.

How long does a render take?

Usually ~30–90 seconds for a real flow. You can leave the page and come back; agents/CLI poll for you.

Is planning free?

The plan phase is read-only and cheap — it runs the director and returns preview frames + a fit-score with no render. You only pay when you render. That’s what keeps iterating (especially for agents) inexpensive.

Can an AI agent run the whole thing end to end?

Yes — that’s what the MCP server is for. The agent can see the preview frames from plan, judge the result, refine the plan, and only then render.

How do I keep a demo video in sync with my product?

Run pitchstage render ./screenshots --goal "…" --watch in CI — it re-plans when the screenshots change.

Can I tweak one part without redoing everything?

Yes — regenerate is scoped (whole / a single scene / audio / video / caption), so you can refresh just the music or one scene’s narration without a full re-render.

What formats do I get?

A narrated demo video in 16:9, 9:16, or 1:1; a LinkedIn carousel; and ready-to-post launch copy (LinkedIn, X, email).

Will it post something wrong or off-brand under my name?

Nothing publishes automatically — you review and download. The director reconciles incoherent settings and tells you what it changed (see the warnings above), and everything renders in your brand + voice.

What are the limits / costs for the API?

Per key: 60 requests/min and a $5/mo spend cap by default (you set the cap when you create the key). A plan-scope key can’t trigger a billable render. Rate- limited responses include a Retry-After header.

Is my uploaded data / login session stored?

Screenshots and outputs live in your workspace. A login session captured for a behind-the-login capture is used once by the worker and never stored.

Still stuck? Open the API reference or head to your dashboard and start a project.