Emmanuel Mumba

Posted on Jul 9 • Edited on Jul 10

How to Create Any Google Veo 3 Video Styles with json format Hack

#webdev #programming #veo3 #videogeneration

If you’ve ever wanted to take control of Google Veo’s powerful video generation but felt boxed in by vague prompts, you’re not alone. Luckily, there’s a hack going around the creative corners of the internet that lets you fine-tune every single element of your video—using a clean JSON format.

Before we dive deep into crafting cinematic prompts with JSON, here’s a tip for devs building anything around video generation tools, APIs, or creative workflows:Apidog Docs is perfect for documenting and testing your API endpoints in one clean interface.

In this guide, we’ll break down what this JSON hack looks like, why it’s blowing up, and how you can use it to replicate cinematic aesthetics, lens types, wardrobe styles, ambient sound, and even tone of voice. Whether you’re building a fashion short film or an anime-inspired clip, this method gives you the building blocks.

What’s the Deal with the Veo JSON Hack?

Instead of feeding Veo 3 a vague block of text and hoping it gets it right, this JSON-format approach gives you something better: structure and control.

It’s like giving the AI a shot list and creative brief in one — and suddenly, your output starts to feel like it had a human director.

Here’s why this works:

Why JSON Makes Sense for Veo Prompts:

Cleaner input: Each section of your idea (camera, subject, audio, lighting, etc.) is broken down clearly.
Modular editing: Want to change the mood or location? Just tweak one section—no need to rewrite the whole thing.
Cinematic control: You can define:
Lens type and film grain
Camera movement (e.g., Steadicam, handheld)
Ambient sound and vocal tone
Lighting style and time of day
Specific wardrobe and styling cues
No surprises: Want no subtitles or overlays? Just say it outright in the visual_rules section.

What This Means for Creators:

You're not guessing what Veo “might” generate anymore.
You’re guiding the visuals like a director using a script.
You can replicate or remix your style across scenes or projects.

So instead of hoping for great results, you’re engineering them—one field at a time.

Full JSON Example Breakdown

Let’s break down this example JSON block that generated a stylish Tokyo street-style morning scene:

{ "shot": { "composition": "Medium tracking shot, 50mm lens, shot on RED V-Raptor 8K with Netflix-approved HDR setup, shallow depth of field", "camera_motion": "smooth Steadicam walk-along, slight handheld bounce for naturalistic rhythm", "frame_rate": "24fps", "film_grain": "clean digital with film-emulated LUT for warmth and vibrancy" }, "subject": { "description": "A young woman with a petite frame and soft porcelain complexion. She has oversized, almond-shaped eyes with long lashes, subtle pink-tinted cheeks, and a heart-shaped face. Her inky-black bob is slightly tousled and clipped to one side with a small red strawberry hairpin. Her style blends playful retro and modern Tokyo streetwear: she wears a crocheted ivory halter top with scalloped edges, high-waisted denim shorts with a wide brown belt and a red enamel star buckle, and a loose red gingham blouse draped off one shoulder. Her accessories include glossy cherry lip tint, a beaded bracelet stack, and soft shimmer eyeshadow.", "wardrobe": "Crocheted ivory halter with scalloped trim, fitted high-waisted denim shorts, wide tan belt with red enamel star buckle, oversized red gingham blouse slipped off one shoulder, strawberry hairpin in side-parted bob, and translucent plastic bead bracelets in pink and cream tones." }, "scene": { "location": "a quiet urban street bathed in early morning sunlight", "time_of_day": "early morning", "environment": "empty sidewalks, golden sunlight reflecting off puddles and windows, occasional birds fluttering by, street slightly wet from overnight rain" }, "visual_details": { "action": "she walks rhythmically down the sidewalk, swinging her hips slightly with the beat, one hand gesturing playfully, the other adjusting her shirt sleeve as she sings", "props": "morning mist, traffic light turning green in the distance, reflective puddles, subtle sun flare" }, "cinematography": { "lighting": "natural golden-hour lighting with soft HDR bounce, gentle lens flare through morning haze", "tone": "playful, stylish, vibrant", "notes": "STRICTLY NO on-screen subtitles, lyrics, captions, or text overlays. Final render must be clean visual-only." }, "audio": { "ambient": "city birds chirping, distant traffic hum, her boots tapping pavement", "voice": { "tone": "light, teasing, and melodic", "style": "pop-rap delivery in Japanese with flirtatious rhythm, confident breath control, playful pacing and bounce" }, "lyrics": "ラーメンはもういらない、キャビアだけでいいの。ファイナンスのおかげで、私、星みたいに輝いてる。" }, "color_palette": "sun-warmed pastels with vibrant reds and denim blues, soft contrast with warm film LUT", "dialogue": { "character": "Woman (singing in Japanese)", "line": "ラーメンはもういらない、キャビアだけでいいの。ファイナンスのおかげで、私、星みたいに輝いてる。", "subtitles": false }, "visual_rules": { "prohibited_elements": [ "subtitles", "captions", "karaoke-style lyrics", "text overlays", "lower thirds", "any written language appearing on screen" ] } }

Rather than paste the entire block again, here’s what this structured prompt includes:

Shot

Composition type (medium tracking shot, 50mm lens)
Motion style (Steadicam, with a touch of handheld)
Frame rate and LUT film grain
You basically get full cinematographer-level control here.

Subject & Wardrobe

The description is highly detailed—down to accessories like strawberry hairpins and cherry lip gloss. The character is described in visual, tactile language that helps the AI model generate vivid results.

Scene & Environment

Time of day: Early morning
Atmosphere: Golden light, empty street, wet pavement
It even includes birds and puddle reflections.

Visual Details & Props

Physical actions like walking, singing, adjusting clothes
Elements like sun flares and mist
Props (traffic light in distance, puddles, etc.)

Lighting & Tone

Golden hour with HDR bounce and soft lens flares. Think soft, dreamy, but vibrant. It also sets the mood: “playful, stylish, vibrant.”

Audio & Lyrics

Ambient audio: birds, distant cars, shoes tapping
Voice tone: melodic, teasing, playful
Lyrics in Japanese: flashy, finance-themed

No subtitles, no captions—this is a strict “visual-only” policy.

Why This Method Works

AI video generators like Veo thrive on structure. While most prompt-based tools respond to loose storytelling instructions, JSON gives your request:

Clarity: No confusion about what goes where
Control: Set every scene element like a director
Reproducibility: You can tweak one part at a time

Customize It for Your Own Videos

Want to use this format for your own project? Here’s a simple way to do it:

You can plug in your own style references, film gear, mood, and tone. The more specific, the better.

Tips to Nail the Perfect Veo JSON Prompt

Stick to film language: Use words like “lens,” “frame rate,” “cinematic motion,” “bokeh,” etc.
Describe subject like you’re painting: Facial structure, clothing texture, accessories
Set tone with lighting and audio: Warm/cold, sharp/soft, ambient/clean
Use verbs: Have your character walk, spin, sing, adjust, etc.
Avoid prohibited elements: Like this JSON did—no on-screen text unless you want chaos.

Before You Try It...

This method isn't "official," but it’s shockingly effective. Don’t be afraid to experiment. Start small—change the lighting, add props, or switch the scene—and compare the results. That’s where the magic happens.

If Google ever decides to expose a formal JSON interface, you’ll already be ahead of the game.

Why This Matters for Creators and Developers

Generative video tools like Veo 3 aren’t just about clicking “generate” and hoping for the best anymore. They’re evolving into precision instruments—and this JSON approach proves it. For creators, that means you don’t need to settle for generic outputs. With a structured format, you can dial in exactly what you want, from lens type to lighting mood, all the way to wardrobe details and ambient audio.

For developers, this opens up exciting possibilities:

You could build custom prompt templates for different aesthetics.
Automate prompt generation based on mood boards or UI inputs.
Even integrate with APIs to create video production pipelines.

It's like turning generative video into a programmable medium—and that’s a big deal. It means your creative vision doesn’t get lost in vague prompts. Instead, it’s translated clearly, line by line, into a stunning visual output.

This isn’t just a hack. It’s a new workflow. One that’s structured, repeatable, and tailored to your vision.

Final Thoughts

This JSON-style hack shows that cinematic video generation is entering its prompt-engineering era. With the right structure, you can make Veo 3 do things that feel hand-directed.

Whether you’re making moody cityscapes or fun music video snippets, the format is flexible enough to match your vision.

Let your JSON tell the story—and let your tools bring it to life.

Enterprise-level in-app dashboards. Startup-level speed.

Ship pixel-perfect dashboards that feel native to your app with Embeddable. It's fast, flexible, and built for devs.

Get early access

Top comments (11)

Gary Svenson • Jul 9

Great guide! Never thought of this method. Good work Emmanuel!

Emmanuel Mumba • Jul 9

You are welcome Gary. It's quite a hidden gem.

Ananya Balehithlu • Jul 9

Impressive work! Having been watching so many Google Veo videos with various styles, this is how they cooked!

Good job!

Emmanuel Mumba • Jul 9

Glad you find it helpful. That's awesome.😎

Anietie Brownson • Jul 9

Nice one Emmanuel
I'll definitely try it out

Nathan Tarbert • Jul 9

This is extremely impressive, honestly. I've wasted too much time on scattered prompts that never quite deliver - having this kind of control with JSON feels like finally getting to actually direct instead of just hoping for the best

Raymond Camden • Jul 9

So hey... am I the only one who didn't actually see a JSON example? Like, it's mentioned, a lot, and never actually shown. Yet none of the other commenters noticed this, so maybe it's just me.

Emmanuel Mumba • Jul 10

Hey Raymond, Thanks for pointing that out. I might have mistakenly edited that part. It's now added back. Thanks

Lynn Mikami • Jul 9 • Edited

Cool tips! Do you have a prompt or n8n workflow that can automate this process?

Emmanuel Mumba • Jul 9

Yes! 🔥 And definitely keep an eye out for my next post — I’ll be sharing more on that soon.

Amorangi Mathews • Jul 17

Cool! I'm doing something similar with Sora using a DSL based Lisp and S-expression mixed with JSON(why? vibe coding xD) Messing with functions and parameter rules, currently sat on PascalCase but am about to move to NLP for the parameters.

SceneID("scene_id", Duration=5s) {
Environment(Room=..., Style=..., Time=..., Lighting(...))
Characters(Main=..., BodyState=..., Gesture=..., EyeContact=...)
Props(...)
Camera(Angle=..., Motion=..., Framing=...)
Emotion(Outer=..., Inner=...)
FocalPoint("...")
Rhythm(Beat=..., Cut=...)
SceneArc(Action=..., Outcome=...)
Echo("Previous scene_id")
}