Veo3: how to use prompts in JSON to create more consistent videos with AI

Veo3: how to use prompts in JSON to create more consistent videos with AI

Veo3: how to use prompts in JSON to create more consistent videos with AI

25 de ago. de 2025

Nando

CEO | FOUNDER

If you've ever tried Veo3, you know that Google's tool is becoming one of the most powerful when it comes to AI video generation. But to truly extract its potential, it's not enough to write loose text prompts; the secret lies in using structures in JSON, which organize every detail of the narrative, camera, style, and effects clearly.

This technique allows for more creative control, facilitates adjustments, ensures character consistency, and helps work on complex projects in stages. In this article, I will show how to structure prompts in JSON for Veo3, bringing best practices, examples, and a free tool that will simplify your creative process.

What makes Veo3 special?

Veo3 goes beyond simple "text in motion." It integrates image, sound, and narrative in a single creative flow, bringing features that lend more realism and consistency to the result.

The model allows for video creation from short text descriptions that indicate elements like characters and scenes. It can generate videos with audio, such as dialogues and ambient sounds. Among the main differentiators are:

  • High-quality video generation from text.

  • Inclusion of speech and dialogue directly in the result.

  • Creation of sound effects and music that are coherent with the scene.

For more complex videos, it allows control over what would be camera movements and angles, as well as editing and extending scenes to reveal more of the action or to transition to the next shot.

Why write prompts in JSON?

Writing prompts in JSON for Veo3 is not just a technical matter; it's a way to organize your idea clearly, almost like a digital film script.

Instead of throwing out rushed descriptions, the JSON format organizes information into clear blocks: title, style, sequences, camera, lighting, and sound. This helps Veo3 interpret the prompt with much greater precision.

  • Structural clarity: each detail of the scene in its own key.

  • Creative control: breaking the video into stages (in JSON, called stage) ensures a more cohesive narrative.

  • Character and scene consistency: essential in longer videos.

  • Scalability: editable and reusable prompts in other projects.

In the end, JSON works as a map of instructions; you describe what happens, how the camera should capture it, what mood the scene should convey, and what sounds accompany it. The more structured the prompt, the more chances Veo3 will deliver a consistent, professional result that aligns with your creative vision.

The 4 best practices for prompts in Veo3

1. Use precise language

Avoid vague or ambiguous terms. The model takes every word literally.

  • "Woman cooking dinner"

  • "A young woman, 20 years old, cooking dinner in a modern kitchen"

2. Structure in layers (like a sandwich)

Start with the core of the action and add technique, style, and audio.

  • Core: “A dog running”

  • Layers: “Golden retriever running in a field at sunset, medium shot, warm light, sound of birds in the background”.

3. Specify time and rhythm

Words like gradual, fast, sudden, rhythmic help control the flow.

  • “A flower blooming slowly in time-lapse”.

4. Mix technique and creativity

Combine camera terms with emotional descriptions.

  • “Close-up shot of wrinkled hands slowly opening a photo album, warm yellow light, nostalgic music in the background”.

Common mistakes to avoid

  • Being too vague:
    “A person in a room”
    “A chef preparing pasta in a busy kitchen”.

  • Overstating elements:
    “A dragon singing opera while robots dance in an electric storm”.
    “A dragon flying among clouds at sunset, wings beating in a steady rhythm”.

  • Forgetting about audio:
    “A waterfall”
    “Waterfall crashing on rocky cliff with intense sound of water echoing”.

Structure of a JSON prompt in Veo3

When we think about prompts for Veo3, it's common to start by describing a scene in flowing text. This works in simple situations, but for more complex productions, this format quickly becomes limited. That's where JSON comes in as a powerful solution.

JSON functions like a technical film script, organized in blocks that help the model better understand each instruction. Instead of mixing camera, sound, and atmosphere in a single sentence, you separate everything into specific keys, as if you were writing a production map. This clarity reduces ambiguities and increases the level of creative control you have over the result.

Essential components (and why they matter):

  • title: provides creative context and facilitates versioning.

  • style: defines the visual language, such as “cinematic”, “documentary”, or “surrealist”.

  • sequence: the heart of the prompt, where each step represents a micro-scene.

  • description: describes the action and the central visual elements.

  • camera: defines movement and framing (dolly, close-up, wide shot, orbit).

  • lighting: creates the atmosphere (contrast, time of day, neon, soft light).

  • sound_effects / dialogue / music: compose the sound landscape.

  • effects, color_palette, mood, style_reference: refine aesthetic and narrative rhythm.

Quick best practices:

✅ One goal per stage (clear action + camera movement).
✅ Explicit rhythm (slow/fast/gradual/sudden) for temporal control.
✅ Consistency (character, scenery, and light recurring between stages).
✅ Audio from the start (effects, environments, dialogue, or musical indication).

Example 1: Fluid metal ball transforming into an SUV

In this video, we follow a metallic sphere that pulses, dissolves into particles, and reassembles into an SUV within a dark studio. Each stage of the JSON defines the transformation: from the initial sphere to the emergence of individual pieces, until the moment the complete vehicle appears. The result is a clip that mixes technology and visual poetry, worthy of a futuristic automotive teaser.

💡 Access the JSON prompt in the video description on YouTube.

Example 2: Rooftop chase scene

Here the challenge was to create a cinematic one-take: an agent pursues a target running over rooftops in a European city. JSON was used to ensure the fluidity of the camera movement, maintaining tension in each transition.

💡 Access the JSON prompt in the video description on YouTube.

Example 3: SUV in multiple takes

This video was structured to be sophisticated, minimalist, and impactful. First, we see the SUV in a wide shot with dramatic lighting; then, quick cuts reveal details such as headlights, wheels, brakes, and door handles, until returning to the wide shot for the closing. JSON guided each micro-cut, with precision in camera movements and light effects. 

💡 Access the JSON prompt in the video description on YouTube.

Additional examples created by Human

In addition to the above tests, we explored other ideas that showcase the narrative potential of Veo3 with JSON. 

A portrait with an atmosphere charged with sound and light

Architectural metamorphosis, where space gradually emerges

Dynamic study of body and movement, with cut lighting

These videos reinforce how Veo3 interprets complex descriptions and how JSON can control rhythm, camera, and consistency in every detail.

Free tool: JSON Prompt Assistant for Veo3

Want real control of your scene with AI? Then you need to use prompts in JSON. In Veo3, this format allows you to describe camera angles, action, lighting, style, audio... everything with the precision of a director.

Each line of the prompt is a detail of the video. This is creative direction with intelligence. With that in mind, we've created the Human's JSON Prompt Assistant: a visual production partner that transforms any idea, no matter how loose or abstract, into cinematic prompts ready for Veo3.

👉 Click here to access the assistant

How does it work?

You describe your idea, and it translates it into a detailed visual language, structured like a real set. The result is a technical script ready to generate videos with high-budget aesthetics and precision.

The assistant takes into account:

  • Visual narrative: scene by scene, with defined rhythm and emotion.

  • Camera direction: impactful angles, movements, and framings.

  • Lighting and atmosphere: lights, shadows, and tones that reinforce style.

  • Visual effects and transitions: details that elevate aesthetics.

  • Soundscape: sounds and tracks that bring the video to life.

  • Color palette and mood: unified visual identity.

  • Style references: brands, movies, and trends that inspire.

📖 How to use the Human JSON Assistant to create your prompts

1.⁠ ⁠One take with several camera or acting changes

In the assistant, type something like:

“Create a prompt for a one take where the camera starts low, close to a car wheel, then rotates showing the driver’s hand on the steering wheel and finally rises to the driver’s face preparing to accelerate.”

2.⁠ ⁠Multi-scenes in the same prompt

In the assistant, type something like:

“Create a prompt with four different scenes from different angles of a person walking down the street. The camera starts at the feet, cuts to a close-up of the hands, cuts to a wide shot, and finally cuts to a close-up of the face.”

3.⁠ ⁠Transformation effect

In the assistant, type something like:

“A metal ball begins to spin in the middle of a dark studio with subtle cinematic lighting. It disintegrates into particles that form a modern SUV, dramatically lit.”

Now that you know the potential of your new partner and how to use it, all that's left is to put it on screen. Access here.

Conclusion

Creating videos in Veo3 in JSON is like directing a film set: you define every detail of narrative, camera, sound, and style. This structure is what ensures consistency in complex transformations, multi-scene commercials, or continuous action sequences.

In practice, JSON not only improves the clarity of AI but also gives the creator a rare power of creative direction in automated tools. With it, you can achieve results that resemble high-budget productions, but created quickly and accessibly.

If you want to continue exploring this universe, I also recommend checking out Human's favorite tools for creating images, videos, and upscaling with AI.

Get the latest news from the world of AI and the Market

Get the latest news from the world of AI and the Market

Every Thursday at 10 AM, in your email inbox.

MIDJORNEY

ChatGPT

Get the latest news from the world of AI and the Market

Get the latest news from the world of AI and the Market

Every Thursday at 10 AM, in your email inbox.

MIDJORNEY

ChatGPT

Get the latest news from the world of AI and the Market

Get the latest news from the world of AI and the Market

Every Thursday at 10 AM, in your email inbox.

MIDJORNEY

ChatGPT