
Today, we are learning how to seamlessly merge a 2D Saturday-morning cartoon character into a photorealistic, 3D real-world environment using AI.
👇 PROMPTS USED IN THIS VIDEO 👇
1️⃣ Nano Banana 2 Prompts (Image Generation) Real World Image:
Street-level photograph looking down a narrow New York City street, tall buildings rising on both sides creating a dramatic urban canyon. The perspective draws the eye straight down the center of the road. On the right side of the frame, the distinctive Art Deco limestone base of the Empire State Building is clearly visible — its grand entrance with the iconic aluminum and chrome canopy, vertical ribbed facade, and setback tower rising above. The left side features classic Manhattan mixed-use buildings with fire escapes, awnings, and ground-floor storefronts. Late afternoon golden hour light cuts between the buildings, casting long shadows across the asphalt. Taxis, pedestrians, and street vendors add life to the scene. Cinematic composition, photorealistic, shot on 35mm lens, shallow depth of field.
Insert 2D Cartoon:
A single massive 2D cartoon gorilla crossing the street from left to right, mid-stride in the center of the road between the yellow taxi cabs. The gorilla is illustrated in a flat 2D cartoon style with bold black outlines, exaggerated proportions, and a playful expression. It towers over the cars, taking up most of the street width. One giant foot is planted on the asphalt, the other mid-step. Its long arms swing naturally as it casually strolls across like a pedestrian. The gorilla is stylized, not realistic — think Saturday morning cartoon energy with solid color fills and minimal shading. It interacts with the scene naturally, casting a shadow on the street beneath it.
2️⃣ Kling 3.0 Prompt (Image-to-Video - 5s)
Camera remains completely static and locked in place at street level, maintaining the exact original framing throughout. The massive 2D cartoon gorilla turns toward the brick building on the left, reaches up and grabs the facade. It begins climbing — fingers gripping window ledges and fire escapes, pulling itself higher and higher up the building. As it climbs, its body moves upward and eventually exits the top of the frame entirely, disappearing from view. The street below reacts subtly — pedestrians look up, a taxi inches forward. The city scene remains still and grounded, golden hour light unchanged. The camera never moves, never pans, never tilts. 2D cartoon gorilla style contrasts with photorealistic city. 4K.
Let me know in the comments what impossible creations you end up making with this technique! Don't forget to like and subscribe for more AI video tutorials.
Goldwork.app
If you want to swipe through award winning creative work check out Goldwork.app
Note: best experienced via your phone.

