Efficiently Rendering Flash in a 3D Scene

The original source of this post is at the IMVU engineering blog. Subscribe now!

Last time, I talked about how to embed Flash into your desktop application, for UI flexibility and development speed. This time, I'll discuss efficient rendering into a 3D scene.

Rendering Flash as a 3D Overlay (The Naive Way)

At first blush, rendering Flash on top of a 3D scene sounds easy. Every frame:

Create a DIB section the size of your 3D viewport
Render Flash into the DIB section with IViewObject::Draw
Copy the DIB section into an IDirect3DTexture9
Render the texture on the top of the scene

Ta da! But your frame rate dropped to 2 frames per second? Ouch. It turns out this implementation is horribly slow. There are a couple reasons.

First, asking the Adobe flash player to render into a DIB isn't a cheap operation. In our measurements, drawing even a simple SWF takes on the order of 10 milliseconds. Since most UI doesn't animate every frame, we should be able to cache the captured framebuffer.

Second, main memory and graphics memory are on different components in your computer. You want to avoid wasting time and bus traffic by unnecessarily copying data from the CPU to the GPU every frame. If only the lower-right corner of a SWF changes, we should limit our memory copies to that region.

Third, modern GPUs are fast, but not everyone has them. Let's say you have a giant mostly-empty SWF and want to render it on top of your 3D scene. On slower GPUs, it would be ideal if you could limit your texture draws to the region of the screen that are non-transparent.

Rendering Flash as a 3D Overlay (The Fast Way)

Disclaimer: I can't take credit for these algorithms. They were jointly developed over years by many smart engineers at IMVU.

First, let's reduce an embedded Flash player to its principles:

Flash exposes an IShockwaveFlash [link] interface through which you can load and play movies.
Flash maintains its own frame buffer. You can read these pixels with IViewObject::Draw.
When a SWF updates regions of the frame buffer, it notifies you through IOleInPlaceSiteWindowless::InvalidateRect.

In addition, we'd like the Flash overlay system to fit within these performance constraints:

Each SWF is rendered over the entire window. For example, implementing a ball that bounces around the screen or a draggable UI component should not require any special IMVU APIs.
If a SWF is not animating, we do not copy its pixels to the GPU every frame.
We do not render the overlay in transparent regions. That is, if no Flash content is visible, rendering is free.
Memory consumption (ignoring memory used by individual SWFs) for the overlay usage is O(framebuffer), not O(framebuffer * SWFs). That is, loading three SWFs should not require allocation of three screen-sized textures.
If Flash notifies of multiple changed regions per frame, only call IViewObject::Draw once.

Without further ado, let's look at the fast algorithm:

Flash notifies us of visual changes via IOleInPlaceSiteWindowless::InvalidateRect. We take any updated rectangles and add them to a per-frame dirty region. When it's time to render a frame, there are four possibilities:

The dirty region is empty and the opaque region is empty. This case is basically free, because nothing need be drawn.
The dirty region is empty and the opaque region is nonempty. In this case, we just need to render our cached textures for the non-opaque regions of the screen. This case is the most common. Since a video memory blit is fast, there's not much we could do to further speed it up.
The dirty region is nonempty. We must IViewObject::Draw into our Overlay DIB, with one tricky bit. Since we're only storing one overlay texture, we need to render each loaded Flash overlay SWF into the DIB, not just the one that changed. Imagine an animating SWF underneath another translucent SWF. The top SWF must be composited with the bottom SWF's updates. After rendering each SWF, we scan the updated DIB for a minimalish opaque region. Why not just render the dirty region? Imagine a SWF with a bouncing ball. If we naively rendered every dirty rectangle, eventually we'd be rendering the entire screen. Scanning for minimal opaque regions enables recalculation of what's actually visible.
The dirty region is nonempty, but the updated pixels are all transparent. If this occurs, we no longer need to render anything at all until Flash content reappears.

This algorithm has proven efficient. It supports multiple overlapping SWFs while minimizing memory consumption and CPU/GPU draw calls per frame. Until recently, we used Flash for several of our UI components, giving us a standard toolchain and a great deal of flexibility. Flash was the bridge that took us from the dark ages of C++ UI code to UI on which we could actually iterate.