Update on Sorting Translucent Triangles

In my last article on the subject, I searched for an algorithm to sort translucent triangle meshes that ran in less than quadratic time. Sadly, on large organic meshes, the proposed algorithm didn’t perform very well. In particular, when triangles in one continuous patch of geometry sort differently, the individual triangles of the mesh become visible, which is visually worse than not sorting at all.

Thus, I gave up on that approach and tried some others. If the algorithm has to be quadratic, then we have some options:

Topological Sort

First, build a digraph that represents whether a triangle should be rendered before another.
A triangle A should be rendered before triangle B if A is in front of B and B is facing A but A is not facing B.

Once the graph is built, cycles must be broken, ideally using a heuristic that minimizes visual errors. (If a set of triangles form a cycle, there is no ordering that has no errors.) Finally, once all cycles are broken, the graph can be topologically sorted using a depth-first traversal.

This algorithm is O(n^2) in CPU and O(n^2) in memory.

Topologically sorting works well on geometric meshes with a clear triangle ordering. But on organic meshes, there are a great number of cycles, and the heuristic cycle breaking produces visual artifacts where individual triangles become visible.

Coverage Sort

For all pairs of triangles (A, B), if A is in front of B, calculate the form factor of A onto B. Form factor between triangles is fairly hard to calculate, so I tried a simple approximation using area of covering triangle, distance between triangle centers, and incident angles. Then, using a comparison sort, order the triangles such that triangles covered the most are first in the mesh.

This algorithm is O(n^2) in CPU and O(n) in memory.

I am not sure whether my heuristic was that good, but this also produced somewhat weak results on large organic meshes.

Sort by Distance and Angle to Center

This is a cheapo algorithm that works pretty well on convex shapes. Sort all triangles by the dot product of the triangle’s normal vector and the vector from the mesh’s center to the triangle’s center.

It’s pretty easy to come up with cases where this algorithm will sort incorrectly, but for most convex meshes it does well. More importantly, errors occur across entire slabs, which looks better than errors showing individual triangles.

Jon Watte’s “Coverage Peeling” Algorithm

Jon’s original algorithm works by building a graph of which triangles are in front of others, just like the topological sort above. It also accumulates a per-triangle-pair coverage metric, and accumulates them into a per-triangle coverage sum.

Then it peels triangles off, one at a time, by finding the triangle with the lowest coverage value. After peeling a triangle off, it reduces coverage values by the corresponding amount.

This algorithm works well, even in the presence of triangle cycles, but it has a fairly high constant factor. However, it runs out of memory on large meshes where the topological sort doesn’t.

I still plan to see if I can get an implementation to complete on a 60,000 triangle mesh in a few seconds and under a few GB of memory.


Algorithm Time Complexity Memory Constant Factors Errors
Topological Sort O(n^2) O(n^2) Low Per-triangle
Coverage Sort O(n^2) O(n) Low Per-triangle
Distance and Angle Sort O(n lg n) O(n) Low Consistent regions of the mesh
Coverage Peeling O(n^2) O(n^2) High Per-triangle, but fewer than the other algorithms

View-Independent Translucent Triangle Sorting

Given a large, potentially self-intersecting, translucent, two-sided triangle mesh, we’d like to render it with as few visual artifacts as possible. The issue is that OpenGL and Direct3D will render triangles in exactly the order they’re given, and since alpha blending is not commutative, alpha-blended triangles must be rendered back to front.

My colleague, Jon Watte, pointed me to an article he’d written that describes an O(n^2) algorithm that splits all double-sided triangles into two, which are then sorted based on a coverage factor: how many other triangles “shadow” this one. Triangles that are farther back are sorted lower in the list than triangles in front of them. This algorithm is view-independent.

My understanding is that Torque Engine and Unreal Engine implement similar algorithms.

The idea works well in practice, but O(n^2) is too slow for large meshes, especially approaching the 100,000 triangle mark.

I set out to see if we could get similar results from an algorithm with subquadratic or even linear-ish running time.

The basic idea is, if we can define a strict weak ordering on triangles, then we can run a standard sort algorithm across the triangles and have a perfectly-ordered, view-independent, translucent mesh.

What does it mean to order two translucent triangles in a mesh?

bool compare(Triangle A, Triangle B) {
    a_behind_b = all points in A behind B
    b_behind_a = all points in B behind A
    if a_behind_b and b_behind_a: return false
    if !(a_behind_b or b_behind_a): return false
    if a_behind_b: return true
    if b_behind_a: return false
    // some kind of weighted area comparison

If two triangles are facing away from each other — that is, each is behind the other — then the order in which they’re rendered does not matter.

A = B.

If two triangles face each other, their order also does not matter.

Again, here, A = B.

If two triangles point the same direction, but triangle A is in front of triangle B, then A should always be rendered before B.

A < B.

If two triangles intersect, without splitting, there is no correct rendering order. We’d like to sort them by each’s coverage of the other, which can be calculated by each triangle’s plane with the other triangle.

In this case, A < B because B covers A more than A covers B.

So far we have a strict weak ordering. However, even ignoring the possibility of intersecting triangles, there’s a problem. Strict weak ordering implies and requires transitivity. (Indeed, that’s how O(n lg n) comparison sorting algorithms work.) However, consider the following three triangles, A, B, and C.

A > B
B > C
C > A

Without splitting, there is no ordering that can render this mesh correctly from all viewpoints.

Rats, this means our triangle comparison does not implement a strict weak ordering. What if we passed it to std::sort or std::stable_sort anyway? Undefined behavior, according to my reading of the C++ 2003 spec. But let’s say we implemented a merge sort without undefined behavior if given a non-strict-weak ordering. What’s the worst ordering that could happen?

Given that comparison sorts rely on transitivity, consider our failure case from before, this time with a few extra “easy” triangles (labeled X and Y) thrown in.

The sort may compare A>B and B>C, assuming “A>C”. Then, worst case, given any triangle X where X>A, “X>C” which could be incorrect. In addition, given any triangle Y where C>Y, “A>Y” which could also be incorrect. The circled Xs and Ys in the diagram above indicate incorrectly-ordered triangles if the sort assumes “A>C”. That is, even the smallest cycle could result in several triangles being sorted incorrectly. A possible sort order would be: YYYCBAXXX, even though, ideally, one of the Ys > A and and two of the Xs < C.

A body of research named consensus clustering or rank aggregation explores approximating the optimal ordering given conflicting constraints. The goal is to minimize conflicting orderings. An optimal solution is NP-complete, but I propose the following stochastic solution, where k and j are tunable constants:

for k iterations:
    shuffle triangle list
    merge sort
    pick j random pairs of triangles, count number of pairs where (second < first)
    keep ordering that minimizes incorrect orderings

This algorithm runs in O(k*(n+j)) — better than quadratic, which was the stated goal. Also, given sufficient iterations, stochastic algorithms tend to avoid worst-case scenarios, which is good enough here.

Now, if we could find cycles in subquadratic time, we could break them by selectively splitting triangles or by removing cycles entirely from the mesh and merging them separately, once the strict weak ordering on non-cycles is determined. Cycle detection is O(E+V) in general, but total ordering implies E=V^2.

Future work: Could we use the fact that some triangles can be rendered in any order relative to other triangles to optimize vertex cache hits?

Thanks to a bunch of people who explored this problem with me.

Extracting Color and Transparency from Flash

The original source of this post is at the IMVU engineering blog. Subscribe now!

For clarity, I slightly oversimplified my previous discussion on efficiently rendering Flash in a 3D scene. The sticky bit is extracting transparency information from the Flash framebuffer so we can composite the overlay into the scene.

Flash does not give you direct access to its framebuffer. It does, with IViewObject::Draw, allow you to composite the Flash framebuffer onto a DIB section of your choice.

Remembering your Porter-Duff, composition of source over dest is:

Color = SourceColor * SourceAlpha + DestColor * (1 - SourceAlpha)

If the source color is premultiplied, you get:

Color = SourceColor + DestColor * (1 - SourceAlpha)

Assuming we want premultiplied color and alpha from Flash for efficient rendering in the 3D scene, applying the above requires solving for FlashAlpha and FlashColor:

RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor = FlashColor * FlashAlpha + DestColor - DestColor * FlashAlpha

RenderedColor - DestColor = FlashColor * FlashAlpha - DestColor * FlashAlpha

RenderedColor - DestColor = FlashAlpha * (FlashColor - DestColor)

FlashAlpha = (RenderedColor - DestColor) / (FlashColor - DestColor)

If FlashColor and DestColor are equal, then FlashAlpha is undefined. Intuitively, this makes sense. If you render a translucent black SWF on a black background, you can’t know the transparency data because all of the pixels are still black. This doesn’t matter, as I’ll show in a moment.

FlashColor is trivial:

RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor - DestColor * (1 - FlashAlpha) = FlashColor * FlashAlpha

FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha

FlashColor is undefined if FlashAlpha is 0. Transparency has no color.

What do these equations give us? We know RenderedColor, since it’s the result of calling IViewObject::Draw. We have control over DestColor, since we configure the DIB Flash is drawn atop. What happens if we set DestColor to black (0)?

FlashColor = (RenderedColorOnBlack) / FlashAlpha

What happens if we set it to white (1)?

FlashColor = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha

Now we’re getting somewhere! Since FlashColor and FlashAlpha are constant, we can define a relationship between FlashAlpha and RenderedColorOnBlack and RenderedColorOnWhite:

(RenderedColorOnBlack) / FlashAlpha = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha

RenderedColorOnBlack = RenderedColorOnWhite - 1 + FlashAlpha

FlashAlpha = RenderedColorOnBlack - RenderedColorOnWhite + 1

FlashAlpha = RenderedColorOnWhite - RenderedColorOnBlack

So all we have to do is render the SWF on a white background and a black background and subtract the two to extract the alpha channel.

Now what about color? Just plug the calculated FlashAlpha into the following when rendering on black.

FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha

FlashColor = RenderedColorOnBlack / FlashAlpha

Since we want premultiplied alpha:

FlashColor = RenderedColorOnBlack

Now that we know FlashColor and FlashAlpha for the overlay, we can copy it into a texture and render the scene!