<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chad Austin &#187; imvu</title>
	<atom:link href="http://chadaustin.me/tag/imvu/feed/" rel="self" type="application/rss+xml" />
	<link>http://chadaustin.me</link>
	<description></description>
	<lastBuildDate>Mon, 07 Nov 2011 21:24:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>High Order Bits (or: mostly correct but in the right direction)</title>
		<link>http://chadaustin.me/2011/10/high-order-bits-or-mostly-correct-but-in-the-right-direction/</link>
		<comments>http://chadaustin.me/2011/10/high-order-bits-or-mostly-correct-but-in-the-right-direction/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 09:17:38 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[agile]]></category>
		<category><![CDATA[imvu]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1672</guid>
		<description><![CDATA[First, the moral: Unit tests are good. But reliable design is better.

Even if you have to deal with short-term pain. Even if you haven’t figured out all of the edge cases.

Let me back up. I love automated tests. I’ve been test-driving code at IMVU since I started. We buy new engineers a copy of Test-Driven [...]]]></description>
			<content:encoded><![CDATA[<p>First, the moral: Unit tests are good. But reliable design is better.</p>

<p>Even if you have to deal with short-term pain. Even if you haven’t figured out all of the edge cases.</p>

<p>Let me back up. I love automated tests. I’ve been test-driving code at IMVU since I started. We buy new engineers a copy of Test-Driven Development: By Example. Whenever there is a bug, we write tests to make sure it never happens again.</p>

<p>After years of working this way, seeing projects succeed and fail, I’d like to refine my perspective.  Let me share a story.</p>

<p>IMVU was originally a bolt-on addition to AOL Instant Messenger.  Two IMVU clients communicated with each other by manipulating AOL IM’s UI and scanning the window for new text messages, much like a screen reader would.  This architecture propagated some implications through our entire codebase:</p>

<p>1) The messaging layer was inherently unreliable.  AOL IM chat windows could be manipulated by the user or other programs.  Thus, our chat protocol was built around eventual consistency.</p>

<p>2) We could not depend on an authoritative source of truth.  Since text-over-IM is peer-to-peer, no client has a true view into where all of the avatars are sitting or who currently owns the room.</p>

<p>Thus, in 2008, long after we’d dropped support for integration with third-party IM clients and replaced it with an authoritative state database, we continued to have severe state consistency bugs.  The client’s architecture still pretended like the chat protocol was unreliable and state was peer-to-peer instead of authoritative.</p>

<p>To address these bugs, we wrote copious test coverage. These were deep tests: start up a local Apache and MySQL instance, connect a couple ClientApp Python processes to them, have one invite another to chat, and assert that their scene graphs were consistent.  We have dozens of these tests, for all kinds of edge cases.  And we thought we’d fixed the bugs for good&#8230;</p>

<p>But the bugs returned.  These tests are still running and passing, mind you, but differences in timing and sequencing result in the same state consistency issues we saw in 2008.  It’s clear that test coverage is not sufficient to prevent these types of bugs.</p>

<p>So what’s the other ingredient in reliable software?  I argue that, in agile software development, correct-by-design systems are underemphasized.</p>

<p>Doesn’t Test Driven Development guide me to build correct-by-design systems?</p>

<p>TDD prescribes a “red, green, refactor” rhythm, where you write a failing test, do the simplest work to make it pass, and then refactor the code so it’s high quality.  TDD helps you reach the “I haven’t seen it fail” stage, by verifying that yes, your code can pass these tests.  But just because you’ve written some tests doesn’t mean your code will always work.</p>

<p>So there’s another level of reliability: “I have considered the ways it can fail, but I can’t think of any.”  This statement is stronger, assuming you’re sufficiently imaginative.  Even still, you won’t think of everything, especially if you’re working at the edge of your human capacity (as you should be).</p>

<p>Nonetheless, thoughtfulness is better than nothing.  I recommend adding a fourth step to your TDD rhythm: “Red, green, refactor, what else could go wrong?” In that fourth step, you deeply examine the code and think of additional tests to write.</p>

<p>The strongest level of software correctness is not about finding possible failure conditions; it’s about proving that your system works in the presence of all inputs.  Correctness proofs for non-trivial algorithms are too challenging for all of the code we write, but in a critical subsystem like chat state management, the time spent on a lightweight proof will easily pay for itself.  Again, I’m not advocating that we always prove the correctness of our software, but we should at least generally be convinced of its correctness and investigate facts that indicate otherwise.  TDD by itself is not enough.</p>

<p>OK, so we can’t easily test-drive or refactor our way out of the chat system mess we got ourselves into, because it’s simply too flawed, so what can we do?  The solution is especially tricky, because in situations like this, there are always features that depend on subtleties of the poor design.  A rewrite would break those features, which is unacceptable, right?  Even if breaking those features is acceptable to the company, there are political challenges.  Imagine the look on your product owner’s face when you announce “Hey I have a new architecture that will break your feature but provide no customer benefit yet.”</p>

<p>The ancient saying “You can’t make an omelette without breaking some eggs” applies directly here.  Preserving 100% feature compatibility is less important than fixing deep flaws.</p>

<p>Why?  High-order bits are hardest to change, but in the end, are all that matters.  The low-order bits are easy to change, and any competent organization will fix the small things over time.</p>

<p>I can’t help but recall the original iPhone.  Everyone said “What?! No copy and paste?!”  Indeed, the iPhone couldn’t copy and paste until 18 months and two major OS releases later.  Even still, the iPhone reshaped the mobile industry.  Clearly 100% feature compatibility is not a requirement for success.</p>

<p>My attitude towards unit testing has changed.  While I write, run, and love unit testing, I value correct-by-design subsystems even more.  When it comes down to it, tests are low-order bits compared to code that just doesn’t break.</p>

<p>For those curious, what are we doing about the chat system?  I’ll let <a href="http://www.slideshare.net/JonWatte/message-queuing-on-a-large-scale-imvus-stateful-realtime-message-queue">Jon’s GDC presentation</a> speak for itself.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2011/10/high-order-bits-or-mostly-correct-but-in-the-right-direction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tracing Leaks in Python: Find the Nearest Root</title>
		<link>http://chadaustin.me/2010/11/tracing-leaks-in-python-find-the-nearest-root/</link>
		<comments>http://chadaustin.me/2010/11/tracing-leaks-in-python-find-the-nearest-root/#comments</comments>
		<pubDate>Mon, 29 Nov 2010 08:20:37 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1602</guid>
		<description><![CDATA[Garbage Collection Doesn&#8217;t Mean You Can Ignore Memory Altogether&#8230;

This post is available on the IMVU Engineering Blog.

Garbage collection removes a great deal of burden from programming.  In fact, garbage collection is a critical language feature for all languages where abstractions such as functional closures or coroutines are common, as they frequently create reference cycles.

IMVU [...]]]></description>
			<content:encoded><![CDATA[<h2>Garbage Collection Doesn&#8217;t Mean You Can Ignore Memory Altogether&#8230;</h2>

<p>This post is available on the <a href="http://engineering.imvu.com/2010/11/29/tracing-leaks-in-python-find-the-nearest-root/">IMVU Engineering Blog</a>.</p>

<p>Garbage collection removes a great deal of burden from programming.  In fact, garbage collection is a critical language feature for all languages where abstractions such as functional closures or coroutines are common, as they frequently create reference cycles.</p>

<p>IMVU is a mix of C++ and Python.  The C++ code generally consists of small, cohesive objects with a clear ownership chain.  An Avatar SceneObject owns a ModelInstance which owns a set of Meshes which own Materials which own Textures and so on&#8230;  Since there are no cycles in this object graph, reference-counting with shared_ptr suffices.</p>

<p>The Python code, however, is full of messy object cycles.  An asynchronous operation may hold a reference to a Room, while the Room may be holding a reference to the asynchronous operation.  Often two related objects will be listening for events from the other.  While Python&#8217;s garbage collector will happily take care of cycles, it&#8217;s still possible to leak objects.</p>

<p>Imagine these scenarios:</p>

<ul>
<li>a leaked or living C++ object has a strong reference to a Python object.</li>
<li>a global cache has a reference to an instance&#8217;s bound method, which implicitly contains a reference to the instance.</li>
<li>two objects with __del__ methods participate in a cycle with each other, and Python <a href="http://docs.python.org/library/gc.html#gc.garbage">refuses to decide which should destruct first</a></li>
</ul>

<p>To detect these types of memory leaks, we use a LifeTimeMonitor utility:</p>

<pre>
a = SomeObject()
lm = LifeTimeMonitor(a)
del a
lm.assertDead() # succeeds

b = SomeObject()
lm = LifeTimeMonitor(b)
lm.assertDead() # raises ObjectNotDead
</pre>

<p>We use LifeTimeMonitor&#8217;s assertDead facility at key events, such as when a user closes a dialog box or 3D window.  Take 3D windows as an example.  Since they&#8217;re the root of an entire object subgraph, we would hate to inadvertently leak them.  LifeTimeMonitor&#8217;s assertDead prevents us from introducing an object leak.</p>

<p>It&#8217;s good to know that an object leaked, but how can you determine why it can&#8217;t be collected?</p>

<h2>Python&#8217;s Garbage Collection Algorithm</h2>

<p>Let&#8217;s go over the basics of automatic garbage collection.  In a garbage-collected system there are objects and objects can reference each other.  Some objects are roots; that is, if an object is referenced by a root, it cannot be collected.  Example roots are the stacks of live threads and the global module list.  The graph formed by objects and their references is the object graph.</p>

<p>In SpiderMonkey, Mozilla&#8217;s JavaScript engine, the root set is <a href="https://developer.mozilla.org/en/SpiderMonkey/JSAPI_Reference/JS_AddRoot">explicitly-managed</a>.  SpiderMonkey&#8217;s GC traverses the object graph from the root set.  If the GC does not reach an object, that object is destroyed.  If C code creates a root object but fails to add it to the root set, it risks the GC deallocating the object while it&#8217;s still in use.</p>

<p>In Python however, the root set is implicit.  All Python objects are ref-counted, and any that can refer to other objects &#8212; and potentially participate in an object cycle &#8212; are added to a global list upon construction.  Each GC-tracked object can be queried for its referents.  Python&#8217;s root set is implicit because anyone can create a root simply by incrementing an object&#8217;s refcount.</p>

<p>Since Python&#8217;s root set is implicit, its garbage collection algorithm differs slightly from SpiderMonkey&#8217;s.  Python begins by setting GCRefs(o) to CurrentRefCount(o) for each GC-tracked PyObject o.  Then it traverses all referents r of all GC-tracked PyObjects and subtracts 1 from GCRefs(r).  Then, if GCRefs(o) is nonzero, o is an unknown reference, and thus a root.  Python traverses the now-known root set and increments GCRefs(o) for any traversed objects.  If any object o remains where GCRefs(o) == 0, that object is unreachable and thus collectible.</p>

<h2>Finding a Path From the Nearest Root to the Leaked Object</h2>

<p>Now that we know how Python&#8217;s garbage collector works, we can ask it for its set of roots by calculating GCRefs(o) for all objects o in gc.get_objects().  Then we perform a breadth-first-search from the root set to the leaked object.  If the root set directly or indirectly refers to the leaked object, we return the path our search took.</p>

<p>Sounds simple, but there&#8217;s a catch!  Imagine that the search function has signature:</p>

<pre>
PyObject* findPathToNearestRoot(PyObject* leakedObject);
</pre>

<p>leakedObject is a reference (incremented within Python&#8217;s function-call machinery itself) to the leaked object, making leakedObject a root!</p>

<p>To work around this, change findPathToNearestRoot so it accepts a singleton list containing a reference to the leaked object.  findPathToNearestRoot can borrow that reference and clear the list, ensuring that leakedObject has no untracked references.</p>

<p>findPathToNearestRoot will find paths to expected Python roots like thread entry points and module objects.  But, since it directly mirrors the behavior of Python&#8217;s GC, it will also find paths to leaked C references!  Obviously, it can&#8217;t directly point you to the C code that leaked the reference, but the reference path should be enough of a clue to figure it out.</p>

<h2>The Code</h2>

<pre>
template&lt;typename ArgType&gt;
void traverse(PyObject* o, int (*visit)(PyObject* visitee, ArgType* arg), ArgType* arg) {
    if (Py_TYPE(o)-&gt;tp_traverse) {
        Py_TYPE(o)-&gt;tp_traverse(o, (visitproc)visit, arg);
    }
}

typedef std::map&lt;PyObject*, int&gt; GCRefs;

static int subtractKnownReferences(PyObject* visitee, GCRefs* gcrefs) {
    if (gcrefs-&gt;count(visitee)) {
        Assert(PyObject_IS_GC(visitee));
        --(*gcrefs)[visitee];
    }
    return 0;
}

typedef int Backlink; // -1 = none

typedef std::vector&lt; std::pair&lt;Backlink, PyObject*&gt; &gt; ReferenceList;
struct Referents {
    std::set&lt;PyObject*&gt;&amp; seen;
    Backlink backlink;
    ReferenceList&amp; referenceList;
};

static int addReferents(PyObject* visitee, Referents* referents) {
    if (!referents-&gt;seen.count(visitee) &amp;&amp; PyObject_IS_GC(visitee)) {
        referents-&gt;referenceList.push_back(std::make_pair(referents-&gt;backlink, visitee));
    }
    return 0;
}

static Backlink findNextLevel(
    std::vector&lt;PyObject*&gt;&amp; chain,
    const ReferenceList&amp; roots,
    PyObject* goal,
    std::set&lt;PyObject*&gt;&amp; seen
) {
    if (roots.empty()) {
        return -1;
    }

    for (size_t i = 0; i &lt; roots.size(); ++i) {
        if (roots[i].first != -1) {
            if (goal == roots[i].second) {
                chain.push_back(goal);
                return roots[i].first;
            }
            seen.insert(roots[i].second);
        }
    }

    ReferenceList nextLevel;
    for (size_t i = 0; i &lt; roots.size(); ++i) {
        Referents referents = {seen, i, nextLevel};
        traverse(roots[i].second, &amp;addReferents, &amp;referents);
    }

    Backlink backlink = findNextLevel(chain, nextLevel, goal, seen);
    if (backlink == -1) {
        return -1;
    }

    chain.push_back(roots[backlink].second);
    return roots[backlink].first;
}

static std::vector&lt;PyObject*&gt; findReferenceChain(
    const std::vector&lt;PyObject*&gt;&amp; roots,
    PyObject* goal
) {
    std::set&lt;PyObject*&gt; seen;
    ReferenceList unknownReferrer;
    for (size_t i = 0; i &lt; roots.size(); ++i) {
        unknownReferrer.push_back(std::make_pair&lt;Backlink&gt;(-1, roots[i]));
    }
    std::vector&lt;PyObject*&gt; rv;
    // going to return -1 no matter what: no backlink from roots
    findNextLevel(rv, unknownReferrer, goal, seen);
    return rv;
}

static object findPathToNearestRoot(const object&amp; o) {
    if (!PyList_Check(o.ptr()) || PyList_GET_SIZE(o.ptr()) != 1) {
        PyErr_SetString(PyExc_TypeError, "findNearestRoot must take a list of length 1");
        throw_error_already_set();
    }

    // target = o.pop()
    object target(handle&lt;&gt;(borrowed(PyList_GET_ITEM(o.ptr(), 0))));
    if (-1 == PyList_SetSlice(o.ptr(), 0, 1, 0)) {
        throw_error_already_set();
    }

    object gc_module(handle&lt;&gt;(PyImport_ImportModule("gc")));
    object tracked_objects_list = gc_module.attr("get_objects")();
    // allocating the returned list may have run a GC, but tracked_objects won't be in the list

    std::vector&lt;PyObject*&gt; tracked_objects(len(tracked_objects_list));
    for (size_t i = 0; i &lt; tracked_objects.size(); ++i) {
        object to = tracked_objects_list[i];
        tracked_objects[i] = to.ptr();
    }
    tracked_objects_list = object();

    GCRefs gcrefs;
    
    // TODO: store allocation/gc count per generation

    for (size_t i = 0; i &lt; tracked_objects.size(); ++i) {
        gcrefs[tracked_objects[i]] = tracked_objects[i]-&gt;ob_refcnt;
    }

    for (size_t i = 0; i &lt; tracked_objects.size(); ++i) {
        traverse(tracked_objects[i], subtractKnownReferences, &amp;gcrefs);
    }

    // BFS time
    
    std::vector&lt;PyObject*&gt; roots;
    for (GCRefs::const_iterator i = gcrefs.begin(); i != gcrefs.end(); ++i) {
        if (i-&gt;second &amp;&amp; i-&gt;first != target.ptr()) { // Don't count the target as a root.
            roots.push_back(i-&gt;first);
        }
    }
    std::vector&lt;PyObject*&gt; chain = findReferenceChain(roots, target.ptr());

    // TODO: assert that allocation/gc count per generation didn't change

    list rv;
    for (size_t i = 0; i &lt; chain.size(); ++i) {
        rv.append(object(handle&lt;&gt;(borrowed(chain[i]))));
    }

    return rv;
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/11/tracing-leaks-in-python-find-the-nearest-root/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Write an Interactive, 60 Hz Desktop Application</title>
		<link>http://chadaustin.me/2010/11/how-to-write-an-interactive-60-hz-desktop-application/</link>
		<comments>http://chadaustin.me/2010/11/how-to-write-an-interactive-60-hz-desktop-application/#comments</comments>
		<pubDate>Wed, 24 Nov 2010 11:17:28 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1592</guid>
		<description><![CDATA[This post is available on the IMVU Engineering Blog.

IMVU&#8217;s client application doesn&#8217;t fit neatly into a single development paradigm:


IMVU is a Windows desktop application.  Mouse clicks, window resizes, and dialog boxes must all respond with imperceptible latency.  Running IMVU should not significantly affect laptop battery life.
IMVU is an interactive 3D game.  The [...]]]></description>
			<content:encoded><![CDATA[<p>This post is available on the <a href="http://engineering.imvu.com/2010/11/24/how-to-write-an-interactive-60-hz-desktop-application/">IMVU Engineering Blog</a>.</p>

<p>IMVU&#8217;s client application doesn&#8217;t fit neatly into a single development paradigm:</p>

<ul>
<li>IMVU is a Windows desktop application.  Mouse clicks, window resizes, and dialog boxes must all respond with imperceptible latency.  Running IMVU should not significantly affect laptop battery life.</li>
<li>IMVU is an interactive 3D game.  The 3D scene must be simulated and drawn at smooth, interactive frame rates, 60 Hz if possible.</li>
<li>IMVU is a networked application.  Sending and receiving network packets must happen quickly and the UI should never have to wait for I/O.</li>
</ul>

<p>Thus, let us clarify some specific requirements:</p>

<ul>
<li>Minimal CPU usage (and thus battery consumption) when the application is minimized or obscured.</li>
<li>Minimal CPU usage in low-complexity scenes.  Unlike most games, IMVU must never unnecessarily consume battery life while waiting in spin loops.</li>
<li>Animation must continue while modal dialog boxes and menus are visible.  You don&#8217;t have control over these modal event loops, but it looks terrible if animation pauses while menus and dialogs are visible.</li>
<li>Animation must be accurate and precise.  It looks much better if every frame takes 22 milliseconds (45 Hz) than if some frames take 30 milliseconds and some take 15 milliseconds (averaging 45 Hz).</li>
<li>Animation must degrade gracefully.  In a really complex room with a dozen avatars, IMVU can easily spend all of a core&#8217;s CPU trying to animate the scene.  In this case, the frame rate should gradually drop while the application remains responsive to mouse clicks and other input events.</li>
<li>Support for Windows XP, Vista, and 7.</li>
</ul>

<h2>Naive Approach #1</h2>

<p>Windows applications typically have a main loop that looks something like:</p>

<pre>
MSG msg;
while (GetMessage(&amp;msg, 0, 0, 0) &gt; 0) {
    TranslateMessage(&amp;msg);
    DispatchMessage(&amp;msg);
}
</pre>

<h3>What went wrong</h3>

<p>Using <a href="http://msdn.microsoft.com/en-us/library/ms644906(VS.85).aspx">SetTimer/WM_TIMER</a> sounds like a good idea for simulation and painting, but it&#8217;s way <a href="http://www.virtualdub.org/blog/pivot/entry.php?id=272">too imprecise</a> for interactive applications.</p>

<h2>Naive Approach #2</h2>

<p>Games typically have a main loop that looks something like the following:</p>

<pre>
while (running) {
    // process input events
    MSG msg;
    while (PeekMessage(&amp;msg, 0, 0, 0, PM_REMOVE)) {
        TranslateMessage(&amp;msg);
        DispatchMessage(&amp;msg);
    }

    if (frame_interval_has_elapsed) {
        simulate_world();
        paint();
    }
}
</pre>

<h3>What went wrong</h3>

<p>The above loop never sleeps, draining the user&#8217;s battery and burning her legs.</p>

<h2>Clever Approach #1: Standard Event Loop + timeSetEvent</h2>

<pre>
void runMainLoop() {
    MSG msg;
    while (GetMessage(&amp;msg, 0, 0, 0) &gt; 0) {
        TranslateMessage(&amp;msg);
        DispatchMessage(&amp;msg);
    }
}

void customWindowProc(...) {
    if (message == timerMessage) {
        simulate();
        // schedules paint with InvalidateRect
    }
}

void CALLBACK TimerProc(UINT, UINT, DWORD, DWORD, DWORD) {
    if (0 == InterlockedExchange(&amp;inFlight, 1)) {
        PostMessage(frameTimerWindow, timerMessage, 0, 0);
    }
}

void startFrameTimer() {
    RegisterClass(customWindowProc, ...);
    frameTimerWindow = CreateWindow(...);
    timeSetEvent(FRAME_INTERVAL, 0, &amp;TimerProc, 0, TIME_PERIODIC);
}
</pre>

<h3>What went wrong</h3>

<p>The main loop&#8217;s GetMessage call always returns messages in a priority order.  Slightly oversimplified, posted messages come first, then WM_PAINT messages, then WM_TIMER.  Since timerMessage is a normal message, it will preempt any scheduled paints.  This would be fine for us, since simulations are cheap, but the dealbreaker is that if we fail to maintain frame rate, WM_TIMER messages are entirely starved.  This violates our graceful degradation requirement.  When frame rate begins to degrade, code dependent on WM_TIMER shouldn&#8217;t stop entirely.</p>

<p>Even worse, the modal dialog loop has a freaky historic detail.  It waits for the message queue to be empty <a href="http://blogs.msdn.com/b/oldnewthing/archive/2004/03/11/87941.aspx">before displaying modal dialogs</a>.  When painting can&#8217;t keep up, modal dialogs simply don&#8217;t appear.</p>

<p>We tried a bunch of variations, setting flags when stepping or painting, but they all had critical flaws.  Some continued to starve timers and dialog boxes and some degraded by ping-ponging between 30 Hz and 15 Hz, which looked terrible.</p>

<h2>Clever Approach #2: PostThreadMessage + WM_ENTERIDLE</h2>

<p>A standard message loop didn&#8217;t seem to be getting us anywhere, so we changed our timeSetEvent callback to PostThreadMessage a custom message to the main loop, who knew how to handle it.  Messages sent via PostThreadMessage don&#8217;t go to a window, so the event loop needs to process them directly.  Since DialogBox and TrackPopupMenu modal loops won&#8217;t understand this custom message, we will fall back on a different mechanism.</p>

<p>DialogBox and TrackPopupMenu send WM_ENTERIDLE to their owning windows.  Any window in IMVU that can host a dialog box or popup menu handles WM_ENTERIDLE by notifying a global idle handler, which can decide to schedule a new frame immediately or in N milliseconds, depending on how much time has elapsed.</p>

<h3>What Went Wrong</h3>

<p>So close!  In our testing under realistic workloads, timeSetEvent had horrible pauses and jitter.  Sometimes the multimedia thread would go 250 ms between notifications.  Otherwise, the custom event loop + WM_ENTERIDLE approach seemed sound.  I tried timeSetEvent with several flags, but they all had accuracy and precision problems.</p>

<h2>What Finally Worked</h2>

<p>Finally, we settled on MsgWaitForMultipleObjects with a calculated timeout.</p>

<p>Assuming the existence of a FrameTimeoutCalculator object which returns the number of milliseconds until the next frame:</p>

<pre>
int runApp() {
    FrameTimeoutCalculator ftc;

    for (;;) {
        const DWORD timeout = ftc.getTimeout();
        DWORD result = (timeout
            ? MsgWaitForMultipleObjects(0, 0, TRUE, timeout, QS_ALLEVENTS)
            : WAIT_TIMEOUT);
        if (result == WAIT_TIMEOUT) {
            simulate();
            ftc.step();
        }

        MSG msg;
        while (PeekMessage(&amp;msg, 0, 0, 0, PM_REMOVE)) {
            if (msg.message == WM_QUIT) {
                return msg.wParam;
            }

            TranslateMessage(&amp;msg);
            DispatchMessage(msg);
        }
    }
}
</pre>

<h3>Well, what about modal dialogs?</h3>

<p>Since we rely on a custom message loop to animate 3D scenes, how do we handle standard message loops such as the modal DialogBox and TrackPopupMenu calls?  Fortunately, DialogBox and TrackPopupMenu provide us with the hook required to implement frame updates: <a href="http://msdn.microsoft.com/en-us/library/ms645422(VS.85).aspx">WM_ENTERIDLE</a>.</p>

<p>When the standard DialogBox and TrackPopupMenu modal message loops go idle, they send their parent window a WM_ENTERIDLE message.  Upon receiving WM_ENTERIDLE, the parent window determines whether it&#8217;s time to render a new frame.  If so, we animate all visible 3D windows, which will trigger a WM_PAINT, which triggers a subsequent WM_ENTERIDLE.</p>

<p>On the other hand, if it&#8217;s not time to render a new frame, we call timeSetEvent with TIME_ONESHOT to schedule a frame update in the future.</p>

<p>As we saw previously, timeSetEvent isn&#8217;t as reliable as a custom loop using MsgWaitForMultipleObjectsEx, but if a modal dialog or popup menu is visible, the user probably isn&#8217;t paying very close attention anyway.  All that matters is that the UI remains responsive and animation continues while modal loops are open.  Code follows:</p>

<pre>
LRESULT CALLBACK ModalFrameSchedulerWndProc(HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
    if (message == idleMessage) {
        stepFrame();
    }
    return DefWindowProc(hwnd, message, wparam, lparam);
}

struct AlmostMSG {
    HWND hwnd;
    UINT message;
    WPARAM wparam;
    LPARAM lparam;
};

void CALLBACK timeForPost(UINT, UINT, DWORD_PTR user_data, DWORD_PTR, DWORD_PTR) {
    AlmostMSG* msg = reinterpret_cast&lt;AlmostMSG*&gt;(user_data);
    PostMessage(msg-&gt;hwnd, msg-&gt;message, msg-&gt;wparam, msg-&gt;lparam);
    delete msg;
}

void PostMessageIn(DWORD timeout, HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
    if (timeout) {
        AlmostMSG* msg = new AlmostMSG;
        msg->hwnd = hwnd;
        msg->message = message;
        msg->wparam = wparam;
        msg->lparam = lparam;
        timeSetEvent(timeout, 1, timeForPost, reinterpret_cast&lt;DWORD_PTR&gt;(msg), TIME_ONESHOT | TIME_CALLBACK_FUNCTION);
    } else {
        PostMessage(hwnd, message, wparam, lparam);
    }
}

class ModalFrameScheduler : public IFrameListener {
public:
    ModalFrameScheduler() { stepping = false; }

    // Call when WM_ENTERIDLE is received.
    void onIdle() {
        if (!frameListenerWindow) {
            idleMessage = RegisterWindowMessageW(L"IMVU_ScheduleFrame");
            Assert(idleMessage);

            WNDCLASS wc;
            ZeroMemory(&amp;wc, sizeof(wc));
            wc.hInstance = GetModuleHandle(0);
            wc.lpfnWndProc = ModalFrameSchedulerWndProc;
            wc.lpszClassName = L"IMVUModalFrameScheduler";
            RegisterClass(&amp;wc);

            frameListenerWindow = CreateWindowW(
                L"IMVUModalFrameScheduler",
                L"IMVUModalFrameScheduler",
                0, 0, 0, 0, 0, 0, 0,
                GetModuleHandle(0), 0);
            Assert(frameListenerWindow);
        }

        if (!stepping) {
            const unsigned timeout = ftc.getTimeout();
            stepping = true;
            PostMessageIn(timeout, frameListenerWindow, idleMessage, 0, 0);
            ftc.step();
        }
    }
    void step() { stepping = false; }

private:
    bool stepping;
    FrameTimeoutCalculator ftc;
};
</pre>

<h2>How has it worked out?</h2>

<p>A custom message loop and WM_ENTERIDLE neatly solves all of the goals we laid out:</p>

<ul>
<li>No unnecessary polling, and thus increased battery life and performace.</li>
<li>When possible, the 3D windows animate at 60 Hz.</li>
<li>Even degradation.  If painting a frame takes 40 ms, the frame rate will drop from 60 Hz to 25 Hz, not from 60 Hz to 15 Hz, as some of the implementations did.</li>
<li>Animation continue to play, even while modal dialogs and popup menus are visible.</li>
<li>This code runs well on XP, Vista, and Windows 7.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/11/how-to-write-an-interactive-60-hz-desktop-application/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>At Least Interview (or: How I Ended Up at IMVU)</title>
		<link>http://chadaustin.me/2010/08/at-least-interview-or-how-i-ended-up-at-imvu/</link>
		<comments>http://chadaustin.me/2010/08/at-least-interview-or-how-i-ended-up-at-imvu/#comments</comments>
		<pubDate>Tue, 17 Aug 2010 08:51:43 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[entrepreneurship]]></category>
		<category><![CDATA[imvu]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1588</guid>
		<description><![CDATA[Recent conversations have pointed out my career philosophy isn&#8217;t as obvious as I thought.  Thus, I&#8217;d like to share the story of how I joined IMVU and what it means to me and to those I interview.

Why Won&#8217;t You Interview?

You wouldn&#8217;t believe the number of times I&#8217;ve tried and failed to get somebody to [...]]]></description>
			<content:encoded><![CDATA[<p>Recent conversations have pointed out my career philosophy isn&#8217;t as obvious as I thought.  Thus, I&#8217;d like to share the story of how I joined IMVU and what it means to me and to those I interview.</p>

<h2>Why Won&#8217;t You Interview?</h2>

<p>You wouldn&#8217;t believe the number of times I&#8217;ve tried and failed to get somebody to take a weekend and fly out to IMVU for an interview.  I don&#8217;t understand: we&#8217;ll happily pay you and your significant other to spend a vacation in San Francisco for the small price of a day&#8217;s interview.</p>

<p>There are three possible outcomes:</p>

<ol>
<li>We make you an offer and you accept.</li>
<li>We make you an offer and you decline.</li>
<li>We don&#8217;t make you an offer.</li>
</ol>

<p>What&#8217;s the worst that could happen?  Maybe you&#8217;ll be forced to actually decide whether IMVU is the right home for you.  Maybe IMVU won&#8217;t be a fit and you&#8217;ll feel a little worse for it.</p>

<p>Either way you&#8217;ll have a better sense of yourself and maybe you&#8217;ll have stumbled upon a more fulfilling life.  Plus you&#8217;ll have a free vacation!</p>

<h2>I Could Have Joined IMVU 9 Months Earlier</h2>

<p>I was halfway through my graduate degree at Iowa State, implementing a <a href="http://chadaustin.me/hci_portfolio/">functional GPU language</a>.  I figured I was headed towards a job working on concurrent languages at Microsoft Research or something.  Indeed, that would have been fine!  I&#8217;m still glad concurrent programming languages aren&#8217;t a solved a problem &#8211; I can still fantasize about someday contributing to the field.</p>

<p>On July 2nd, 2004 (my birthday!), a guy named Eric Ries e-mails me out of the blue &#8220;Are you the same Chad Austin from the boost and cal3d mailing lists?  Interested in some contract work?&#8221;  He was working on some wack AOL Instant Messenger add-on that used BitTorrent as its installer and had a hideous website, so I wasn&#8217;t terribly interested.  He persisted, and by GDC 2005, he convinced me to come interview.</p>

<p>Once I met the founding team, I came to a few conclusions:</p>

<ol>
<li>IMVU&#8217;s founders were <em>smart</em>.  I&#8217;d be silly not to work with them.</li>
<li>Coming from graduate school, I didn&#8217;t expect much of a salary, so I could take a bunch of stock in exchange.</li>
<li>If IMVU succeeded, win!</li>
<li>If IMVU failed, at least I&#8217;d learn a lot.</li>
</ol>

<p>I wasn&#8217;t super excited about the product at first, but IMVU&#8217;s founders convinced me to give them a shot, and it was definitely the right decision.</p>

<h2>How I &#8220;Sell&#8221; to Candidates</h2>

<p>When I interview candidates, I truly believe that IMVU is a great opportunity.  If the candidate is hesitant about committing to such a huge life change, I understand.  Moving across the country and taking a new job is a gigantic personal decision, and I can&#8217;t make that decision for them.</p>

<p>I never aggressively push IMVU, but I do my best to provide the data necessary to make the right decision.  &#8220;I&#8217;ve been here a while.  What information do you need to know whether IMVU is right for you?&#8221;  I like to believe honesty is as effective as aggressive salesmanship.  :)</p>

<h2>What This Means</h2>

<p>I heartily endorse the <a href="http://techcrunch.com/2009/08/05/other-companies-should-have-to-read-this-internal-netflix-presentation/">philosophy espoused by NetFlix</a>: periodically reconsider your place in the world.  I&#8217;d be a hypocrite if I said otherwise.</p>

<p>That said, I think our culture overvalues salary.  Money is but one (uncorrelated?) component of our motivation.  Since humans are notoriously bad at predicting what makes us happy, it&#8217;s critical that we weigh facets such as personal freedom, your colleagues, social context, future opportunities, and how your work fits into your personal narrative.</p>

<p>We once tried to hire a frighteningly smart man away from Google.  He interviewed but declined our generous offer, saying that his entire social life was tied into Google.  In hindsight, the sacrifice we asked of him is clear, and I respect his decision.</p>

<p>In short, stay open-minded, but consciously consider what makes you happy.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/08/at-least-interview-or-how-i-ended-up-at-imvu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting Color and Transparency from Flash</title>
		<link>http://chadaustin.me/2010/07/extracting-color-and-transparency-from-flash/</link>
		<comments>http://chadaustin.me/2010/07/extracting-color-and-transparency-from-flash/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 19:29:20 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[imvu]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1575</guid>
		<description><![CDATA[The original source of this post is at the IMVU engineering blog.  Subscribe now!

For clarity, I slightly oversimplified my previous discussion on efficiently rendering Flash in a 3D scene.  The sticky bit is extracting transparency information from the Flash framebuffer so we can  composite the overlay into the scene.

Flash does not give [...]]]></description>
			<content:encoded><![CDATA[<p>The original source of this post is at the <a href="http://engineering.imvu.com/2010/07/29/extracting-color-and-transparency-from-flash-2/">IMVU engineering blog</a>.  <a href="http://engineering.imvu.com">Subscribe now!</a></p>

<p>For clarity, I slightly oversimplified my previous discussion on <a href="http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/">efficiently rendering Flash in a 3D scene</a>.  The sticky bit is extracting transparency information from the Flash framebuffer so we can  composite the overlay into the scene.</p>

<p>Flash does not give you direct access to its framebuffer.  It does, with IViewObject::Draw, allow you to composite the Flash framebuffer onto a DIB section of your choice.</p>

<p>Remembering your <a href="http://keithp.com/~keithp/porterduff/p253-porter.pdf">Porter-Duff</a>, composition of source over dest is:</p>

<pre>Color = SourceColor * SourceAlpha + DestColor * (1 - SourceAlpha)</pre>

<p>If the source color is premultiplied, you get:</p>

<pre>Color = SourceColor + DestColor * (1 - SourceAlpha)</pre>

<p>Assuming we want premultiplied color and alpha from Flash for efficient rendering in the 3D scene, applying the above requires solving for FlashAlpha and FlashColor:</p>

<pre>
RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor = FlashColor * FlashAlpha + DestColor - DestColor * FlashAlpha

RenderedColor - DestColor = FlashColor * FlashAlpha - DestColor * FlashAlpha

RenderedColor - DestColor = FlashAlpha * (FlashColor - DestColor)

FlashAlpha = (RenderedColor - DestColor) / (FlashColor - DestColor)
</pre>

<p>If FlashColor and DestColor are equal, then FlashAlpha is undefined.  Intuitively, this makes sense.  If you render a translucent black SWF on a black background, you can&#8217;t know the transparency data because all of the pixels are still black.  This doesn&#8217;t matter, as I&#8217;ll show in a moment.</p>

<p>FlashColor is trivial:</p>

<pre>
RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor - DestColor * (1 - FlashAlpha) = FlashColor * FlashAlpha

FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha
</pre>

<p>FlashColor is undefined if FlashAlpha is 0.  Transparency has no color.</p>

<p>What do these equations give us?  We know RenderedColor, since it&#8217;s the result of calling IViewObject::Draw.  We have control over DestColor, since we configure the DIB Flash is drawn atop.  What happens if we set DestColor to black (0)?</p>

<pre>FlashColor = (RenderedColorOnBlack) / FlashAlpha</pre>

<p>What happens if we set it to white (1)?</p>

<pre>FlashColor = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha</pre>

<p>Now we&#8217;re getting somewhere!  Since FlashColor and FlashAlpha are constant, we can define a relationship between FlashAlpha and RenderedColorOnBlack and RenderedColorOnWhite:</p>

<pre>
(RenderedColorOnBlack) / FlashAlpha = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha

RenderedColorOnBlack = RenderedColorOnWhite - 1 + FlashAlpha

FlashAlpha = RenderedColorOnBlack - RenderedColorOnWhite + 1

FlashAlpha = RenderedColorOnWhite - RenderedColorOnBlack
</pre>

<p>So all we have to do is render the SWF on a white background and a black background and subtract the two to extract the alpha channel.</p>

<p>Now what about color?  Just plug the calculated FlashAlpha into the following when rendering on black.</p>

<pre>
FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha

FlashColor = RenderedColorOnBlack / FlashAlpha
</pre>

<p>Since we want premultiplied alpha:</p>

<pre>FlashColor = RenderedColorOnBlack</pre>

<p>Now that we know FlashColor and FlashAlpha for the overlay, we can copy it into a texture and render the scene!</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/07/extracting-color-and-transparency-from-flash/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Efficiently Rendering Flash in a 3D Scene</title>
		<link>http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/</link>
		<comments>http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 09:12:38 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1561</guid>
		<description><![CDATA[The original source of this post is at the IMVU engineering blog.  Subscribe now!

Last time, I talked about how to embed Flash into your desktop application, for UI flexibility and development speed.  This time, I&#8217;ll discuss efficient rendering into a 3D scene.

Rendering Flash as a 3D Overlay (The Naive Way)

At first blush, rendering [...]]]></description>
			<content:encoded><![CDATA[<p>The original source of this post is at the <a href="http://engineering.imvu.com/2010/07/29/efficiently-rendering-flash-in-a-3d-scene/">IMVU engineering blog</a>.  <a href="http://engineering.imvu.com">Subscribe now!</a></p>

<p><a href="http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/">Last time</a>, I talked about how to embed Flash into your desktop application, for UI flexibility and development speed.  This time, I&#8217;ll discuss efficient rendering into a 3D scene.</p>

<h2>Rendering Flash as a 3D Overlay (The Naive Way)</h2>

<p>At first blush, rendering Flash on top of a 3D scene sounds easy.  Every frame:</p>

<ol>
<li>Create a <a href="http://msdn.microsoft.com/en-us/library/dd183494(VS.85).aspx">DIB section</a> the size of your 3D viewport</li>
<li>Render Flash into the DIB section with <a href="http://msdn.microsoft.com/en-us/library/ms688655(VS.85).aspx">IViewObject::Draw</a></li>
<li>Copy the DIB section into an <a href="http://msdn.microsoft.com/en-us/library/bb205909(VS.85).aspx">IDirect3DTexture9</a></li>
<li>Render the texture on the top of the scene</li>
</ol>

<div id="attachment_1562" class="wp-caption aligncenter" style="width: 257px"><a href="http://aegisknight.org/wp-uploads/Flash-Rendering.png"><img src="http://aegisknight.org/wp-uploads/Flash-Rendering.png" alt="" title="Naive Flash Rendering" width="247" height="524" class="size-full wp-image-1562" /></a><p class="wp-caption-text">Naive Flash Rendering</p></div>

<p>Ta da!  But your frame rate dropped to 2 frames per second?  Ouch.  It turns out this implementation is horribly slow.  There are a couple reasons.</p>

<p>First, asking the Adobe flash player to render into a DIB isn&#8217;t a cheap operation.  In our measurements, drawing even a simple SWF takes on the order of 10 milliseconds.  Since most UI doesn&#8217;t animate every frame, we should be able to cache the captured framebuffer.</p>

<p>Second, main memory and graphics memory are on different components in your computer.  You want to avoid wasting time and bus traffic by unnecessarily copying data from the CPU to the GPU every frame.  If only the lower-right corner of a SWF changes, we should limit our memory copies to that region.</p>

<p>Third, modern GPUs are fast, but not everyone has them.  Let&#8217;s say you have a giant mostly-empty SWF and want to render it on top of your 3D scene.  On slower GPUs, it would be ideal if you could limit your texture draws to the region of the screen that are non-transparent.</p>

<h2>Rendering Flash as a 3D Overlay (The Fast Way)</h2>

<p>Disclaimer: I can&#8217;t take credit for these algorithms.  They were jointly developed over years by many smart engineers at IMVU.</p>

<p>First, let&#8217;s reduce an embedded Flash player to its principles:</p>

<ul>
<li>Flash exposes an IShockwaveFlash [link] interface through which you can load and play movies.</li>
<li>Flash maintains its own frame buffer.  You can read these pixels with IViewObject::Draw.</li>
<li>When a SWF updates regions of the frame buffer, it notifies you through IOleInPlaceSiteWindowless::InvalidateRect.</li>
</ul>

<p>In addition, we&#8217;d like the Flash overlay system to fit within these performance constraints:</p>

<ul>
<li>Each SWF is rendered over the entire window.  For example, implementing a ball that bounces around the screen or a draggable UI component should not require any special IMVU APIs.</li>
<li>If a SWF is not animating, we do not copy its pixels to the GPU every frame.</li>
<li>We do not render the overlay in transparent regions.  That is, if no Flash content is visible, rendering is free.</li>
<li>Memory consumption (ignoring memory used by individual SWFs) for the overlay usage is O(framebuffer), not O(framebuffer * SWFs).  That is, loading three SWFs should not require allocation of three screen-sized textures.</li>
<li>If Flash notifies of multiple changed regions per frame, only call IViewObject::Draw once.</li>
</ul>

<p>Without further ado, let&#8217;s look at the fast algorithm:</p>

<div id="attachment_1564" class="wp-caption aligncenter" style="width: 573px"><a href="http://aegisknight.org/wp-uploads/Fast-Flash-Rendering.png"><img src="http://aegisknight.org/wp-uploads/Fast-Flash-Rendering.png" alt="" title="Fast Flash Rendering" width="563" height="808" class="size-full wp-image-1564" /></a><p class="wp-caption-text">Fast Flash Rendering</p></div>

<p>Flash notifies us of visual changes via IOleInPlaceSiteWindowless::InvalidateRect.  We take any updated rectangles and add them to a per-frame dirty region.  When it&#8217;s time to render a frame, there are four possibilities:</p>

<ul>
<li>The dirty region is empty and the opaque region is empty.  This case is basically free, because nothing need be drawn.</li>

<li>The dirty region is empty and the opaque region is nonempty.  In this case, we just need to render our cached textures for the non-opaque regions of the screen.  This case is the most common.  Since a video memory blit is fast, there&#8217;s not much we could do to further speed it up.</li>

<li>The dirty region is nonempty.  We must IViewObject::Draw into our Overlay DIB, with one tricky bit.  Since we&#8217;re only storing one overlay texture, we need to render each loaded Flash overlay SWF into the DIB, not just the one that changed.  Imagine an animating SWF underneath another translucent SWF.  The top SWF must be composited with the bottom SWF&#8217;s updates.  After rendering each SWF, we scan the updated DIB for a minimalish opaque region.  Why not just render the dirty region?  Imagine a SWF with a bouncing ball.  If we naively rendered every dirty rectangle, eventually we&#8217;d be rendering the entire screen.  Scanning for minimal opaque regions enables recalculation of what&#8217;s actually visible.</li>

<li>The dirty region is nonempty, but the updated pixels are all transparent.  If this occurs, we no longer need to render anything at all until Flash content reappears.</li>
</ul>

<p>This algorithm has proven efficient.  It supports multiple overlapping SWFs while minimizing memory consumption and CPU/GPU draw calls per frame.  Until recently, we used Flash for several of our UI components, giving us a standard toolchain and a great deal of flexibility.  Flash was the bridge that took us from the dark ages of C++ UI code to UI on which we could actually iterate.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Embed Flash Into Your 3D Application</title>
		<link>http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/</link>
		<comments>http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 08:52:16 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1550</guid>
		<description><![CDATA[The original source of this post is at the IMVU engineering blog.  Subscribe now!

[I wrote this post last year when IMVU still used Flash for a significant portion of our UI. Even though we now embed Gecko, I believe embedding Flash is still valuable.]

Writing user interfaces is hard.  Writing usable interfaces is harder. [...]]]></description>
			<content:encoded><![CDATA[<p>The original source of this post is at the <a href="http://engineering.imvu.com/2010/07/29/how-to-embed-flash-into-your-3d-application/">IMVU engineering blog</a>.  <a href="http://engineering.imvu.com">Subscribe now!</a></p>

<p><em>[I wrote this post last year when IMVU still used Flash for a significant portion of our UI. Even though we now embed Gecko, I believe embedding Flash is still valuable.]</em></p>

<p>Writing user interfaces is hard.  Writing usable interfaces is harder.  Yet, the design of your interface <em>is your product</em>.</p>

<p>Products are living entities.  They always want to grow, adapting to their users as users adapt to them.  In that light, why build your user interface in a static technology like C++ or Java?  It won&#8217;t be perfect the first time you build it, so prepare for change.</p>

<p>IMVU employs two technologies for rapidly iterating on and refining our client UIs: Flash and Gecko/HTML.  Sure, integrating these technologies has a sizable up-front cost, but the iteration speed they provide easily pays for them.  Rapid iteration has some obvious benefits:</p>

<ol>
<li>reduces development cost</li>
<li>reduces time to market</li>
</ol>

<p>and some less-obvious benefits:</p>

<ol>
<li>better product/market fit: when you can change your UI, you will.</li>
<li>improved product quality: little details distinguish mediocre products from great products.  make changing details cheap and your Pinto will become a Cadillac.</li>
<li>improved morale: both engineers and designers <em>love</em> watching their creations appear on the screen right before them. it&#8217;s why so many programmers create games!</li>
</ol>

<p>I will show you how integrating Flash into a 3D application is easier than it sounds.</p>


<h2>Should I use Adobe Flash or Scaleform GFx?</h2>

<p>The two most common Flash implementations are Adobe&#8217;s ActiveX control (which has a <a href="http://www.adobe.com/products/player_census/flashplayer/version_penetration.html">97% installed base!</a>) and Scaleform GFx.</p>

<p>Adobe&#8217;s control has perfect compatibility with their tool chain (go figure!) but is closed-source and good luck getting help from Adobe.</p>

<p>Scaleform GFx is an alternate implementation of Flash designed to be embedded in 3D applications, but, last I checked, is not efficient on machines without GPUs.  (Disclaimer: this information is two years old, so I encourage you to make your own evaluation.)</p>

<p>IMVU chose to embed Adobe&#8217;s player.</p>

<h2>Deploying the Flash Runtime</h2>

<p>Assuming you&#8217;re using Adobe&#8217;s Flash player, how will you deploy their runtime?  Well, given Flash&#8217;s install base, you can get away with loading the Flash player already installed on the user&#8217;s computer.  If they don&#8217;t have Flash, just require that they install it from your download page.  Simple and easy.</p>

<p>Down the road, when Flash version incompatibilities and that last 5% of your possible market becomes important, you can request <a href="http://www.adobe.com/licensing/">permission from Adobe</a> to deploy the Flash player with your application.</p>

<h2>Displaying SWFs</h2>

<p>IMVU displays Flash in two contexts: traditional HWND windows and 2D overlays atop the 3D scene.</p>

<div id="attachment_1551" class="wp-caption aligncenter" style="width: 689px"><a href="http://aegisknight.org/wp-uploads/imvu_flash_window.png"><img src="http://aegisknight.org/wp-uploads/imvu_flash_window.png" alt="" title="IMVU Flash Window" width="679" height="353" class="size-full wp-image-1551" /></a><p class="wp-caption-text">IMVU Flash Window</p></div>

<div id="attachment_1568" class="wp-caption aligncenter" style="width: 485px"><a href="http://aegisknight.org/wp-uploads/imvu_flash_overlay1.png"><img src="http://aegisknight.org/wp-uploads/imvu_flash_overlay1.png" alt="" title="IMVU Flash Overlay" width="475" height="566" class="size-full wp-image-1568" /></a><p class="wp-caption-text">IMVU Flash Overlay</p></div>

<p>If you want to have something up and running in a day, buy <a href="http://www.f-in-box.com/">f_in_box</a>.  Besides its awesome name, it&#8217;s cheap, comes with source code, and the support forums are fantastic.  It&#8217;s a perfect way to bootstrap.  After a weekend of playing with f_in_box, Dusty and I had a YouTube video playing in a texture on top of our 3D scene.</p>

<p>Once you run into f_in_box&#8217;s limitations, you can use the IShockwaveFlash and IOleInPlaceObjectWindowless COM interfaces directly.  See Igor Makarav&#8217;s <a href="http://www.codeproject.com/KB/COM/flashcontrol.aspx?fid=321012">excellent tutorial</a> and CFlashWnd class.</p>

<h2>Rendering Flash as an HWND</h2>

<p>For top-level UI elements use f_in_box or CFlashWnd directly.  They&#8217;re perfectly suited for this.  Seriously, it&#8217;s just a few lines of code.  Look at their samples and go.</p>

<h2>Rendering Flash as a 3D Overlay</h2>

<p>Rendering Flash to a 3D window gets a bit tricky&#8230;  Wait for Part 2 of this post!</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IMVU Crash Reporting: Stalls and Deadlocks</title>
		<link>http://chadaustin.me/2009/06/imvu-crash-reporting-stalls-and-deadlocks/</link>
		<comments>http://chadaustin.me/2009/06/imvu-crash-reporting-stalls-and-deadlocks/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 06:52:38 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[crashes]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1457</guid>
		<description><![CDATA[By mid-2006, we&#8217;d primarily focused on access violations and unhandled exceptions, the explosive application failures.  After extensive effort, we got our client&#8217;s crash rate down to 2% or so, where 2% of all sessions ended in a crash.*  Still the customers cried &#8220;Fix the crashes!&#8221;

It turns out that when a customer says &#8220;crash&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>By mid-2006, we&#8217;d primarily focused on access violations and unhandled exceptions, the explosive application failures.  After extensive effort, we got our client&#8217;s crash rate down to 2% or so, where 2% of all sessions ended in a crash.<a href="#footnote_session_length">*</a>  Still the customers cried &#8220;Fix the crashes!&#8221;</p>

<p>It turns out that when a customer says &#8220;crash&#8221; they mean &#8220;it stopped doing what I wanted&#8221;, but engineers hear &#8220;the program threw an exception or caused an access violation&#8221;.  Thus, to the customer, crash can mean:</p>

<ul>
<li>the application was unresponsive for a period of time</li>
<li>the UI failed to load, making the client unusable</li>
<li>the application has been disconnected from the server</li>
</ul>

<p>In short, any time the customer cannot make progress and it&#8217;s not (perceived to be) their fault, the application has crashed.</p>

<p>OK, we&#8217;ve got our work cut out for us&#8230;  Let&#8217;s start by considering deadlocks and stalls.</p>

<p>First, some terminology: in computer science, a <a href="http://en.wikipedia.org/wiki/Deadlock ">deadlock</a> is a situation where two threads or processes are waiting for each other, so neither makes progress.  That definition is a bit academic for our purposes.  Let&#8217;s redefine deadlock as any situation where the program becomes unresponsive for an unreasonable length of time.  This definition includes <a href="http://en.wikipedia.org/wiki/Livelock">livelock</a>, slow operations without progress indication, and network (or disk!) I/O that blocks the program from responding to input.</p>

<p>It actually doesn&#8217;t matter whether the program will <i>eventually</i> respond to input.  People get impatient quickly.  You&#8217;ve only got a few seconds to respond to the customer&#8217;s commands.</p>

<h2>Detecting Deadlocks in C++</h2>

<p>The embedded programming world has a &#8220;<a href="http://en.wikipedia.org/wiki/Watchdog_timer">watchdog timer</a>&#8221; concept.  Your program is responsible for periodically pinging the watchdog, and if for several seconds you don&#8217;t, the watchdog restarts your program and reports debugging information.</p>

<p>Implementing this in C++ is straightforward:</p>

<ul>
<li>Start a watchdog thread that wakes up every few seconds to check that the program is still responding to events.</li>
<li>Add a heartbeat to your main event loop that frequently pings the watchdog.</li>
<li>If the watchdog timer detects the program is unresponsive, record stack traces and log files, then report the failure.</li>
</ul>

<p>IMVU&#8217;s <a href="http://aegisknight.org/2009/04/imvus-callstack-api-now-open-source/">CallStack API</a> allows us to grab the C++ call stack of an arbitrary thread, so, if the main thread is unresponsive, we report its current stack every couple of seconds.  This is often all that&#8217;s needed to find and fix the deadlock.</p>

<h2>Detecting Deadlocks in Python</h2>

<p>In Python, we can take the same approach as above:</p>

<ol>
<li>Start a watchdog thread.</li>
<li>Ping the Python watchdog thread in your main loop.</li>
<li>If the watchdog detects that you&#8217;re unresponsive, record the main thread&#8217;s Python stack (this time with <a href="http://docs.python.org/library/sys.html#sys._current_frames">sys._current_frames</a>) and report it.</li>
</ol>

<p>Python&#8217;s <a href="http://docs.python.org/dev/glossary.html#term-global-interpreter-lock">global interpreter lock</a> (GIL) can throw a wrench in this plan.  If one thread enters an infinite loop while keeping the GIL held (say, in a native extension), the watchdog thread will never wake and so cannot report a deadlock.  In practice, this isn&#8217;t a problem, because the C++ deadlock detector will notice and report a deadlock.  Plus, most common deadlocks are caused by calls that release the GIL: <code>threading.Lock.acquire</code>, <code>socket.read</code>, <code>file.read</code>, and so on.</p>

<p>It might help to think of the Python deadlock detector as a fallback that, if successful, adds richer information to your deadlock reports.  If it failed, whatever.  The C++ deadlock detector is probably enough to diagnose and fix the problem.</p>

<h2>What did we learn?</h2>

<p>It turned out the IMVU client had several bugs where we blocked the main thread on the network, sometimes for up to 30 seconds.  By that point, most users just clicked the close box [X] and terminated the process.  Oops.</p>

<p>In addition, the deadlock detectors pointed out places where we were doing too much work in between message pumps.  For example, loading some assets into the 3D scene might nominally take 200ms.  On a computer with 256 MB of RAM, though, the system might start thrashing and loading the same assets would take 5s and report as a &#8220;deadlock&#8221;.  The solution was to reducing the program&#8217;s working set and bite off smaller chunks of work in between pumps.</p>

<p>I don&#8217;t recall seeing many &#8220;computer science&#8221; deadlocks, but these watchdogs were invaluable in tracking down important failure conditions in the IMVU client.</p>

<p>Next time, we&#8217;ll improve the accuracy of our crash metrics and answer the question &#8220;How do you know your metrics are valid?&#8221;</p>

<hr />

<p><a id="footnote_session_length">*</a> Median session length is a more useful reliability metric.  It&#8217;s possible to fix crashes and see no change in your percentage of failed sessions, if fixing crashes simply causes sessions to become longer.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/06/imvu-crash-reporting-stalls-and-deadlocks/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fast Builds: Incremental Linking and Embedded SxS Manifests</title>
		<link>http://chadaustin.me/2009/05/incremental-linking-and-embedded-manifests/</link>
		<comments>http://chadaustin.me/2009/05/incremental-linking-and-embedded-manifests/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 02:31:54 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[scons]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1414</guid>
		<description><![CDATA[
As I&#8217;ve said before, fast builds are crucial for efficient development.  But for those of us who use C++ regularly, link times are killer.  It&#8217;s not uncommon to spend minutes linking your compiled objects into a single binary.  Incremental linking helps a great deal, but, as you&#8217;ll see, incremental linking has become [...]]]></description>
			<content:encoded><![CDATA[<p>
As I&#8217;ve said before, fast builds are crucial for efficient development.  But for those of us who use C++ regularly, link times are killer.  It&#8217;s not uncommon to spend minutes linking your compiled objects into a single binary.  Incremental linking helps a great deal, but, as you&#8217;ll see, incremental linking has become a lot harder in the last few versions of Visual Studio&#8230;
</p>

<p><a href="http://en.wikipedia.org/wiki/Linker">Linking</a> an EXE or DLL is a very expensive operation &#8212; it&#8217;s roughly O(N) where N is the amount of code being linked.  Worse, several optimizing linkers defer code generation to link time, exacerbating the problem!  When you&#8217;re trying to practice TDD, even a couple seconds in your red-green-refactor iteration loop is brutal.  And it&#8217;s not uncommon for large projects to spend minutes linking.</p>

<p>Luckily, Visual C++ supports an <a href="http://msdn.microsoft.com/en-us/library/4khtbfyf(VS.80).aspx">/INCREMENTAL</a> flag, instructing relinks to modify the DLL or EXE in-place, reducing link time to O(changed code) rather than O(all code).  In the olden days of Visual C++ 6, all you had to do was enable /INCREMENTAL, and bam, fast builds.</p>

<p>These days, it&#8217;s not so simple.  Let&#8217;s take an excursion into how modern Windows finds DLL dependencies&#8230;</p>

<h2>Side-by-Side (SxS) Manifests</h2>

<p>Let&#8217;s say you&#8217;re writing a DLL <code>foo.dll</code> that depends on the CRT by using, say, <code>printf</code> or <code>std::string</code>.  When you link <code>foo.dll</code>, the linker will also produce <code>foo.dll.manifest</code>.  Windows XP and Vista use .manifest files to load the correct CRT version.  (This prevents DLL hell: two programs can depend on different versions of the same DLL.)</p>

<p>Since remembering to carry around .manifest files is annoying and error-prone, Microsoft and others recommend that you embed them into your EXE or DLL as a resource:</p>

<pre>
mt.exe –manifest foo.dll.manifest -outputresource:foo.dll;2
</pre>

<p>Convenient, but it modifies the DLL in place, breaking incremental links!  This is a <a href="http://markmail.org/message/f4g2qi2kf5wu7n5t">known problem</a>, and the <a href="http://blogs.msdn.com/zakramer/archive/2006/05/22/603558.aspx">&#8220;solutions&#8221;</a> others suggest are INSANE.  My favorite is the <a href="http://blogs.msdn.com/nikolad/articles/425359.aspx">300-line makefile</a> with a note from the author &#8220;[If this does not work], please let me know ASAP. I will try fixing it for you.&#8221;  Why doesn&#8217;t Visual Studio just provide an /EMBEDMANIFESTRESOURCE flag that would automatically solve the problem?!</p>

<p>I just want incremental linking and embedded manifests.  Is that so much to ask?  I tried a bunch of approaches.  Most didn&#8217;t work.  I&#8217;ll show them, and then give my current (working) approach.  If you don&#8217;t care about the sordid journey, <a href="#solution">skip to the end</a>.</p>

<h2>What Didn&#8217;t Work</h2>

<ul>
<li><em>Not embedding manifests at all.</em></li>
</ul>

<p>What went wrong: I could never figure out the rules where by manifest dependencies are discovered.  If python.exe depends on the release CRT and your module DLL depends on the debug CRT, and they live in different directories (??), loading the module DLL would fail.  Gave up.</p>

<ul>
<li><em>Linking a temporary file (foo.pre.dll), making a copy (foo.pre.dll -> foo.dll), and embedding foo.pre.dll.manifest into foo.dll with mt.exe.</em></li>
</ul>

<p>What went wrong: As far as I can tell, mt.exe is a terrible piece of code.  In procmon I&#8217;ve watched it close file handles it didn&#8217;t open, causing permissions violations down the line.  (?!)  Sometimes it silently corrupts your EXEs and DLLs too.  This may be a known weakness in <a href="http://msdn.microsoft.com/en-us/library/ms648049(VS.85).aspx">UpdateResource</a>.  Yay!  (Thanks to <a href="http://www.luminance.org/">Kevin Gadd</a>; he was instrumental in diagnosing these bugs.)  mt.exe may or <a href="http://www.wintellect.com/CS/blogs/jrobbins/archive/2009/01/24/the-case-of-the-corrupt-pe-binaries.aspx">may not</a> be fixed in recent Visual Studios.  Either way, I&#8217;m convinced mt.exe has caused us several intermittent build failures in the past.  Avoiding it is a good thing.</p>

<ul>
<li><em>Linking to a temporary file (foo.pre.dll), generating a resource script (foo.pre.rc) from (foo.pre.dll.manifest), compiling said resource script (foo.pre.res), and including the compiled resource into the final link (foo.dll).</em></li>
</ul>

<p>What went wrong: This approach is reliable but slow.  Linking each DLL and EXE twice, even if both links are incremental, is often slower than just doing a full link to begin with.</p>

<ul>
<li><em>Linking foo.dll with foo.dll.manifest (via a resource script, as above) if it exists.  If foo.dll.manifest changed as a result of the link, relink.</em></li>
</ul>

<p>I didn&#8217;t actually try this one because non-DAG builds scare me.  I like the simplicity and reliability of the &#8220;inputs -> command -> outputs&#8221; build model.  It&#8217;s weird if foo.dll.manifest is an input and an output of the link.  Yes, technically, that&#8217;s how incremental linking works at all, but the non-DAG machinery is hidden in link.exe.  From SCons&#8217;s perspective, it&#8217;s still a DAG.</p>

<h2><a id="solution">Finally, a working solution:</a></h2>

<p>For every build configuration {debug,release} and dependency {CRT,MFC,&#8230;}, link a tiny program to generate said dependency manifest.  Compile manifest into a resource script (.rc -> .res) and link the compiled manifest resources into your other DLLs and EXEs.</p>

<p>This approach has several advantages:</p>

<ul>
<li>These pre-generated manifest resources are created once and reused in future builds, with no impact to build time.</li>
<li>The build is a DAG.</li>
<li>We avoid letting mt.exe wreak havoc on our build by sidestepping it entirely.</li>
</ul>

<p>I can think of one disadvantage &#8211; you need to know up-front on which SxS DLLs you depend.  For most programs, the CRT is the only one.  And hopefully understanding your dependencies isn&#8217;t a bad thing, though.  ;)</p>

<p>After several evenings of investigation, we&#8217;re back to the same link times we had with Visual C++ 6!  Yay!</p>

<hr />

<h2>The Code</h2>

<p>If you care, here&#8217;s our SCons implementation of embedded manifests:</p>

<pre>
# manifest_resource(env, is_dll) returns a manifest resource suitable for inclusion into
# the sources list of a Program or SharedLibrary.
manifest_resources = {}
def manifest_resource(env, is_dll):
    if is_dll:
        resource_type = 2 #define ISOLATIONAWARE_MANIFEST_RESOURCE_ID 2
    else:
        resource_type = 1 #define CREATEPROCESS_MANIFEST_RESOURCE_ID  1

    is_debug = env['DEBUG'] # could use a 'build_config' key if we had more than debug/release
    del env

    def build_manifest_resource():
        if is_debug:
            env = baseEnv.Clone(tools=[Debug])
        else:
            env = baseEnv.Clone(tools=[Release])
        env['LINKFLAGS'].remove('/MANIFEST:NO')

        if is_dll:
            linker = env.SharedLibrary
            target_name = 'crt_manifest.dll'
            source = env.File('#/MSVC/crt_manifest_dll.cpp')
        else:
            linker = env.Program
            target_name = 'crt_manifest.exe'
            source = env.File('#/MSVC/crt_manifest_exe.cpp')

        env['OUTPUT_PATH'] = '#/${BUILDDIR}/${IMVU_BUILDDIR_NAME}/%s' % (target_name,)

        obj = env.SharedObject('${OUTPUT_PATH}.obj', source)
        result = linker([env.File('${OUTPUT_PATH}'), '${OUTPUT_PATH}.manifest'], obj)
        manifest = result[1]

        def genrc(env, target, source):
            [target] = target
            [source] = source
            # 24 = RT_MANIFEST
            file(target.abspath, 'w').write('%d 24 "%s"' % (resource_type, source.abspath,))

        rc = env.Command('${OUTPUT_PATH}.rc', manifest, genrc)
        res = env.RES('${OUTPUT_PATH}.res', rc)
        env.Depends(res, manifest)
        return res
    
    key = (is_debug, resource_type)
    try:
        return manifest_resources[key]
    except KeyError:
        res = build_manifest_resource()

        manifest_resources[key] = res
        return res
</pre>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/05/incremental-linking-and-embedded-manifests/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Fast Builds: Unintrusive Precompiled Headers (PCH)</title>
		<link>http://chadaustin.me/2009/05/unintrusive-precompiled-headers-pch/</link>
		<comments>http://chadaustin.me/2009/05/unintrusive-precompiled-headers-pch/#comments</comments>
		<pubDate>Wed, 20 May 2009 20:24:59 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[scons]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1399</guid>
		<description><![CDATA[
Fast builds are critical to the C++ programmer&#8217;s productivity and happiness.  One common technique for reducing build times is precompiled headers (PCH).  There&#8217;s plenty of literature out there; I won&#8217;t describe PCH in detail here.


But one thing that&#8217;s always bothered me about PCH is that it affects your code.  #pragma hdrstop and [...]]]></description>
			<content:encoded><![CDATA[<p>
<a href="http://gamesfromwithin.com/?p=100">Fast builds</a> are critical to the C++ programmer&#8217;s productivity and happiness.  One common technique for reducing build times is precompiled headers (PCH).  There&#8217;s <a href="http://gamesfromwithin.com/?p=39">plenty of literature</a> out there; I won&#8217;t describe PCH in detail here.
</p>

<p>But one thing that&#8217;s always bothered me about PCH is that it affects your code.  <code>#pragma hdrstop</code> and <code>#include "StdAfx.h"</code> everywhere.  Gross.</p>

<p>I&#8217;m a strong believer in clean code without boilerplate, so can&#8217;t we do better?  Ideally we could make a simple tweak to the build system and see build times magically improve.  <a href="http://ennos.home.pages.de/">Enno</a> enticed me with mentions of his fast builds, so I took a look&#8230;</p>

<p>Using PCH in Visual C++ requires a header (call it Precompiled.h) that includes all of the expensive dependencies:</p>

<pre>
#include &lt;vector&gt;
#include &lt;map&gt;
#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;boost/python.hpp&gt;
#include &lt;windows.h&gt;
#include &lt;mmsystem.h&gt;
</pre>

<p>Additionally, we need a source file (let&#8217;s get creative and call it Precompiled.cpp), which is empty except for <code>#include "Precompiled.h"</code>.</p>

<p>Compile Precompiled.cpp with <code><a href="http://msdn.microsoft.com/en-us/library/7zc28563.aspx">/Yc</a> Precompiled.h</code> to generate Precompiled.pch, the actual precompiled header. Then, use the precompiled header on the rest of your files with <code><a href="http://msdn.microsoft.com/en-us/library/z0atkd6c.aspx">/Yu</a> Precompiled.h</code>.</p>

<p>OK, here&#8217;s the step that prevented me from using PCH for so long: every single source file in your project must <code>#include "Precompiled.h"</code> on its first line.</p>

<p>That&#8217;s ridiculous!  I don&#8217;t want to touch every file!</p>

<p>It turns out our savior is the <a href="http://msdn.microsoft.com/en-us/library/8c5ztk84.aspx">/FI</a> option.  From the documentation:</p>

<blockquote>
<p>This option has the same effect as specifying the file with double quotation marks in an #include directive on the first line of every source file specified on the command line [...]</p>
</blockquote>

<p>Exactly what we want!</p>

<p>But wait, doesn&#8217;t that mean every .cpp in our project will have access to every symbol included by the PCH?  Yes.  :(  It&#8217;s worth the build speedup.</p>

<p>However, explicit physical dependencies are important, and the only way to prevent important things from breaking is by blocking commits if they fail.  Since enabling and disabling PCH does not require any code changes, it&#8217;s easy enough to add a &#8220;disable PCH&#8221; option to your build system and run it on your continuous integration server:</p>

<a href="http://aegisknight.org/wp-uploads/compile_without_pch.png"><img src="http://aegisknight.org/wp-uploads/compile_without_pch-138x300.png" alt="Compile without PCH" title="Compile without PCH" width="138" height="300" class="aligncenter size-medium wp-image-1404" /></a>

<p>If somebody uses <code>std::string</code> but forgets to <code>#include &lt;string&gt;</code>, the build will fail and block commits.</p>

<p>In the end, here&#8217;s the bit of SCons magic that lets me quickly drop PCH into a project:</p>

<pre>
def enable_pch(env, source_file, header):
    if PCH_ENABLED:
        PCH, PCH_OBJ = env.PCH(source_file)
        env['PCH'] = PCH
        env['PCHSTOP'] = header
        env.Append(CPPFLAGS=['/FI' + header])
        return [PCH_OBJ]
    else:
        return [source_file]
</pre>

<p>Now you can benefit from fast builds with minimal effort and no change to your existing code!</p>

]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/05/unintrusive-precompiled-headers-pch/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

