<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chad Austin &#187; c++</title>
	<atom:link href="http://chadaustin.me/tag/c/feed/" rel="self" type="application/rss+xml" />
	<link>http://chadaustin.me</link>
	<description></description>
	<lastBuildDate>Tue, 17 Aug 2010 08:51:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Extracting Color and Transparency from Flash</title>
		<link>http://chadaustin.me/2010/07/extracting-color-and-transparency-from-flash/</link>
		<comments>http://chadaustin.me/2010/07/extracting-color-and-transparency-from-flash/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 19:29:20 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[imvu]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1575</guid>
		<description><![CDATA[The original source of this post is at the IMVU engineering blog.  Subscribe now!

For clarity, I slightly oversimplified my previous discussion on efficiently rendering Flash in a 3D scene.  The sticky bit is extracting transparency information from the Flash framebuffer so we can  composite the overlay into the scene.

Flash does not give [...]]]></description>
			<content:encoded><![CDATA[<p>The original source of this post is at the <a href="http://engineering.imvu.com/2010/07/29/extracting-color-and-transparency-from-flash-2/">IMVU engineering blog</a>.  <a href="http://engineering.imvu.com">Subscribe now!</a></p>

<p>For clarity, I slightly oversimplified my previous discussion on <a href="http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/">efficiently rendering Flash in a 3D scene</a>.  The sticky bit is extracting transparency information from the Flash framebuffer so we can  composite the overlay into the scene.</p>

<p>Flash does not give you direct access to its framebuffer.  It does, with IViewObject::Draw, allow you to composite the Flash framebuffer onto a DIB section of your choice.</p>

<p>Remembering your <a href="http://keithp.com/~keithp/porterduff/p253-porter.pdf">Porter-Duff</a>, composition of source over dest is:</p>

<pre>Color = SourceColor * SourceAlpha + DestColor * (1 - SourceAlpha)</pre>

<p>If the source color is premultiplied, you get:</p>

<pre>Color = SourceColor + DestColor * (1 - SourceAlpha)</pre>

<p>Assuming we want premultiplied color and alpha from Flash for efficient rendering in the 3D scene, applying the above requires solving for FlashAlpha and FlashColor:</p>

<pre>
RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor = FlashColor * FlashAlpha + DestColor - DestColor * FlashAlpha

RenderedColor - DestColor = FlashColor * FlashAlpha - DestColor * FlashAlpha

RenderedColor - DestColor = FlashAlpha * (FlashColor - DestColor)

FlashAlpha = (RenderedColor - DestColor) / (FlashColor - DestColor)
</pre>

<p>If FlashColor and DestColor are equal, then FlashAlpha is undefined.  Intuitively, this makes sense.  If you render a translucent black SWF on a black background, you can&#8217;t know the transparency data because all of the pixels are still black.  This doesn&#8217;t matter, as I&#8217;ll show in a moment.</p>

<p>FlashColor is trivial:</p>

<pre>
RenderedColor = FlashColor * FlashAlpha + DestColor * (1 - FlashAlpha)

RenderedColor - DestColor * (1 - FlashAlpha) = FlashColor * FlashAlpha

FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha
</pre>

<p>FlashColor is undefined if FlashAlpha is 0.  Transparency has no color.</p>

<p>What do these equations give us?  We know RenderedColor, since it&#8217;s the result of calling IViewObject::Draw.  We have control over DestColor, since we configure the DIB Flash is drawn atop.  What happens if we set DestColor to black (0)?</p>

<pre>FlashColor = (RenderedColorOnBlack) / FlashAlpha</pre>

<p>What happens if we set it to white (1)?</p>

<pre>FlashColor = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha</pre>

<p>Now we&#8217;re getting somewhere!  Since FlashColor and FlashAlpha are constant, we can define a relationship between FlashAlpha and RenderedColorOnBlack and RenderedColorOnWhite:</p>

<pre>
(RenderedColorOnBlack) / FlashAlpha = (RenderedColorOnWhite - (1 - FlashAlpha)) / FlashAlpha

RenderedColorOnBlack = RenderedColorOnWhite - 1 + FlashAlpha

FlashAlpha = RenderedColorOnBlack - RenderedColorOnWhite + 1

FlashAlpha = RenderedColorOnWhite - RenderedColorOnBlack
</pre>

<p>So all we have to do is render the SWF on a white background and a black background and subtract the two to extract the alpha channel.</p>

<p>Now what about color?  Just plug the calculated FlashAlpha into the following when rendering on black.</p>

<pre>
FlashColor = (RenderedColor - DestColor * (1 - FlashAlpha)) / FlashAlpha

FlashColor = RenderedColorOnBlack / FlashAlpha
</pre>

<p>Since we want premultiplied alpha:</p>

<pre>FlashColor = RenderedColorOnBlack</pre>

<p>Now that we know FlashColor and FlashAlpha for the overlay, we can copy it into a texture and render the scene!</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/07/extracting-color-and-transparency-from-flash/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Efficiently Rendering Flash in a 3D Scene</title>
		<link>http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/</link>
		<comments>http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 09:12:38 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1561</guid>
		<description><![CDATA[The original source of this post is at the IMVU engineering blog.  Subscribe now!

Last time, I talked about how to embed Flash into your desktop application, for UI flexibility and development speed.  This time, I&#8217;ll discuss efficient rendering into a 3D scene.

Rendering Flash as a 3D Overlay (The Naive Way)

At first blush, rendering [...]]]></description>
			<content:encoded><![CDATA[<p>The original source of this post is at the <a href="http://engineering.imvu.com/2010/07/29/efficiently-rendering-flash-in-a-3d-scene/">IMVU engineering blog</a>.  <a href="http://engineering.imvu.com">Subscribe now!</a></p>

<p><a href="http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/">Last time</a>, I talked about how to embed Flash into your desktop application, for UI flexibility and development speed.  This time, I&#8217;ll discuss efficient rendering into a 3D scene.</p>

<h2>Rendering Flash as a 3D Overlay (The Naive Way)</h2>

<p>At first blush, rendering Flash on top of a 3D scene sounds easy.  Every frame:</p>

<ol>
<li>Create a <a href="http://msdn.microsoft.com/en-us/library/dd183494(VS.85).aspx">DIB section</a> the size of your 3D viewport</li>
<li>Render Flash into the DIB section with <a href="http://msdn.microsoft.com/en-us/library/ms688655(VS.85).aspx">IViewObject::Draw</a></li>
<li>Copy the DIB section into an <a href="http://msdn.microsoft.com/en-us/library/bb205909(VS.85).aspx">IDirect3DTexture9</a></li>
<li>Render the texture on the top of the scene</li>
</ol>

<div id="attachment_1562" class="wp-caption aligncenter" style="width: 257px"><a href="http://aegisknight.org/wp-uploads/Flash-Rendering.png"><img src="http://aegisknight.org/wp-uploads/Flash-Rendering.png" alt="" title="Naive Flash Rendering" width="247" height="524" class="size-full wp-image-1562" /></a><p class="wp-caption-text">Naive Flash Rendering</p></div>

<p>Ta da!  But your frame rate dropped to 2 frames per second?  Ouch.  It turns out this implementation is horribly slow.  There are a couple reasons.</p>

<p>First, asking the Adobe flash player to render into a DIB isn&#8217;t a cheap operation.  In our measurements, drawing even a simple SWF takes on the order of 10 milliseconds.  Since most UI doesn&#8217;t animate every frame, we should be able to cache the captured framebuffer.</p>

<p>Second, main memory and graphics memory are on different components in your computer.  You want to avoid wasting time and bus traffic by unnecessarily copying data from the CPU to the GPU every frame.  If only the lower-right corner of a SWF changes, we should limit our memory copies to that region.</p>

<p>Third, modern GPUs are fast, but not everyone has them.  Let&#8217;s say you have a giant mostly-empty SWF and want to render it on top of your 3D scene.  On slower GPUs, it would be ideal if you could limit your texture draws to the region of the screen that are non-transparent.</p>

<h2>Rendering Flash as a 3D Overlay (The Fast Way)</h2>

<p>Disclaimer: I can&#8217;t take credit for these algorithms.  They were jointly developed over years by many smart engineers at IMVU.</p>

<p>First, let&#8217;s reduce an embedded Flash player to its principles:</p>

<ul>
<li>Flash exposes an IShockwaveFlash [link] interface through which you can load and play movies.</li>
<li>Flash maintains its own frame buffer.  You can read these pixels with IViewObject::Draw.</li>
<li>When a SWF updates regions of the frame buffer, it notifies you through IOleInPlaceSiteWindowless::InvalidateRect.</li>
</ul>

<p>In addition, we&#8217;d like the Flash overlay system to fit within these performance constraints:</p>

<ul>
<li>Each SWF is rendered over the entire window.  For example, implementing a ball that bounces around the screen or a draggable UI component should not require any special IMVU APIs.</li>
<li>If a SWF is not animating, we do not copy its pixels to the GPU every frame.</li>
<li>We do not render the overlay in transparent regions.  That is, if no Flash content is visible, rendering is free.</li>
<li>Memory consumption (ignoring memory used by individual SWFs) for the overlay usage is O(framebuffer), not O(framebuffer * SWFs).  That is, loading three SWFs should not require allocation of three screen-sized textures.</li>
<li>If Flash notifies of multiple changed regions per frame, only call IViewObject::Draw once.</li>
</ul>

<p>Without further ado, let&#8217;s look at the fast algorithm:</p>

<div id="attachment_1564" class="wp-caption aligncenter" style="width: 573px"><a href="http://aegisknight.org/wp-uploads/Fast-Flash-Rendering.png"><img src="http://aegisknight.org/wp-uploads/Fast-Flash-Rendering.png" alt="" title="Fast Flash Rendering" width="563" height="808" class="size-full wp-image-1564" /></a><p class="wp-caption-text">Fast Flash Rendering</p></div>

<p>Flash notifies us of visual changes via IOleInPlaceSiteWindowless::InvalidateRect.  We take any updated rectangles and add them to a per-frame dirty region.  When it&#8217;s time to render a frame, there are four possibilities:</p>

<ul>
<li>The dirty region is empty and the opaque region is empty.  This case is basically free, because nothing need be drawn.</li>

<li>The dirty region is empty and the opaque region is nonempty.  In this case, we just need to render our cached textures for the non-opaque regions of the screen.  This case is the most common.  Since a video memory blit is fast, there&#8217;s not much we could do to further speed it up.</li>

<li>The dirty region is nonempty.  We must IViewObject::Draw into our Overlay DIB, with one tricky bit.  Since we&#8217;re only storing one overlay texture, we need to render each loaded Flash overlay SWF into the DIB, not just the one that changed.  Imagine an animating SWF underneath another translucent SWF.  The top SWF must be composited with the bottom SWF&#8217;s updates.  After rendering each SWF, we scan the updated DIB for a minimalish opaque region.  Why not just render the dirty region?  Imagine a SWF with a bouncing ball.  If we naively rendered every dirty rectangle, eventually we&#8217;d be rendering the entire screen.  Scanning for minimal opaque regions enables recalculation of what&#8217;s actually visible.</li>

<li>The dirty region is nonempty, but the updated pixels are all transparent.  If this occurs, we no longer need to render anything at all until Flash content reappears.</li>
</ul>

<p>This algorithm has proven efficient.  It supports multiple overlapping SWFs while minimizing memory consumption and CPU/GPU draw calls per frame.  Until recently, we used Flash for several of our UI components, giving us a standard toolchain and a great deal of flexibility.  Flash was the bridge that took us from the dark ages of C++ UI code to UI on which we could actually iterate.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/07/efficiently-rendering-flash-in-a-3d-scene/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Embed Flash Into Your 3D Application</title>
		<link>http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/</link>
		<comments>http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 08:52:16 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1550</guid>
		<description><![CDATA[The original source of this post is at the IMVU engineering blog.  Subscribe now!

[I wrote this post last year when IMVU still used Flash for a significant portion of our UI. Even though we now embed Gecko, I believe embedding Flash is still valuable.]

Writing user interfaces is hard.  Writing usable interfaces is harder. [...]]]></description>
			<content:encoded><![CDATA[<p>The original source of this post is at the <a href="http://engineering.imvu.com/2010/07/29/how-to-embed-flash-into-your-3d-application/">IMVU engineering blog</a>.  <a href="http://engineering.imvu.com">Subscribe now!</a></p>

<p><em>[I wrote this post last year when IMVU still used Flash for a significant portion of our UI. Even though we now embed Gecko, I believe embedding Flash is still valuable.]</em></p>

<p>Writing user interfaces is hard.  Writing usable interfaces is harder.  Yet, the design of your interface <em>is your product</em>.</p>

<p>Products are living entities.  They always want to grow, adapting to their users as users adapt to them.  In that light, why build your user interface in a static technology like C++ or Java?  It won&#8217;t be perfect the first time you build it, so prepare for change.</p>

<p>IMVU employs two technologies for rapidly iterating on and refining our client UIs: Flash and Gecko/HTML.  Sure, integrating these technologies has a sizable up-front cost, but the iteration speed they provide easily pays for them.  Rapid iteration has some obvious benefits:</p>

<ol>
<li>reduces development cost</li>
<li>reduces time to market</li>
</ol>

<p>and some less-obvious benefits:</p>

<ol>
<li>better product/market fit: when you can change your UI, you will.</li>
<li>improved product quality: little details distinguish mediocre products from great products.  make changing details cheap and your Pinto will become a Cadillac.</li>
<li>improved morale: both engineers and designers <em>love</em> watching their creations appear on the screen right before them. it&#8217;s why so many programmers create games!</li>
</ol>

<p>I will show you how integrating Flash into a 3D application is easier than it sounds.</p>


<h2>Should I use Adobe Flash or Scaleform GFx?</h2>

<p>The two most common Flash implementations are Adobe&#8217;s ActiveX control (which has a <a href="http://www.adobe.com/products/player_census/flashplayer/version_penetration.html">97% installed base!</a>) and Scaleform GFx.</p>

<p>Adobe&#8217;s control has perfect compatibility with their tool chain (go figure!) but is closed-source and good luck getting help from Adobe.</p>

<p>Scaleform GFx is an alternate implementation of Flash designed to be embedded in 3D applications, but, last I checked, is not efficient on machines without GPUs.  (Disclaimer: this information is two years old, so I encourage you to make your own evaluation.)</p>

<p>IMVU chose to embed Adobe&#8217;s player.</p>

<h2>Deploying the Flash Runtime</h2>

<p>Assuming you&#8217;re using Adobe&#8217;s Flash player, how will you deploy their runtime?  Well, given Flash&#8217;s install base, you can get away with loading the Flash player already installed on the user&#8217;s computer.  If they don&#8217;t have Flash, just require that they install it from your download page.  Simple and easy.</p>

<p>Down the road, when Flash version incompatibilities and that last 5% of your possible market becomes important, you can request <a href="http://www.adobe.com/licensing/">permission from Adobe</a> to deploy the Flash player with your application.</p>

<h2>Displaying SWFs</h2>

<p>IMVU displays Flash in two contexts: traditional HWND windows and 2D overlays atop the 3D scene.</p>

<div id="attachment_1551" class="wp-caption aligncenter" style="width: 689px"><a href="http://aegisknight.org/wp-uploads/imvu_flash_window.png"><img src="http://aegisknight.org/wp-uploads/imvu_flash_window.png" alt="" title="IMVU Flash Window" width="679" height="353" class="size-full wp-image-1551" /></a><p class="wp-caption-text">IMVU Flash Window</p></div>

<div id="attachment_1568" class="wp-caption aligncenter" style="width: 485px"><a href="http://aegisknight.org/wp-uploads/imvu_flash_overlay1.png"><img src="http://aegisknight.org/wp-uploads/imvu_flash_overlay1.png" alt="" title="IMVU Flash Overlay" width="475" height="566" class="size-full wp-image-1568" /></a><p class="wp-caption-text">IMVU Flash Overlay</p></div>

<p>If you want to have something up and running in a day, buy <a href="http://www.f-in-box.com/">f_in_box</a>.  Besides its awesome name, it&#8217;s cheap, comes with source code, and the support forums are fantastic.  It&#8217;s a perfect way to bootstrap.  After a weekend of playing with f_in_box, Dusty and I had a YouTube video playing in a texture on top of our 3D scene.</p>

<p>Once you run into f_in_box&#8217;s limitations, you can use the IShockwaveFlash and IOleInPlaceObjectWindowless COM interfaces directly.  See Igor Makarav&#8217;s <a href="http://www.codeproject.com/KB/COM/flashcontrol.aspx?fid=321012">excellent tutorial</a> and CFlashWnd class.</p>

<h2>Rendering Flash as an HWND</h2>

<p>For top-level UI elements use f_in_box or CFlashWnd directly.  They&#8217;re perfectly suited for this.  Seriously, it&#8217;s just a few lines of code.  Look at their samples and go.</p>

<h2>Rendering Flash as a 3D Overlay</h2>

<p>Rendering Flash to a 3D window gets a bit tricky&#8230;  Wait for Part 2 of this post!</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/07/how-to-embed-flash-into-your-3d-application/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scalable Build Systems: An Analysis of Tup</title>
		<link>http://chadaustin.me/2010/06/scalable-build-systems-an-analysis-of-tup/</link>
		<comments>http://chadaustin.me/2010/06/scalable-build-systems-an-analysis-of-tup/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 10:00:17 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[agile]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[ibb]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[scons]]></category>

		<guid isPermaLink="false">http://chadaustin.me/?p=1530</guid>
		<description><![CDATA[I previously argued that any tool whose running time is proportional with the number of files in a project scales quadratically with time.  Bluem00 on Hacker News pointed me towards Tup, a scalable build system with goals similar to ibb.

Mike Shal, Tup&#8217;s author, wrote Build System Rules and Algorithms, formalizing the algorithmic deficiencies with [...]]]></description>
			<content:encoded><![CDATA[<p>I previously argued that any tool whose running time is proportional with the number of files in a project <a href="http://chadaustin.me/2010/03/your-version-control-and-build-systems-dont-scale-introducing-ibb/">scales quadratically with time</a>.  Bluem00 on <a href="http://news.ycombinator.com/item?id=1167238">Hacker News</a> pointed me towards <a href="http://gittup.org/tup/">Tup</a>, a scalable build system with goals similar to <a href="http://github.com/chadaustin/ibb">ibb</a>.</p>

<p>Mike Shal, Tup&#8217;s author, wrote <a href="http://gittup.org/tup/build_system_rules_and_algorithms.pdf">Build System Rules and Algorithms</a>, formalizing the algorithmic deficiencies with existing build systems and describing Tup&#8217;s implementation, a significant improvement over the status quo.  I would like to document my analysis of Tup and whether I think it replaces <a href="http://github.com/chadaustin/ibb">ibb</a>.</p>

<p>Before we get started, I&#8217;d like to thank Mike Shal for being receptive to my comments.  I sent him a draft of my analysis and his responses were thoughtful and complete.  With his permission, I have folded his thoughts into the discussion below.</p>

<p>Is Tup suitable as a general-purpose build system?  Will it replace SCons or Jam or Make anytime soon?  Should I continue working on ibb?</p>

<p>Remember our criteria for a scalable build system, one that enables test-driven development at arbitrary project sizes:</p>

<ol>
<li>O(1) no-op builds</li>
<li>O(changes) incremental builds</li>
<li>Accessible dependency DAG atop which a variety of tools can be built</li>
</ol>

<p>Without further ado, my thoughts on Tup follow:</p>

<h2>Syntax</h2>

<p>Tup defines its own declarative syntax, similar to Make or Jam.  At first glance, the Tup syntax looks semantically equivalent to Make.  From the <a href="http://gittup.org/tup/examples.html">examples</a>:</p>

<pre>
: hello.c |> gcc hello.c -o hello |> hello
</pre>

<p>Read the dependency graph from left to right:  hello.c is compiled by gcc into a hello executable.  Tup supports variable substitution and limited flow control.</p>

<p>Build systems are inherently declarative, but I think Tup&#8217;s syntax has two flaws:</p>

<ol>
<li>Inventing a new syntax unnecessarily slows adoption: by implementing GNU Make&#8217;s syntax, Tup would be a huge drop-in improvement to existing build systems.</li>
<li>Even though specifying dependency graphs is naturally declarative, I think a declarative syntax is a mistake.  Build systems are a first-class component of your software and your team&#8217;s workflow.  You should be able to develop them in a well-known, high-level language such as Python or Ruby, especially since those languages come with rich libraries.  As an example, SCons gets this right: it&#8217;s trivial for me to write CPU autodetection logic for parallel builds in a build script if that makes sense.  Or I can extend SCons&#8217;s Node system to download source files from the web.</li>
</ol>

<h2>Implementation Language</h2>

<p>Tup is 15,000 lines of C.  There&#8217;s no inherent problem with C, but I do think a community-supported project is more likely to thrive in a faster and safer language, such as Python or Ruby.  Having worked with teams of engineers, it&#8217;s clear that most engineers can safely work in Python with hardly any spin-up time.  I can&#8217;t say the same of C.</p>

<p>Git is an interesting case study: The core performance-sensitive data structures and algorithms are written in C, but many of its interesting features are written in Perl or sh, including git-stash, git-svn, and git-bisect.  Unlike Git, I claim Python and Ruby are plenty efficient for the entirety of a scalable build system.  Worst case, the dependency graph could live in C and everything else could stay in Python.</p>

<h2>Scanning Implicit Dependencies</h2>

<p>The Tup paper mentions offhand that it&#8217;s trivial to monitor a compiler&#8217;s file accesses and thus determine its true dependencies for generating a particular set of outputs.  The existing implementation uses a LD_PRELOAD shim to monitor all file accesses attempted by, say, gcc, and treats those as canonical input files.  Clever!</p>

<p>This is a great example of lateral, scrappy thinking.  It has a couple huge advantages:</p>

<ol>
<li>No implicit dependencies (such as C++ header file includes) need be specified &#8212; if all dependencies come from the command line or a file, Tup will know them all.</li>
<li>It&#8217;s easy to implement.  Tup&#8217;s ldpreload.c is a mere 500 lines.</li>
</ol>

<p>And a few disadvantages:</p>

<ol>
<li>Any realistic build system must treat Windows as a first-class citizen.  Perhaps, on Windows, Tup could use something like <a href="http://research.microsoft.com/en-us/projects/detours/">Detours</a>. I&#8217;ll have to investigate that.</li>

<li>Intercepting system calls is reliable when the set of system calls is known and finite.  However, there&#8217;s nothing stopping the OS vendor from adding <a href="http://msdn.microsoft.com/en-us/library/aa363853(VS.85).aspx">new system calls</a> that modify files.</li>

<li>Incremental linking / external PDB files: these Visual C++ features both read and write the same file in one compile command.  SCons calls this a SideEffect: commands that share a SideEffect cannot parallelize.  A build system that does not support incremental linking or external symbols would face resistance among Visual C++ users.</li>
</ol>

<p>And some open questions:</p>

<ol>
<li>I haven&#8217;t completely thought this through, but it may be important to support user-defined dependency scanners that run before command execution, enabling tools such as graph debugging.</li>
<li>I don&#8217;t have a realistic example, but imagine a compiler that reads spurious dependency changes from run to run; say, a compiler that only checks its license file on every other day.</li>
</ol>

<p>Stepping back, I think the core build system should not be responsible for dependency scanning.  By focusing on dependency graph semantics and leaving dependency scanning up to individual tools (which may or may not use LD_PRELOAD or similar techniques), a build system can generalize to uses beyond compiling software, as I mentioned in my previous blog post.</p>

<h2>Dependency Graph</h2>

<p>Tup&#8217;s dependency DAG contains two types of nodes: Commands and Files.  Files depend on Commands and Commands depend on other Files.  I prefer Tup&#8217;s design over SCons&#8217;s DAG-edges-are-commands design for two reasons:</p>

<ol>
<li>It simplifies the representation of multiple-input multiple-output commands.</li>
<li>Some commands, such as &#8220;run-test foo&#8221; or &#8220;search-regex some.*regex&#8221; depend on source files but produce no files as output. Since they fit naturally into the build DAG, commands are a first-class concept.</li>
</ol>

<h2>Build Reliability</h2>

<p>Tup, like SCons, places a huge emphasis on build reliability.  This is key and I couldn&#8217;t agree more.  In the half-decade I&#8217;ve used SCons, I can count the number of broken builds on one hand.  Sadly, many software developers are used to typing &#8220;make clean&#8221; or clicking &#8220;full rebuild&#8221; when something is weird.  What a huge source of waste!  Developers should trust the build system as much as their compiler, and the build system should go out of its way to help engineers specify complete and accurate dependencies.</p>

<p>Reliable builds imply:</p>

<ol>
<li>Changes are tracked by file <em>contents</em>, not timestamps.</li>
<li>The dependency graph, including implicit dependencies such as header files and build commands, is complete and accurate by default.</li>
<li>Compiler command lines are included in the DAG.  Put another way: if the command used to build a file changes, the file must be rebuilt.</li>
</ol>

<p>Tup takes a strict functional approach and formalizes build state as a set of files and their contents.  (I would argue build state also includes file metadata such as file names and timestamps, at least if the compiler uses such information.)  If the build state does not change between invocations, then no work must be done.</p>

<p>Tup even takes build reliability one step further than SCons:  If you rename a target file in the build script, Tup actually deletes the old built target before rebuilding the new one.  Thus, you will never have stale target executables lying around in your build tree.</p>

<p>Nonetheless, there are situations where a project may choose to sacrifice absolute reliability for significant improvements in build speed, such as incremental linking discussed above.</p>

<h2>Core vs. Community</h2>

<p>A build system is a critical component of any software team&#8217;s development process.  Since every team is different, it&#8217;s essential that a build system is flexible and extensible.  SCons, for example, correctly chose to implement build scripts in a high-level language (Python) with a declarative API for specifying nodes and edges in the dependency graph.</p>

<p>However, I think SCons did not succeed at separating its core engine from its community.  SCons tightly couples the underlying dependency graph with support for tools like Visual C++, gcc, and version control.  The frozen and documented SCons API is fairly high-level while the (interesting) internals are treated as private APIs.  It should be the opposite: a dependency graph is a narrow, stable, and general API.  By simplifying and documenting the DAG API, SCons could enable broader uses, such as unit test execution.</p>

<h2>Configuration</h2>

<p>Like Tup&#8217;s author, I agree that build autoconfiguration (such as autoconf or SCons&#8217;s Configure support) should not live in the core build system.  Autoconfiguration is simply an argument that build scripts should be specified in a general programming language and that the community should develop competing autoconfiguration systems.  If a particular autoconfiguration system succeeds in the marketplace, then, by all means, ship it with your build tool.  Either way, it shouldn&#8217;t have access to any internal APIs.  Configuration mechanisms are highly environment-sensitive and are best maintained by the community anyway.</p>

<h2>DAG post-process optimizations</h2>

<p>Another argument for defining a build tool in a general-purpose language is to allow user-defined DAG optimizations and sort orders.  I can think of two such use cases:</p>

<ol>
<li>Visual C++ greatly improves compile times when multiple C++ files are specified on one command line.  In fact, the benefit of batched builds can exceed the benefit of PCH.  A DAG optimizer would search for a set of C++ source files that produce object files in the same directory and rewrite the individual command lines into one.</li>

<li>When rapidly iterating, it would be valuable for a build system or test runner to sort such that the most-recently-failed compile or test runs first.  However, when hunting test interdependencies as part of a nightly build, you may want to shuffle test runs.  On machines with many cores but slow disks, you want to schedule expensive links as soon as possible to mitigate the risk that multiple will execute concurrently and thrash against your disk.</li>
</ol>

<h2>Conclusion</h2>

<p>Tup is a significant improvement over the status quo, and I have personally confirmed its performance &#8212; it&#8217;s lightning fast and it scales to arbitrary project sizes.</p>

<p>However, without out-of-the-box Windows support, a mainstream general-purpose language, and a model for community contribution, I don&#8217;t see Tup rapidly gaining traction.  With the changes I suggest, it could certainly replace Make and perhaps change the way we iterate on software entirely.</p>

<p>Next, I intend to analyze <a href="http://code.google.com/p/prebake/">prebake</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2010/06/scalable-build-systems-an-analysis-of-tup/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>IMVU Crash Reporting: Stalls and Deadlocks</title>
		<link>http://chadaustin.me/2009/06/imvu-crash-reporting-stalls-and-deadlocks/</link>
		<comments>http://chadaustin.me/2009/06/imvu-crash-reporting-stalls-and-deadlocks/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 06:52:38 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[crashes]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1457</guid>
		<description><![CDATA[By mid-2006, we&#8217;d primarily focused on access violations and unhandled exceptions, the explosive application failures.  After extensive effort, we got our client&#8217;s crash rate down to 2% or so, where 2% of all sessions ended in a crash.*  Still the customers cried &#8220;Fix the crashes!&#8221;

It turns out that when a customer says &#8220;crash&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>By mid-2006, we&#8217;d primarily focused on access violations and unhandled exceptions, the explosive application failures.  After extensive effort, we got our client&#8217;s crash rate down to 2% or so, where 2% of all sessions ended in a crash.<a href="#footnote_session_length">*</a>  Still the customers cried &#8220;Fix the crashes!&#8221;</p>

<p>It turns out that when a customer says &#8220;crash&#8221; they mean &#8220;it stopped doing what I wanted&#8221;, but engineers hear &#8220;the program threw an exception or caused an access violation&#8221;.  Thus, to the customer, crash can mean:</p>

<ul>
<li>the application was unresponsive for a period of time</li>
<li>the UI failed to load, making the client unusable</li>
<li>the application has been disconnected from the server</li>
</ul>

<p>In short, any time the customer cannot make progress and it&#8217;s not (perceived to be) their fault, the application has crashed.</p>

<p>OK, we&#8217;ve got our work cut out for us&#8230;  Let&#8217;s start by considering deadlocks and stalls.</p>

<p>First, some terminology: in computer science, a <a href="http://en.wikipedia.org/wiki/Deadlock ">deadlock</a> is a situation where two threads or processes are waiting for each other, so neither makes progress.  That definition is a bit academic for our purposes.  Let&#8217;s redefine deadlock as any situation where the program becomes unresponsive for an unreasonable length of time.  This definition includes <a href="http://en.wikipedia.org/wiki/Livelock">livelock</a>, slow operations without progress indication, and network (or disk!) I/O that blocks the program from responding to input.</p>

<p>It actually doesn&#8217;t matter whether the program will <i>eventually</i> respond to input.  People get impatient quickly.  You&#8217;ve only got a few seconds to respond to the customer&#8217;s commands.</p>

<h2>Detecting Deadlocks in C++</h2>

<p>The embedded programming world has a &#8220;<a href="http://en.wikipedia.org/wiki/Watchdog_timer">watchdog timer</a>&#8221; concept.  Your program is responsible for periodically pinging the watchdog, and if for several seconds you don&#8217;t, the watchdog restarts your program and reports debugging information.</p>

<p>Implementing this in C++ is straightforward:</p>

<ul>
<li>Start a watchdog thread that wakes up every few seconds to check that the program is still responding to events.</li>
<li>Add a heartbeat to your main event loop that frequently pings the watchdog.</li>
<li>If the watchdog timer detects the program is unresponsive, record stack traces and log files, then report the failure.</li>
</ul>

<p>IMVU&#8217;s <a href="http://aegisknight.org/2009/04/imvus-callstack-api-now-open-source/">CallStack API</a> allows us to grab the C++ call stack of an arbitrary thread, so, if the main thread is unresponsive, we report its current stack every couple of seconds.  This is often all that&#8217;s needed to find and fix the deadlock.</p>

<h2>Detecting Deadlocks in Python</h2>

<p>In Python, we can take the same approach as above:</p>

<ol>
<li>Start a watchdog thread.</li>
<li>Ping the Python watchdog thread in your main loop.</li>
<li>If the watchdog detects that you&#8217;re unresponsive, record the main thread&#8217;s Python stack (this time with <a href="http://docs.python.org/library/sys.html#sys._current_frames">sys._current_frames</a>) and report it.</li>
</ol>

<p>Python&#8217;s <a href="http://docs.python.org/dev/glossary.html#term-global-interpreter-lock">global interpreter lock</a> (GIL) can throw a wrench in this plan.  If one thread enters an infinite loop while keeping the GIL held (say, in a native extension), the watchdog thread will never wake and so cannot report a deadlock.  In practice, this isn&#8217;t a problem, because the C++ deadlock detector will notice and report a deadlock.  Plus, most common deadlocks are caused by calls that release the GIL: <code>threading.Lock.acquire</code>, <code>socket.read</code>, <code>file.read</code>, and so on.</p>

<p>It might help to think of the Python deadlock detector as a fallback that, if successful, adds richer information to your deadlock reports.  If it failed, whatever.  The C++ deadlock detector is probably enough to diagnose and fix the problem.</p>

<h2>What did we learn?</h2>

<p>It turned out the IMVU client had several bugs where we blocked the main thread on the network, sometimes for up to 30 seconds.  By that point, most users just clicked the close box [X] and terminated the process.  Oops.</p>

<p>In addition, the deadlock detectors pointed out places where we were doing too much work in between message pumps.  For example, loading some assets into the 3D scene might nominally take 200ms.  On a computer with 256 MB of RAM, though, the system might start thrashing and loading the same assets would take 5s and report as a &#8220;deadlock&#8221;.  The solution was to reducing the program&#8217;s working set and bite off smaller chunks of work in between pumps.</p>

<p>I don&#8217;t recall seeing many &#8220;computer science&#8221; deadlocks, but these watchdogs were invaluable in tracking down important failure conditions in the IMVU client.</p>

<p>Next time, we&#8217;ll improve the accuracy of our crash metrics and answer the question &#8220;How do you know your metrics are valid?&#8221;</p>

<hr />

<p><a id="footnote_session_length">*</a> Median session length is a more useful reliability metric.  It&#8217;s possible to fix crashes and see no change in your percentage of failed sessions, if fixing crashes simply causes sessions to become longer.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/06/imvu-crash-reporting-stalls-and-deadlocks/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fast Builds: Incremental Linking and Embedded SxS Manifests</title>
		<link>http://chadaustin.me/2009/05/incremental-linking-and-embedded-manifests/</link>
		<comments>http://chadaustin.me/2009/05/incremental-linking-and-embedded-manifests/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 02:31:54 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[scons]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1414</guid>
		<description><![CDATA[
As I&#8217;ve said before, fast builds are crucial for efficient development.  But for those of us who use C++ regularly, link times are killer.  It&#8217;s not uncommon to spend minutes linking your compiled objects into a single binary.  Incremental linking helps a great deal, but, as you&#8217;ll see, incremental linking has become [...]]]></description>
			<content:encoded><![CDATA[<p>
As I&#8217;ve said before, fast builds are crucial for efficient development.  But for those of us who use C++ regularly, link times are killer.  It&#8217;s not uncommon to spend minutes linking your compiled objects into a single binary.  Incremental linking helps a great deal, but, as you&#8217;ll see, incremental linking has become a lot harder in the last few versions of Visual Studio&#8230;
</p>

<p><a href="http://en.wikipedia.org/wiki/Linker">Linking</a> an EXE or DLL is a very expensive operation &#8212; it&#8217;s roughly O(N) where N is the amount of code being linked.  Worse, several optimizing linkers defer code generation to link time, exacerbating the problem!  When you&#8217;re trying to practice TDD, even a couple seconds in your red-green-refactor iteration loop is brutal.  And it&#8217;s not uncommon for large projects to spend minutes linking.</p>

<p>Luckily, Visual C++ supports an <a href="http://msdn.microsoft.com/en-us/library/4khtbfyf(VS.80).aspx">/INCREMENTAL</a> flag, instructing relinks to modify the DLL or EXE in-place, reducing link time to O(changed code) rather than O(all code).  In the olden days of Visual C++ 6, all you had to do was enable /INCREMENTAL, and bam, fast builds.</p>

<p>These days, it&#8217;s not so simple.  Let&#8217;s take an excursion into how modern Windows finds DLL dependencies&#8230;</p>

<h2>Side-by-Side (SxS) Manifests</h2>

<p>Let&#8217;s say you&#8217;re writing a DLL <code>foo.dll</code> that depends on the CRT by using, say, <code>printf</code> or <code>std::string</code>.  When you link <code>foo.dll</code>, the linker will also produce <code>foo.dll.manifest</code>.  Windows XP and Vista use .manifest files to load the correct CRT version.  (This prevents DLL hell: two programs can depend on different versions of the same DLL.)</p>

<p>Since remembering to carry around .manifest files is annoying and error-prone, Microsoft and others recommend that you embed them into your EXE or DLL as a resource:</p>

<pre>
mt.exe –manifest foo.dll.manifest -outputresource:foo.dll;2
</pre>

<p>Convenient, but it modifies the DLL in place, breaking incremental links!  This is a <a href="http://markmail.org/message/f4g2qi2kf5wu7n5t">known problem</a>, and the <a href="http://blogs.msdn.com/zakramer/archive/2006/05/22/603558.aspx">&#8220;solutions&#8221;</a> others suggest are INSANE.  My favorite is the <a href="http://blogs.msdn.com/nikolad/articles/425359.aspx">300-line makefile</a> with a note from the author &#8220;[If this does not work], please let me know ASAP. I will try fixing it for you.&#8221;  Why doesn&#8217;t Visual Studio just provide an /EMBEDMANIFESTRESOURCE flag that would automatically solve the problem?!</p>

<p>I just want incremental linking and embedded manifests.  Is that so much to ask?  I tried a bunch of approaches.  Most didn&#8217;t work.  I&#8217;ll show them, and then give my current (working) approach.  If you don&#8217;t care about the sordid journey, <a href="#solution">skip to the end</a>.</p>

<h2>What Didn&#8217;t Work</h2>

<ul>
<li><em>Not embedding manifests at all.</em></li>
</ul>

<p>What went wrong: I could never figure out the rules where by manifest dependencies are discovered.  If python.exe depends on the release CRT and your module DLL depends on the debug CRT, and they live in different directories (??), loading the module DLL would fail.  Gave up.</p>

<ul>
<li><em>Linking a temporary file (foo.pre.dll), making a copy (foo.pre.dll -> foo.dll), and embedding foo.pre.dll.manifest into foo.dll with mt.exe.</em></li>
</ul>

<p>What went wrong: As far as I can tell, mt.exe is a terrible piece of code.  In procmon I&#8217;ve watched it close file handles it didn&#8217;t open, causing permissions violations down the line.  (?!)  Sometimes it silently corrupts your EXEs and DLLs too.  This may be a known weakness in <a href="http://msdn.microsoft.com/en-us/library/ms648049(VS.85).aspx">UpdateResource</a>.  Yay!  (Thanks to <a href="http://www.luminance.org/">Kevin Gadd</a>; he was instrumental in diagnosing these bugs.)  mt.exe may or <a href="http://www.wintellect.com/CS/blogs/jrobbins/archive/2009/01/24/the-case-of-the-corrupt-pe-binaries.aspx">may not</a> be fixed in recent Visual Studios.  Either way, I&#8217;m convinced mt.exe has caused us several intermittent build failures in the past.  Avoiding it is a good thing.</p>

<ul>
<li><em>Linking to a temporary file (foo.pre.dll), generating a resource script (foo.pre.rc) from (foo.pre.dll.manifest), compiling said resource script (foo.pre.res), and including the compiled resource into the final link (foo.dll).</em></li>
</ul>

<p>What went wrong: This approach is reliable but slow.  Linking each DLL and EXE twice, even if both links are incremental, is often slower than just doing a full link to begin with.</p>

<ul>
<li><em>Linking foo.dll with foo.dll.manifest (via a resource script, as above) if it exists.  If foo.dll.manifest changed as a result of the link, relink.</em></li>
</ul>

<p>I didn&#8217;t actually try this one because non-DAG builds scare me.  I like the simplicity and reliability of the &#8220;inputs -> command -> outputs&#8221; build model.  It&#8217;s weird if foo.dll.manifest is an input and an output of the link.  Yes, technically, that&#8217;s how incremental linking works at all, but the non-DAG machinery is hidden in link.exe.  From SCons&#8217;s perspective, it&#8217;s still a DAG.</p>

<h2><a id="solution">Finally, a working solution:</a></h2>

<p>For every build configuration {debug,release} and dependency {CRT,MFC,&#8230;}, link a tiny program to generate said dependency manifest.  Compile manifest into a resource script (.rc -> .res) and link the compiled manifest resources into your other DLLs and EXEs.</p>

<p>This approach has several advantages:</p>

<ul>
<li>These pre-generated manifest resources are created once and reused in future builds, with no impact to build time.</li>
<li>The build is a DAG.</li>
<li>We avoid letting mt.exe wreak havoc on our build by sidestepping it entirely.</li>
</ul>

<p>I can think of one disadvantage &#8211; you need to know up-front on which SxS DLLs you depend.  For most programs, the CRT is the only one.  And hopefully understanding your dependencies isn&#8217;t a bad thing, though.  ;)</p>

<p>After several evenings of investigation, we&#8217;re back to the same link times we had with Visual C++ 6!  Yay!</p>

<hr />

<h2>The Code</h2>

<p>If you care, here&#8217;s our SCons implementation of embedded manifests:</p>

<pre>
# manifest_resource(env, is_dll) returns a manifest resource suitable for inclusion into
# the sources list of a Program or SharedLibrary.
manifest_resources = {}
def manifest_resource(env, is_dll):
    if is_dll:
        resource_type = 2 #define ISOLATIONAWARE_MANIFEST_RESOURCE_ID 2
    else:
        resource_type = 1 #define CREATEPROCESS_MANIFEST_RESOURCE_ID  1

    is_debug = env['DEBUG'] # could use a 'build_config' key if we had more than debug/release
    del env

    def build_manifest_resource():
        if is_debug:
            env = baseEnv.Clone(tools=[Debug])
        else:
            env = baseEnv.Clone(tools=[Release])
        env['LINKFLAGS'].remove('/MANIFEST:NO')

        if is_dll:
            linker = env.SharedLibrary
            target_name = 'crt_manifest.dll'
            source = env.File('#/MSVC/crt_manifest_dll.cpp')
        else:
            linker = env.Program
            target_name = 'crt_manifest.exe'
            source = env.File('#/MSVC/crt_manifest_exe.cpp')

        env['OUTPUT_PATH'] = '#/${BUILDDIR}/${IMVU_BUILDDIR_NAME}/%s' % (target_name,)

        obj = env.SharedObject('${OUTPUT_PATH}.obj', source)
        result = linker([env.File('${OUTPUT_PATH}'), '${OUTPUT_PATH}.manifest'], obj)
        manifest = result[1]

        def genrc(env, target, source):
            [target] = target
            [source] = source
            # 24 = RT_MANIFEST
            file(target.abspath, 'w').write('%d 24 "%s"' % (resource_type, source.abspath,))

        rc = env.Command('${OUTPUT_PATH}.rc', manifest, genrc)
        res = env.RES('${OUTPUT_PATH}.res', rc)
        env.Depends(res, manifest)
        return res
    
    key = (is_debug, resource_type)
    try:
        return manifest_resources[key]
    except KeyError:
        res = build_manifest_resource()

        manifest_resources[key] = res
        return res
</pre>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/05/incremental-linking-and-embedded-manifests/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Fast Builds: Unintrusive Precompiled Headers (PCH)</title>
		<link>http://chadaustin.me/2009/05/unintrusive-precompiled-headers-pch/</link>
		<comments>http://chadaustin.me/2009/05/unintrusive-precompiled-headers-pch/#comments</comments>
		<pubDate>Wed, 20 May 2009 20:24:59 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[scons]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1399</guid>
		<description><![CDATA[
Fast builds are critical to the C++ programmer&#8217;s productivity and happiness.  One common technique for reducing build times is precompiled headers (PCH).  There&#8217;s plenty of literature out there; I won&#8217;t describe PCH in detail here.


But one thing that&#8217;s always bothered me about PCH is that it affects your code.  #pragma hdrstop and [...]]]></description>
			<content:encoded><![CDATA[<p>
<a href="http://gamesfromwithin.com/?p=100">Fast builds</a> are critical to the C++ programmer&#8217;s productivity and happiness.  One common technique for reducing build times is precompiled headers (PCH).  There&#8217;s <a href="http://gamesfromwithin.com/?p=39">plenty of literature</a> out there; I won&#8217;t describe PCH in detail here.
</p>

<p>But one thing that&#8217;s always bothered me about PCH is that it affects your code.  <code>#pragma hdrstop</code> and <code>#include "StdAfx.h"</code> everywhere.  Gross.</p>

<p>I&#8217;m a strong believer in clean code without boilerplate, so can&#8217;t we do better?  Ideally we could make a simple tweak to the build system and see build times magically improve.  <a href="http://ennos.home.pages.de/">Enno</a> enticed me with mentions of his fast builds, so I took a look&#8230;</p>

<p>Using PCH in Visual C++ requires a header (call it Precompiled.h) that includes all of the expensive dependencies:</p>

<pre>
#include &lt;vector&gt;
#include &lt;map&gt;
#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;boost/python.hpp&gt;
#include &lt;windows.h&gt;
#include &lt;mmsystem.h&gt;
</pre>

<p>Additionally, we need a source file (let&#8217;s get creative and call it Precompiled.cpp), which is empty except for <code>#include "Precompiled.h"</code>.</p>

<p>Compile Precompiled.cpp with <code><a href="http://msdn.microsoft.com/en-us/library/7zc28563.aspx">/Yc</a> Precompiled.h</code> to generate Precompiled.pch, the actual precompiled header. Then, use the precompiled header on the rest of your files with <code><a href="http://msdn.microsoft.com/en-us/library/z0atkd6c.aspx">/Yu</a> Precompiled.h</code>.</p>

<p>OK, here&#8217;s the step that prevented me from using PCH for so long: every single source file in your project must <code>#include "Precompiled.h"</code> on its first line.</p>

<p>That&#8217;s ridiculous!  I don&#8217;t want to touch every file!</p>

<p>It turns out our savior is the <a href="http://msdn.microsoft.com/en-us/library/8c5ztk84.aspx">/FI</a> option.  From the documentation:</p>

<blockquote>
<p>This option has the same effect as specifying the file with double quotation marks in an #include directive on the first line of every source file specified on the command line [...]</p>
</blockquote>

<p>Exactly what we want!</p>

<p>But wait, doesn&#8217;t that mean every .cpp in our project will have access to every symbol included by the PCH?  Yes.  :(  It&#8217;s worth the build speedup.</p>

<p>However, explicit physical dependencies are important, and the only way to prevent important things from breaking is by blocking commits if they fail.  Since enabling and disabling PCH does not require any code changes, it&#8217;s easy enough to add a &#8220;disable PCH&#8221; option to your build system and run it on your continuous integration server:</p>

<a href="http://aegisknight.org/wp-uploads/compile_without_pch.png"><img src="http://aegisknight.org/wp-uploads/compile_without_pch-138x300.png" alt="Compile without PCH" title="Compile without PCH" width="138" height="300" class="aligncenter size-medium wp-image-1404" /></a>

<p>If somebody uses <code>std::string</code> but forgets to <code>#include &lt;string&gt;</code>, the build will fail and block commits.</p>

<p>In the end, here&#8217;s the bit of SCons magic that lets me quickly drop PCH into a project:</p>

<pre>
def enable_pch(env, source_file, header):
    if PCH_ENABLED:
        PCH, PCH_OBJ = env.PCH(source_file)
        env['PCH'] = PCH
        env['PCHSTOP'] = header
        env.Append(CPPFLAGS=['/FI' + header])
        return [PCH_OBJ]
    else:
        return [source_file]
</pre>

<p>Now you can benefit from fast builds with minimal effort and no change to your existing code!</p>

]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/05/unintrusive-precompiled-headers-pch/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Reporting Crashes in IMVU: Who threw that C++ exception?</title>
		<link>http://chadaustin.me/2009/04/who-threw-that-exception/</link>
		<comments>http://chadaustin.me/2009/04/who-threw-that-exception/#comments</comments>
		<pubDate>Mon, 20 Apr 2009 03:32:38 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[crashes]]></category>
		<category><![CDATA[imvu]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1328</guid>
		<description><![CDATA[It&#8217;s not often that I get to write about recent work.  Most of the techniques in this series were implemented at IMVU years ago.  A few weeks ago, however, a common C++ exception (tr1::bad_weak_ptr) starting intermittently causing crashes in the wild.  This exception can be thrown in a variety of circumstances, so [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s not often that I get to write about recent work.  Most of the techniques in this series were implemented at IMVU years ago.  A few weeks ago, however, a common C++ exception (<code>tr1::bad_weak_ptr</code>) starting intermittently causing crashes in the wild.  This exception can be thrown in a variety of circumstances, so we had no clue which code was problematic.</p>

<p>We could have modified <code>tr1::bad_weak_ptr</code> so its constructor fetched a <code>CallStack</code> and returned it from <code>tr1::bad_weak_ptr::what()</code>, but fetching a <code>CallStack</code> is not terribly cheap, especially in such a frequently-thrown-and-caught exception.  Ideally, we&#8217;d only grab a stack after we&#8217;ve determined it&#8217;s a crash (in the top-level crash handler).</p>

<p>Allow me to illustrate:</p>

<pre>
void main_function(/*arguments*/) {
    try {
        try {
            // We don't want to grab the call stack here, because
            // we'll catch the exception soon.
            this_could_fail(/*arguments*/);
        }
        catch (const std::exception&amp; e) {
            // Yup, exception is fine.  Just swallow and
            // do something else.
            fallback_algorithm(/*arguments*/);
        }
    }
    catch (const std::exception&amp; e) {
        // Oh no! fallback_algorithm() failed.
        // Grab a stack trace now.
        report_crash(CallStack::here());
    }
}
</pre>

<p>Almost!  Unfortunately, the call stack generated in the catch clause doesn&#8217;t contain <code>fallback_algorithm</code>.  It starts with <code>main_function</code>, because the stack has already been unwound by the time the catch clause runs.</p>

<p>Remember the structure of the stack:</p>

<a href="http://aegisknight.org/wp-uploads/example_stack.png"><img src="http://aegisknight.org/wp-uploads/example_stack.png" alt="Example Stack" title="Example Stack" width="455" height="444" class="size-full wp-image-1337" /></a>

<p>We can use the <code>ebp</code> register, which points to the current stack frame, to walk and record the current call stack.  <code>[ebp+4]</code> is the caller&#8217;s address, <code>[[ebp]+4]</code> is the caller&#8217;s caller, <code>[[[ebp]]+4]</code> is the caller&#8217;s caller&#8217;s caller, and so on.</p>

<p>What can we do with this information?  Slava Oks at Microsoft <a href="http://blogs.msdn.com/slavao/archive/2005/01/30/363428.aspx">gives the clues we need</a>.  When you type <code>throw MyException()</code>, a temporary <code>MyException</code> object is constructed <em>at the bottom of the stack</em> and passed into the catch clause by reference or by value (as a copy deeper on the stack).</p>

<p>Before the catch clause runs, objects on the stack between the thrower and the catcher are destructed, and <code>ebp</code> is pointed at the catcher&#8217;s stack frame (so the catch clause can access parameters and local variables).</p>

<p>From within the outer catch block, here is the stack, <code>ebp</code>, and <code>esp</code>:</p>

<a href="http://aegisknight.org/wp-uploads/stack_in_catch.png"><img src="http://aegisknight.org/wp-uploads/stack_in_catch.png" alt="Stack From Catch Clause" title="Stack From Catch Clause" width="455" height="1168" class="size-full wp-image-1338" /></a>

<p>Notice that, every time an exception is <em>caught</em> the linked list of stack frames is truncated.  When an exception is caught, <code>ebp</code> is reset to the stack frame of the <em>catcher</em>, destroying our link to the thrower&#8217;s stack.</p>

<p>But there&#8217;s useful information between <code>ebp</code> and <code>esp</code>!  We just need to search for it.  We can find who threw the exception with this simple algorithm:</p>

<pre>
	For every possible pointer between ebp and esp,
	find the deepest pointer p,
	where p might be a frame pointer.
	(That is, where walking p eventually leads to ebp.)
</pre>

<p>Or you can just use <a href="http://imvu.svn.sourceforge.net/viewvc/imvu/imvu_open_source/CallStack/CallStack.cpp?view=markup#l_127">our implementation</a>.</p>

<p>With this in mind, let&#8217;s rewrite our example&#8217;s error handling:</p>

<pre>
void main_function(/*arguments*/) {
    try {
        try {
            this_could_fail(/*arguments*/);
        }
        catch (const std::exception&amp; e) {
            // that's okay, just swallow and
            // do something else.
            fallback_algorithm(/*arguments*/);
        }
    }
    catch (const std::exception&amp; e) {
        // oh no! fallback_algorithm() failed.
        // grab a stack trace - including thrower!<b>
        Context ctx;
        getCurrentContext(ctx);
        ctx.ebp = findDeepestFrame(ctx.ebp, ctx.esp);
        report_crash(CallStack(ctx));</b>
    }
}
</pre>

<p>Bingo, fallback_algorithm appears in the stack:</p>

<pre>
main_function
<b>fallback_algorithm</b>
__CxxThrowException@8
_KiUserExceptionDispatcher@8
ExecuteHandler@20
ExecuteHandler2@20
___CxxFrameHandler
___InternalCxxFrameHandler
___CxxExceptionFilter
___CxxExceptionFilter
?_is_exception_typeof@@YAHABVtype_info@@PAU_EXCEPTION_POINTERS@@@Z
?_CallCatchBlock2@@YAPAXPAUEHRegistrationNode@@PBU_s_FuncInfo@@PAXHK@Z
</pre>

<p>Now we&#8217;ll have no problems finding the source of C++ exceptions!</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/04/who-threw-that-exception/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>IMVU&#8217;s CallStack API Now Open Source!</title>
		<link>http://chadaustin.me/2009/04/imvus-callstack-api-now-open-source/</link>
		<comments>http://chadaustin.me/2009/04/imvus-callstack-api-now-open-source/#comments</comments>
		<pubDate>Thu, 16 Apr 2009 04:07:18 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[crashes]]></category>
		<category><![CDATA[imvu]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1318</guid>
		<description><![CDATA[I&#8217;m proud to announce that IMVU has open-sourced its C++ CallStack API!  It&#8217;s available under the MIT license at our SourceForge project.  You can view the code here.

CallStack is a simple API for recording and displaying C++ call stacks on 32-bit Windows.  To display the call stack at the current location:


printf("%s\n", CallStack::here().asString().c_str());


To [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m proud to announce that IMVU has open-sourced its C++ CallStack API!  It&#8217;s available under the MIT license at our <a href="http://sourceforge.net/projects/imvu">SourceForge project</a>.  You can view the code <a href="http://imvu.svn.sourceforge.net/viewvc/imvu/imvu_open_source/CallStack/">here</a>.</p>

<p>CallStack is a simple API for recording and displaying C++ call stacks on 32-bit Windows.  To display the call stack at the current location:</p>

<pre>
printf("%s\n", CallStack::here().asString().c_str());
</pre>

<p>To grab a CallStack from an arbitrary thread:</p>

<pre>
HANDLE other_thread_handle = ...;
CallStack other_thread(other_thread_handle);
</pre>

<p>
From a structured exception:
</p>

<pre>
Context ctx;
CallStack cs;
__try {
	// broken code
}
__except (
	ctx = *(GetExceptionInformation())-&gt;ContextRecord),
	cs.getFromContext(ctx),
	EXCEPTION_EXECUTE_HANDLER
) {
	// display cs.asString()
}
</pre>

<p>
At first, the format of <code>CallStack.asString()</code> is a bit confusing, but with your symbol server it contains everything necessary to generate a symbolic call stack, including file names and line numbers.
</p>

<p>Here is an example <code>CallStack.asString()</code> result:</p>

<pre>
PYTHON25.DLL#b57f5c3ff1b64eda861d97643831ce701!000266dc
boost_python.dll#507f2f0a5fd34e65af25e728d0be9ebb1!0000d4bf
_avatarwindow.pyd#5289bbd0ff9c4ceab5198308f99ef9271!0002f76a
</pre>

<p>The lines are formatted <code>module_name#module_hash!offset</code>.  <code>module_name</code> is the name of the DLL or EXE in which the function lives.  <code>module_hash</code> is a unique hash that identifies a build of a particular module.  <code>offset</code> is the offset of the line of code in bytes from the start of the module.  With this information, you can look up a function name and line number for each entry in a call stack.</p>

<p>Fortunately, we have a tool that automates this process: <a href="http://imvu.svn.sourceforge.net/viewvc/imvu/imvu_open_source/tools/symbol_dump.py?view=markup">symbol_dump.py</a>!  Running it with the previous call stack on the clipboard produces this output:</p>

<pre>
PYTHON25.DLL#b57f5c3ff1b64eda861d97643831ce701!000266dc
	<strong>...t\python-2.5.1-src\objects\abstract.c (1860): PyObject_Call</strong>
boost_python.dll#507f2f0a5fd34e65af25e728d0be9ebb1!0000d4bf
	<strong>...0\libs\python\src\object\function.cpp ( 614): function_call</strong>
_avatarwindow.pyd#5289bbd0ff9c4ceab5198308f99ef9271!0002f76a
	<strong>...\boost\function\function_template.hpp ( 132): boost::detail::function::function_obj_invoker2&lt;boost::_bi::bind_t&lt;bool,boost::python::detail::translate_exception&lt;IMVUError,void (__cdecl*)(IMVUError const &amp;)&gt;,boost::_bi::list3&lt;boost::arg&lt;1&gt;,boost::arg&lt;2&gt;,boost::_bi::value&lt;void (__cdecl*)(IMVUError const</strong>
</pre>

<p>That last function name is pretty epic (as are most Boost or C++ function names), but notice that the call stack has accurate file names and line numbers.</p>

<p>The astute reader might ask &#8220;Don&#8217;t minidumps contain stack traces too?&#8221;  The answer is yes, but minidumps are often inconvenient.  Consider the common case:</p>

<ol>
<li>Open crash report</li>
<li>Download mini.dmp to the desktop</li>
<li>Open mini.dmp in Visual Studio</li>
<li>Press F11</li>
<li>Open the call stack debug window if it&#8217;s not open</li>
</ol>

<p>With CallStack, we can shorten that to</p>

<ol>
<li>Open crash report</li>
<li>Copy the call stack</li>
<li>Run symbol_dump.py</li>
</ol>

<p>Also, for reasons I don&#8217;t understand, sometimes Visual Studio fails to produce an informative stack when CallStack succeeds.</p>

<p>CallStack is a handy tool for debugging crashes from the wild, and I&#8217;m happy that we were able to make it available.</p>

]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/04/imvus-callstack-api-now-open-source/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>IMVU Crash Reporting: Plugging the VC++ Runtime&#8217;s Escape Hatches</title>
		<link>http://chadaustin.me/2009/04/imvu-crash-reporting-plugging-the-vc-runtimes-escape-hatches/</link>
		<comments>http://chadaustin.me/2009/04/imvu-crash-reporting-plugging-the-vc-runtimes-escape-hatches/#comments</comments>
		<pubDate>Sun, 05 Apr 2009 23:10:41 +0000</pubDate>
		<dc:creator>Chad Austin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[crashes]]></category>
		<category><![CDATA[imvu]]></category>

		<guid isPermaLink="false">http://aegisknight.org/?p=1306</guid>
		<description><![CDATA[Modern software is built on top of piles of third-party components.  The holy grail of reusable code has manifested in the form of an open source ecosystem.  That said, in a startup, you rarely have time to audit the third-party components you use.  These third-party components might be ridiculous enough to call [...]]]></description>
			<content:encoded><![CDATA[<p>Modern software is built on top of piles of third-party components.  The holy grail of reusable code has manifested in the form of an open source ecosystem.  That said, in a startup, you rarely have time to audit the third-party components you use.  These third-party components might be ridiculous enough to call <code>abort()</code> on error.  It may sound scary, but with a fixed amount of work, you can turn these calls into reported structured exceptions.</p>

<p>Unfortunately, the Visual C++ Runtime provides several functions that abnormally terminate the process without running our crash handlers.  As a bonus, they usually include user-incomprehensible dialog boxes.  Let&#8217;s see:</p>

<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/k089yyh0(VS.80).aspx">abort()</a></li>
<li>&#8220;pure virtual&#8221; method calls</li>
<li>throwing an exception from a destructor during unwind</li>
<li>stack buffer overflow (with <a href="http://msdn.microsoft.com/en-us/library/8dbf701c(VS.80).aspx">/GS</a> enabled)</li>
<li>standard C++ library index/iterator error with <a href="http://msdn.microsoft.com/en-us/library/aa985965(VS.80).aspx">checked iterators</a> enabled</li>
</ul>

<p>Since you can never prove that you&#8217;ve implemented crash reporting for every way a third-party component can bypass your crash reporting, I&#8217;m just going to cover the ones we&#8217;ve implemented:</p>

<h2>abort()</h2>

<div class="wp-caption aligncenter" style="width: 518px"><img alt="abort()" src="http://www.groovypost.com/wp-content/uploads/2009/03/image-thumb101.png" title="abort()" width="508" height="289" /><p class="wp-caption-text">Result of calling abort()</p></div>

<p>Turning <code>abort()</code> into a structured exception is pretty straightforward.  A quick read of the CRT source shows that <code>abort()</code> runs SIGABRT&#8217;s installed signal handler.  It&#8217;s easy enough to install a custom handler that raises a structured exception:</p>

<pre>
void __cdecl onSignalAbort(int code) {
    // It's possible that this signal handler will get called twice
    // in a single execution of the application.  (On multiple threads,
    // for example.)  Since raise() resets the signal handler, put it back.
    signal(SIGABRT, onSignalAbort);

    RaiseException(EXC_SIGNAL_ABORT, 0, 0, 0);
}

...

// at program start:
signal(SIGABRT, onSignalAbort);
</pre>

<h2>&#8220;Pure Virtual&#8221; Method Calls</h2>

<div class="wp-caption aligncenter" style="width: 468px"><img alt="Pure virtual function call" src="http://www.microsoftvisualcruntimelibrary.com/wp-content/uploads/microsoft-visual-c-runtime-library-runtime-error-screenshot.gif" title="Pure virtual function call" width="458" height="227" /><p class="wp-caption-text">Pure virtual function call</p></div>

<p>Ever see a program fail with that useless &#8220;pure virtual function call&#8221; error message?  This happens when a base class&#8217;s constructor tries to call a pure virtual method implemented by a derived class.  Since base class constructors run before derived class constructors, the compiler fills the vtable for the derived class with references to _purecall, a function normally defined by the CRT.  _purecall() aborts the process, sidestepping our crash reporting.  Code might better elucidate this situation:</p>

<pre>

struct Base;
void foo(Base* b);

struct Base {
    Base() {
        foo(this);
    }
    virtual void pure() = 0;
};
struct Derived : public Base {
    void pure() { }
};

void foo(Base* b) {
    b-&gt;pure();
}

Derived d; // boom
</pre>

<p>The fix is simple: just define a <code>_purecall</code> that shadows the CRT implementation:</p>

<pre>
int __cdecl _purecall() {
    RaiseException(EXC_PURE_CALL, 0, 0, 0);
    return 0;
}
</pre>

<h2>Throwing an Exception from a Destructor During Unwind</h2>

<p>C++ is aggressive about making sure you don&#8217;t throw an exception while another exception is in the air.  If you do, its default behavior is to terminate your process.  From <a href="http://msdn.microsoft.com/en-us/library/6dekhbbc(VS.80).aspx">MSDN</a>: <em>If a matching handler is still not found, or if an exception occurs while unwinding, but before the handler gets control, the predefined run-time function <b>terminate</b> is called. If an exception occurs after throwing the exception, but before the unwind begins, <b>terminate</b> is called.</em></p>

<p>To convert calls to <code>terminate()</code> (and <code>unexpected()</code>, for completeness) into structured exceptions, override the terminate handler with <a href="http://msdn.microsoft.com/en-us/library/ycf93beb(VS.80).aspx">set_terminate</a> (and <a href="http://msdn.microsoft.com/en-us/library/7twc8dwy(VS.80).aspx">set_unexpected</a>):</p>

<pre>
void onTerminate() {
    RaiseException(EXC_TERMINATE, 0, 0, 0);
}
void onUnexpected() {
    RaiseException(EXC_UNEXPECTED, 0, 0, 0);
}

// at program start:
set_terminate(onTerminate);
set_unexpected(onUnexpected);
</pre>

<h2>Standard C++ Library Index/Iterator Error with Checked Iterators Enabled</h2>

<p>The IMVU client is compiled with <code><a href="http://msdn.microsoft.com/en-us/library/aa985965(VS.80).aspx">_SECURE_SCL</a></code> enabled.  Increased reliability from knowing exactly where failures occur is more important than the very minor performance hit of validating all iterator accesses.</p>

<p>There are two ways to convert invalid iterator uses into reported exceptions.  The easiest is compiling with <code>_SECURE_SCL_THROWS=1</code>.  Otherwise, just install your own invalid_parameter handler with <code><a href="http://msdn.microsoft.com/en-us/library/a9yf33zb.aspx">_set_invalid_parameter_handler</a></code>.</p>

<h2>Stack Buffer Overflow (with /GS enabled)</h2>

<p>By default, Visual C++ generates code that detects and reports stack buffer overruns.  This prevents a common class of application security holes.  Unfortunately, the stock implementation of this feature does not allow you to install your own handler, which means you can&#8217;t report any buffer overruns.</p>

<p>Again, we can shadow a CRT function to handle these failures.  From <code>C:\Program Files\Microsoft Visual Studio 8\VC\crt\src\gs_report.c</code>, copy the <code>__report_gsfailure</code> function into your application.  (You did install the CRT source, didn&#8217;t you?)  Instead of calling <code>UnhandledExceptionFilter</code> at the bottom of <code>__report_gsfailure</code>, call your own last-chance handler or write a minidump.</p>

<h2>Testing Crash Reporting</h2>

<p>Writing tests for the above reporting mechanisms is super fun.  Take everything you&#8217;re told <em>not</em> to do and implement it.  I recommend adding these crashes to your UI somewhere so you can directly observe what happens when they occur in your application.  Here is our crash menu:</p>

<div id="attachment_1308" class="wp-caption aligncenter" style="width: 645px"><img src="http://aegisknight.org/wp-uploads/crashmenu.png" alt="Crash Menu" title="Crash Menu" width="635" height="947" class="size-full wp-image-1308" /><p class="wp-caption-text">Crash Menu</p></div>

<p>These types of failures are rare, but when we hit them, I&#8217;m glad we implemented this reporting.</p>
]]></content:encoded>
			<wfw:commentRss>http://chadaustin.me/2009/04/imvu-crash-reporting-plugging-the-vc-runtimes-escape-hatches/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
