Since joining IMVU, I have had two people tell me "Profilers (especially VTune) suck. They don't help you optimize anything." That didn't make sense to me: how could a tool that gives such amazingly detailed data about the execution of your program be useless?

Now, these are two very smart people, so I knew there had to be some truth in what they were saying. I just had to look for it. At least a year later, after I'd dramatically improved the performance of some key subsystems in the IMVU client, I realized what they meant. They should have said: "Don't just run a profiler to find out which functions are taking the most time, and make them execute faster. Take a global approach to optimization. Prevent those functions from being called at all."

There you have it: take a global approach to optimization. But how does that work? First, let me ramble a bit about the benefits of performance.

There are two types of features:

  1. interactive
  2. not interactive (i.e. slow)

Searching on Google, opening a Word document, and firing a gun in Team Fortress 2 are all interactive.

Compressing large files, factoring large primes, and downloading HD movies are not.

We wish all features were interactive, but computers can't do everything instantly. Sometimes, however, a feature switches from non-interactive to interactive, to dramatic effect. Remember way back? Before YouTube? Downloading videos took forever, and probably wouldn't even play. YouTube made video consumption so fast and so easy that it changed the shape of the internet. Similarly, thanks to Google, it's faster to search the internet for something than it is to search your own hard drive in Windows XP.

Anyway, if you truly want to make something as fast as it can be, you need to think like this:

  • What's the starting state A?
  • What's the ending state B?
  • What's the minimal set of operations to get from A to B, and how do I execute them as fast as possible?

Optimizing your existing, naive code won't get you there. You'll have to build your application around these goals. There's plenty of room for out-of-the-box thinking here. Take MacOS X's resumption-from-hibernation feature:

  • The starting state: the computer is off, the memory is saved to disk.
  • The ending state: the computer is on and the user is productive.

MacOS X takes advantage of the fact that this is not purely a technology problem. The user has to remember what they were doing and become reattached to the computer. Thus, they show you a copy of what was on your screen last to remind you what was happening while the computer prepares for your actions. Opportunities for this kind of parallelism abound: why is it that operating system installers ask you questions, download packages, and install them serially? There is dramatic room for improvement there.

I don't claim that IMVU's website is the fastest website out there, but here's an example of a type of optimization that takes the whole picture into account: when you start loading a page on IMVU.com, it optimistically fetches hundreds of keys from our memcache servers, before even looking up your customer information. It's probable that you'll need many of those keys, and it's faster to get them all at once than to fetch them as you need them.

Someday, I hope to apply this global optimization approach to a software build system (a la Make, SCons, MSBuild). It's insane that we don't have a build system with all of the flexibility of SCons and instantaneous performance. Sure, the first build may need to scan for dependencies, but there's no reason that a second build couldn't reuse the information from the first build and start instantaneously. Just have a server process that watches for changes to files and updates the internal dependency graph. On large projects, I've seen SCons take minutes to figure out that it has to rebuild one file, which is simply crazy.

When optimizing a feature, take the user into consideration, and write down the minimum set of steps between the starting state and ending state. Execute those steps as fast as you can, and run them in parallel if it helps. If technology has advanced enough, maybe you have just transformed something non-interactive into an interactive feature.