Thinking About Performance - Notes
I gave an internal presentation at Dropbox (sorry, video is not sharable publicly) about engineering software for performance. Here are a bunch of resources that went into the presentation.
Similar Presentations
- Professor Bill Gropp
- I wrote a related article before joining Dropbox
- Rico Mariani (Microsoft)'s performance blog
- Typical C++ Bullshit - A hilarious presentation by Mike Acton (Insomniac Games) about how OOP and C++ are not a good fit for high-performance code.
- Dan Luu
The Economics of Performance
- The Economic Value of Rapid Response Time
- Latency is Everywhere and it Costs You Sales - How to Crush it - many supporting links in here
Humans
Vision
Touch
- Tactile vibration
- Human sensor for light touch (wikipedia)
- Human tactile detection thresholds: modification by inputs from specific tactile receptor classes
Reaction Times
Response Time
- System Response Time and User Satisfaction: An Experimental Study of Browser-based Applications
- Response Time and Display Rate in Human Performance with Computers
- Response Times: The 3 Important Limits
- Typing with Pleasure
Perception of Time
- Actual Performance, Perceived Performance
- Mental chronometry
- Manipulating perceived duration in progress bars
Computers
Latency
- Byuu's awesome discussion of latency in emulators
- Operation Costs in CPU Clock Cycles
- Geostationary satellite latency
- Latency and the Quest for Interactivity
- Latency numbers every programmer should know
Examples of High-Performance Code
- SpookyHash - V2 inner hash body
- Cal3D skeletal animation skinning loop - fits in registers
- CellPerformance
CPU architecture
In the talk I intentionally left out some detail - technically the branch predictor and branch target predictor are different things.
Agner Fog has amazing resources for CPU optimization, including his famous x86 instruction tables. It's helpful to scan the latencies and reciprocal throughputs of common instructions.
- Instruction tables
- Microarchitecture guide contains details on branch predictors, memory access, caching, the instruction decoder, etc.
Haswell
- Architecture details
- Anandtech 1
- Anandtech 2
- Architecture diagram (and other things)
- Wikipedia entry)
- realworldtech.com overview
Apple A9
Caches, Memory, and Atomics
- rygorous's excellent cache coherency primer
- rygorous's explanation of the workings and costs of atomics
- Evaluating the cost of atomic operations on modern architectures
- Atomics: Odds & Ends
- What is the true cost/performance of atomic operations?
- Nonblocking algorithms and scalable multicore programming
- CAS latency
Memory Bandwidth
Branch Prediction
- Fast and slow if-statements: branch prediction in modern processors
- Branch prediction class notes from UWash
- Great example of improving speed by sorting the data
- Binary Search eliminates Branch Mispredictions by Paul Khuong