Reporting Crashes in IMVU: C++ Call Stacks

Last time, we talked about including contextual information to help us
actually fix crashes that happen in the field. Minidumps are a great
way to easily save a snapshot of the most important parts of a running
(or crashed) process, but it’s often useful to understand the
low-level mechanics of a C++ call stack (on x86). Given some basic
principles about function calls, we will derive the implementation
of code to walk a call stack.

C++ function call stack entries are stored on the x86 stack, which
grows downward in memory. That is, pushing on the stack subtracts
from the stack pointer. The ESP register points to the
most-recently-written item on the stack; thus, push eax
is equivalent to:

sub esp, 4
mov [esp], eax

Let’s say we’re calling a function:

int __stdcall foo(int x, int y)

The __stdcall
calling convention pushes arguments onto the stack from right to left
and returns the result in the EAX register, so calling
foo(1, 2) generates this code:

push 2
push 1
call foo
; result in eax

If you aren’t familiar with assembly, I know this is a lot to absorb,
but bear with me; we’re almost there. We haven’t seen the
call instruction before. It pushes the EIP
register, which is the return address from the called function onto
the stack and then jumps to the target function.
If we didn’t store the instruction pointer, the called function would
not know where to return when it was done.

The final piece of information we need to construct a C++ call stack is
that functions live in memory, functions have names, and thus sections
of memory have names. If we can get access to a mapping of memory
addresses to function names (say, with the /MAP
linker option
), and we can read instruction pointers up the call
stack, we can generate a symbolic stack trace.

How do we read the instruction pointers up the call stack?
Unfortunately, just knowing the return address from the current
function is not enough. How do you know the location of the caller’s
caller? Without extra information, you don’t. Fortunately, most
functions have that information in the form of a function prologue:

push ebp
mov ebp, esp

and epilogue:

mov esp, ebp
pop ebp

These bits of code appear at the beginning and end of every function, allowing you
to use the EBP register as the “current stack frame”.
Function arguments are always accessed at positive offsets from EBP,
and locals at negative offsets:

; int foo(int x, int y)
; ...
[EBP+12] = y argument
[EBP+8]  = x argument
[EBP+4]  = return address (set by call instruction)
[EBP]    = previous stack frame
[EBP-4]  = local variable 1
[EBP-8]  = local variable 2
; ...

Look! For any stack frame EBP, the caller’s address is
at [EBP+4] and the previous stack frame is at [EBP].
By dereferencing EBP, we can walk
the call stack, all the way to the top!

struct stack_frame {
    stack_frame*  previous;
    unsigned long return_address;

std::vector<unsigned long> get_call_stack() {
    std::vector<unsigned long> call_stack;

    stack_frame* current_frame;
    __asm mov current_frame, ebp

    while (!IsBadReadPtr(current_frame, sizeof(stack_frame))) {
        current_frame = current_frame->previous;
    return call_stack;

// Convert the array of addresses to names with the aforementioned MAP file.

Yay, now we know how to grab a stack trace from any location in the
code. This implementation is not robust, but the concepts are
correct: functions have names, functions live in memory, and we can
determine which memory addresses are on the call stack. Now that you
know how to manually grab a call stack, let Microsoft do the heavy
lifting with the StackWalk64

Next time, we’ll talk about setting up your very own Microsoft Symbol Server so you can
grab accurate function names from every version of your software.

Reporting Crashes in IMVU: Call Stacks and Minidumps

So far, we’ve implemented reporting for Python exceptions that bubble
out of the main loop
, C++ exceptions that bubble into Python (and then
out of the main loop), and structured exceptions that bubble into
(and then out of the main loop.) This is a fairly
comprehensive set of failure conditions, but there’s still a big piece
missing from our reporting.

Imagine that you implement this error reporting and have customers try
the new version of your software. You’ll soon have a collection of
crash reports, and one thing will stand out clearly. Without the
context in which crashes happened (call stacks, variable values,
perhaps log files), it’s very hard to determine their cause(s). And
without determining their cause(s), it’s very hard to fix them.

Reporting log files are easy enough. Just attach them to the error
report. You may need to deal with privacy concerns or limit the size
of the log files that get uploaded, but those are straightforward

Because Python has batteries
, grabbing the call stack from a Python exception is
trivial. Just take a quick look at the traceback

Structured exceptions are a little harder. The structure of a call
stack on x86 is machine- and sometimes compiler-dependent.
Fortunately, Microsoft provides an API to dump the relevant process
state to a file such that it can be opened in Visual
or WinDbg,
which will let you view the stack trace and select other data. These
files are called minidumps, and they’re pretty small. Just call MiniDumpWriteDump
with the context of the exception and submit the generated file with your crash

Grabbing a call stack from C++ exceptions is even harder, and maybe
not desired. If you regularly use C++ exceptions for communicating
errors from C++ to Python, it’s probably too expensive to grab a call
stack or write a minidump every single time. However, if you want to
do it anyway, here’s one way.

C++ exceptions are implemented on top of the Windows kernel’s
structured exception machinery. Using the try and
catch statements in your C++ code causes the compiler to
generate SEH code behind the scenes. However, by the time your C++
catch clauses run, the stack has already been unwound. Remember
that SEH has three passes: first it runs filter expressions until it
finds one that can handle the exception; then it unwinds the stack
(destroying any objects allocated on the stack); finally it runs the
actual exception handler. Your C++ exception handler runs as the last stage,
which means the stack has already been unwound, which means you can’t
get an accurate call stack from the exception handler. However, we
can use SEH to grab a call stack at the point where the exception was
thrown, before we handle it…

First, let’s determine the SEH exception code of C++ exceptions
(WARNING, this code is compiler-dependent):

int main() {
    DWORD code;
    __try {
        throw std::exception();
    __except (code = GetExceptionCode(), EXCEPTION_EXECUTE_HANDLER) {
        printf("%X\n", code);

Once we have that, we can write our exception-catching function like

void throw_cpp_exception() {
    throw std::runtime_error("hi");

bool writeMiniDump(const EXCEPTION_POINTERS* ep) {
    // ...
    return true;

void catch_seh_exception() {
    __try {
    __except (
        (CPP_EXCEPTION_CODE == GetExceptionCode()) && writeMiniDump(GetExceptionInformation()),
    ) {

int main() {
    try {
    catch (const std::exception& e) {
        printf("%s\n", e.what());

Now we’ve got call stacks and program state for C++, SEH, and Python
exceptions, which makes fixing reported crashes dramatically easier.

Next time I’ll go into more detail about how C++ stack traces work,
and we’ll see if we can grab them more efficiently.

Reporting Crashes in IMVU: Structured Exceptions

Previously, we discussed the implementation of automated reporting of
unhandled C++ exceptions
. However, if you’ve ever programmed in C++,
you know that C++ exceptions are not the only way your code can fail.
In fact, the most common failures probably aren’t C++ exceptions at
all. You know what I’m referring to: the dreaded access violation
(sometimes called segmentation fault).

Access Violation

How do we detect and report access violations? First, let’s talk
about what an access violation actually is.

Your processor has a mechanism for detecting loads and stores from
invalid memory addresses. When this happens, it raises an interrupt,
which Windows exposes to the program via Structured Exception Handling
(SEH). Matt Pietrek has written an excellent article on how
SEH works
, including a description of C++ exceptions implemented
on top of SEH. The gist is that there is a linked list of stack
frames that can possibly handle the exception. When an exception
occurs, that list is walked, and if an entry claims it can handle it,
it does. Otherwise, if no entry can handle the exception, the program
is halted and the familiar crash dialog box is displayed to the user.

OK, so access violations can be detected with SEH. In fact, with the
same mechanism, we can detect all other types of structured
exceptions, including division by zero and stack overflow. What does
the code look like? It’s approximately:

bool handle_exception_impl_seh(function f) {
    __try {
        // This is the previously-described C++ exception handler.
        // For various reasons, they need to be in different functions.
        // C++ exceptions are implemented in terms of SEH, so the C++
        // exception handling must be deeper in the call stack than
        // the structured exception handling.
        return handle_exception_impl_cpp(f);
    // catch all structured exceptions here
        PyErr_SetString(PyExc_RuntimeError, "Structured exception in C++ function");
        return true; // an error occurred

Note the __try and __except keywords. This means we’re using
structured exception handling, not C++ exception handling. The filter
expression in the __except statement evaluates to
EXCEPTION_EXECUTE_HANDLER, indicating that we always want to handle
structured exceptions. From the filter expression, you can optionally
use the GetExceptionCode
and GetExceptionInformation
intrinsics to access information about the actual error.

Now, if you write some code like:

Object* o = 0;
o->method(); // oops!

The error will be converted to a Python exception, and reported
with our existing mechanism. Good enough for now! However, there are
real problems with this approach. Can you think of them?

Soon, I’ll show the full implementation of the structured
exception handler.

Reporting Crashes in IMVU: Part II: C++ Exceptions

A year ago, I explained
how the IMVU client automatically reports unexpected Python exceptions
(crashes) to us. I intended that post to be the first of a long
series that covered all of the tricks we use to detect and report
abnormal situations. Clearly, my intentions have not played out yet,
so I am going to pick up that series by describing how we catch
exceptions that occur in our C++ code. Without further ado,

Reporting C++ Exceptions

As discussed earlier, IMVU’s error handling system can handle any
Python exception that bubbles out of the client’s main loop and
automatically report the failure back to us so that we can fix it for
the next release. However, our application is a
combination of Python and C++, so what happens if our C++ code has a
bug and raises an uncaught C++ exception, such as std::bad_alloc
or std::out_of_range?

Most of our C++ code is exposed to Python via the excellent
Boost.Python library, which automatically catches C++ exceptions at
the boundary and translates them to Python exceptions. The
translation layer looks something like this:

bool handle_errors(function fn) {
    try {
        return false; // no error
    catch (const std::runtime_error& e) {
        // raise RuntimeError into Python
        PyErr_SetString(PyExc_RuntimeError, e.what());
    catch (const std::bad_alloc&) {
        // raise MemoryError into Python
        PyErr_SetString(PyExc_MemoryError, "out of memory");
    catch (const std::exception& e) {
        // raise Exception into Python
        PyErr_SetString(PyExc_Exception, e.what());
    catch (...) {
        PyErr_SetString(PyExc_Exception, "Unknown C++ exception");
    return true;

Thus, any C++ exception that’s thrown by the C++ function is
caught by Boost.Python and reraised as the appropriate Python
exception, which will already be handled by the previously-discussed
crash reporting system.

Let’s take another look at the client’s main loop:

def mainLoop():
    while running:

def main():
        # includes exception type, exception value, and python stack trace
        error_information = sys.exc_info()
        if OK == askUserForPermission():

If the C++ functions called from updateAnimations() or redrawWindows()
raise a C++ exception, it will be caught by the Python error-handling
code and reported to us the same way Python
exceptions are.

Great! But is this a complete solution to the problem? Exercise
for the reader: what else could go wrong here? (Hint: we use Visual
Studio 2005 and there was a bug in catch (…) that Microsoft fixed in
Visual Studio 2008…)

Evaluating JavaScript in an Embedded XULRunner/Gecko Window

I intended to write something with more substance tonight, but I’m
exhausted from wrasslin’ with Gecko/XULRunner/SpiderMonkey in a
days-long marathon debugging session. None of you will understand
this entry, because its intent is to contain enough keywords and
content that others don’t have to go through the pain that I did.

If you’re embedding Gecko/XULRunner/SpiderMonkey into your
application, and you want to evaluate some JavaScript in the context
of an nsIDOMWindow or nsIWebBrowser, you’d think you’d have many
approaches. You could call JS_EvaluateScript or JS_EvaluateUCScript
directly, getting the JSContext from the nsIScriptContext and the
JSObject* global from the nsIScriptGlobalObject… However, I simply
could not get this to work: I kept running into crazy errors inside of
JS_InitArrayClass. I still don’t understand those errors.

People suggested using EvaluateString and EvaluateStringWithValue on
nsIScriptContext, but that failed in an empty window (I define empty
as not having called nsIWebNavigation::LoadURI) because it did not
have a security principal (nsIPrincipal). Eventually I learned that
you can grab the system principal from the nsIScriptSecurityManager
service and pass that directly to EvaluateStringWithValue. With a few
more minor details, this approach worked in all cases that we care
about so far!

Here is the final magic incantation:

typedef std::map<jsval, boost::python::object> ReferenceMap;

boost::python::object GeckoWindow::evalJavaScript(const std::wstring& js) {
    nsresult rv;

    nsCOMPtr<nsIPrincipal> principal;
    nsCOMPtr<nsIScriptSecurityManager> secMan = do_GetService(
    rv = secMan->GetSystemPrincipal(getter_AddRefs(principal));
    if (NS_FAILED(rv)) {
        throw GeckoError("Failed to get system principal");

    nsCOMPtr<nsIScriptGlobalObject> sgo = do_GetInterface(webBrowser);
    nsCOMPtr<nsIScriptContext> ctx = sgo->GetContext();

    JSContext* cx = reinterpret_cast<JSContext*>(ctx->GetNativeContext());
    uint32 previous = JS_SetOptions(

    jsval out;
    rv = ctx->EvaluateStringWithValue(
        nsString(, js.size()),

    JS_SetOptions(cx, previous);

    JSAutoRequest ar(cx);
    JSAutoLocalRootScope alrs(cx);


    if (NS_SUCCEEDED(rv)) {
        ReferenceMap references;
        return buildPythonObjectFromJsval(references, cx, out);
    } else {
        throw GeckoEvalUnknownError("eval failed with no exception set");

void GeckoWindow::maybeThrowPythonExceptionFromJsContext(JSContext* cx) {
    jsval exception;
    if (JS_GetPendingException(cx, &exception)) {
        ReferenceMap references;
        boost::python::object py_exc_value(buildPythonObjectFromJsval(
        throw GeckoEvalError(py_exc_value.ptr());

boost::python::object GeckoWindow::buildPythonObjectFromJsval(
    ReferenceMap& references,
    JSContext* cx,
    const jsval v
) {
    using namespace boost::python;

    if (v == JSVAL_TRUE) {
        return object(handle<>(Py_True));
    } else if (v == JSVAL_FALSE) {
        return object(handle<>(Py_False));
    } else if (v == JSVAL_NULL) {
        return object(handle<>(Py_None));
    } else if (v == JSVAL_VOID) {
        return object(handle<>(Py_None));
    } else if (JSVAL_IS_INT(v)) {
        return object(handle<>(PyInt_FromLong(JSVAL_TO_INT(v))));
    } else if (JSVAL_IS_NUMBER(v)) {
        return object(handle<>(PyFloat_FromDouble(*JSVAL_TO_DOUBLE(v))));
    // } else if (JSVAL_IS_STRING(v)) {
    } else if (JSVAL_IS_OBJECT(v)) {
        JSObject* obj = JSVAL_TO_OBJECT(v);

        if (references.count(v)) {
            return references[v];

        if (JS_IsArrayObject(cx, obj)) {
            list rv;
            references[v] = rv;
            jsuint length;
            if (JS_GetArrayLength(cx, obj, &length)) {
                jsval element;
                for (jsuint i = 0; i < length; ++i) {
                    if (JS_GetElement(cx, obj, i, &element)) {
                        rv.append(buildPythonObjectFromJsval(references, cx, element));
            return rv;
        } else {
            dict rv;
            references[v] = rv;

            JSObject* iterator = JS_NewPropertyIterator(cx, obj);
            if (!iterator) {
                throw GeckoEvalUnknownError("Error creating object property iterator while marshalling");
            for (;;) {
                jsid propertyName;
                if (!JS_NextProperty(cx, iterator, &propertyName)) {
                    throw GeckoEvalUnknownError("Error enumerating property list of object while marshalling");

                if (propertyName == JSVAL_VOID) {

                jsval propertyNameValue;
                jsval propertyValue;
                object k;

                if (!JS_IdToValue(cx, propertyName, &propertyNameValue)) {
                    throw GeckoEvalUnknownError("Error converting property name to jsval while marshalling");
                if (JSVAL_IS_INT(propertyNameValue)) {
                    jsint propertyIndex = JSVAL_TO_INT(propertyNameValue);
                    k = long_(propertyIndex);

                    if (!JS_LookupElement(cx, obj, propertyIndex, &propertyValue)) {
                        throw GeckoEvalUnknownError("Error looking up property value by index");
                } else if (JSVAL_IS_STRING(propertyNameValue)) {
                    JSString* kjsstr = JSVAL_TO_STRING(propertyNameValue);
                    std::wstring kstr(JS_GetStringChars(kjsstr), JS_GetStringLength(kjsstr));
                    k = object(kstr);

                    if (!JS_LookupUCProperty(cx, obj, kstr.c_str(), kstr.size(), &propertyValue)) {
                        throw GeckoEvalUnknownError("Error looking up property value by name");
                } else {
                    throw GeckoEvalUnknownError("Unknown property name type while marshalling");

                rv[k] = buildPythonObjectFromJsval(references, cx, propertyValue);
            return rv;
    } else {
        // We don't know what type it is, or we can't marshal it,
        // so convert it to a string and hope for the best...
        JSString* string = JS_ValueToString(cx, v);
        return str(std::wstring(JS_GetStringChars(string), JS_GetStringLength(string)));

Hope that helps, and Godspeed.

The Real Benefit of Inlining Functions (or: Floating Point Calling Conventions)

My mental model for the performance benefit of inlining a function call was:

  1. code size increases
  2. the overhead of the call, including argument and return value marshalling, is eliminated
  3. the compiler knows more information, so it can generate better code

I had dramatically underestimated the value of #3, so this entry is an attempt to give a concrete example of how inlining can help.

As alluded in my previous entry, you can’t just leave the floating point state willy nilly across function calls. Every function should be able to make full use of the floating point register stack, which doesn’t work if somebody has left stale values on it. In general, these rules are called calling conventions. Agner Fog has excellent coverage of the topic, as usual.

Anyway, back to inlining. The specifics aren’t that important, but we had a really simple function in the IMVU client which continued to show up in the profiles. It looked something like this:

std::vector<float> array;

float function() {
    float sum = 0.0f;
    for (size_t i = 0; i < array.size(); ++i) {
        sum += array[i];
    return sum;

This function never operated on very large lists, and it also wasn’t called very often, so why was it consistently in the profiles? A peek at the assembly showed (again, something like):

fstp dword ptr [sum] ; sum = 0.0

xor ecx, ecx ; i = 0
jmp cmp


push ecx
call array.operator[]

fadd [sum] ; return value of operator[] in ST(0)
fstp [sum] ; why the load and the store??

add ecx, 1


call array.size()
cmp ecx, eax
jb loop ; continue if i < return value

fld [sum] ; return value

First of all, why all of the function calls? Shouldn't std::vector be inlined? But more importantly, why does the compiler keep spilling sum out to the stack? Surely it could keep the sum in a floating point register for the entire calculation.

This is when I realized: due to the calling convention requirements on function calls, the floating point stack must be empty upon entry into the function. The stack is in L1 cache, but still, that's three cycles per access, plus a bunch of pointless load and store uops.

Now, I actually know why std::vector isn't inlined. For faster bug detection, we compile and ship with bounds checking enabled on STL containers and iterators. But in this particular situation, the bounds checking isn't helpful, since we're iterating over the entire container. I rewrote the function as:

std::vector<float> array;

float function() {
    const float* p = &array[0];
    size_t count = array.size();
    float sum = 0.0f;
    while (count--) {
        sum += *p++;
    return sum;

And the compiler generated the much more reasonable:

call array.size()
mov ecx, eax ; ecx = count

push 0
call array.operator[]
mov esi, eax ; esi = p

fldz ; ST(0) = sum

jmp cmp

fadd [esi] ; sum += *p

add esi, 4 ; p++
sub ecx, 1 ; count--

cmp ecx, 0
jne loop

; return ST(0)

This is the real benefit of inlining. Modern compilers are awesome at making nearly-optimal use of the CPU, but only when they have enough information. Inlining functions gives them that information.

p.s. I apologize if my pseudo-assembly had mistakes. I wrote it from memory.

#IND and #QNaN with /fp:fast

The other day Timothy and I were optimizing some floating-point-intensive lighting code. Looking at the generated code, I realized we weren’t compiling with /fp:fast. Due to the wonky state of floating point on 32-bit x86, Visual C++ frequently stores temporary results of floating point calculations to the stack and then reloads them, for the sake of consistent results.

See, the problem is that the floating point registers on x86 are 80 bits wide, so if you compile “float x, y, z, w; w = (x + y) * z” as…

fld [x]  ; ST0 = x
fadd [y] ; ST0 = x + y
fmul [z] ; ST0 = (x + y) * z
fstp [w] ; w = (x + y) * z

… the temporary results are always stored in ST0 with 80 bits of precision. However, since floats only have 32 bits of precision, you can wind up with different results depending on compilers, optimization settings, register allocation, etc. We often had problems like this at VRAC. Some poor engineering student would send out a panicked e-mail at 9:00 p.m. asking why his program started producing different results in release mode than it did in debug mode.

Thus, Visual C++ takes a more cautious approach. By default, it stores float intermediates back to memory to truncate them to 32 bits of precision:

fld [x]
fadd [y]
fstp [temp] ; truncate precision
fld [temp]
fmul [z]
fstp [w]

Tiny differences in precision don’t matter in IMVU, so enabling /fp:fast saved 50-100 CPU cycles per vertex in our vertex lighting loop. However, with this option turned on, our automated tests started failing with crazy #IND and #QNAN errors!

After some investigation, we discovered that our 4×4 matrix inversion routine (which calculates several 2×2 and 3×3 determinants) was using all 8 floating point registers with /fp:fast enabled. The x87 registers are stored in a “stack“, where ST0 is the top of the stack and STi is the i’th entry. Load operations like fld, fld1, and fldz push entries on the stack. Arithmetic operations like fadd and fmul operate on the top of the stack with the value in memory, storing the result back on the stack.

But what if the x87 register stack overflows? In this case, an “indefinite” NAN is loaded instead of the value you requested, indicating that you have lost information. (The data at the bottom of the stack was lost.) Here’s an example:

fldz  ; ST0 = 0
fld1  ; ST0 = 1, ST1 = 0
fldpi ; ST0 = pi, ST1 = 1, ST2 = 0
fldz  ; ST0-4 = 0, ST5 = pi, ST6 = 1, ST7 = 0
fldz  ; ST0 = IND!

Woops, there’s a bug in your code! You shouldn’t overflow the x87 register stack, so the processor has given you IND.

Indeed, this is what happened in our matrix inversion routine. But why?

Using a debugger, we determined that the x87 stack contained one value at the start of the function. Moreover, it contained a value at the start of the test! Something was fishy. Somebody was leaving the x87 stack dirty, and we needed to find out who.

void verify_x87_stack_empty() {
    unsigned z[8];
    __asm {
        fstp dword ptr [z+0x00]
        fstp dword ptr [z+0x04]
        fstp dword ptr [z+0x08]
        fstp dword ptr [z+0x0c]
        fstp dword ptr [z+0x10]
        fstp dword ptr [z+0x14]
        fstp dword ptr [z+0x18]
        fstp dword ptr [z+0x1c]

    // Verify bit patterns. 0 = 0.0
    for (unsigned i = 0; i < 8; ++i) {
        CHECK_EQUAL(z[i], 0);

The previous function, called before and after every test, discovered the culprit: we had a test that intentionally called printf() and frexp() with NaN values, which had the side effect of leaving the floating point stack in an unpredictable state.

Adding __asm emms to the end of the test fixed our problem: thereafter, /fp:fast worked wonderfully. Case closed.

Download IMVU’s Cal3D modifications

Hi all,

IMVU uses a lot of open source software on both our client software and the web site (frontend and backend). One of the packages we use is Cal3D, a 3D skeletal animation system. We’ve made a few changes to Cal3D over the last year or two, including support for morph targets, exporter UI improvements, an improved animation scheduler, vertex colors, and others. In the past, we’ve published our changes directly back to the Cal3D project, but this wasn’t strictly in accordance with the LGPL, under which Cal3D is licensed. So now you can easily download our version of Cal3D directly from our technology page!




Rather than sprinkle your functions with precondition asserts that arguments aren’t null, use the following, self-documenting type as part of the function’s signature:

#include <assert.h>
#include <stdio.h>

template<typename T>
class NotNull {
    NotNull(T object)
    : _object(object) {

    operator T() const {
        return _object;

    T _object;

void printNumber(NotNull<int*> arg) {
    printf("%d\n", *arg);

int main() {
    int i = 10;
    printNumber(&i);  // fine

    int* p = 0;
    printNumber(p);  // asserts

The Best Answer I Could Come Up With

Note that the only reason this requires C++ is for the operator signature syntactic sugar.

Update: Didn’t need the explicit selfRV construction.

#include <assert.h>
#include <stdio.h>

struct selfRV {
    typedef struct selfRV (*selfSignature)();
    selfRV(selfSignature ptr) : _ptr(ptr) { }
    operator selfSignature() const { return _ptr; }
    selfSignature _ptr;

selfRV self() {
    return self;

int main() {
    puts(self == self()()()()()()()
         ? "works"
         : "doesn't work");
    return 0;