Visualizing Python Import Dependencies

In a large Python program such as IMVU, startup time is dominated by Python module imports. Take these warm timings:

$ time python -c 'None'

real    0m0.096s
user    0m0.077s
sys     0m0.061s

$ time python -c 'import urllib2'

real    0m0.314s
user    0m0.155s
sys     0m0.186s

That's 300ms for a single basic dependency. Importing the entire IMVU client takes 1.5s warm and 20s cold on a typical user's machine.

The IMVU client's loading progress bar imports modules bottom-up; that is, leaf modules are imported before their parents. The root module is imported last.

Implementing a bottom-up import sequence requires generating a graph of dependencies between modules:

def get_dependencies(module_name):
    """\
    Takes a module name as input (e.g. 'xml.dom') and returns a set of
    (lhs, rhs) tuples where lhs and rhs are module names and lhs
    imports rhs.
    """
    
    # module_from_key is a dict from a module key, an arbitrary
    # object, to a module object.  While importing, we discover
    # dependencies before we have access to the actual module objects.
    
    # import_dependencies is a list of (lhs, rhs) tuples where lhs and
    # rhs are module keys, and module_from_key[lhs] imported
    # module_from_key[rhs].

    root_key = object()
    module_from_key = {root_key: __main__}
    import_dependencies = []
    stack = [root_key]

    def import_in_stack(key, name, globals, locals, fromlist, level):
        stack.append(key)
        try:
            return original_import(name, globals, locals, fromlist, level)
        finally:
            stack.pop()

    import __builtin__
    original_import = __builtin__.__import__

    def my_import(name, globals=globals(), locals=locals(), fromlist=[], level=-1):
        # fromlist is a whore.  Most of the complexity in this
        # function stems from fromlist's semantics.  See
        # http://docs.python.org/library/functions.html#__import__
        
        # If a module imports 'xml.dom', then the module depends on
        # both 'xml' and 'xml.dom' modules.
        dotted = name.split('.')
        for i in range(1, len(dotted)):
            my_import('.'.join(dotted[0:i]), globals, locals, [], level)

        module_key = object()
        parent_key = stack[-1]

        def add_dependency_from_parent(key, m):
            module_from_key[key] = m
            import_dependencies.append((parent_key, key))

        submodule = import_in_stack(module_key, name, globals, locals, ['__name__'], level)
        add_dependency_from_parent(module_key, submodule)

        for f in (fromlist or []):
            from_key = object()
            module = import_in_stack(from_key, name, globals, locals, [f], level)
            if f == '*':
                continue
            submodule = getattr(module, f)
            if isinstance(submodule, types.ModuleType):
                add_dependency_from_parent(from_key, submodule)

        return original_import(name, globals, locals, fromlist, level)

    # Import module_name with import hook enabled.
    original_modules = sys.modules.copy()
    __builtin__.__import__ = my_import
    try:
        my_import(module_name)
    finally:
        __builtin__.__import__ = original_import
        sys.modules.clear()
        sys.modules.update(original_modules)

    assert stack == [root_key], stack

    return sorted(set(
        (module_from_key[lhs].__name__, module_from_key[rhs].__name__)
        for lhs, rhs in import_dependencies
    ))

(You can view all of the code at SourceForge).

First, we install an __import__ hook that discovers import dependencies between modules, and convert those relationships into a directed graph:

Then, we merge cycles. If module A imports B, B imports C, and C imports A, then it doesn't matter which you import first. Importing A is equivalent to importing B or C. After this step, we have a DAG:

Finally, we can postorder traverse the DAG to determine a good import sequence and cost (approximated as the number of modules in the cycle) per import:

1 xml
3 xml.dom
1 copy_reg
1 types
1 copy
1 xml.dom.NodeFilter
1 xml.dom.xmlbuilder
1 xml.dom.minidom
1 __main__

Now let's look at some less-trivial examples. urllib2:

SimpleXMLRPCServer:

imvu.task:

Final notes: Other people have solved this problem with bytecode scanning, but we wanted to know the actual modules imported for an accurate progress bar. A simpler __import__ hook could have calculate the correct import sequence, but I find a visual representation of module dependencies to have additional benefits.

Imported Comments [?]

Marius Gedminas on May 22, 2009

Neat idea!

I'm curious: have you tried to postpone some of the imports to make the startup faster?

Chad Austin on May 22, 2009

Yeah, we used to postpone some of the imports, but it turns out we need most of the graph to show the login box and initial experience anyway... I'd rather just get the load over with, and have the rest of the experience nice and snappy. :)