Visualizing Python Import Dependencies
In a large Python program such as IMVU, startup time is dominated by Python module imports. Take these warm timings:
$ time python -c 'None' real 0m0.096s user 0m0.077s sys 0m0.061s $ time python -c 'import urllib2' real 0m0.314s user 0m0.155s sys 0m0.186s
That's 300ms for a single basic dependency. Importing the entire IMVU client takes 1.5s warm and 20s cold on a typical user's machine.
The IMVU client's loading progress bar imports modules bottom-up; that is, leaf modules are imported before their parents. The root module is imported last.
Implementing a bottom-up import sequence requires generating a graph of dependencies between modules:
def get_dependencies(module_name): """\ Takes a module name as input (e.g. 'xml.dom') and returns a set of (lhs, rhs) tuples where lhs and rhs are module names and lhs imports rhs. """ # module_from_key is a dict from a module key, an arbitrary # object, to a module object. While importing, we discover # dependencies before we have access to the actual module objects. # import_dependencies is a list of (lhs, rhs) tuples where lhs and # rhs are module keys, and module_from_key[lhs] imported # module_from_key[rhs]. root_key = object() module_from_key = {root_key: __main__} import_dependencies = [] stack = [root_key] def import_in_stack(key, name, globals, locals, fromlist, level): stack.append(key) try: return original_import(name, globals, locals, fromlist, level) finally: stack.pop() import __builtin__ original_import = __builtin__.__import__ def my_import(name, globals=globals(), locals=locals(), fromlist=[], level=-1): # fromlist is a whore. Most of the complexity in this # function stems from fromlist's semantics. See # http://docs.python.org/library/functions.html#__import__ # If a module imports 'xml.dom', then the module depends on # both 'xml' and 'xml.dom' modules. dotted = name.split('.') for i in range(1, len(dotted)): my_import('.'.join(dotted[0:i]), globals, locals, [], level) module_key = object() parent_key = stack[-1] def add_dependency_from_parent(key, m): module_from_key[key] = m import_dependencies.append((parent_key, key)) submodule = import_in_stack(module_key, name, globals, locals, ['__name__'], level) add_dependency_from_parent(module_key, submodule) for f in (fromlist or []): from_key = object() module = import_in_stack(from_key, name, globals, locals, [f], level) if f == '*': continue submodule = getattr(module, f) if isinstance(submodule, types.ModuleType): add_dependency_from_parent(from_key, submodule) return original_import(name, globals, locals, fromlist, level) # Import module_name with import hook enabled. original_modules = sys.modules.copy() __builtin__.__import__ = my_import try: my_import(module_name) finally: __builtin__.__import__ = original_import sys.modules.clear() sys.modules.update(original_modules) assert stack == [root_key], stack return sorted(set( (module_from_key[lhs].__name__, module_from_key[rhs].__name__) for lhs, rhs in import_dependencies ))
(You can view all of the code at SourceForge).
First, we install an __import__ hook that discovers import dependencies between modules, and convert those relationships into a directed graph:
Then, we merge cycles. If module A imports B, B imports C, and C imports A, then it doesn't matter which you import first. Importing A is equivalent to importing B or C. After this step, we have a DAG:
Finally, we can postorder traverse the DAG to determine a good import sequence and cost (approximated as the number of modules in the cycle) per import:
1 xml 3 xml.dom 1 copy_reg 1 types 1 copy 1 xml.dom.NodeFilter 1 xml.dom.xmlbuilder 1 xml.dom.minidom 1 __main__
Now let's look at some less-trivial examples. urllib2:
SimpleXMLRPCServer:
Final notes: Other people have solved this problem with bytecode scanning, but we wanted to know the actual modules imported for an accurate progress bar. A simpler __import__ hook could have calculate the correct import sequence, but I find a visual representation of module dependencies to have additional benefits.
Neat idea!
I'm curious: have you tried to postpone some of the imports to make the startup faster?
Yeah, we used to postpone some of the imports, but it turns out we need most of the graph to show the login box and initial experience anyway... I'd rather just get the load over with, and have the rest of the experience nice and snappy. :)