In a large Python program such as IMVU, startup time is dominated by Python module imports. Take these warm timings:
$ time python -c 'None' real 0m0.096s user 0m0.077s sys 0m0.061s $ time python -c 'import urllib2' real 0m0.314s user 0m0.155s sys 0m0.186s
That’s 300ms for a single basic dependency. Importing the entire IMVU client takes 1.5s warm and 20s cold on a typical user’s machine.
The IMVU client’s loading progress bar imports modules bottom-up; that is, leaf modules are imported before their parents. The root module is imported last.
Implementing a bottom-up import sequence requires generating a graph of dependencies between modules:
def get_dependencies(module_name):
"""\
Takes a module name as input (e.g. 'xml.dom') and returns a set of
(lhs, rhs) tuples where lhs and rhs are module names and lhs
imports rhs.
"""
# module_from_key is a dict from a module key, an arbitrary
# object, to a module object. While importing, we discover
# dependencies before we have access to the actual module objects.
# import_dependencies is a list of (lhs, rhs) tuples where lhs and
# rhs are module keys, and module_from_key[lhs] imported
# module_from_key[rhs].
root_key = object()
module_from_key = {root_key: __main__}
import_dependencies = []
stack = [root_key]
def import_in_stack(key, name, globals, locals, fromlist, level):
stack.append(key)
try:
return original_import(name, globals, locals, fromlist, level)
finally:
stack.pop()
import __builtin__
original_import = __builtin__.__import__
def my_import(name, globals=globals(), locals=locals(), fromlist=[], level=-1):
# fromlist is a whore. Most of the complexity in this
# function stems from fromlist's semantics. See
# http://docs.python.org/library/functions.html#__import__
# If a module imports 'xml.dom', then the module depends on
# both 'xml' and 'xml.dom' modules.
dotted = name.split('.')
for i in range(1, len(dotted)):
my_import('.'.join(dotted[0:i]), globals, locals, [], level)
module_key = object()
parent_key = stack[-1]
def add_dependency_from_parent(key, m):
module_from_key[key] = m
import_dependencies.append((parent_key, key))
submodule = import_in_stack(module_key, name, globals, locals, ['__name__'], level)
add_dependency_from_parent(module_key, submodule)
for f in (fromlist or []):
from_key = object()
module = import_in_stack(from_key, name, globals, locals, [f], level)
if f == '*':
continue
submodule = getattr(module, f)
if isinstance(submodule, types.ModuleType):
add_dependency_from_parent(from_key, submodule)
return original_import(name, globals, locals, fromlist, level)
# Import module_name with import hook enabled.
original_modules = sys.modules.copy()
__builtin__.__import__ = my_import
try:
my_import(module_name)
finally:
__builtin__.__import__ = original_import
sys.modules.clear()
sys.modules.update(original_modules)
assert stack == [root_key], stack
return sorted(set(
(module_from_key[lhs].__name__, module_from_key[rhs].__name__)
for lhs, rhs in import_dependencies
))
(You can view all of the code at SourceForge).
First, we install an __import__ hook that discovers import dependencies between modules, and convert those relationships into a directed graph:
Then, we merge cycles. If module A imports B, B imports C, and C imports A, then it doesn’t matter which you import first. Importing A is equivalent to importing B or C. After this step, we have a DAG:
Finally, we can postorder traverse the DAG to determine a good import sequence and cost (approximated as the number of modules in the cycle) per import:
1 xml 3 xml.dom 1 copy_reg 1 types 1 copy 1 xml.dom.NodeFilter 1 xml.dom.xmlbuilder 1 xml.dom.minidom 1 __main__
Now let’s look at some less-trivial examples. urllib2:
SimpleXMLRPCServer:
Final notes: Other people have solved this problem with bytecode scanning, but we wanted to know the actual modules imported for an accurate progress bar. A simpler __import__ hook could have calculate the correct import sequence, but I find a visual representation of module dependencies to have additional benefits.













Add to Google Reader
(RSS for the rest of us)