Copyright 2007-2008 Nanorex, Inc. See LICENSE file for details. # $Id$ This describes how to use various tools to manage import statements and to detect and analyze dependency loops created by those imports. Summary of steps for a routine check: For routine, incremental checks of the import cycle graph, it's sufficient to run the following two commands in /bin/sh, which are elaborated on below: $ tools/PackageDependency.py `tools/AllPyFiles.sh` --justCycles > depend.dot $ tools/splitDependDot.py > anotheroutputfile Full explanation, and steps for more complete checks: The first step is to make sure that all the necessary import statements are in the code, and that they point to the right locations. The tools involved are: * AllPyFiles.sh This script finds all .py files in the source tree under the point it is executed at, and filters out files that we're not interested in. Things which are known to not be included in release builds can be removed from consideration at this stage. Keeping some of these files can be a good idea, though. It can help catch code which might have been referencing a symbol in a no-longer used file, which appears with a different value elsewhere. * FindPythonGlobals.py This script analyzes a list of python files (generated by AllPyFiles) and emits a list of all of the global symbols defined in those files. * FindExternalImports.py This script takes a list of python files and emits a list of all packages that the set of files imports from outside of itself. So if the set is [A.py, B.py], and A imports B and C, and B imports D, the result would be C and D. * SymbolsInPackage.py This script takes the package list generated by FindExternalImports and emits a list of all of the symbols defined in those packages, in the same format as FindPythonGlobals. When those two lists are concatenated, they give the locations of all globals the processed python files could possibly reference. * ResolveGlobals.py Reads the dataset produced by FindPythonGlobals and SymbolsInPackage and uses the data in one of several ways. It can look at just the dataset itself and emit information about duplicate definitions of symbols within the dataset. These are the symbols which could be confused with each other. It can also analyze a set of import statements to verify that they are importing symbols from their actual definition location, and not indirectly. Finally, it can parse the output of pychecker, converting its warnings about undefined globals into the appropriate import statements to resolve those warnings. * pychecker An external tool to perform static checking on python source code. Here we just use it to find a list of global symbols which are not defined within a particular source file. First, make sure you are working on the right copy of the source. Do a cvs update and make sure there are no merge conflicts. Next, create a list of global symbols defined in those files: $ tools/FindPythonGlobals.py `tools/AllPyFiles.sh` > allglobalsymbols Next, add the externally defined symbols to this list. Note we are writing to allglobalsymbols in append mode, via >>: $ tools/FindExternalImports.py `tools/AllPyFiles.sh` | tools/SymbolsInPackage.py >> allglobalsymbols This will send to stderr a list of modules which were imported by files in allpyfiles, but which could not be imported by SymbolsInPackage. If allpyfiles includes files which don't actually work, improper imports from them would show up here. If any programs change the import search path, that could also cause a problem here. It could also be affected by running this in a directory other than the one the program will be actually running in. Next, check the symbol list for duplicates: $ tools/ResolveGlobals.py allglobalsymbols --duplicates > dups Examine the resulting file. Duplicates are not necessarily a problem, but you may notice something which will help later. If you know some symbol values are identical, or perhaps that some should never be used, you can remove them from allglobalsymbols. For example, PyQt4.Qt contains a complete copy of the symbols in PyQt4.QtGui and PyQt4.QtCore. Removing one of these sets will later allow ResolveGlobals to print an import statement for a missing Qt call, rather than saying it is ambiguous. Next, run pychecker on each file. Note that pychecker can fail on a file before actually processing it if it cannot import it. The failure may happen several levels deep in imports, and pychecker doesn't always tell you where the actual problem was. When this happens, you can try loading the file directly into python: $ python problemfile.py and you may get a better error message. To do the whole set in one batch: $ for i in `tools/AllPyFiles.sh`; do pychecker $i; done > pycheckerstdout 2> pycheckerstderr This can take a while. Examine the output. Wherever you find the string "NOT PROCESSED UNABLE TO IMPORT" you should have also gotten a message printed on stderr. Fix these and rerun the above command. Look for "No global (...) found" messages. These indicate that pychecker was unable to resolve a global symbol, so you'll need to add an import statement for it. Or, it could indicate a bug that needs to be fixed. To determine the appropriate import statement, run: $ pychecker problemfile.py | tools/ResolveGlobals.py allglobalsymbols Which will print out import...from statements for symbols it finds. It prints a list of possible modules if the symbol appeared in the duplicates list you examined above. If the symbol is not found, it just prints "import ". When pychecker is happy that all global symbols are defined, you can check to make sure everything is imported directly from the module it is defined in: $ grep '^[[:space:]]*from.*import' `tools/AllPyFiles.sh` | tools/ResolveGlobals.py allglobalsymbols --check-import > checkimportoutput Examine the output. Any remaining import *'s will be flagged, as will import lines that end in a backslash. Those should be removed, so that the grep will find all symbol imports. Lines in the output which say "can't check up on file..." mean that symbol is not in the allglobalsymbols file. Some of these may not be problems, like if you removed the symbol on purpose. Lines including the string "elsewhere imported from" indicate that a symbol is being imported from multiple sources in different modules. Symbols which could be imported from one of several places have all potential sources listed. When everything above has been fixed, the import statements should accurately reflect the import dependencies between all modules. At this point, it's time to try graphing that structure. $ tools/PackageDependency.py `tools/AllPyFiles.sh` --justCycles > depend.dot (Optionally add "2> packageloopcounts" to capture the package loop counts from stderr, but this may hide error messages printed to stderr, including some we are about to add for imports with continuation lines or without fully qualified module names. BTW the loop counts are no longer very useful, so they might be removed or made optional. [--bruce 072126 update]) If you have the GraphViz package installed, the results can be plotted with: $ dot -Tpng depend.dot > depend.png (To color-code the nodes by the tentative package assignments hardcoded into PackageDependency.py, generate depend.dot with the additional option --colorPackages.) (To split depend.dot into disjoint connected subgraphs, assuming it was made using the --justCycles option, run % tools/splitDependDot.py > anotheroutputfile after making it. I don't know if that file is suitable input to GraphViz, but the individual digraphs within it should be.)