My Experience Using Static Analysis With Python

June 3, 2015

The short version: I now believe that automated static analysis is just as important as automated testing, and deserves a corresponding place in the development workflow.

Background

About a month ago, there was an interesting post on the python-dev mailing list, which began like this:

tldr; type hints in python source are scary.

I don't follow core Python development that closely. I dimly remembered something introduced Python 3 that lets you annotate the parameters and return values (PEP 3107) with arbitrary expressions. Type hints, it turns out, refers to a completely different proposal (PEP 484) to standardize the use of annotations, so that people can more easily build type-aware static analyzers for Python.

To be clear, static analysis is the process of analyzing a program's source code, and drawing conclusions as to its quality or correctness, without executing the program. It's a mainstay of compiled languages like C or C++; for example, if you try to assign a string pointer to a variable that's been declared as storing an integer, the compiler will complain.

But even without PEP 484 (or mypy, which inspired it), many lower-key static analyzers already exist for Python. They don't require any modification of source code, and promise to catch a wide range of programming and design errors. Since I'm new to static analysis, I thought it would be a good idea to give them a try.

My project is fine... right?

The code base I tried out is the current master branch of the "HDF5 for Python" (h5py) project, a widely-used library for accessing scientific data in the HDF5 file format.

Overall, I think h5py is a pretty typical open-source Python project, with the possible exception that it uses a lot of code written in Cython (~15k lines). Apart from Cython itself, I am not aware of any tools which can do static analysis on Cython code. So this experiment focused on the h5py "high-level" interface, which is about 3k lines of pure Python code. A nice, small, easily-manageable product one would expect to be in pretty good shape.

We started the h5py project in 2008, so it's a mature code base, with support for Python 2.6 through 3.4. Travis CI runs the h5py test suite automatically when a pull request is opened on GitHub. If the tests don't pass, the pull request can't be merged. Basic coding standards are enforced during code review as part of the pull-request process; docstrings are required, as are appropriate comments, and unwieldy or unmaintainable constructs are not permitted.

Well, maybe

The first tool I used was pyflakes, which is advertised like this:

Pyflakes makes a simple promise: it will never complain about style, and it will try very, very hard to never emit false positives.

Sounds like a good place to start. I ran it over the "high-level" subpackage in h5py and here's the result:

h5py/_hl/attrs.py:13: 'collections' imported but unused
h5py/_hl/base.py:13: 'warnings' imported but unused
h5py/_hl/dataset.py:257: local variable 'dcpl' is assigned to but never used
h5py/_hl/dataset.py:544: undefined name 'h5y'
h5py/_hl/datatype.py:15: 'phil' imported but unused
h5py/_hl/dims.py:12: 'numpy' imported but unused
h5py/_hl/dims.py:17: 'readtime_dtype' imported but unused
h5py/_hl/files.py:12: 'weakref' imported but unused
h5py/_hl/files.py:18: 'HLObject' imported but unused
h5py/_hl/files.py:20: 'h5t' imported but unused
h5py/_hl/filters.py:47: 'h5s' imported but unused
h5py/_hl/filters.py:221: local variable 'szopts' is assigned to but never used
h5py/_hl/group.py:17: 'collections' imported but unused
h5py/_hl/selections2.py:29: undefined name 'name'

Not too bad. It found two errors in the code base right away, both in oddball code paths which are not presently covered by the test suite. One was caused by a mistyped name ("h5y" instead of "h5t"), and the second was this old chestnut of a mistake:

raise ValueError("Field %s does not appear in this type." % name)

where some refactoring had taken place and "name" was no longer defined.

I don't consider the unused import lines to be noise... since module-level imports can be used in many places, after refactoring it can be a tedious process of ctrl+f on suspect names to determine what's not needed any more. And removing imports makes dependencies between modules a lot more clear to the developer.

At this point I was already thinking about adding pyflakes to our Travis testing script. It's comically simple to use, everything in the output is legitimate, and it caught two errors in code we're actually shipping. And if you're on the fence about static analysis, please at least use pyflakes. Or an IDE that has a linter built-in. Or something.

As it happens, we ended up using PyLint, which found about 100 legitimate issues with the code base, ranging from missing docstrings to calls to functions with the wrong number of arguments.

Fixing things and making them stay fixed

These days, running an automated test suite is considered good practice, in particular, before changes are merged. We maintain a test suite for the h5py project specifically so that new pull requests don't break the project. What this exercise taught me is that despite the automation, the project was broken, rotting in a hundred subtle ways ranging from missing docstrings to duplicated code.

So as part of the cleanup, I added a "pylintrc" file to the project and added a "pylint" step to our Travis build. All changes to h5py go through pull requests, including changes by the core developers. Now, in addition to verifying that no regression tests fail, we can also check that the code meets minimum standards for quality.

Final thoughts

If you're already using an automated test suite to check for regressions, you should also be running a static analyzer. Particularly in the case of pyflakes, which had zero false positives, required zero configuration, and found two real bugs, it's hard to think of a reason not to.

Getting Pylint going took a bit of time, but for the h5py project I think it was worthwhile. The ability to check for missing docstrings is valuable all by itself. By having the automation check for boring things like mistyped variable names and missing function arguments, code review can focus on squishy human things like design and maintainability.