This is a section from the open-source living textbook Better Code, Better Science, which is being released in sections on Substack. The entire book can be accessed here and the Github repository is here. This material is released under CC-BY-NC.
Managing technical debt
The Python package ecosystem provides a cornucopia of tools, such that for nearly any problem one can find a package on PyPI or code on Github that can solve the problem. Most coders never think twice about installing a package that solves their problem; how could it be a bad thing? While we also love the richness of the Python package ecosystem, there are reasons to think twice about relying on arbitrary packages that one finds.
The concept of technical debt refers to work that is deferred in the short term in exchange for higher costs in the future (such as maintenance or changes). The use of an existing package counts as technical debt because there is uncertainty about how well any package will be maintained in the long term. A package that is not actively maintained can:
become dysfunctional with newer Python releases
come in conflict with newer versions of other packages, e.g. relying upon a function in another package that becomes deprecated
introduce security risks
fail to address bugs or errors in the code that are discovered by users
At the same time, there are very good reasons for using well-maintained packages:
Linus' law ("given enough eyeballs, all bugs are shallow") (Raymond, 1999) suggests that highly used software is less likely to retain bugs
A well-maintained package is likely to be well-tested
Using a well-maintained package can save a great deal of time compared to writing one's own implementation
While we don't want to suggest that one shouldn't use any old package from PyPI that happens to solve an important problem, we think it's important to keep in mind the fact that when we come to rely on a package, we are taking on technical debt and assuming some degree of risk. The level of concern about this will vary depending upon the expected reuse of the code: If you expect to reuse the code in the future, then you should pay more attention to how well the code is maintained. To see what an example of a well-maintained package look like, visit the Github repository for the Scikit-learn project. This is a long-lived project with more than 2000 contributors and a consistent history of commits over many years. Most projects will never reach this level of maturity, but we can use this as a template for what to look for in a well-maintained project:
Multiple active contributors (not just a single developer)
Automated testing with a high degree of code coverage
Testing across multiple python versions, including recent ones
An active issues page, with developers responding to issues relatively quickly
You may well decide that the code from a project that doesn't meet these standards is still useful enough to rely upon, but you should make that decision only after thinking through what would happen if the project was no longer maintained in the future. Considering and managing dependency risk is an essential aspect of building good software.