Have you ever copy pasted some code from one repo to another? Perhaps you were building a new web service and wanted to use some utilities which were available in the codebase of some existing web service. Then half a year later you did the same for a third service. One year later you discovered a bug in one of the copy pasted utility functions but remembered to fix the issue only in two of the three services. There's a problem here. Libraries to the rescue! This blog post is written with a focus on Python but the principles and motivation for building more libraries are basically language-neutral.
Share functionality & best practices between services
In the era of microservices, it’s not uncommon to have tens or even hundreds of different web services in a single company. There can be e.g. backends for web, different data crunching services, and pipelines for machine learning related functions. If these services are less than 5 yeas old, there’s a high chance they (or at least some of them) are written in Python. Python has been booming for the past couple of years, especially on the data science and analytics side.
When there are multiple services, written in the same language, there’s a high probability that they share some functionality. For example, I bet almost every company has their own set of utility functions for handling datetime. The critical point is when there’s a need for copy pasting something from one service to another for the first time. This is usually the point when one should consider moving the functionality to a library which can be used in both services. The benefits are obvious: If there’s a bug, it needs to be fixed only in a single place. If both services need some additional utility functionality regarding e.g. datetime, it can be implemented in one place. If you’re not convinced yet, read further.
When some functionality of a service is moved to a library, it means that there's no need to run tests or other quality checks related to that code within the continuous integration (CI) of the services that depend on the library. Same applies for local test runs.
In my experience, fast CI is one of the most important building blocks of successful, agile software development process. Smaller codebase usually means smaller test suite which in turn means faster CI. By extracting some of the functionality to libraries and listing those as dependencies of a service, yields smaller codebase. This is also closely related to cleaner codebase which I'll discuss next.
Have you ever seen a codebase which has more LOC for all sorts of utilities compared to the actual business logic? This is not necessary a problem for people who have been working with the codebase for longer period of time but it can bring significant cognitive overhead for newcomers. Good code structuring practices could be of course helpful regarding this. However, I still argue there's value in moving at least some of the generic helpers to libraries instead of keeping them around in a codebase which main purpose is to provide business logic.
In my opinion, cleaner codebase combined with the above mentioned faster CI are already strong enough arguments for starting to build more libraries even if there's no copy pasting going on between different services.
It's understandable that the business logic parts of the code are rarely something companies are willing to open source. However, my guess is that the world is filled with closed source projects which contain clever implementations for different generic problems. If there's no justification for keeping a generic solution closed source, there's likely some added value in open sourcing.
Some benefits of open sourcing:
- Give back to the community. How many open source libraries your products or services depend on? How many projects have you open sourced?
- Other people/companies will likely benefit from your library. This usually means that they are also willing to report bugs, suggest features, and implement improvements. Everyone benefits.
- From the company POV, open source projects tend to attract talent. I.e. at its best, open sourcing can provide an effective recruitment channel.
Make it easy to build and release libraries
Packaging hasn’t traditionally been the strong suit of Python which can be one of the factors why people tend to prefer copy pasting over creating libraries for shared functionality. Ideally, building and releasing a library should not bring notable overhead compared to implementing (or copy pasting) the actual beef (i.e. the functionality).
Another level of copy pasting is to take some existing library and copy paste the boilerplate (setup.py, linter & test & CI configs, etc) from there and after that copy paste the beef from some existing service. Then you of course need to remember to change the package name in relevant places, replace the dependencies with correct ones, and so on. And finally see if it builds and installs. Luckily there’s powerful tooling available, no copy pasting needed.
Cookiecutter makes it possible to define the boilerplate code and configs in single place. Cookiecutter is a CLI utility for spinning up new projects (or actually any directory / file structures) from a project template. Note that cookiecutter is not limited to Python projects (but happens to be written in Python). In a company environment, cookiecutter is a perfect tool for bootstrapping new repos with company specific best practices (e.g. linters, auto-formatting, CI, instructions and tooling for releasing) applied. Thus, creating a new Python library from some existing functionality can be as easy as:
- run single cookiecutter CLI command
- fill in project name and other simple details
- copy paste the existing functionality and tests for it
- push the stuff to a new repo
- let the CI do its thing
- make a release e.g. by pushing a tag
If the functionality was copy pasted e.g. from some existing service, then you can delete the original code from the service, list your new shiny library as a dependency, and fix some imports. Assuming the cookiecutter contains all the relevant pieces, the whole process can take less than 10 minutes. So, the initial cost for building & releasing a library vs copy pasting code from service to another is basically zero.
Happy coding and building libraries!