TCM
UoC crest

Python in TCM

Python is a remarkably useful programming language, and also a remarkably frustrating one.

One set of frustrations arises from the language itself. It is easy to write code in python. But it is also easy to write very slow, inefficient code accidentally, and it is easy to write code which almost no-one else will ever understand. And python's use of a Global Interpreter Lock makes writing threaded code unusually hard. Of course, other languages are imperfect too.

The other set comes from the reliance of most python scripts on non-standard modules. The core python language contains a number of standard modules in its "standard library": math, random, statistics, urllib, datetime, re, zlib, os, sys, shutil and glob to name a few. But most scripts rely on modules outside of this core collection.

This means that a python script will run only in an environment which can provide all of the extra modules that it needs, and which can provide sufficiently-recent versions of those modules. Given that many python modules evolve rapidly, both gaining new features and removing old features, this can cause difficulties. Often one needs access to an old version of a module for a particular script, whereas a different script requires a newer version. And whilst sensible people try to avoid dependencies in their scripts which might cause trouble, one can find oneself collaborating with people who are not sensible.

TCM provides whatever python is provided by the Linux distribution it is using. To this it adds a modest number of modules as provided by that distribution. Currently the list includes ase, matplotlib, networkx, numba, numpy, pandas, scipy, h5py and sklearn. In total we currently have over 160 python3 packages installed on our Ubuntu 20.04 machines, which is almost exactly 5% of the number which Ubuntu offers!

This is sufficient to run many, many python scripts, but what are the alternatives if it is not?

Ask for an additional package to be installed

If Ubuntu supplies a suitable package, and its dependencies do not conflict with anything else we have installed (which mostly means that it does not require MPI), then ask [email protected], and it might well be installed on all machines.

Install a personal copy of a package using pip

See our pip page.

Using anaconda / miniconda

These maintain python installations that are completely independent of any already provided by the OS. They are large. An initial install of miniconda is about 300MB and 22,000 files, and of anaconda about 3GB and 160,000 files. Both are capable of growing significantly with use. So they are best installed to local /scratch disks, and certainly not to one's home directory. (Anything with a large number of small files will be slow on a remote directory, and the condas do not need to enter our backup system. Operations such as conda create can be ten times faster on a local disk, and conda remove forty times faster or more!)

Given that they are complete python installations, one does not need to download the one corresponding to the installed version of python. Here version 3.9 from the miniconda download page is used.

pc00:/scratch/spqr1$ wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh
pc00:/scratch/spqr1$ bash Miniconda3-py39_4.9.2-Linux-x86_64.sh -p /scratch/${USER}/miniconda3

and accept the default for every question save the last, "Do you wish the installer to initialize Miniconda3 by running conda init?" to which one should probably answer "yes". This will update one's ~/.bashrc file to make conda available every time you log in. You will notice this as the prompt will change to include your current conda environment.

Of course the above will install conda on a single PC only. If one wishes to copy an installation to another PC, then first check that the target has sufficient space:

pc00:/scratch/spqr1$ du -sh miniconda3
305M	miniconda3
pc00:/scratch/spqr1$ ssh pc99 df -h /scratch
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda6       1.6T  1.3T  250G  84% /scratch
pc00:/scratch/spqr1$ rsync -aAH --delete miniconda3/ pc99:/scratch/${USER}/miniconda3/

The use of rsync will be familiar to most for synchronising two directories whilst minimising data transfer, and is commonly used for backing up between computers, especially laptops. Those too old to know about rsync might prefer

pc00:/scratch/spqr1$ tar -cf - miniconda3 | ssh pc99 tar -C /scratch/${USER} -xf -

and others might try

pc00:/scratch/spqr1$ scp -r miniconda3 pc99:/scratch/${USER}/

Note that neither tar nor scp will delete files which appear on the destination but not the source, whereas rsync will. Also scp will not preserve hard links, and as conda uses them extensively, scp really cannot be recommended here.

If one manages to keep one's miniconda environment small, and one frequently uses many different computers, and one is not worried by the performance penalty on some operations, then it might be reasonable to consider installing it to /rscratch instead, so that it is trivially available on all PCs. But there were many ifs prefixing the above.

Using venv

Python3 includes the module venv for creating virtual environments, and, at first glance, it is quite attractive. An empty virtual environment is about 8MB and 600 files, so a fraction of the size of miniconda. But it is far from ideal. It achieves this small size by simply linking to the python binary supplied by one's OS. This is fine, until the OS on one's computer is upgraded (or simply the version of python), at which point everything is likely to stop working if either of the first two parts of python's version number change. And the small size is also achieved by installing no packages at all. If packages are requested, using pip, they are always installed afresh, and not linked to copies already existing, so the size of the virtual environment quickly grows.

If only for the quality of the illustrations, I should mention this Guide to Python's Virtual Environments (best viewed in a private window unless one is a member of Medium). There is also a Primer at realpython.com.