Jupyter
Jupyter is a popular cell-based notebook format for keeping calculations tidy.
A jupyter session has two components:
- The server, or kernel, that is responsible for executing your code (sometimes called the back-end)
- The client, which is in essence a website that interfaces with the kernel (sometimes called the front-end)
Running Locally / Getting Started
For now we'll run both server and client on the same machine.
After following the appropriate installation instructions for your OS, you can run the kernel locally with the command
mypc$ jupyter notebook
This should print out
[I 15:39:46.096 NotebookApp] Serving notebooks from local directory: /path/to/my/directory/
[I 15:39:46.096 NotebookApp] Jupyter Notebook 6.4.12 is running at:
[I 15:39:46.096 NotebookApp] http://localhost:8888/?token=6e65008a92bdea735729e44173b2967cf15513be312e6ba6
and automatically redirect you to the jupyter interface in a web browser.
Note in particular the form of the URL:
http:// | Jupyter talks over http |
localhost:8888 |
The local URL that the server is running at, indicating it is localhost
(aka 127.0.0.1 ) running on port 8888. If you accidentally launch more than one kernel, it will redirect to a free port
(8889, 8890, ...).
|
/?token=.... |
A randomly generated password for this jupyter session. |
Running on a Cluster
For heavy loads, it is desirable to separate these onto two machines - running the kernel on a beefy TCM / CSD3 machine, and sending it commands from a lightweight laptop.
Replace all instances of [crsid]
with your crsid (no brackets),
and similarly [cluster]
with s1, s2, s3... as appropriate.
Setup
This guide assumes that you have set up your ~/.ssh/config
file in a manner
similar to that suggested on the ssh page, e.g.:
Host [cluster].tcm
User [crsid]
HostName [cluster]
ProxyJump pc51.tcm # change this to your pc
# Optional
ControlMaster auto
ControlPath ~/.ssh/control-[cluster]
# the wildcard matches any machine name, e.g. mypc$ ssh pc51.tcm
Host *.tcm
User [crsid]
HostName %h.phy.cam.ac.uk
# Optional
ControlMaster auto
ControlPath ~/.ssh/control-%h
Port Forwarding
To connect your computer's 8888 port to the one listened to on [cluster], it is necessary to port forward. This can be done by either running
mypc$ ssh -L 8888:localhost:8888 [crsid]@pc51.tcm.phy.cam.ac.uk
or by modifying the [cluster].tcm Host in your ssh config,
Host [cluster].tcm User [crsid] HostName [cluster] ProxyJump pc51.tcm # change this to your pc # Forward 8888 on the cluster to localhost:8888 LocalForward 8888 localhost:8888 # Optional ControlMaster auto ControlPath ~/.ssh/control-[cluster]
and then using ssh as normal,
mypc$ ssh [cluster].tcm
If someone else is using jupyter on [cluster], then port 8888 is unlikely to be free, and jupyter will choose a different port (probably 8889). For this reason it is not always convenient to hard-code 8888 in the configuration file.
Running the Kernel
Once you are connected to the cluster, starting the kernel is straightforward:
cluster$ cd /scratch/[crsid] # or wherever you want the jupyter directory to be
cluster$ jupyter notebook --no-browser --port=8888
Then, manually navigate to the URL that jupyter spits out in a web browser.
To list running kernels:
cluster$ jupyter notebok list
Other Languages
Despite its name, jupyter is a language-agnostic interface. It is possible to program in a different language by installing an alternative jupyter kernel, though some caution should be exercised - only the default (IPython) kernel is maintained by the designers. Some popular (read: well maintained) kernels are- IPython (the default, does not require installation)
- Julia
- R
- Mathematica/Wolfram Language
- Rust
Troubleshooting
- Do not use the
ControlPersist
option in your ssh config file. It interferes with port forwarding. - If more than one person is running a Jupyter kernel on the cluster at once, the
jupyter
command will give the warning
The port 8888 is already in use, trying another port.
In this case, simply repeat the steps of this guide, replacing 8888 with 8890.