Resource management [2 min read]¶

"Develop small, run big"

Basic rules to follow¶

if you are using helpfull microservice, deploy it as separate pod and share the usage with team, e.g. Carla or SUCCESS6G services
do EDA(Exploratory Data Analysis), develop, debug, and troubleshoot code on small JupyterHub pod or on your local machine
run script on larger pod
if absolutely necessary run the code on standalone server
if ultra-super-duper-absolutely necessary develop solution using SLURM or SLURM on Kubernetes

Step-by-step¶

Please follow these simple rules to avoid misuse of the SAI Research Unit computational resources: 1. spawn a small pod that is sufficient to process the data your are using * i.e. depending on the dataset size do a best guess and eventually go bigger 2. profile your code and pod resource usage 3. spawn a larger pod if needed 4. estimate a time and requirements and request a standalone server by putting a note into shared callendar in Teams - MANDATORY * make an ssh message - OPTIONAL (e.g. when you are chasing deadline and cant afford any mishaps) 5. [EXTREME] - develop a solution using multiple machines using SLURM, SLURM on Kubernetes, or other solution; and consult it with team

GPU inside JupyterHub pod¶

To get the GPU running in a pod install a proper driver:

sudo apt-get update
sudo apt install nvidia-utils-535

For Code Profiling please see code_profiling.md.