Job Resource Utilization#
Making sure you are making efficient use of the HPC resources you request is very important. Any HPC job will only use a portion of the resources you request, so it is important to make sure you are not requesting a lot more than you need. Reviewing your completed jobs helps you make better decisions on how much resources to request for future jobs.
Way to view job efficiency after completion
Slurm job completion emails
Open OnDemand Active Jobs Dashboard
Slurm seff command
Ways to view job efficiency during execution
top command
nvidia-smi command
Completed Jobs#
Slurm job completion emails#
If you use Open OnDemand to run jobs, or request Slurm emails on job completion these message include Job Efficiency Metrics in the message. An email attachment includes additional details.
Active Jobs Dashboard#
The Open OnDemand server provides an “Active Jobs” dashboard that shows your job history. It can be accessed at under the Jobs -> Active Jobs menu item. Clicking on each completed job in the Job History section will show additional details about it. Completed jobs will include efficiency data under the Resources and I/O tab.
seff#
You can check your completed jobs using the slurm seff command. The would be job that no longer show as running or pending in the queue. This command can be run on any hpc node. Knowing your slurm JOBID is required.
Display job CPU and memory usage:
$ seff JOBID
[tutln01@login-p01 ~]$ seff 296794
Job ID: 296794
Cluster: pax
User/Group: /tutln01
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:22:12 core-walltime
Job Wall-clock time: 00:11:06
Memory Utilized: 1.16 MB (estimated maximum)
Memory Efficiency: 0.06% of 2.00 GB (2.00 GB/node)
Display job detailed accounting data:
$ sacct --format=partition,state,time,start,end,elapsed,MaxRss,ReqMem,MaxVMSize,nnodes,ncpus,nodelist -j JOBID
[tutln01@login-prod-01 ~]$ sacct --format=partition,state,time,start,end,elapsed,MaxRss,ReqMem,MaxVMSize,nnodes,ncpus,nodelist -j 296794
Partition State Timelimit Start End Elapsed MaxRSS ReqMem MaxVMSize NNodes NCPUS NodeList
---------- ---------- ---------- ------------------- ------------------- ---------- ---------- ---------- ---------- -------- ---------- ---------------
preempt COMPLETED 1-02:30:00 2021-03-22T22:18:55 2021-03-22T22:30:01 00:11:06 2Gn 1 2 cc1gpu001
OUT_OF_ME+ 2021-03-22T22:18:55 2021-03-22T22:30:01 00:11:06 8K 2Gn 135100K 1 2 cc1gpu001
COMPLETED 2021-03-22T22:18:56 2021-03-22T22:30:01 00:11:05 592K 2Gn 351672K 1 2 cc1gpu001
NOTE: there are more format options, see sacct
Hint
If you don’t know your job IDs or can’t find them. Utilize hpctools - Tufts HPC Helper Tool on the HPC cluster to find all of your jobs since a certain date.
Running Jobs#
Sometimes it is necessary to review the resource utilization while a job is running. This is typically done by connected to the node your jobs is currently running on via SSH, and running additional commands.
top#
The top command will list all the processes running on the compute node and the memory and CPU being used. To see the processes for your user run top -u $USER.
nvidia-smi#
When using GPU nodes the nvidia-smi command is useful for spot checking GPU compute and memory usage. You will only see the GPUs allocated to your user.
[utln@pax00# ~]$ nvidia-smi
Wed Feb 18 22:52:34 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H200 On | 00000000:19:00.0 Off | 0 |
| N/A 44C P0 263W / 700W | 18193MiB / 143771MiB | 54% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H200 On | 00000000:3B:00.0 Off | 0 |
| N/A 35C P0 78W / 700W | 0MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H200 On | 00000000:4C:00.0 Off | 0 |
| N/A 43C P0 252W / 700W | 18711MiB / 143771MiB | 53% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H200 On | 00000000:5D:00.0 Off | 0 |
| N/A 45C P0 257W / 700W | 18711MiB / 143771MiB | 53% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H200 On | 00000000:9B:00.0 Off | 0 |
| N/A 40C P0 237W / 700W | 18711MiB / 143771MiB | 47% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H200 On | 00000000:BB:00.0 Off | 0 |
| N/A 34C P0 79W / 700W | 0MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H200 On | 00000000:CB:00.0 Off | 0 |
| N/A 42C P0 241W / 700W | 18711MiB / 143771MiB | 48% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H200 On | 00000000:DB:00.0 Off | 0 |
| N/A 34C P0 80W / 700W | 0MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1213801 C python 18184MiB |
| 2 N/A N/A 1213800 C python 18702MiB |
| 3 N/A N/A 1213803 C python 18702MiB |
| 4 N/A N/A 1059301 C python 18702MiB |
| 6 N/A N/A 1213802 C python 18702MiB |
+-----------------------------------------------------------------------------------------+