High Utilization in NVIDIA GPU

Problem:

In recent GPUs, you may notice that somehow the GPU is getting high utilization while there is no process running. Based on explaination in http://docs.nvidia.com/deploy/driver-persistence/, this is happened because the kernel module is loaded but the GPU is not initialized yet. By default, GPU will be initialized when there is a GPU process start working on it, and then deinitialized when the process is completed.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:01:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 Off | 0000:02:00.0 Off | 0 |
| 0% 33C P0 67W / 250W | 55MiB / 11519MiB | 67% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

 

Solution:

We can keep the GPU to be initialized all the time. This is called “persistence mode”. To enable it:
sysadmin@sap-dl:~$ sudo nvidia-smi -i 0 -pm 1
[sudo] password for sysadmin:
Enabled persistence mode for GPU 0000:02:00.0.
All done.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:02:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 On | 0000:02:00.0 Off | 0 |
| 0% 36C P0 66W / 250W | 55MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

However, this setting will be reset when the server is rebooted. Hence, we will need to set it in rc.local to make it starts during startup.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s