High Utilization in NVIDIA GPU

Problem:

In recent GPUs, you may notice that somehow the GPU is getting high utilization while there is no process running. Based on explaination in http://docs.nvidia.com/deploy/driver-persistence/, this is happened because the kernel module is loaded but the GPU is not initialized yet. By default, GPU will be initialized when there is a GPU process start working on it, and then deinitialized when the process is completed.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:01:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 Off | 0000:02:00.0 Off | 0 |
| 0% 33C P0 67W / 250W | 55MiB / 11519MiB | 67% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

 

Solution:

We can keep the GPU to be initialized all the time. This is called “persistence mode”. To enable it:
sysadmin@sap-dl:~$ sudo nvidia-smi -i 0 -pm 1
[sudo] password for sysadmin:
Enabled persistence mode for GPU 0000:02:00.0.
All done.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:02:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 On | 0000:02:00.0 Off | 0 |
| 0% 36C P0 66W / 250W | 55MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

However, this setting will be reset when the server is rebooted. Hence, we will need to set it in rc.local to make it starts during startup.

Installing CUDA 7.5 for Tesla M40 on Ubuntu 14.04.5 LTS

Install Driver

  1. Download Tesla driver (http://www.nvidia.com/Download/index.aspx?lang=en-us )
    Picture1
  2. Move to runlevel 3
    $ telinit 3
  3. Stop lightdm service
    $ service lightdm stop

  4. Change file mode of the driver package
    $ chmod +x NVIDIA-Linux-x86_64-352.99.run

Continue reading

got stuck at “Wait for Plymouth Boot Screen to Quit”

If you can’t get to the login page (booting gets stuck at “Wait for Plymouth Boot Screen to Quit”) after CUDA driver installation, then it’s probably because the kernel is trying to load xorg.conf created by NVIDIA driver. I got this experience in my laptop that has Intel + NVIDIA GPUs running CentOS 7.

Workaround Solution: Continue reading

error: “Oh no! Something has gone wrong.”


If the above message suddenly comes up in your screen after CUDA driver installation in RedHat/CentOS/Fedora OS, don’t be panic. This is happened because of xorg-x11-drv-nvidia-gl package, which is part of cuda-drivers dependencies. I got this experience in my laptop that has Intel + NVIDIA GPUs. I guess it’s because the Intel GPU is the primary GPU in my laptop, and for RedHat/CentOS/Fedora there’s no a kind of official Optimus technology, like in Windows.

Workaround Solution: Continue reading

CUDA 7.5 and Visual Studio 2015

Sorry, I won’t tell you the solution. Instead, I will show you why you should not expect for the solution of CUDA 7.5 and Visual Studio 2015 integration problem. 😀

If you try to compile a simple kernel code with nvcc and bind it with the VS2015 C++ compiler like this:

> nvcc .\kernel.cu -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\"

then you will get this error: Continue reading

Optimus + CUDA in Fedora 20

Recent laptops mostly come with hybrid-graphics system (two GPUs in one machine: an integrated GPU and a discrete GPU). It was first designed to control power consumption in laptops. By default, the operating system will use the integrated GPU which is less power consumption. Only when heavy activities (gaming, graphic rendering, GPU computing, etc) are performed, then operating system will move the workload to the discrete GPU.

For laptop with NVIDIA GPU, there is NVIDIA Optimus Technology for auto-switching between integrated GPU and discrete GPU. Unfortunately, NVIDIA support for this technology in Linux is not as good as in Windows. Since discrete GPU is a secondary card, installing the driver for NVIDIA GPU is not easy and may cause problem with the display manager in Linux. Continue reading