High Utilization in NVIDIA GPU

Problem:

In recent GPUs, you may notice that somehow the GPU is getting high utilization while there is no process running. Based on explaination in http://docs.nvidia.com/deploy/driver-persistence/, this is happened because the kernel module is loaded but the GPU is not initialized yet. By default, GPU will be initialized when there is a GPU process start working on it, and then deinitialized when the process is completed.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:01:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 Off | 0000:02:00.0 Off | 0 |
| 0% 33C P0 67W / 250W | 55MiB / 11519MiB | 67% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

 

Solution:

We can keep the GPU to be initialized all the time. This is called “persistence mode”. To enable it:
sysadmin@sap-dl:~$ sudo nvidia-smi -i 0 -pm 1
[sudo] password for sysadmin:
Enabled persistence mode for GPU 0000:02:00.0.
All done.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:02:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 On | 0000:02:00.0 Off | 0 |
| 0% 36C P0 66W / 250W | 55MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

However, this setting will be reset when the server is rebooted. Hence, we will need to set it in rc.local to make it starts during startup.

got stuck at “Wait for Plymouth Boot Screen to Quit”

If you can’t get to the login page (booting gets stuck at “Wait for Plymouth Boot Screen to Quit”) after CUDA driver installation, then it’s probably because the kernel is trying to load xorg.conf created by NVIDIA driver. I got this experience in my laptop that has Intel + NVIDIA GPUs running CentOS 7.

Workaround Solution: Continue reading

error: “Oh no! Something has gone wrong.”


If the above message suddenly comes up in your screen after CUDA driver installation in RedHat/CentOS/Fedora OS, don’t be panic. This is happened because of xorg-x11-drv-nvidia-gl package, which is part of cuda-drivers dependencies. I got this experience in my laptop that has Intel + NVIDIA GPUs. I guess it’s because the Intel GPU is the primary GPU in my laptop, and for RedHat/CentOS/Fedora there’s no a kind of official Optimus technology, like in Windows.

Workaround Solution: Continue reading

install nvidia driver and CUDA in fedora 10

nginstal nvidia driver di fedora 10 ga beda kok kayak versi sblon2nya, tapi bener2 harus login pake root awalnya.

yang agak pusing cuma kalo nginstall nvdia driver yang support CUDA. karna CUDA terakhir versi 2.1 masih Beta, trus SDK nya masih 2.0, dan sebenarnya untuk fedora 9. tapi karna gw bandel, gw cobain aja di fedora 10, dan hasilnya… error dimana-mana, heheh… ga sih, sebenarnya error ntu karna gcc nya versi 4.3 sementara CUDA SDK nya masih pake gcc versi di bawah 4.3 jadi pas di compile error dah… untunglah dengan ubah dikit2 di file2 SDK nya,.. CUDA bisa jalan di fedora 10.. hohoho

cudagl

maen CUDA2.0an di fedora 8 : instalasi

Proses instalasi (sebagai root):

1. download NVIDIA’s CUDA Development Tools, yang terdiri dari

  • NVIDIA CUDA Display Driver : NVIDIA-Linux-x86-177.13-pkg1.run
  • CUDA Toolkit : NVIDIA_CUDA_Toolkit_2.0beta2_Fedora8_x86.run
  • CUDA SDK : NVIDIA_CUDA_sdk_2.0beta2_linux.run

di http://www.nvidia.com/object/cuda_get.html

2. Instal NVIDIA CUDA Display Driver dengan langkah-langkah seperti di postingan sblonnya(langkah 2-5).

3. Instal CUDA Toolkit dan CUDA SDK. Untuk menginstal file berformat .run, bisa dilihat Continue reading