High Utilization in NVIDIA GPU

Problem:

In recent GPUs, you may notice that somehow the GPU is getting high utilization while there is no process running. Based on explaination in http://docs.nvidia.com/deploy/driver-persistence/, this is happened because the kernel module is loaded but the GPU is not initialized yet. By default, GPU will be initialized when there is a GPU process start working on it, and then deinitialized when the process is completed.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:01:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 Off | 0000:02:00.0 Off | 0 |
| 0% 33C P0 67W / 250W | 55MiB / 11519MiB | 67% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

 

Solution:

We can keep the GPU to be initialized all the time. This is called “persistence mode”. To enable it:
sysadmin@sap-dl:~$ sudo nvidia-smi -i 0 -pm 1
[sudo] password for sysadmin:
Enabled persistence mode for GPU 0000:02:00.0.
All done.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:02:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 On | 0000:02:00.0 Off | 0 |
| 0% 36C P0 66W / 250W | 55MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

However, this setting will be reset when the server is rebooted. Hence, we will need to set it in rc.local to make it starts during startup.

disabling nouveau kernel driver

Cuma catetan dikit ttg linux (fedora 11 ke atas) biar gampang nyari. Baru tau kalo ternyata mulai dari kernel 2.6.33 ke atas, di dalamnya udah terdapat open graphics driver yang compatible ama GeForce, namanya, “nouveau”. Sebenarnya sih, hal ini menguntungkan karna kita tidak perlu repot lagi nyari-nyari driver buat VGA. Tetapi untuk beberapa keperluan yang berkaitan langsung dengan fitur keluaran NVIDIA yang memerlukan official driver dari NVIDIA, seperti CUDA, hal ini jadi masalah. nouveau akan menghalangi proses instalasi driver.

Therefore, we have to disable the nouveau kernel driver. We just need to add following to the “kernel” line in grub.conf (/etc/grub.conf or /boot/grub/grub.conf)

rdblacklist=nouveau

the grub.conf will then look like

title Fedora (2.6.31.5-127.fc12.i686.PAE)
	root (hd0,2)
	kernel /vmlinuz-2.6.31.5-127.fc12.i686.PAE ro root=/dev/mapper/vg_satriahost-lv_root  LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet rdblacklist=nouveau
	initrd /initramfs-2.6.31.5-127.fc12.i686.PAE.img

After reboot, you can install the official NVIDIA driver.

Source: http://www.h-online.com/open/news/item/Kernel-Log-Linux-2-6-33-to-include-NVIDIA-graphics-driver-nouveau-885001.html

upgrade kernel pada fedora

Awalnya, gara-gara Prof gw nyuruh gw buat mulai fokus ngerjain thesis, jadi lah kpaksa gw mulai nyobain Fedora, linux turunan Red Hat. Kenapa harus turunan Red Hat ?? kata si Prof, cluster di Lab pake Red Hat semua, jadi biar gw lebih mudah maen-maen di cluster ntar. Padahal kmaren-kmaren gw dah mulai ngrasa nyaman ama Ubuntu. Yaudah lah gapapa, itung-itung pelajaran baru.. hehe.

Setelah gw install fedora 8, ternyata kernelnya dah ktinggalan zaman. Eh, bentar dulu,.. kenapa gw ga make fedora 9 keluaran terbaru ? karena ternyata ada beberapa applikasi penting yang blon kompatibel ama fedora 9, dan gw butuh banget ama applikasi itu, jadinya mau ga mau tetep make fedora 8.

Balik lagi, yah itu tadi, kpaksa gw harus ng-upgrade kernel. Trus, gimana caranya ? Nah, kbingungan pertama.. hehe.. setelah ngubek-ngubek mbah Google.. nemuin beberapa referensi. Trus dicobain satu-satu, akhirnya nemu yang singkat dan padat. Yaudah, dicatat deh di sini, biar gampang kalo misalnya bsok-bsok lupa.. hehe.. 😀

Langkah – langkah buat ngupgrade kernel di fedora : Continue reading