High Utilization in NVIDIA GPU

Problem:

In recent GPUs, you may notice that somehow the GPU is getting high utilization while there is no process running. Based on explaination in http://docs.nvidia.com/deploy/driver-persistence/, this is happened because the kernel module is loaded but the GPU is not initialized yet. By default, GPU will be initialized when there is a GPU process start working on it, and then deinitialized when the process is completed.

[code language=”bash”]
sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:01:52 2016
+——————————————————+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|——————————-+———————-+———————-+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 Off | 0000:02:00.0 Off | 0 |
| 0% 33C P0 67W / 250W | 55MiB / 11519MiB | 67% Default |
+——————————-+———————-+———————-+

+—————————————————————————–+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+—————————————————————————–+
[/code]

 

Solution:

We can keep the GPU to be initialized all the time. This is called “persistence mode”. To enable it:
sysadmin@sap-dl:~$ sudo nvidia-smi -i 0 -pm 1
[sudo] password for sysadmin:
Enabled persistence mode for GPU 0000:02:00.0.
All done.

[code language=”bash”]
sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:02:52 2016
+——————————————————+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|——————————-+———————-+———————-+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 On | 0000:02:00.0 Off | 0 |
| 0% 36C P0 66W / 250W | 55MiB / 11519MiB | 0% Default |
+——————————-+———————-+———————-+

+—————————————————————————–+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+—————————————————————————–+
[/code]

However, this setting will be reset when the server is rebooted. Hence, we will need to set it in rc.local to make it starts during startup.

got stuck at “Wait for Plymouth Boot Screen to Quit”

If you can’t get to the login page (booting gets stuck at “Wait for Plymouth Boot Screen to Quit”) after CUDA driver installation, then it’s probably because the kernel is trying to load xorg.conf created by NVIDIA driver. I got this experience in my laptop that has Intel + NVIDIA GPUs running CentOS 7.

Workaround Solution: Read More …

error: “Oh no! Something has gone wrong.”


If the above message suddenly comes up in your screen after CUDA driver installation in RedHat/CentOS/Fedora OS, don’t be panic. This is happened because of xorg-x11-drv-nvidia-gl package, which is part of cuda-drivers dependencies. I got this experience in my laptop that has Intel + NVIDIA GPUs. I guess it’s because the Intel GPU is the primary GPU in my laptop, and for RedHat/CentOS/Fedora there’s no a kind of official Optimus technology, like in Windows.

Workaround Solution: Read More …

CUDA 7.5 and Visual Studio 2015

Sorry, I won’t tell you the solution. Instead, I will show you why you should not expect for the solution of CUDA 7.5 and Visual Studio 2015 integration problem. 😀

If you try to compile a simple kernel code with nvcc and bind it with the VS2015 C++ compiler like this:

> nvcc .\kernel.cu -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\"

then you will get this error: Read More …

Optimus + CUDA in Fedora 20

Recent laptops mostly come with hybrid-graphics system (two GPUs in one machine: an integrated GPU and a discrete GPU). It was first designed to control power consumption in laptops. By default, the operating system will use the integrated GPU which is less power consumption. Only when heavy activities (gaming, graphic rendering, GPU computing, etc) are performed, then operating system will move the workload to the discrete GPU.

For laptop with NVIDIA GPU, there is NVIDIA Optimus Technology for auto-switching between integrated GPU and discrete GPU. Unfortunately, NVIDIA support for this technology in Linux is not as good as in Windows. Since discrete GPU is a secondary card, installing the driver for NVIDIA GPU is not easy and may cause problem with the display manager in Linux. Read More …

“workspace in use or cannot be created” in eclipse or nvidia nsight

– remove .lock file in workspace’s metadata folder.
rm {YourWorkspaceDir}/.metadata/.lock

– find out the RECENT_WORKSPACES attribute
cd ~/.eclipse
grep -r "RECENT_WORKSPACES" *

– once you got the file where the RECENT_WORKSPACES attribute exists, edit that file by removing the RECENT_WORKSPACES line.

installing cuda 5 on ubuntu 12.04

  1. Download CUDA 5 installers from https://developer.nvidia.com/cuda-downloads.
  2. Add execution mode to the run file.
    • $ chmod +x cuda_5.*.run
  3. Change to terminal mode Ctrl-Alt-F1, log on and type
    • $ sudo service lightdm stop
    • $ sudo ./cuda_5.*.run
    • $ sudo shutdown -r now

If you failed to get back to login gui, type this:

  • $ sudo apt-get purge nvidia*
  • $ sudo apt-get install nvidia-current-updates-dev

errors in cuda 4.0

I’m just trying the cuda 4.0 on my macbook. And, as usual, i’m too lazy to read the “what’s new” page or doc. Just go straight to test it until got stuck with the errors.. 😀

Here, I’ll list any kind of errors i experienced with, and hopefully it’ll come along with solution.. 😛

First, i’m trying to compile an old simple code. This code was fine using cuda 3.2, but when I compile it using cuda 4.0, this kind of error comes out..

cudaLK.o:1203:53: warning: null character(s) ignored
cudaLK.o:1203:101: warning: null character(s) ignored
cudaLK.o:1203:112: warning: null character(s) preserved in literal
cudaLK.o:1203:112: warning: missing terminating ' character
cudaLK.o(1): error: unrecognized token

whoaa…… this forced me to read the programming guide, again..

Finally, when my eyes went to nvcc section, I got the clue… It looks like nvidia change the format of nvcc command.. In earlier version, simple nvcc command was

nvcc -c <cudacodefile>.cu -o <cudacodefile>.o

But for cuda 4.0, you don’t need to define the object file. nvcc will automatically define the object file based on the .cu file name. So, it should be simply..

nvcc -c <cudacodefile>.cu

to be continued later..

portable version of GPU Caps Viewer

GPU Caps Viewer is a tool to see your graphics card information with focus on the OpenGL, OpenCL, and CUDA API level support. It’s developed by Geeks3D and unfortunately only support Win 32-bit. ( no linux there 🙁 )

The latest version (1.9.4 released on 2010.11.05) can be downloaded here. Read More …