How to Configure Mellanox Switch

Required Tools:

  1. Serial console cable (provided inside the box)
  2. Converter cable from serial port to USB port (Prolific)

Step-by-step guide using MacOS:

  1. Download and install driver for the converter cable http://plugable.com/drivers/prolific
  2. Check if the driver is installed correctly
     $ kextstat | grep prolific
    
     159 0 0xffffff7f832fa000 0x6000 0x6000 com.prolific.driver.PL2303 (1.6.0) F6A6805D-685D-3E6D-BF81-106EBBC0A386
    
     $ ioreg -c IOSerialBSDClient | grep usb
    
     | | "IOTTYBaseName" = "usbserial"
    
     | | "IOCalloutDevice" = "/dev/cu.usbserial"
    
     | | "IODialinDevice" = "/dev/tty.usbserial"
    
     | | "IOTTYDevice" = "usbserial"
  3. Start the connection
     $ screen /dev/cu.usbserial
  4. Press Enter and follow instructions in official user guide.
    For example:
Mellanox Switch

Mellanox configuration wizard
 Do you want to use the wizard for initial configuration? y
 Step 1: Hostname? [switch-56d680] switch-10g
 Step 2: Use DHCP on mgmt0 interface? [yes] no
 Step 3: Use zeroconf on mgmt0 interface? [no] no
 Step 4: Primary IPv4 address and masklen? [0.0.0.0/0] 172.21.35.60/23
 Step 5: Default gateway? 172.21.35.254
 Step 6: Primary DNS server? 155.69.3.8,155.69.3.7
 % Value must be an IPv4 address in the format of '192.168.0.1'.
 Step 6: Primary DNS server? 155.69.3.8
 Step 7: Domain name?
 Step 8: Enable IPv6? [yes] yes
 Step 9: Enable IPv6 autoconfig (SLAAC) on mgmt0 interface? [no] no
 Step 10: Enable DHCPv6 on mgmt0 interface? [no] no
 Step 11: Admin password (Enter to leave unchanged)?
 Step 11: Confirm admin password?
 You have entered the following information:
 1. Hostname: switch-10g
 2. Use DHCP on mgmt0 interface: no
 3. Use zeroconf on mgmt0 interface: no
 4. Primary IPv4 address and masklen: 172.21.35.60/23
 5. Default gateway: 172.21.35.254
 6. Primary DNS server: 155.69.3.8
 7. Domain name:
 8. Enable IPv6: yes
 9. Enable IPv6 autoconfig (SLAAC) on mgmt0 interface: no
 10. Enable DHCPv6 on mgmt0 interface: no
 11. Admin password (Enter to leave unchanged): (CHANGED)
 To change an answer, enter the step number to return to.
 Otherwise hit to save changes and exit.
 Choice:

High Utilization in NVIDIA GPU

Problem:

In recent GPUs, you may notice that somehow the GPU is getting high utilization while there is no process running. Based on explaination in http://docs.nvidia.com/deploy/driver-persistence/, this is happened because the kernel module is loaded but the GPU is not initialized yet. By default, GPU will be initialized when there is a GPU process start working on it, and then deinitialized when the process is completed.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:01:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 Off | 0000:02:00.0 Off | 0 |
| 0% 33C P0 67W / 250W | 55MiB / 11519MiB | 67% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

 

Solution:

We can keep the GPU to be initialized all the time. This is called “persistence mode”. To enable it:
sysadmin@sap-dl:~$ sudo nvidia-smi -i 0 -pm 1
[sudo] password for sysadmin:
Enabled persistence mode for GPU 0000:02:00.0.
All done.

sysadmin@sap-dl:~$ nvidia-smi
Tue Sep 27 19:02:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 On | 0000:02:00.0 Off | 0 |
| 0% 36C P0 66W / 250W | 55MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

However, this setting will be reset when the server is rebooted. Hence, we will need to set it in rc.local to make it starts during startup.

Limiting CPU Usage of A Process in CentOS/RHEL 7

In HPC, we may need to protect head node from unnecessary heavy process that may cause login problem for users. One of the solutions is by using cpulimit. We can create a cronjob to monitor all processes and set certain limit for them. This is how I usually did in CentOS/RHEL 7.x.

  1. Install cpulimit package from EPEL repo.

yum install cpulimit

  1. Create a script to monitor the process. The script below is a modified version of the script in this forum. You can modify inputs of the first 3 variables: CPU_LIMIT, BLACK_PROCESSES_LIST, and WHITE_PROCESSES_LIST.

Continue reading