EGO issue in IBM SCF CE

Problem:

[root@pcmce-co68 ~]# egosh resource list
Cannot get host info. Not logged on.

Solution:

Log in to egosh shell (one time only). Default user and password is Admin.
[root@pcmce-co68 ~]# egosh user logon
user account: Admin
password:
Logged on successfully
[root@pcmce-co68 ~]# egosh resource list
NAME status mem swp tmp ut it pg r1m r15s r15m ls
pcmce-c* ok 827M 1516M 69G 7% 258 3.1 0.2 3.3 0.7 1

Disable Serial Console Redirection in xCAT

PXE boot config:
"... console=tty0 console=ttyS0,115200 ..."

Note: serial console redirection is managed by hardware profile.

To check hardware profile:
$ tabdump nodehm

#node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,cmdmapping,consoleondemand,comments,disable
"__HardwareProfile_IPMI",,"ipmi",,,,,,,,,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_ipmi.xml",,,
"__HardwareProfile_IBM_Flex_System_x",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_flex_x.xml",,,
"__HardwareProfile_IBM_System_x_M4",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_rackmount_x.xml",,,
"__HardwareProfile_IBM_iDataPlex_M4",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_rackmount_x.xml",,,
"__HardwareProfile_IBM_NeXtScale_M4",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_nextscale_x.xml",,,
"__Chassis_IBM_Flex_chassis",,"blade",,,,,,,,,,,,

To disable: clear the entries of serialport, serialspeed and serialflow columns
$ chdef -t group -o __HardwareProfile_IBM_Flex_System_x serialport= serialspeed= serialflow=

How to Configure Mellanox Switch

Required Tools:

  1. Serial console cable (provided inside the box)
  2. Converter cable from serial port to USB port (Prolific)

Step-by-step guide using MacOS:

  1. Download and install driver for the converter cable http://plugable.com/drivers/prolific
  2. Check if the driver is installed correctly
     $ kextstat | grep prolific
    
     159 0 0xffffff7f832fa000 0x6000 0x6000 com.prolific.driver.PL2303 (1.6.0) F6A6805D-685D-3E6D-BF81-106EBBC0A386
    
     $ ioreg -c IOSerialBSDClient | grep usb
    
     | | "IOTTYBaseName" = "usbserial"
    
     | | "IOCalloutDevice" = "/dev/cu.usbserial"
    
     | | "IODialinDevice" = "/dev/tty.usbserial"
    
     | | "IOTTYDevice" = "usbserial"
  3. Start the connection
     $ screen /dev/cu.usbserial
  4. Press Enter and follow instructions in official user guide.
    For example:
Mellanox Switch

Mellanox configuration wizard
 Do you want to use the wizard for initial configuration? y
 Step 1: Hostname? [switch-56d680] switch-10g
 Step 2: Use DHCP on mgmt0 interface? [yes] no
 Step 3: Use zeroconf on mgmt0 interface? [no] no
 Step 4: Primary IPv4 address and masklen? [0.0.0.0/0] 172.21.35.60/23
 Step 5: Default gateway? 172.21.35.254
 Step 6: Primary DNS server? 155.69.3.8,155.69.3.7
 % Value must be an IPv4 address in the format of '192.168.0.1'.
 Step 6: Primary DNS server? 155.69.3.8
 Step 7: Domain name?
 Step 8: Enable IPv6? [yes] yes
 Step 9: Enable IPv6 autoconfig (SLAAC) on mgmt0 interface? [no] no
 Step 10: Enable DHCPv6 on mgmt0 interface? [no] no
 Step 11: Admin password (Enter to leave unchanged)?
 Step 11: Confirm admin password?
 You have entered the following information:
 1. Hostname: switch-10g
 2. Use DHCP on mgmt0 interface: no
 3. Use zeroconf on mgmt0 interface: no
 4. Primary IPv4 address and masklen: 172.21.35.60/23
 5. Default gateway: 172.21.35.254
 6. Primary DNS server: 155.69.3.8
 7. Domain name:
 8. Enable IPv6: yes
 9. Enable IPv6 autoconfig (SLAAC) on mgmt0 interface: no
 10. Enable DHCPv6 on mgmt0 interface: no
 11. Admin password (Enter to leave unchanged): (CHANGED)
 To change an answer, enter the step number to return to.
 Otherwise hit to save changes and exit.
 Choice:

Installing CUDA 7.5 for Tesla M40 on Ubuntu 14.04.5 LTS

Install Driver

  1. Download Tesla driver (http://www.nvidia.com/Download/index.aspx?lang=en-us )
    Picture1
  2. Move to runlevel 3
    $ telinit 3
  3. Stop lightdm service
    $ service lightdm stop

  4. Change file mode of the driver package
    $ chmod +x NVIDIA-Linux-x86_64-352.99.run

Continue reading

got stuck at “Wait for Plymouth Boot Screen to Quit”

If you can’t get to the login page (booting gets stuck at “Wait for Plymouth Boot Screen to Quit”) after CUDA driver installation, then it’s probably because the kernel is trying to load xorg.conf created by NVIDIA driver. I got this experience in my laptop that has Intel + NVIDIA GPUs running CentOS 7.

Workaround Solution: Continue reading