EGO issue in IBM SCF CE

Problem:

[root@pcmce-co68 ~]# egosh resource list
Cannot get host info. Not logged on.

Solution:

Log in to egosh shell (one time only). Default user and password is Admin.
[root@pcmce-co68 ~]# egosh user logon
user account: Admin
password:
Logged on successfully
[root@pcmce-co68 ~]# egosh resource list
NAME status mem swp tmp ut it pg r1m r15s r15m ls
pcmce-c* ok 827M 1516M 69G 7% 258 3.1 0.2 3.3 0.7 1

Disable Serial Console Redirection in xCAT

PXE boot config:
"... console=tty0 console=ttyS0,115200 ..."

Note: serial console redirection is managed by hardware profile.

To check hardware profile:
$ tabdump nodehm

#node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,cmdmapping,consoleondemand,comments,disable
"__HardwareProfile_IPMI",,"ipmi",,,,,,,,,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_ipmi.xml",,,
"__HardwareProfile_IBM_Flex_System_x",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_flex_x.xml",,,
"__HardwareProfile_IBM_System_x_M4",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_rackmount_x.xml",,,
"__HardwareProfile_IBM_iDataPlex_M4",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_rackmount_x.xml",,,
"__HardwareProfile_IBM_NeXtScale_M4",,"ipmi",,,,,"0","115200","hard",,"/opt/pcm/etc/hwmgt/mappings/HWCmdMapping_nextscale_x.xml",,,
"__Chassis_IBM_Flex_chassis",,"blade",,,,,,,,,,,,

To disable: clear the entries of serialport, serialspeed and serialflow columns
$ chdef -t group -o __HardwareProfile_IBM_Flex_System_x serialport= serialspeed= serialflow=

Installing CUDA 7.5 for Tesla M40 on Ubuntu 14.04.5 LTS

Install Driver

  1. Download Tesla driver (http://www.nvidia.com/Download/index.aspx?lang=en-us )
    Picture1
  2. Move to runlevel 3
    $ telinit 3
  3. Stop lightdm service
    $ service lightdm stop

  4. Change file mode of the driver package
    $ chmod +x NVIDIA-Linux-x86_64-352.99.run

Continue reading

Limiting CPU Usage of A Process in CentOS/RHEL 7

In HPC, we may need to protect head node from unnecessary heavy process that may cause login problem for users. One of the solutions is by using cpulimit. We can create a cronjob to monitor all processes and set certain limit for them. This is how I usually did in CentOS/RHEL 7.x.

  1. Install cpulimit package from EPEL repo.

yum install cpulimit

  1. Create a script to monitor the process. The script below is a modified version of the script in this forum. You can modify inputs of the first 3 variables: CPU_LIMIT, BLACK_PROCESSES_LIST, and WHITE_PROCESSES_LIST.

Continue reading