We know there are several kinds of kill/eviction in Oracle RAC CSS component (Cluster Sync Service).
instance kill -> node member kill
The relationship between these two kinds of kill regarding CSS is:
A member kill escalation. For example, database LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanism. If this times out it could escalate to a node kill.
If node kill even hang in some of the situation, how Oracle RAC it? This comes to the topic today I would like to introduce: IPMI
Since Oracle Database 11gR2, IPMI is integrated with Oracle RAC and with the configuration below, you can make it to work with Oracle RAC and trigger the node eviction when needed.
1) Log in as root.
2) Verify that ipmitool can communicate with the BMC using the IPMI driver by using the command bmc info, and looking for a device ID in the output. For example:
# ipmitool bmc info
Device ID : 32
If ipmitool is not communicating with the BMC, then configuring the BMC and ensure that the IPMI driver is running.
3) Enable IPMI over LAN using the following procedure
Determine the channel number for the channel used for IPMI over LAN. Beginning with channel 1, run the following command until you find the channel that displays LAN attributes (for example, the IP address):
# ipmitool lan print 1
IP Address Source : 0×01
IP Address : 140.87.155.89
Turn on LAN access for the channel found. For example, where the channel is 1:
# ipmitool -I bmc lan set 1 access on
4) Configure IP address settings for IPMI using the static IP addressing procedure:
Using static IP Addressing
If the BMC shares a network connection with ILOM, then the IP address must be on the same subnet. You must set not only the IP address, but also the proper values for netmask, and the default gateway. For example, assuming the channel is 1:
# ipmitool -I bmc lan set 1 ipaddr 192.168.0.55
# ipmitool -I bmc lan set 1 netmask 255.255.255.0
# ipmitool -I bmc lan set 1 defgw ipaddr 192.168.0.1
Note that the specified address (192.168.0.55) will be associated only with the BMC, and will not respond to normal pings.
5) Establish an administration account with a username and password, using the following procedure (assuming the channel is 1):
Set BMC to require password authentication for ADMIN access over LAN. For example:
# ipmitool -I bmc lan set 1 auth ADMIN MD5,PASSWORD
List the account slots on the BMC, and identify an unused slot (a User ID with an empty user name field). For example:
# ipmitool channel getaccess 1
. . .
User ID : 4
User Name :
Fixed Name : No
Access Available : call-in / callback
Link Authentication : disabled
IPMI Messaging : disabled
Privilege Level : NO ACCESS
. . .
Assign the desired administrator user name and password and enable messaging for the identified slot. (Note that for IPMI v1.5 the user name and password can be at most 16 characters). Also, set the privilege level for that slot when accessed over LAN (channel 1) to ADMIN (level 4). For example, where username is the administrative user name, and password is the password:
# ipmitool user set name 4 username
# ipmitool user set password 4 password
# ipmitool user enable 4
# ipmitool channel setaccess 1 4 privilege=4
# ipmitool channel setaccess 1 4 link=on
# ipmitool channel setaccess 1 4 ipmi=on
6) Verify the setup using the command lan print 1. The output should appear similar to the following. Note that the items in bold text are the settings made in the preceding configuration steps, and comments or alternative options are indicated within brackets []:
# ipmitool lan print 1
Set in Progress : Set Complete
Auth Type Support : NONE MD2 MD5 PASSWORD
Auth Type Enable : Callback : MD2 MD5
: User : MD2 MD5
: Operator : MD2 MD5
: Admin : MD5 PASSWORD
: OEM : MD2 MD5
IP Address Source : DHCP Address [or Static Address]
IP Address : 192.168.0.55
Subnet Mask : 255.255.255.0
MAC Address : 00:14:22:23:fa:f9
SNMP Community String : public
IP Header : TTL=0x40 Flags=0x40 Precedence=…
Default Gateway IP : 192.168.0.1
Default Gateway MAC : 00:00:00:00:00:00
.
.
.
# ipmitool channel getaccess 1 4
Maximum User IDs : 10
Enabled User IDs : 2
User ID : 4
User Name : username [This is the administration user]
Fixed Name : No
Access Available : call-in / callback
Link Authentication : enabled
IPMI Messaging : enabled
Privilege Level : ADMINISTRATOR
Verify that the BMC is accessible and controllable from a remote node in your cluster using the bmc info command. For example, if node2-ipmi is the network host name assigned the IP address of node2′s BMC, then to verify the BMC on node node2 from node1, with the administrator account username, enter the following command on node1:
$ ipmitool -H node2-ipmi -U username lan print 1
You are prompted for a password. Provide the IPMI password.
If the BMC is correctly configured, then you should see information about the BMC on the remote node. If you see an error message, such as Error: Unable to establish LAN session, then you must check the BMC configuration on the remote node.
Repeat this process for each cluster member node.
Below is a demo in my environment on how to set it up.
# ipmitool bmc info
Device ID : 32
Device Revision : 1
Firmware Revision : 3.0
IPMI Version : 2.0
Manufacturer ID : 42
Manufacturer Name : Sun Microsystems
Product ID : 18177 (0x4701)
Device Available : yes
Provides Device SDRs : no
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
IPMB Event Generator
Chassis Device
Aux Firmware Rev Info :
0x03
0x20
0x00
0x00
# ipmitool -I bmc lan set 1 ipaddr 10.137.17.12
Setting LAN IP Address to 10.137.17.12
# ipmitool -I bmc lan set 1 netmask 255.255.252.0
Setting LAN Subnet Mask to 255.255.252.0
# ipmitool -I bmc lan set 1 defgw ipaddr 10.137.16.1
Setting LAN Default Gateway IP to 10.137.16.1
# ipmitool -I bmc lan set 1 auth ADMIN MD5,PASSWORD
# ipmitool channel getaccess 1
Get User Name (id 1) failed: Invalid data field in request
# ipmitool user set name 5 crsusr
Set User Name command failed (user 5, name crsusr): Unknown (0x5)
# ipmitool user set password 5 cdcora
# ipmitool user enable 5
# ipmitool channel setaccess 1 5 privilege=4
# ipmitool channel setaccess 1 5 link=on
# ipmitool channel setaccess 1 5 ipmi=on
# ipmitool lan print 1
Set in Progress : Set Complete
Auth Type Support : NONE MD2 MD5 PASSWORD
Auth Type Enable : Callback : MD2 MD5 PASSWORD
: User : MD2 MD5 PASSWORD
: Operator : MD2 MD5 PASSWORD
: Admin : MD5 PASSWORD
: OEM :
IP Address Source : Static Address
IP Address : 10.137.17.12
Subnet Mask : 255.255.252.0
MAC Address : 00:21:28:11:bd:0f
SNMP Community String : public
IP Header : TTL=0x00 Flags=0x00 Precedence=0x00 TOS=0x00
BMC ARP Control : ARP Responses Disabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl : 5.0 seconds
Default Gateway IP : 10.137.16.1
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP : 0.0.0.0
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 2,3,0
Cipher Suite Priv Max : XXXXXXXXXXXXXXX
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM
# ipmitool channel getaccess 1 5
Maximum User IDs : 20
Enabled User IDs : 10
User ID : 5
User Name : crsusr
Fixed Name : No
Access Available : call-in / callback
Link Authentication : enabled
IPMI Messaging : enabled
Privilege Level : ADMINISTRATOR
# ipmitool -H 10.137.17.12 -U crsusr lan print 1
Password:
Set in Progress : Set Complete
Auth Type Support : NONE MD2 MD5 PASSWORD
Auth Type Enable : Callback : MD2 MD5 PASSWORD
: User : MD2 MD5 PASSWORD
: Operator : MD2 MD5 PASSWORD
: Admin : MD5 PASSWORD
: OEM :
IP Address Source : Static Address
IP Address : 10.137.17.12
Subnet Mask : 255.255.252.0
MAC Address : 00:21:28:11:bd:0f
SNMP Community String : public
IP Header : TTL=0x00 Flags=0x00 Precedence=0x00 TOS=0x00
BMC ARP Control : ARP Responses Disabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl : 5.0 seconds
Default Gateway IP : 10.137.16.1
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP : 0.0.0.0
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 2,3,0
Cipher Suite Priv Max : XXXXXXXXXXXXXXX
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM
# cd /u01/app/11.2.0/grid/bin
After this, you need to use crsctl command to set the correspond ipmiaddr and admin user:
[Thu May 26 07:00:18][crsusr@05:~]
$ cd /u01/app/11.2.0/grid/bin
[Thu May 26 07:00:40][crsusr@05:/u01/app/11.2.0/grid/bin]
$ crsctl set css ipmiaddr 10.137.17.12
CRS-4229: The IPMI information change was successful
$
[Thu May 26 07:00:26][crsusr@05:/u01/app/11.2.0/grid/bin]
$ crsctl set css ipmiadmin crsusr
IPMI BMC password:
CRS-4229: The IPMI information change was successful
$
Here, you can check the ocssd.log to verify that it really works, Here is the example in my environment.
2011-05-26 07:00:43.703: [ CSSD][5]clssnmSendIPMIReq: clssnmAuthSendReqThread spawned successfully, for the first time - nmreq 101825530
2011-05-26 07:00:43.703: [ CSSD][70]clssscUpdateEventValue: IPMIInfo State val 0, changes 8
2011-05-26 07:00:43.777: [ CSSD][70]clssscUpdateEventValue: IPMIInfo State val 0, changes 9
2011-05-26 07:00:43.777: [ CSSD][70]clssnmnodeTest: IPMI Admin Node selected is 2 and my nodenum is 1
2011-05-26 07:00:43.777: [ CSSD][70]clssscUpdateEventValue: IPMIInfo State val 1, changes 10
2011-05-26 07:00:44.069: [ CSSD][48]clssscUpdateEventValue: IPMIInfo State val 2, changes 11
2011-05-26 07:00:44.069: [ CSSD][70]clssscWaitChangeEventValue: ev(IPMIInfo State) changed to 2 from 1
2011-05-26 07:00:44.069: [ CSSD][70]clssnmAuthSendReqThread: IPMI Cookie Validation succeeds and in success event change for nmreq 101825530
2011-05-26 07:00:44.069: [ CSSD][70]clssscUpdateEventValue: IPMIInfo State val 6, changes 12
2011-05-26 07:00:44.085: [ CSSD][48]clssscUpdateEventValue: IPMIInfo State val 7, changes 13
2011-05-26 07:00:44.085: [ CSSD][70]clssscWaitOnEventValue: after IPMIInfo State val 7, eval 7 waited 16
2011-05-26 07:00:44.086: [ CSSD][70]clssscUpdateEventValue: IPMIInfo State val 0, changes 14
2011-05-26 07:01:07.489: [ CSSD][72]clssnkipmiPing: 00001:Sent IPMI ping msg, max RT timeout=250 msec
2011-05-26 07:01:07.492: [ CSSD][72]clssnkipmiPing: 00004:IPMI pong message successfully recvd
2011-05-26 07:01:07.774: [ CSSD][72]clssnmAuthHandleReqThread: IPMI Cookie Validation succeeds for request from node 3 named dnagad08
2011-05-26 07:01:11.314: [ CSSD][74]clssnkipmiPing: 00000:Sent IPMI ping msg, max RT timeout=250 msec
2011-05-26 07:01:11.325: [ CSSD][74]clssnkipmiPing: 00012:IPMI pong message successfully recvd
2011-05-26 07:01:11.728: [ CSSD][74]clssnkipmiTalkToBMC: IPMI outbound SSN too low, discarding
2011-05-26 07:01:11.929: [ CSSD][74]clssnmAuthHandleReqThread: IPMI Cookie Validation succeeds for request from node 3 named dnagad08
$
2011-05-26 07:01:08.006: [ CSSD][25]clssnkipmiTrMsg: 06 00 ff 07 02 4e 5f 00 00 1e bf 5d 23 bf 44 1a fb 9d 09 ed 6e b6 28 e7 80 ec 01 df 7c 09 81 1c 63 20 10 3b 00 04 91
2011-05-26 07:01:08.006: [ CSSD][25]clssnkipmiTrMsgApp: 00216:4e 5f 00 00:RSP:MD5 :0010:SETSESPRIVLVL:00:
2011-05-26 07:01:08.006: [ CSSD][25]clssnkipmiDestroySes: Start
2011-05-26 07:01:08.006: [ CSSD][25]clssnkipmiTrMsg: 06 00 ff 07 02 01 00 00 00 1e bf 5d 23 41 da 08 41 05 a5 dd d2 f1 f2 48 89 39 5d 60 a9 0b 20 18 c8 81 14 3c 1e bf 5d 23 d2
2011-05-26 07:01:08.006: [ CSSD][25]clssnkipmiTrMsgApp: 00216:01 00 00 00:REQ:MD5 :0014:SESCLOSE : :
2011-05-26 07:01:08.094: [ CSSD][25]clssnkipmiTrMsg: 06 00 ff 07 02 4f 5f 00 00 1e bf 5d 23 80 dd c0 a5 9e ad 8c 7d 82 72 81 7c c5 9e bd d2 08 81 1c 63 20 14 3c 00 90
2011-05-26 07:01:08.094: [ CSSD][25]clssnkipmiTrMsgApp: 00304:4f 5f 00 00:RSP:MD5 :0014:SESCLOSE :00:
2011-05-26 07:01:08.095: [ CSSD][25]clssnkipmiTermCtx: Start
2011-05-26 07:01:08.095: [ CSSD][25]clssnkipmiValidate: Successful validate using method 2
2011-05-26 07:01:11.931: [ CSSD][25]clssnmRcfgMgrThread: initiating reconfig for modified ipmi cookie distribution
[Thu May 26 07:04:13][crsusr@dnagad08:/u01/app/11.2.0/grid/log/dnagad08/cssd]
$
I do a test by hang ocssd.bin, cssdagent and cssdmonitor to check that IPMI works as expected and will find that:
2011-05-26 07:05:16.013: [ CSSD][43]clssnkipmiKillNode: Power off detected. Powering on BMC at IP address 10.137.17.13
2011-05-26 07:05:16.013: [ CSSD][43]clssnkipmiPwrOn: Start
This indicate that the IPMI is working and after Misscount the node was evicted by IPMI.
Thanks
又一年了,继续合租的话,请支付宝seaman.ning#gmail.com ^_^