Solaris Run States Introduction

October 27th, 2008 404 Views

Solaris运行级别表示系统的运行状态,每个level具体运行哪些服务和进程是由/etc/rc#.d目录下面的脚本决定的。举例来说,在有SunCluster的RAC环境下,对Oracle UDLM (ORCLudlm) 进行升级就需要先进入单用户模式,卸载老的ORCLudlm,然后安装新版本。 这个时候就需要boot -s
默认的服务的运行级别列表如下:

* 0: The system is at the PROM monitor (ok>) or security monitor (>) prompt. It is safe to shut down the system when it is at this init state.
* 1, s or S: This state is known as “single-user” or “system administrator” mode. Root is the only user on the system, and only basic kernel functions are enabled. A limited number of filesystems (usually only root and /usr) are mounted. This init state is often used for sensitive functions (such as kernel libc patches) or while troubleshooting a problem that is keeping the system from booting into multiuser mode.
* 2: Multiple users can log in. Most system services (except for NFS server and printer resource sharing) are enabled.
* 3: Normal operating state. NFS and printer sharing is enabled, where appropriate.
* 4: Usually undefined.
* 5: Associated with the boot -a command. The system is taken to init 0 and an interactive boot is started.
* 6: Reboot. This state takes the system to init state 0 and then to the default init state (usually 3, but can be redefined in the /etc/inittab file).
阅读全文 »

Solaris rsh connection refused resolved

September 22nd, 2008 441 Views

这个问题困扰了我好几个月,今天终于搞定了。

一个Solaris10的cluster,四个节点,此处以1,2,3,4代替,所有节点之间ssh和rsh都是通的,但是1-1,2-1,3-1,4-1的rsh不通,这里所说的通就是不用输入密码即可访问其他的节点,比如1-2,即在1节点执行rsh 2 date即可显示2节点的当前时间。

其实要配置从1-2节点的rsh,有一些必要的步骤,简单罗列如下:
阅读全文 »

SunCluster ucmmd问题解决过程

March 7th, 2008 1,484 Views

最近,经常遇到SunCluster中有一个节点ucmm起不来的问题,现象就是scstat -g输出的结果显示ucmmd is not running,十分郁闷
STIT的弟兄们帮助解决了几次,但是也不知所以然,这次刚解决了,又坏了,恼火。

search了Sun的网站,找到了scswitch的用法,仔细看了一遍,然后用了两个命令搞定了。

1.先用ucmmd把ucmm的process重新启动一次:

#/usr/cluster/lib/ucmm/ucmmd -r /usr/cluster/lib/ucmm/ucmm_reconf

2. 然后用scswitch把相关的group resource offline/online一次,结果OK

# /usr/cluster/bin//scswitch -R -h xxx -g rac-framework-rg
#xxx is the node name

scswitch的用法记录一下,以备后患

scswitch(1M)

scswitch– perform ownership and state change of resource groups and disk device groups in Sun Cluster configurations

SYNOPSIS

scswitch -c -h node[,…] -j resource[,…] -f flag-name
scswitch {-e| -n} [-M] -j resource[,…]
scswitch -F {-g resource-grp[,…]| -D device-group[,…]}
scswitch -m -D device-group[,…]
scswitch -Q [ -g resource-grp[,…]]
scswitch -R -h node[,…] -g resource-grp[,…]
scswitch -S -h from-node [ -K continue_evac]
scswitch {-u| -o} -g resource-grp[,…]
scswitch -z -g resource-grp[,…] -h node[,…]
scswitch -z -g resource-grp[,…]
scswitch -z
scswitch -z -D device-group[,…] -h node
scswitch -Z [-g resource-grp[,…]]
阅读全文 »

Clusterware Console脚本分享

March 5th, 2008 1,534 Views

做Clusterware和RAC的测试的时候,节点多的时候,需要不停的在节点之间切换,而且容易出错,于是写了这样一个脚本,跟大家分享一下。
目前主要完成一些简单的功能,支持的平台有Linux,Solaris, AIX and HP,打算继续扩展。也欢迎使用并提出意见,

  1.  
  2. [ractest@sun880-1 ~]$ more console
  3. #!/bin/bash
  4.  
  5. #This script is used to control the whole cluster nodes in one interface
  6.  
  7. echo "******************************************************************"
  8. echo "                  Welcome to Cluster Console                      "
  9. echo "                                                                  "
  10. echo "The console is used to control the whole cluster nodes in one node"
  11. echo "now it can support start/stop stack,check stack status, process   "
  12. echo "priority, check node uptime and will support more in the future   "
  13. echo "                                                                  "
  14. echo "  Any bug or comment please report to ricky.zhu@gmail.com        "
  15. echo "******************************************************************"
  16.  
  17. get_nodename () {
  18.   $CH/bin/olsnodes -n > tmp
  19.   name=`head -n $1 tmp | tail -1 | awk ‘{print $1}’`
  20.   echo "$name"
  21. }
  22. check_uptime() {
  23.  
  24.   nl=`$CH/bin/olsnodes `
  25.   for node in $nl
  26.   do
  27.     echo "node=$node"
  28.     $RSH $node "hostname; date; /usr/bin/uptime"
  29.   done
  30. }
  31.  
  32. UNAME=‘/bin/uname’
  33. PLATFORM=`$UNAME`
  34.  

阅读全文 »

High availability cluster

February 19th, 2008 1,926 Views

摘自于维基百科

High-availability cluster
From Wikipedia, the free encyclopedia

High-availability clusters (also known as HA Clusters or Failover Clusters) are computer clusters that are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant computers or nodes which are then used to provide service when system components fail. Normally, if a server with a particular application crashes, the application will be unavailable until someone fixes the crashed server. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as Failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate filesystems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

HA clusters are often used for critical databases, file sharing on a network, business applications, and customer services such as electronic commerce websites.

HA cluster implementations attempt to build redundancy into a cluster to eliminate single points of failure, including multiple network connections and data storage which is multiply connected via Storage area networks.

HA clusters usually use a heartbeat private network connection which is used to monitor the health and status of each node in the cluster. One subtle, but serious condition every clustering software must be able to handle is split-brain. Split-brain occurs when all of the private links go down simultaneously, but the cluster nodes are still running. If that happens, each node in the cluster may mistakenly decide that every other node has gone down and attempt to start services that other nodes are still running. Having duplicate instances of services may cause data corruption on the shared storage.
Node configurations
ha cluster
阅读全文 »


Close
E-mail It