SunCluster ucmmd问题解决过程

最近,经常遇到SunCluster中有一个节点ucmm起不来的问题,现象就是scstat -g输出的结果显示ucmmd is not running,十分郁闷
STIT的弟兄们帮助解决了几次,但是也不知所以然,这次刚解决了,又坏了,恼火。

search了Sun的网站,找到了scswitch的用法,仔细看了一遍,然后用了两个命令搞定了。

1.先用ucmmd把ucmm的process重新启动一次:

#/usr/cluster/lib/ucmm/ucmmd -r /usr/cluster/lib/ucmm/ucmm_reconf

2. 然后用scswitch把相关的group resource offline/online一次,结果OK

# /usr/cluster/bin//scswitch -R -h xxx -g rac-framework-rg
#xxx is the node name

scswitch的用法记录一下,以备后患

scswitch(1M)

scswitch– perform ownership and state change of resource groups and disk device groups in Sun Cluster configurations

SYNOPSIS

scswitch -c -h node[,…] -j resource[,…] -f flag-name
scswitch {-e| -n} [-M] -j resource[,…]
scswitch -F {-g resource-grp[,…]| -D device-group[,…]}
scswitch -m -D device-group[,…]
scswitch -Q [ -g resource-grp[,…]]
scswitch -R -h node[,…] -g resource-grp[,…]
scswitch -S -h from-node [ -K continue_evac]
scswitch {-u| -o} -g resource-grp[,…]
scswitch -z -g resource-grp[,…] -h node[,…]
scswitch -z -g resource-grp[,…]
scswitch -z
scswitch -z -D device-group[,…] -h node
scswitch -Z [-g resource-grp[,…]]

DESCRIPTION

The scswitch command moves resource groups or disk device groups to new primary nodes. It also provides options for evacuating all resource groups and disk device groups from a node by moving ownership elsewhere, bringing resource groups or disk device groups offline and online, enabling or disabling resources, switching resource groups to or from an Unmanaged state, or clearing error flags on resource groups.

You can run the scswitch command from any node in a Sun Cluster configuration. If a device group is offline, you can use scswitch to bring the device group online onto any host in the node list. However, once the device group is online, a switchover to a spare node is not permitted. Only one invocation of scswitch at a time is permitted.

Do not attempt to kill an scswitch operation that is already underway.

There are ten forms of the scswitch command, each specified by a different option. See SYNOPSIS and OPTIONS.

change error flag (-c)

Clears the specified error flag-name on one or more resources on the specified nodes.
enable or disable (-e or -n)

Enables or disables the specified resources.
take offline (-F)

Takes the specified resource-grps or device-grps offline on all nodes.
set maintenance mode (-m)

Takes the specified disk device-grps offline from the cluster for maintenance. The resulting state survives reboots. If a disk device group is currently being accessed, this action fails and the specified disk device groups are not taken offline from the cluster. Disk device groups are brought back online by using the -z option. Only explicit calls to scswitch can bring a disk device group out of maintenance mode.
quiesce (-Q)

Brings the specified resource-grps to a quiescent state. This option stops these resource-grps from continuously bouncing around from one node to another in the event of the failure of a START or STOP method.
restart (-R)

Takes the specified resource-grps offline and then back online on the specified primary nodes of the resource groups. The specified nodes must be the current primaries of the resource groups.
evacuate or switch all (-S)

Attempts to switch over all resource groups and disk device groups from the specified from-node to a new set of primaries. The system attempts to select new primaries based on configured preferences for each group. All evacuated groups are not necessarily remastered by the same primary. If one or more resource groups or disk device groups cannot be evacuated from the specified from-node, the command fails, issues an error message, and exits with a nonzero exit code.
unmanage or manage (-u or -o)

Takes the specified resource-grps to (-u) the unmanaged state or takes the specified unmanaged resource-grps out of (-o) the unmanaged state.

The -o option brings the specified resource-grps under Resource Group Manager (RGM) management so that the RGM attempts to bring the resource groups online.
set primaries (-z)

Causes the orderly transfer of one or more resource-grps or disk device-grps from one primary node in a Sun Cluster configuration to another node in the configuration (or to multiple nodes for resource groups that are configured with multiple primaries). This option takes resource groups offline and brings disk device groups back online after being in maintenance mode. This option also brings all or selected resource groups online on their most-preferred node or nodes. This option does not, however, enable any resources, enable monitoring on any resources, or take any resource groups out of the unmanaged state, as the -Z option does.
bring online (-Z)

Enables all resources in the specified resource-grps, enables monitoring on all resources, manages groups, and brings the groups online on the default list of primaries.

OPTIONS

The ten forms of the scswitch command are specified by the following options:

-c

Clears the -f flag-name on the specified set of resources on the specified nodes. For the current release of Sun Cluster software, the -c option is only implemented for the Stop_failed error flag. Clearing the Stop_failed error flag places the resource into the offline state on the specified nodes.

If the Stop method fails on a resource and the Failover_mode property of the resource is set to Hard, the RGM halts or reboots the node to force the resource (and all other resources mastered by that node) offline.

If the Stop method fails on a resource and the Failover_mode property is set to a value other than Hard, the individual resource goes into the Stop_failed state and the resource group is placed into the Error_stop_failed state. A resource group in the Error_stop_failed state on any node cannot be brought online on any node, nor can it be edited (you cannot add or delete resources or change resource group properties or resource properties). You must clear the Stop_failed state by performing the procedure documented in the Sun Cluster Data Services Installation Guide for Solaris OS.
Caution – Caution –

Make sure that both the resource and its monitor are stopped on the specified node before you clear the Stop_failed flag. Clearing the Stop_failed error flag without fully killing the resource and its monitor can lead to more than one instance of the resource executing on the cluster simultaneously. If you are using shared storage, this situation can cause data corruption. If necessary, as a last resort, execute a kill(1) command on the associated processes.
-e or -n

Enables (-e) or disables (-n) the specified resources.

You cannot disable a resource without also disabling all resources that depend on that resource. Conversely, you cannot enable a resource unless all of the resources on which that resource depends are also enabled. Once you have enabled a resource, it goes online or offline depending on whether its resource group is online or offline. A disabled resource is immediately brought offline from all of its current masters and remains offline regardless of the state of its resource group.
-F

Takes the specified resource-grps (-g) or device-groups (-D) offline on all nodes.

When the -F option takes a disk device group offline, the associated VxVM disk group or Solstice DiskSuite diskset is unported or released by the primary node. Before a disk device group can be taken offline, all access to its devices must be stopped and all dependent file systems must be unmounted. You must start an offline disk device group by issuing an explicit scswitch call, by accessing a device within the group, or by mounting a file system that depends on the group.
-m

Specifies the “set maintenance mode” form of the scswitch command.

The -m option takes the specified device-groups offline from the cluster for maintenance. Before a disk device group can be placed in maintenance mode, all access to its devices must be stopped and all dependent file systems must be unmounted. Disk device groups are brought back online by using the -z option.
-Q

Brings the specified resource-grps, which might be reconfigured, to a quiescent state. This form of the scswitch command does not exit until the resource-grps have reached a quiescent state in which they are no longer stopping or starting on any node.

If a Monitor_stop, Stop, Postnet_stop, Start, or Prenet_start method fails, on any resource in a group while the scswitch -Q command is executing, the resource behaves as if its Failover_mode property was set to None, regardless of its actual setting. Upon failure of one of these methods, the resource moves to an error state (either Start_failed or Stop_failed) rather than initiating a failover or a rebooting of the node.

When the scswitch -Q command exits, the specified resource-grps might be online or offline. You can determine their current state by executing the scstat(1M) command.

If a node dies during execution of the scswitch -Q command, execution might be interrupted, and, as a result, the resource groups are left in a non-quiescent state. If execution is interrupted, scswitch -Q returns a nonzero exit code and writes an error message to the standard error. In this case, you can re-issue the scswitch -Q command.
-R

Specifies the “restart” form of the command. The -R option moves the specified resource-grps offline and then back online on the specified primary nodes. The resource groups must already be mastered by all of the specified nodes.
-S

Specifies the “evacuate” or “switch all” form of the scswitch command.

The -S option switches all resource groups and disk device groups off the specified node. If not all groups owned by the given node can be successfully evacuated to a new set of primaries, the command exits with an error. If the primary ownership of a group cannot be changed to one of the other nodes, primary ownership for that group is retained by the original node.
-u or -o

Specifies the “change resource group state” form of the scswitch command.

The -u option takes the specified managed resource-grps to the unmanaged state. As a precondition of the -u option, all resources that belong to the indicated resource groups must first be disabled.

The -o option takes the specified unmanaged resource-grps to the managed state. Once a resource group is in the managed state, the RGM attempts to bring the resource group online.
-z

Specifies a change in mastery of a specified resource-grp or a disk device-grp.

When used with the -g and -h options, the -z option brings the specified resource-grps online on the nodes specified by the -h option and takes them offline on all other cluster nodes. If the node list specified with the -h option is the empty set, the -z option takes the resource groups specified by the -g option offline from all of their current masters. If one of the listed resource-grps is not capable of being mastered by node, an error is reported and no resource-grps are switched over. All nodes specified by the -h option must be current members of the cluster and must be potential primaries of all of the resource groups specified by the -g option. The number of nodes specified by the -h option must not exceed the setting of the Maximum_primaries property of any of the resource groups specified by the -g option.

When used with only the -g option, the -z option brings the specified resource-grps, which must already be managed, online on their most-preferred node or nodes. This form of scswitch does not bring a resource group online in violation of its strong RG_affinities, and writes a warning message if the affinities of a resource group cannot be satisfied on any node.

If you configure the RG_affinities properties of one or more resource groups, and you issue the scswitch -z -g command (with or without the -h option), additional resource groups other than those that are specified after the -g option might be switched as well. RG_affinities is described in rg_properties(5).

When used alone (scswitch -z), the -z switches all managed resource groups online on their most-preferred node or nodes.

When used with only -g or when used alone, the -z option only switches resources and groups online, unlike the -Z option. Resource groups that are unmanaged remain unmanaged, and resources that are disabled or that have monitoring disabled are left in the disabled state.

When used with the -D option, the -z option switches one or more specified device-groups to the specified node. Only one primary node name can be specified for a disk device group’s switchover. When multiple device-groups are specified, the -D option switches the device-groups in the order specified. If the -z -D operation encounters an error, the operation stops and no further switches are performed.
-Z

Enables all resources of the specified resource-grps and their monitors, moves the resource-grp into the managed state, and brings the resource-grp online on all the default primaries. When the -g option is not specified, the scswitch command attempts to bring all resource groups online.

You can combine the following options with the previous ten options as follows:

-D

Specifies the name of one or more device-groups.

This option is only legal with the -F, -m, and -z options.

You need solaris.cluster.device.admin RBAC authorization to use this command option with -F, -m, and -z (in conjunction with -h). See rbac(5).

You must also be able to assume a role to which the Sun Cluster Commands rights profile has been assigned to use this command. Authorized users can issue privileged Sun Cluster commands on the command line from the pfsh(1), pfcsh(1), or pfksh(1) profile shell. A profile shell is a special kind of shell that enables you to access privileged Sun Cluster commands that are assigned to the Sun Cluster Commands rights profile. A profile shell is launched when you run su(1M) to assume a role. You can also use pfexec(1) to issue privileged Sun Cluster commands.
-f

Specifies the error flag-name.

This option is only legal with the -c option.

The only error flag currently supported is Stop_failed.

You need solaris.cluster.resource.admin RBAC authorization to use this command option with -c. See rbac(5).

You must also be able to assume a role to which the Sun Cluster Commands rights profile has been assigned to use this command. Authorized users can issue privileged Sun Cluster commands on the command line from the pfsh(1), pfcsh(1), or pfksh(1) profile shell. A profile shell is a special kind of shell that enables you to access privileged Sun Cluster commands that are assigned to the Sun Cluster Commands rights profile. A profile shell is launched when you run su(1M) to assume a role. You can also use pfexec(1) to issue privileged Sun Cluster commands.
-g

Specifies the name of one or more resource-grps.

This option is only legal with the -F, -o, -Q, -R, -u, -z, and -Z options.

You need solaris.cluster.resource.admin RBAC authorization to use this command option with -F, -o, -R (in conjunction with -h), -u, -z (in conjunction with -h), or -Z. See rbac(5).

You must also be able to assume a role to which the Sun Cluster Commands rights profile has been assigned to use this command. Authorized users can issue privileged Sun Cluster commands on the command line from the pfsh(1), pfcsh(1), or pfksh(1) profile shell. A profile shell is a special kind of shell that enables you to access privileged Sun Cluster commands that are assigned to the Sun Cluster Commands rights profile. A profile shell is launched when you run su(1M) to assume a role. You can also use pfexec(1) to issue privileged Sun Cluster commands.
-h

Specifies the names of one or more cluster nodes.

This option is only legal with the -c, -R, -S, and -z options.

When used with the -c, -R, or -z option, the -h option specifies the target server (or list of servers in the case of resource groups configured with multiple primaries).

When used with the -S option, the -h option specifies the original server. A comma-delimited list of nodes can be specified after the -h option for resource-grps or device-groups that are configured with multiple primaries. In this case, if any of the listed primaries cannot master a particular resource-grp or device-group, resource-grp or disk device-group is not switched over.

You need solaris.cluster.resource.admin RBAC authorization to use this command option with -c, -R (in conjunction with -g), -S, and -z (in conjunction with -g). In addition, you need solaris.cluster.device.admin RBAC authorization to use this command option with -z (in conjunction with -D). See rbac(5).

You must also be able to assume a role to which the Sun Cluster Commands rights profile has been assigned to use this command. Authorized users can issue privileged Sun Cluster commands on the command line from the pfsh(1), pfcsh(1), or pfksh(1) profile shell. A profile shell is a special kind of shell that enables you to access privileged Sun Cluster commands that are assigned to the Sun Cluster Commands rights profile. A profile shell is launched when you run su(1M) to assume a role. You can also use pfexec(1) to issue privileged Sun Cluster commands.
-j

Specifies the names of one or more resources.

This option is legal only with the -c, -e, and -n options.

You need solaris.cluster.resource.admin RBAC authorization to use this command option with -c, -e, or -n. See rbac(5).

You must also be able to assume a role to which the Sun Cluster Commands rights profile has been assigned to use this command. Authorized users can issue privileged Sun Cluster commands on the command line from the pfsh(1), pfcsh(1), or pfksh(1) profile shell. A profile shell is a special kind of shell that enables you to access privileged Sun Cluster commands that are assigned to the Sun Cluster Commands rights profile. A profile shell is launched when you run su(1M) to assume a role. You can also use pfexec(1) to issue privileged Sun Cluster commands.
-K

Specifies the number of seconds to keep resource groups from switching back onto a node after that node has been successfully evacuated.

Resource groups cannot fail over or automatically switch over onto the node while that node is being evacuated, and, after evacuation is completed, for the number of seconds that you specify with this option. You can, however, initiate a switchover onto the evacuated node with the scswitch -z -g -h command before continue_evac seconds have passed. Only automatic switchovers are prevented.

This option is legal only with the -S option. You must specify an integer value between 0 and 65535. If you do not specify a value, 60 seconds is used by default.

You need solaris.cluster.resource.admin RBAC authorization to use this command option. See rbac(5).

You must also be able to assume a role to which the Sun Cluster Commands rights profile has been assigned to use this command. Authorized users can issue privileged Sun Cluster commands on the command line from the pfsh(1), pfcsh(1), or pfksh(1) profile shell. A profile shell is a special kind of shell that enables you to access privileged Sun Cluster commands that are assigned to the Sun Cluster Commands rights profile. A profile shell is launched when you run su(1M) to assume a role. You can also use pfexec(1) to issue privileged Sun Cluster commands.
-M

Enables (-e) or disables (-n) monitoring for the specified resources. When you disable a resource, you need not disable monitoring on it because both the resource and its monitor are kept offline.

This option is legal only with the -e and -n options.

You need solaris.cluster.resource.admin RBAC authorization to use this command option with -e and -n. See rbac(5).

You must also be able to assume a role to which the Sun Cluster Commands rights profile has been assigned to use this command. Authorized users can issue privileged Sun Cluster commands on the command line from the pfsh(1), pfcsh(1), or pfksh(1) profile shell. A profile shell is a special kind of shell that enables you to access privileged Sun Cluster commands that are assigned to the Sun Cluster Commands rights profile. A profile shell is launched when you run su(1M) to assume a role. You can also use pfexec(1) to issue privileged Sun Cluster commands.

EXAMPLES

Example 1 Switching Over a Resource Group

The following command switches over resource-grp-2 to be mastered by node1:

node1# scswitch –z –h node1 –g resource-grp-2

Example 2 Switching Over a Managed Resource Group Without Enabling Monitoring or Resources

The following command brings resource-grp-2 online if resource-grp-2 is already managed, but does not enable any resources or enable monitoring on any resources that are currently disabled.

node1# scswitch –z –g resource-grp-2

Example 3 Switching Over a Resource Group Configured to Have Multiple Primaries

The following command switches over resource-grp-3, a resource group configured to have multiple primaries, to be mastered by node1,node2,node3:

node1# scswitch –z –h node1,node2,node3 –g resource-grp-3

Example 4 Moving All Resource Groups and Disk Device Groups Off of a Node

The following command switches over all resource groups and disk device groups from node1 to a new set of primaries:

node1# scswitch –S –h node1

Example 5 Moving All Resource Groups and Disk Device Groups Persistently Off of a Node

The following command switches over all resource groups and disk device groups from node1 to a new set of primaries. The following command also shows how to prevent resource groups from automatically switching back onto that node after that node has been successfully evacuated. For example, this situation might occur if one of the resource groups failed to start on its new master. You prevent this situation from occurring by setting the -K option continue_evac to an integer number of seconds, in this example, two minutes. That is, by setting -K to 120, you prevent resource groups from switching back onto the evacuated node for two minutes. This situation arises when resource groups attempt to switch back automatically when strong negative affinities have been configured (with RG_affinities).

node1# scswitch –S –h node1 -K 120

Example 6 Restarting Some Resource Groups

The following command restarts some resource groups on the specified nodes:

node1# scswitch –R –h node1,node2 –g resource-grp-1,resource-grp-2

Example 7 Disabling Some Resources

node1# scswitch –n –j resource-1,resource-2

Example 8 Enabling a Resource

node1# scswitch –e –j resource-1

Example 9 Taking Resource Groups to the Unmanaged State

node1# scswitch –u –g resource-grp-1,resource-grp-2

Example 10 Taking Resource Groups Out of the Unmanaged State

node1# scswitch –o –g resource-grp-1,resource-grp-2

Example 11 Switching Over a Device Group

The following command switches over device-group-1 to be mastered by node2:

node1# scswitch –z –h node2 –D device-group-1

Example 12 Putting a Device Group Into Maintenance Mode

The following command puts device-group-1 into maintenance mode:

node1# scswitch –m –D device-group-1

Example 13 Quiescing Resource Groups

The following command brings resource groups RG1 and RG2 to a quiescent state:

node1# scswitch –Q -g RG1,RG2

EXIT STATUS

This command blocks until requested actions are completely finished or an error occurs.

The following exit values are returned:

0

The command completed successfully.
nonzero

An error has occurred. scswitch writes an error message to standard error.

If scswitch exits nonzero with the error message cluster is reconfiguring, the requested operation might have completed successfully, despite the error status. If you doubt the result, you can execute scswitch again with the same arguments after the reconfiguration is complete.

If scswitch exits nonzero with the error message Resource group failed to start on chosen node and may fail over to other node(s), the resource group will continue to reconfigure for some time after the scswitch command exits. Additional scswitch or scrgadm(1M) operations on that resource group will fail until the resource group has reached a terminal state such as Online, Online_faulted, or Offline on all nodes.

If you invoke the scswitch command on multiple resource groups and multiple errors occur, the exit value only reflects one of the errors. To avoid this possibility, invoke scswitch on just one resource group at a time.

Some operations are not permitted on a resource group (and its resources) whose RG_system property is True. See rg_properties(5) for more information.

This entry was posted in 主机 and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *