Veritas Cluster Server

Running NetBackup as a clustered Master server service group provides a level of resiliency and redundancy which is not available to a standalone NetBackup master server. Vertias cluster, server which is part of the Storage Foundation/Infoscale suite from Veritas, is tightly integrated with NetBackup to the extent that the NetBackup installation process will recognise if VCS is installed on the hosts and configure all the relevant NetBackup cluster resources accordingly.

 

Troubleshooting a VCS clustered master server can seem daunting for some administrators, hopefully the tips below will help. It is assumed that the default resource names have been used in the following examples.

 

Prior to troubleshooting the cluster should be persistently frozen to prevent the nbu_group from failing over

# haconf makerw

# hagrp -freeze nbu_group -persistent

A sign that a resource has not started correctly or is having a problem is when the service group is in a partial state. To confirm which resource is causing the service group to be in a partial state

# hares -display -group nbu_group | egrep -i line

From the above output to check the status of a resource which is causing the PARTIAL state

# hares -display <resource_name> | more

The NBU resources can be taken offline manually one at a time and consequently they can be started manually one at a time, which can be very helpful in troubleshooting. Prior to taking resources offline check the dependencies of the resource so they can be taken offline in the correct order

# hares -dep

If there are any services still running which are preventing the resource from coming offline manually then they will be listed in the engine_A log and can be killed manually.

The nbu_server resource will be taken offline first, this stops the NetBackup services from running

The nbu_mount resource being stopped will unmount /opt/VRTSnbu

The nbu_vol resource being stopped will stop the underlying nbu_vol volume which is mounted as /opt/VRTSnbu

 

The nbudg resource being stopped will deport the nbudg diskgroup

# hares -offline nbu_server -sys `hostname`

# hares -offline nbu_mount -sys `hostname

# hares -offline nbu_vol -sys `hostname`

# hares -offline nbudg -sys `hostname`

The following command will confirm that the nbudg diskgroup has been deported

# vxdisk -o alldgs list

It is not possible to take the nbu_nic resource offline.

 

Once the resources are offline apart from the nbu_nic resource, they can be started up in reverse order thereby ensuring that each resource comes online correctly before the next dependent resource is started. If one of the resources fails to come online then troubleshooting efforts should be focused on that resource

# hares -online nbudg -sys `hostname`

# hares -online nbu_vol -sys `hostname`

# hares -online nbu_mount -sys `hostname`

# hares -online nbu_server -sys `hostname`

The following files are the configuration files which have clustered paths defined in them

 

/usr/openv/netbackup/bin/cluster/NBU_RSP - this file is built by the NBU install process and contains all the relevant information for the cluster such as node names, virtual ip address, monitored services etc

In the bp.conf ensure the following option points to the shared disk:
VXDBMS_NB_DATA = /opt/VRTSnbu/db/data

 

Check the paths in the vxdbms file point to the shared disk and the staging directory points to staging directory on the shared disk
/opt/VRTSnbu/db/data/vxdbms.conf
 

The path for the server.log in the file must also point to the shared disk
The /opt/VRTSnbu/var/global/server.conf

 

Ensure the paths within the /opt/VRTSnbu/var/global/database.conf file also point to the shared disk:
"/opt/VRTSnbu/db/data/NBDB.db" -n NBDB
"/opt/VRTSnbu/db/data/NBAZDB.db" -n NBAZDB

It is possible to start the NetBackup database which is under cluster control, outside of the cluster. In order to do this the nbudg diskgroup must be be imported and the shared disk started and /opt/VRTSnbu mounted. These steps can be completed by bringing the nbudg, nbu_vol and nbu_mount resources online. The nbu_server resource does not need to be started.

Prior to starting the NBU database the cluster symlinks must be in place. These symlinks are created when the nbu_server resource is brough online and they are removed when the nbu_server resource is taken offline. If the symlinks do not exist then the nbu_server resource would not start or the NBU database could not be started outside of cluster control.

 

To manually create the symlinks, after confirming they are listed in the RSP file

/opt/VRTSnbu/netbackup/db -> /usr/openv/netbackup/db
/opt/VRTSnbu/var/global -> /usr/openv/var/global
/opt/VRTSnbu/volmgr/misc/robotic_db -> /usr/openv/volmgr/misc/robotic_db
/opt/VRTSnbu/netbackup/vault/sessions -> /usr/openv/netbackup/vault/sessions

 

l# ln -s /opt/VRTSnbu/netbackup/db /usr/openv/netbackup/db

# ln -s /opt/VRTSnbu/var/global /usr/openv/var/global

# ln -s /opt/VRTSnbu/volmgr/misc/robotic_db /usr/openv/volmgr/misc/robotic_db

# ln -s /opt/VRTSnbu/netbackup/vault/sessions /usr/openv/netbackup/vault/sessions

Ensure pbx is running otherwise the NBU database will not start

 

Start the database with the following command

# /usr/openv/netbackup/bin/nbdbms_start_stop start

If the DB will still not start then it could be the case that one of the config files is missing from /usr/openv/var/global (/opt/VRTSnbu/var/global).


To check this you can try starting the DB, which will show as the NB_dbsrv process, with

# /usr/openv/db/bin/nbdbms_start_server

This shell script will check for the existence of the conf files and throw an error if one of them does not exist

With the database running it is now possible to start nbemm and query the database

# /usr/openv/netbackup/bin/nbemm

If these steps work then it has been confirmed that the NetBackup database can be started outside of cluster control. Once the testing has been finished then the nbemm and NB_dbsrv process can be stopped.

# nbemm -terminate

# nbdbms_start_stop stop

Once the NB_dbsrv and nbemm processes have been stopped then it would be recommended to shut the resources down again ensuring the symlinks have been removed and bring the nbu_server online via VCS control