Thursday, May 5, 2011

Complete TSM Disaster Recovery

Link : http://ezaix.blogspot.com/2008/01/complete-tsm-disaster-recovery.html
Complete TSM Disaster Recovery
These are my notes from the Mercedes Benz DR Test .
To restore the TSM we must have at least the DB backup and storage pool backups. These are two most essential things, rest all can be recreated. It helps if you have the volume history and device configuration file as well, although most of the time if TSM is being restored on a different server you will end up recreating the device config. If you have DR plan that would be the best.

Now to begin with, the original environment was TSM as a partition on p595 with a 3584 partitioned Library with 5 drives. The restoration environment was a similar partition on p570, 3584 with 10 drives. The test was done at vendor location who provided hardware with partitions and vanilla AIX installation.


Make sure you have CDROM drivers installed or at least have CDROM properly configured on one of the partitions from where you can NFS mount it to TSM partition. We ended up spending half hour in searching and installing drivers from AIX OS CDs by FTPing the bff files from a PC to TSM partition.
Install TSM software. Match the version and fix level of source server. Do not forget to run lppchk and comment out the automatic start script in /etc/inittab before you reboot the server.
Check the size of DB and log files from DR plan and create them with dsmfmt.
Initialize the logs and db files with dsmserv format. This would put in the entries for these files in dsmserv.dsk file in TSM installation directory.
Set your ulimits to unlimited, export LANG=en_US and run dsmserv. It will start TSM server and would give you TSM prompt. Here set the server name, then create device configurations in following order:
define library
define path for library
define drive
define path for drives
define devclass
Make sure the device names used while defining path to drives have the same serial number as the serial number of physical drive in the element used in defining the drive. The AIX device numbers (rmt0, rmt1 etc.) are not necessarily in order. Use lscfg -vl and lsattr -El for this. To speed up the restoration, just define one drive and start, rest can be done once TSM is restored and clients have started restoring their files. We do not need disk storage pool as no client is going to backup their data here and all restores will happen from tapes only.
Halt the TSM server. Restore the original dsmserv.opt from DR plan or any other location and run backup devconfig. This will write the newly created devconfig to files specified in dsmserv.opt. Copy this devconfig, we will need it again and it gets overwritten when we restore DB.
Copy the original volume history file to the place pointed by dsmserv.opt.
Put in the tapes in library.

While using the 3584 library, TSM puts in the location of volumes as a line in device config file, I don't remember seeing this in 3590s, may be because that was an intelligent library. Get the location of DB backup tape from 3584 web interface and write a single line in device config with the location of DB backup volume. (Spent another half hour figuring this out). The line would look like this: /* LIBRARYINVENTORY SCSI 3584_LIB 000002 1605 101*/
Where first column indicates its library inventory, second column states the type of library it is, third column library name, fourth column volume name/serial no, fifth column is the element address and the sixth and last column is logical library number.
Run dsmserv restore db with todate option. TSM checks the volhist file for latest DB backup and then checks device config for its location. The volume is mounted, DB restored and the log files are formatted again. The progress of log format and DB restore is mixed together in console so you have to look hard to monitor the DB restoration progress.
Once the DB is restored, the TSM dismounts the volume and halts. The device config is now overwritten with original TSM's device config, so copy it and overwrite it with the one we created for this environment.
Start TSM by issuing dsmserv. Once you get the prompt, checkin the volumes with search=yes and status=private.
Mark all primary storage pool volumes' access as destroyed.
Update all copy storage pool volumes' access as readonly.
Disable all admin schedules for backup storage pool, expiration or reclamation.
Halt the server.
Start the server normally. You should be good to restore the data from TSM now.

1 comment: