my personal blog about systemcenter

Archive for November, 2011

 

Backup is one thing , Restore is another especially disaster recovery , in this blog post we will cover the scenario where everything is gone except the backup tapes , first blog post will cover getting the domain up and running again where following posts will cover fileservers , sql and exchange.

Data Protection Manager requires a functional domain to be able to recover any data from tape , so first step will always be to get the domain up and running again.

In this post we a production domain test.local with dc10.test.local as the sole domain controller and dpm10.test.local as the Data Protection Manager server.

For restore purpose we setup dctemp10.temptest.local and dpmtemp10.temptest.local to restore the data , so  a disaster recovery will always require staging of the restore data if the domain is lost.

Its always recommended to keep additional backup of the domain controllers outside of Data Protection Manager to ensure that we always no matter what happens can recover the domain and then start recovery

But always keep separate backup of domain controllers , there cant be to many backups if the media is located in a big safe somewhere Smiley

 

image

On the test.local domain we need a complete backup of the Domain Controller

image

And we will need the configuration database for Data Protection Manager

imageimage

After creating the Protection Groups create a manual recovery point to tape , both for the domain controller and the Data Protection Manager configuration database

image

For testing I am using Cristalink brilliant Virtual Tape Library for Data Protection Manager http://www.cristalink.com/fs/ , so here the backup is located on 3 virtual tapes.

image

We need a new domain to restore to until we can get the original production environment up and running so here we created a new domain and is adding a Data Protection Manager Server to the same domain.

 

image

To simulate adding tapes from the library we use FileStreams Import feature Load From File

image

And then we can see the same tapes as we had before the wipe of the Domain Controller and Data Protection Manager Server

image

image

We then need to start a detailed inventory so we can see what’s on the tapes

 

image

After the detailed inventory have completed we need to Recatalog Imported Tape to add the content of the tapes to the Data Protection Managers configuration database

image

We then need to recover the System Protection from the tapes we did a recatalog on

image

Only restore option for systemprotection is a network folder

 

image

And its time to start the restore

image

Data Protection Manager == Success

image

We then need to share the folder out so we can access it from the Windows 2008 R2 installation media

image

to Restore from the recovered data we need to start the Windows 2008 R2 installer , DHCP needs to be enabled for networking support

image

Select a repair

image

As there are no local images to restore from select next

image

And start networking

image

Connect to the DPM Server

image

With Credentials from the new temporary domain

image

Select the Image to restore

 

image

Start the ReImage

image

And Wait Smiley

image

For about 12 minutes on my test setup

image

And after a reboot and setting fixed ip address the domain is up and running , and we now have a working domain so we can start to restore the Data Protection Manager server and then start restore all remaining workloads.

 

One reoccurring task for many Data Protection Manger Administrators is to find out who broke their SQL Backup chain.

image

Data Protection Manager does a excellent job out of box notifying us when and why a job would fail

image

And the Management Pack for Operations Manager provided by Microsoft also picks up that alert and tells us what to do.

image

If we want additional information or perhaps want to catch errors before next synchronization we can make a simple Operations Manager Rule triggering on the SQL server logging

image

In Operations Manager Create a Event Based Alerting Rule

image

Target the Rule at SQL Database Engine and leave it disabled if you don’t want monitoring of all SQL instances

image

Select Application Log

image

And 18264 for Full Backup and 18265 for Log backup (still need to find failed backup event id’s) , and MSSQL server as source

image

and a eventdescription that does not contain DPM_SQL , for performance reasons its recommended that we don’t use description but find the event parameter the text is in, but since this is a low volume event performance shouldn’t suffer

 

image

And if the rule isn’t enabled for all SQL servers we need to create a override

image

And enable the rule for the SQL servers we want to monitor

 

image

And when we create a manual backup we will get a alert right away , the DPM alert triggers at next synchronization job

 

This is part 1 of 2 , next post will try to cover the steps for recovery when there is only tapes left.

Data Protection Manager require a domain to be able to work , this means that in a disaster recovery scenario we need to be able to get active directory up and running without the help of Data Protection Manager , this is in my opinion a big issue that everyone needs to step up on the soap box and yell about , adding the option to logon with a local account would speed up things and help out a lot , this would require that the site is alive and only the Active Directory is dead or that there is a 2nd Data Protection Server creating protection of critical workloads Offsite

Disaster Recovery can be triggered by complete site failure or rouge admin disabling all high privileged account locking admins out of the domain

The “workaround” is to schedule local backups with Windows Server Backup and then let Data Protection Manager back that up to tape as we can restore that from a “clean” build , but preferable copy the backup offsite or to tape directly on a server , this will be a cost issues on a lot of smaller sites but it cant be stressed enough that we need be to able to recovery Active Directory without Data Protection Manager.

This is in addition to the normal backup of domain controllers though Data Protection Manager, and would apply to every single backup vendor , always keep a separate native backup of active directory, auditors will complain but setting up a safe procedure for storage the additional backup is worth the effort

Reference : http://technet.microsoft.com/en-us/library/cc772519(WS.10).aspx

 

image

image

image

Setup a schedule

image

image

When destination is remote , the backup will be overwritten each day so its needs to keep some rotation on the destination to ensure that there is more than one generation to recover from if disaster strikes , and again if there can be  backup to tape it would be great

image

And we now have a WindowsImageBackup we can use if disaster strikes

 

I recently ran into a task where a customer wanted a simple alerting of all restores that was completed, so this was just a simple alerting rule that would trigger on completed jobs

image

Data Protection Manager logs into the DPM Alerts log on the DPM server all jobs including recovery jobs , so here we have DPM-EM as source and 3112 as Completed

image

In Operations Manager I created a new management pack called auditing and a NT Event Log Alert Generating Rule

image

The rule is target at DPM Seed provided in the Data Protection Management Pack provided by Microsoft , and Rule is disabled per default

image

We need to monitor the DPM Alerts Event Log

image

And event 3112 from DPM-EM , need to see if the event changed if a recovery operations fails but for the purpose this was “enough” to start with

image

Severity created as Informational as this is a permitted operation but just need auditing

image

On the rules tab find the newly created rule

image

And create a override “For a specific object of class : DPM seed”

image

And select the DPM server you want monitored , if all servers needs to be monitored without admin interaction keep the default enabled instead of disabled

image

And override the Enabled to True.

 

image

image

And on the DPM server verify that the Management Pack have been delivered and updated configuration is enabled

image

And on the DPM server select a object to recover

image

 

Recovery Job started

image

And we can verify in the event log that it have completed.

image

And in our Operations Manger Console we can see the alert , we are still missing meta data except destination but that can be looked up in Data Protection Manager

image

And to make the job easier for the Audit we can create a Notification Subscription

image

Give the subscription a name to remember

image

Set a resolution state as New for dependency for the alert so closing alerts wont send off a second mail

image

Select the Auditing mailbox as target for the alert

image

And try a recovery job again and verify that the alerting works

note: just a simple post about Auto Protection.

image

One of the great many features of Data Protection Manager 2012 is the ability to auto protect newly created databases on a instance being protected , this feature was introduced in Data Protection Manager 2010 and is still alive and kicking in DPM 2012.

image

When you create the protection group and drill down to the SQL Server , you can select the databases you want to protect , this gives control but lacks the feature of automatically adding new databases when created , so the DPM admin have to reply on the DBA to inform that a new database is created and is needing protection.

image

If you select the SQL01 root level on the SQL instance (Auto) will show next to the SQL server indicting that all future databases on that instance will be added automatically to the protection group ensuring that we always can restore when the business needs it.

One feature still missing is beta 2 , is to be able to set Auto for all databases except xyz , not sure if its something we will see in the near future but it would be a nice one.