Chris' Musings

about

Name: Christian Thibodeau
From: Victoria, British Columbia, Canada
About me: I am a baby-boomer who is now in his second career after spending 10 years in the military. I now work in the computer field, which also happens to be one of my hobbies. I also like to sing, play the guitar to accompany my singing, and play the piano when I practice (which is pretty much never these days)
More..

Love and Hate

Hate : ---
Love : My wife and family
My music : ---
My books : ----

Links

Archives

Previous entries

Tagboard

empty box

write what you want here or just delete it :)

Monday, August 15, 2005

Shooting ourselves in the foot?

It would appear that the problem with the backups for the last couple of days stems from two things: size of the environment and configuration settings. A scheduler gets started every 20 minutes to run scheduled backups. When it starts, it goes out and verifies that it can talk to all the backup clients before running the jobs. Normally that is not a problem. Add to the mix the fact that if it cannot talk to a backup client it waits until the connect timeout before proceeding to the next client and now you can have problems.

We currently have just over 800 clients in this particular backup environment. Looking at the logs, it would appear to take about 5 seconds to communicate with each client. So if everything works perfectly, it takes about 66 minutes to go through the complete list of clients. Add in those clients that we cannot reach, and take into account that the connect timeout is 20 minutes, then it does not take many unreachable clients before you have a real problem. In our case last night, we had 9 clients that were unreachable: that's 180 minutes! Therefore, if the scheduler received the list of jobs to start at 7pm, it would be well after 11pm before those jobswould start.

I have reduced the connect timeout to 5 minutes, which is the default for Netbackup. At least now it will be less than 2 hours from submission to running if the same problem occurs again.

posted by Christian Thibodeau at 12:53 PM

| Permalink |

0 Comments: