Backup of 19 GB in 10 to 11 hours? Something’s wrong.

comment 2 Comments Written by Anders on May 1, 2008 – 5:39 am

Another client example took 18 months to solve, mainly because the consultant on the project was called off to architect another backup solution, only to return to the original project to find that the problem was never remedied. This didn’t involve failed backup jobs, crashing servers, or anything else so drastic. This was a simple case of backup speeds. The client assured the consultant that there was a full T3 between their buildings (45 Mb/sec), so there ought to be blazing speeds for backup. (We won’t debate why they chose to backup across this T3 as opposed to putting another backup server in the other facility. Let’s just say budget was the primary reason.) So it was on the consultant’s shoulders to make this work and work well.

The numbers say that a full T3 should push 5.625 MB (that’s megabytes) per second, 337.5 MB per minute, and 20 GB per hour. The network that the client had in place for the servers was 100BT, which should push 12.5 MB/second, 750 MB/minute, or 45 GB/hour. As you can see, it definitely looked like our T3 was going to be the bottleneck for our backup jobs. However, as we began testing and tweaking, we found that the backups were taking exceptionally long for the remote servers. We made sure that the servers were configured properly- specifically, that they had their NIC settings to 100 FULL and not AUTO, since we know that some switches and a certain server manufacturer’s host-motherboard-Ethernet interface do not play well when set to AUTO-NEGOTIATE.

All of the network issues were worked out on our end; we made sure the administrators at the other location verified all of the network points as well, including server, switch, and router. They assured us everything was A-OK. After running the second test, the same performance numbers were realized. After making sure all of our network connection points were properly configured, we felt it was necessary to take it to the network group to have them look at the link between the buildings-maybe the backup T1 was actually primary and the T3 was secondary.

One thing we ought to mention, though, is that the IT team in the remote facility didn’t get along very well with their counterparts in the primary facility. This is never a good thing. Teamwork and intercommunication are key to any successful IT organization.

After the network department assured us that the T3 is primary and wasn’t even close to being fully utilized, we turned it back onto our network configurations between buildings. Naturally, the primary building where our consultant was sitting was reviewed a second time to make absolutely sure that it was configured properly. However, the IT team on the remote site didn’t feel it was necessary to check their work, since they told us they had already done that.

About this time, our consultant was pulled off to work the architecture job; he had instructed the customer that the problem appeared to be on the remote site, but without having physical access, it was difficult to prove. During the absence of our consultant, the test backup policies went into production and the backup speeds continued to return very poor results. Unfortunately, the customer didn’t pursue the issue of speed either, mainly because of personnel resource limitations.

Upon the consultant’s return to complete the project, he found that the problem still existed and the client had simply grown accustomed to the performance and just figured that’s the way it’s going to be. Well, not for our consultant; he insisted that the network group turn a sniffer (network analyzer) on to see exactly what was happening between the main backup server, the switch, router, and backup client. So, at 10:00 P.M. the network administrator found the problem. Everything was connecting perfectly at the primary site, 100 FULL DUPLEX, until we got across the T3. The consultant noted that it was dropping down to 10 Mb/sec after the router, which would explain the performance numbers.

With hard information in our hands, we approached the remote site again and asked them to walk the wire, because somewhere on their end, it was dropping down to 10 Mbps. Fortunately, we found an administrator in the remote site who was more than willing to work with us. He tracked down the problem to the switch. There he found the switch was not set for 100 FULL, but AUTO-NEGOTIATE. Once that was changed, we saw our backup speeds increase from 19 GB in 10 to 11 hours to 19 GB in just a little over 90 minutes.

Sometimes it is not the software configuration, hardware configuration, or the backup administration; it is just an oversight in the network architecture somewhere. This problem was different from the previous one in that the consultant was the de facto backup administrator for the group, so he didn’t necessarily have to ask any questions, but he did have to understand where the breakdown was occurring. Having an intimate understanding of how the backup product works is a considerable help when trying to troubleshoot these problems.

Bookmark or Share:
  • E-mail this story to a friend!
  • Technorati
  • StumbleUpon
  • Facebook
  • Google
  • del.icio.us
  • Digg
  • Slashdot
If you enjoyed the article, why not subscribe?

Browse Timeline

2 Responses to “Backup of 19 GB in 10 to 11 hours? Something’s wrong.”

  1. Hi. I am a long time reader. I wanted to say that I like your blog and the layout.

    Peter Quinn

  2. Thanks, Peter! :)

    By Robert on May 2, 2008 | Reply

Post a Comment

About The Author: Anders

Anders is a freelance graphic designer. He specializes in CSS/XHTML web design and design of print materials including business cards, brochures and flyer’s. You can view his portfolio at andershaig.com.

Want to subscribe?

SEO blog and web design related issues. Subscribe in a reader Or, subscribe via email:
Enter your email address: