Castle Rock, Colorado is a town of 50,000 people located in the foothills of the Rocky Mountains, about 30 miles south of Denver. The town’s IT department supports a full range of municipal government applications in support of public safety, utilities, public works, fire, police and finance. The IT environment largely runs on VMware with 8 ESXi hosts and 50 virtual machines in a combination of Windows 2003, Windows 2008 and Windows 2008R2 servers. The server population is comprised of SQL servers, application servers, file servers and file shares. The bulk of the servers are application servers and the backing storage for the virtual servers is a NetApp FAS2040 SAN.
The Problem: System I/O Issues and Backing Up 7 TB+ of Data
The Castle Rock IT team has a mandate to do weekly full system backups of its 7+terabytes of data. The team uses a state-of-the-art CommVault enterprise backup platform that has 4 drives totaling 10TB of storage. Due to the town’s backup retention requirements, the backups are done in a disk-to-disk-to-tape sequence. The problem was that the backups were taking over 72 hours to complete, with a negative impact on system performance and user response time. With backup times this long, there was no margin for error in the backup process; the IT staff could not afford restarts. Furthermore, as data volume increased, the problem was only going to get worse.
The IT staff looked at all the usual areas that are considered when I/O issues arise. The backup system was tweaked and optimized according to specifications. The network, always a suspect when bottlenecks are an issue, was analyzed and adjustments were made. There were meetings with HP to troubleshoot the disk drives; and while all of these adjustments did provide some improvement, it just wasn’t enough. Despite all the team’s efforts, backups that were starting at 5pm on Friday were still completing on Monday or Tuesday. Something had to be done.
The Culprit: Fragmented Thin-Provisioned Virtual Servers & Backup Disks
John Kilman, Castle Rock’s Server Administrator, did some research and came across an article that indicated file and free space fragmentation can affect system performance. John wondered if file and free space fragmentation might be part of the backup problem. After all, it takes longer to read and write a fragmented file, and backup is a read/write-intensive process.
To read more, download case study: 33% Savings in Backup Time of VMware Environment