Windows guest servers can rob 25-40% of virtual machine (VM) performance, and this gradual erosion goes largely unnoticed. Companies make an enormous investment in their virtual platform and the storage that supports it. There are a lot of advantages to a virtual environment, but better performance is not necessarily one of them. This blog explains how Windows guest servers can have an adverse impact on total IOPS, disk latency, read rates, and sequential I/O. It also explains VMware’s recommended solution to the problem.
Understanding the Problem
Windows was originally designed to run on a stand-alone computer, no consideration was given to it running as one of several guest servers on a platform designed by another vendor. It should be no surprise there are areas where the way Windows operates is not the friendliest to VMware. One of those areas involves how Windows saves and processes files.
To understand how Windows impacts VMware performance, we need to grasp how Windows creates and saves a file. Let’s say a user saves a 1GB file to a G: drive on a guest.
- First, the Windows file system (NTFS) creates a file record in the Master File Table (MFT) on the G: drive. The user provides a file name using “Save As…”, and NTFS assigns a File ID and a few attributes. There is a data structure inside the file record called the Extent List. The Extent List is empty at this point, but that is about to change.
- Next, NTFS accesses the $Bitmap File on the G: drive. The $Bitmap File was created by NTFS when the G: drive was formatted, it keeps track of which clusters on the drive are used and which are free. NTFS moves through the $Bitmap File and grabs chunks of free logical address space wherever it finds them until it has enough space to accommodate the file being saved. Each chunk of reserved space is recorded in the Extent List in the file record. The Extent List contains information telling you which piece of the file it is (VCN), the starting Logical Cluster Number (LCN), and the size of the piece in clusters (Run Length) as seen in Figure 1.
- For our example we will arbitrarily say NTFS “saved” the 1GB file in 100 chunks of 10MB each, so the Extent List will have 100 rows. The information in each row, plus 10MB of data, are sent across the hypervisor to the storage controller as a SCSI command (IOP).
- The storage controller maps the incoming SCSI information to its own view of the array and the controller software decides where to write that chunk of the file. If there are 100 incoming SCSI commands, the controller will issue a minimum of 100 disk I/O (striping can increase the I/O count per SCSI command).
- Any subsequent access to this file will require 100 SCSI commands and 100 disk I/O. When you consider all the users accessing millions of files every day, the SCSI and disk I/O workloads can have a big impact on LUN queue contention, disk latency, read rates and sequential reads.
This mechanism was developed for Windows NT4 in 1996 and has been the same in every Windows server and workstation version since. As explained here, the NTFS file saving process creates excess IOP and disk I/O processing that clogs the system, impacting performance and throughput.
VMware recognizes that Windows creates excess I/O and addresses the issue in its documentation. Under the topic, Disk I/O Performance Enhancement Advice, VMware recommends to “Defragment the file systems on all guests”. VMware has this absolutely correct, fragmentation and defragmentation occur at the file system level, not at the storage level. It is important to note that in their recommendation to defragment the file system on all guests, VMware doesn’t differentiate between flash-based storage and non-flash storage. Their recommendation is to defragment regardless of underlying storage.
File fragmentation occurs when NTFS “saves” those chunks of free space. For Windows and NTFS, Figure 1 above is a fragmented file since the Extent List has more than one entry. As described, the storage controller decides where data is written in the array. Windows and NTFS only deal with the MFT and Extent List, they have no idea what kind of storage is attached to the servers/hosts.
What will the recommended defragmentation do? Defragmentation software will scan the MFT and identify the Extent Lists with more than one entry, the fragmented files. Using the Windows FSCTL_MOVE_FILE IOCTL and its own algorithms, it will attempt to consolidate each file into a single, contiguous chunk of logical clusters as seen in Figure 2. Now, only a single, larger SCSI command is needed to convey the file to the controller, and that’s a 99% reduction in the SCSI workload!
While more attention is given to file fragmentation, the real culprit is free space fragmentation. Free space fragments are created when files are deleted. As free space fragmentation increases, the chunks of free space available per the $Bitmap File are smaller. As a result, free space fragmentation exacerbates file fragmentation. A good enterprise defragmentation tool should also consolidate the free space on the logical drive.
Defragmentation consolidates logical address space at the NTFS level so files can be sent to the controller in fewer, and larger SCSI commands. How does this affect everyday performance in a VMware environment? Testing was performed using VMware’s own vSCSIStats utility; the same tests were performed on identical sets of disks where one set was defragmented and the other was fragmented and here are the results.
- 28% improvement in total IOPS
- 49% improvement in disk latency
- 1200% improvement in large I/O ( >524K, largest size measured by vSCSIStats)
- 58% improvement in sequential I/O
In a real world environment, a UK company that had never defragmented any of their 340 VM had HP personnel monitor their 3PAR array before and after defragmentation. HP reported the following changes:
- 31% improvement in total IOPS
- 80% improvement in peak latency
- 33% improvement in core latency
- 43-hour improvement in full system backup time.
The company determined they could add 110 new VMs with no additional hardware.
Quirks and Concerns
As technologies evolve, issues arise and there are often mixed opinions on how to deal with them. This is certainly true of defragmentation in virtual environments. Here are several of the areas where defragmentation has adapted to new ways of doing things:
Change Block Tracking (CBT)– Backup products using CBT will create a lot of change blocks when a defragmentation occurs. A good enterprise defragmentation solution will offer the ability to set alerts/warnings that tell the user when a machine is having an issue (i.e., file fragmentation exceeds 20%). When notified, you only need to act on a single drive, minimizing any CBT concerns. Admittedly, there is a tradeoff between the impact of fragmentation versus the impact of CBT.
Thin-provisioned Drives– Enterprise defragmentation software will detect a thin-provisioned drive and use alternative defrag strategies that prevent blowing out the storage on the drive.
“Management By Exception” – This approach to defragmentation focuses on the systems that are having issues. Once all the VMs are optimized, the alerts/warnings in the defrag software notifies you when a system is having issues. There is no real need to defragment systems on a scheduled basis, only address the exceptions.
VMkernel Metrics and Defragmentation
There are several VMkernel internal settings that control how many I/O a VM can issue to a LUN before, in the interest of fairness, it must relinquish its turn and allow another VM to issue I/O to the same LUN. Defragmentation can get you extra I/O requests and more sequential I/O based on these VMkernel settings:
- Disk.SchedNumReqOutstanding – This setting controls the number of outstanding I/O requests a VM can have. Performance can be impacted by defragmentation’s influence on other settings described here. The default value for this setting is 64.
- Disk.SchedQuantum- This represents the maximum number of “sequential” I/Os allowed before a Hypervisor must switch to a different VM. For example, if a VM shows a pattern of sequential I/O you may want to allow it more I/O requests than shown in Disk.SchedNumReqOutstanding. The default value for Disk.SchedQuantum is 8, so these additional I/O requests are granted so sequential processing isn’t interrupted. But, how do you know a VM is doing sequential I/O?
- Disk.SectorMaxDiff– The default for this setting is 2000 sectors. If the next I/O is proximal to, within 2000 sectors of the previous I/O, then they are sequential. If the I/O is sequential, Disk.SchedNumReqOutstanding can be increased by the 8 I/O allowed by Disk.SchedQuantum.
This demonstrates how defragmentation can help squeeze more performance out of a VM by taking advantage of VMware actions to enforce fair distribution of resources. Defragmented guests increase the chance of any VM being granted additional I/O requests due to an increased incidence of sequential data.
When looking at enterprise defragmentation software there are several things you want to consider that can make its administration easier. Some of the basic functions include:
- Centralized management
- Ability to work with thin-provisioned drives
- Prevents clone or snapshot defragmentation
- Doesn’t interfere with operations
- Alerts and Warnings (identifies any single machine with issues)
- OU support
Defragmentation can be done on a regularly scheduled basis or on a selected basis based on what the situation is. An initial defragmentation can take some time to get things in order, but subsequent defragmentation runs are usually quick tasks.
As you have learned here, there is a reason VMware recommends to “Defragment the file systems on all guests.” File and free space fragmentation are natural byproducts of using Windows. Microsoft has published numerous articles on the need for defragmentation and they still include a defrag tool with Windows (albeit not designed for a VMware environment). With use, fragmentation will reduce virtual server performance by 25-40%.
The Windows mechanism that creates fragmentation is the same today as it was when Windows NT4 rolled out in 1996. Virtual servers running Windows should expect the same behavior observed on physical servers. As we have seen here, fragmentation creates excess and unnecessary I/O workloads. As a result, I/O bottlenecks can occur at the LUN queue level and at the disk level in the form of increased latency. Defragmentation resolves these issues.
Making file system defragmentation part of your routine maintenance allows you to get more from the investment in your virtual IT environment. Servers run at their potential performance levels, and system errors are minimized. Remember, no one moved to VMware to have slow servers.