Did you ever wonder why virtual server I/O performance is slow? You have the most up to date virtualization software, the newest versions of Windows Server on the guests, and some fast storage, so why is there poor performance? There are 3 main reasons this can be happening:
- Virtual servers share resources on the host, busy systems may not be getting the resources they need
- All the I/O goes to one array, increasing the potential for I/O bottlenecks
- Proper file system maintenance isn’t being done on guest servers, impacting performance
Problem 1 can be addressed with some tweaks to the VM priorities. The vSphere Resource Management Guide does a good job of provide insight on how to do this https://docs.vmware.com/en/VMware-vSphere/7.0/vsphere-esxi-vcenter-server-70-resource-management-guide.pdf
Problems 2 and 3 are related, and the intent of this article is to explain how proper guest server file system maintenance eliminates I/O bottlenecks and that improves overall VM performance by 25-40%.
File System Operation
Windows is the dominant OS on most VMs, so we are going focus there. Not surprisingly, the Windows file system (NTFS) is responsible to all I/O on Windows systems and it also creates some of the problems that impact I/O performance. The following paragraphs explain how NTFS, in the course of its normal operations, creates I/O bottlenecks that impact performance.
When a user “saves” a file NTFS reserves chunks of logical address space on the target disk. These chucks of space are recorded in the file record in the MFT. It then sends information about each chunk of space, along with the appropriate amount of file data, to the storage controller in the form of a SCSI command. If NTFS saved the file in 100 chunks of space, it will take 100 SCSI commands to move the file to the controller.
The controller takes the incoming SCSI commands and maps them to its view of the array. An incoming file delivered in 100 SCSI commands will generate at least 100 disk I/O to the array (1 SCSI command = 1 disk/O). Every time a user requests this file it will require 100 SCSI commands and 100 disk I/O; when you consider all the files on the system and all the file requests, it is easy to see how the SCSI workload can cause I/O contention at the HBA-LUN queue. Since the SCSI workload also drives the number of disk I/O generated, an increased SCSI workload also increases disk latency.
File System Maintenance
As we described above, the number of SCSI commands sent to the storage controller is determined by how many chunks of logical address space NTFS reserved to save a file. To reduce the SCSI workload, you must reduce the number of chunks of logical address space. VMware understands this, that’s why the vSphere documentation recommends to “Defragment the file systems on all guests” . Scroll to Disk I/O Performance Enhancement Advice https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-174326D5-238B-48CA-B030-02009E388523.html Microsoft has also written extensively on the importance of defragmentation.
Guest defragmentation consolidates the chucks of address space into a single chunk of contiguous space. This means the file can be accessed in a single, larger SCSI command. When all the files on a network are optimized in this way, there is a significant reduction in the total SCSI workload that eliminates any I/O contention at the LUN queue. The SCSI workload reduction results in the storage controller issuing fewer and larger disk I/O which improves disk latency.
Quantifiable Results with PerfectDisk vSphere or PerfectDisk Hyper-V
A UK company had a large SQL, SharePoint and Exchange environment running on VMware. The IT team had HP come and monitor their 3PAR array for a week before and after running PerfectDisk. After PerfectDisk ran, HP reported the following:
- 31% improvement in total IOPS
- 80% improvement in disk latency
- 43 hours slashed from full system backup
The only significant action on the system that could have caused these results was PerfectDisk. The company determined that the resources they were saving would let them add 110 more VMs with no additional hardware.