Virtualization and Benchmarking
The old phrase, ‘what cannot be measured cannot be improved’ is a favorite amongst those in the computer industry – and it contains more than a kernel of truth.  That logic has been behind the establishment of a variety of industry organizations, such as SPEC and TPC, which seek to establish standard benchmarks for various workloads.
Virtualization is certainly one of the trendiest technologies and ripe for measurement.  The latest announcements of CPUs from Intel and AMD have all explicitly mentioned and showcased improvements in virtualization performance, along with a flurry of feature names like VPID, Pacifica, Nested Page Tables and Extended Page Tables.  Most of these I’ve described in prior article, but to summarize, a lot of these features are about shifting the burden of virtualization from software (that is, the VMM) onto the hardware by making some hardware operations more complete.  Take for example, THE VPID in Nehalem (incidentally AMD has had an equivalent feature for a while), which reduces transition times between VMs by about a third (compared to the prior generation Penryn).  While it’s great that VM transitions are faster, it’s really hard for an average user to understand what that really means in terms of virtualization performance.
Unfortunately, virtualization benchmark is a very complex and difficult subject.  To start with – it’s rather hard to define.  Virtualization by itself is not useful and isn’t important to measure.  Server users ultimately care about the performance of their virtualized workloads – which is pretty well understood with benchmarks like sAP sales & distribution (SAP SD).  In theory, a user could run SAP SD on raw hardware and then on the same hardware with virtualization and measure the performance changes. That’s by far the simplest and easiest test – but it also doesn’t really capture the complexity of virtualization. Â
The aggregate performance of a server across a set of virtualized workloads is much more interesting.  It reflects reality, how virtualization is used in the data center, much better and also adds in the complexity of multiple VMs contending for CPU, memory and I/O.  The latter should help to distinguish between the mediocre and excellent platforms.  Next, there is the question of what workloads to put in the VMs – should they be homogeneous (all print servers on a single piece of hardware) or heterogeneous (mixing DBMS, file servers, etc. and if so, what should the mix be)?  Those questions are all rather difficult and depend on what IT staff usually do in the real world.
Once these questions have been answered, it will give users a clearer picture of what features actually provide real value for virtualization. Right now, there are no standard benchmarks unfortunately, although SPEC has a working committee that has focused on this problem since 2006. Â
Of course, the work won’t stop there as there are other interesting questions – how to assess the power efficiency of a virtualized solution, or compare different architectures, but some standardized benchmarks would go a long way towards helping virtualization become more buyer friendly.














Add Your Comment