[Ksummit-2012-discuss] [ATTEND] Memory management
mgorman at suse.de
Wed Jun 20 12:18:22 UTC 2012
On Tue, Jun 19, 2012 at 07:16:07PM +0200, Johannes Weiner wrote:
> I would like to discuss if there is interest in a framework for
> benchmark result comparison. Everyone seems to have their own
> (ad-hoc) benchmark suites with custom invocation, evaluation, and
> comparison scripts.
We should be wary of creating a new one because there are a few out there
already. autotest was created years ago although I have no idea how well
it is maintained. Phoronix Test Suite is also there and while I have never
used it myself it should certainly be considered in advance of creating a
new framework. There are others. I use MMTests myself but have been very
poor at releasing it and making the data from it available. By rights I
should have dumped the basis of MMTests a long time ago and reimplemented
everything in autotest but I didn't.
Here is an example report from it which is ugly as hell but it exists at
> To make both test and evaluation recipes easily
> exchangable, I hacked together some tools that take job spec files
> describing workload and data gathering, and evaluation spec files
> describing how to extract and present items from the collected data,
> to compare separate collection runs (kernel versions) at the push of a
MMTests does the same thing although because of the age of parts of it and
how quickly parts of it were actually implement it can be a bit messy. The
config files are there and also describe if what external monitors should
be run (such as vmstat, iostat, slabinfo etc).
> It also has individual tools that can be stuck together in
> shell pipes to quickly explore, plot etc. a set of data.
This is a mixed bag for me. Some of the scripts are properly split out
but the reporting is generally a monolithic mess that needs fixing. In
the past this was mostly a manual task that I'm only recently beginning to
address. It was all lower priority once the test data and basic
reporting was there.
> meant to replace existing suites, but rather to wrap them in job and
> evaluation recipes that are easy to modify, extend, and share. Maybe
> this can be a foundation for building a common benchmark suite that
> can be transferred and set up quickly, and includes agreed upon
> performance/behaviour metrics so people can do regression testing
> without much effort? Code for the daring is available here:
I've already done quite a bit of work on creating the test configurations
aimed at measuring particularly scenarios or focusing on a particular
area. Conceivably this could be ported to another framework if necessary but
would be very time consuming for me which is why I have been reluctant. Not
all of them have been properly validated yet as some are relatively new
and some double checking is required.
I glanced through statutils. The job config spec file feels similar to
autotest at least and sensible in concept. I took a different approach
with split out configuration and benchmark files. Not necessarily better or
worse, just what I did. I am completely inconsistent about how configuration
parameters are passed in to the scripts because of the history of some
of the scripts being ad-hoc and run differently. The benchmark scripts
should also have used macros but this is still possible to implement as
the scripts in the tarball are actually outputted by a basic preprocessor.
Your approach to data extraction and analysis make sense and what I should
have done in the first place but didn't because it was faster to hack
out a monolithic reporting script. So while MMTests has some analysis
stuff it's all wedged in together with reporting and makes no attempt to
automatically analyse if the results is "good" or "bad".
Either way, there is scope for a discussion on whether we want to adopt
a framework with some test configurations and how that might be better
distributed. It's not the first time we discussed this and the caveats
from the last time still apply but no harm.
One issue we should be aware of is that if such a framework becomes
popular then it means that everyone is just running the same test and is
not necessarily as valuable. This would be particularly true if we were
all running it in virtual machine instances with duplicate images and
More information about the Ksummit-2012-discuss