13 Commits

Author SHA1 Message Date
Chris Wilson
9024a72d29 benchmark/gem_wait: poc for benchmarking i915_wait_request overhead
One scenario under recent discussion is that of having a thundering herd
in i915_wait_request - where the overhead of waking up every waiter for
every batchbuffer was significantly impacting customer throughput. This
benchmark tries to replicate something to that effect by having a large
number of consumers generating a busy load (a large copy followed by
lots of small copies to generate lots of interrupts) and tries to wait
upon all the consumers concurrenctly (to reproduce the thundering herd
effect). To measure the overhead, we have a bunch of cpu hogs - less
kernel overhead in waiting should allow more CPU throughput.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-30 15:04:55 +00:00
Chris Wilson
8253e7dc84 benchmarks: Measure BLT performance
Execute N blits and time how long they complete to measure both GPU
limited bandwidth and submission overhead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-06 10:24:07 +01:00
Chris Wilson
38b3bd6b7c benchmarks: Add a microbenchmark for relocation overhead
Allow specification of the many different busyness modes and relocation
interfaces, along with the number of buffers to use and relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-11 15:31:02 +01:00
Chris Wilson
4c74a683c1 benchmarks: Do not install to system-wide bin/
These benchmarks are first-and-foremost development tools, not aimed at
general users. As such they should not be installed into the system-wide
bin/ directory, but installed into libexec/.

v2: Now actually install beneath ${libexec}

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-10 15:53:08 +01:00
Chris Wilson
0393e7288b benchmarks: Record and replay calls to EXECBUFFER2
This slightly idealises the behaviour of clients with the aim of
measuring the kernel overhead of different workloads. This test focuses
on the cost of relocating batchbuffers.

A trace file is generated with an LD_PRELOAD intercept around
execbuffer, which we can then replay at our leisure. The replay replaces
the real buffers with a set of empty ones so the only thing that the
kernel has to do is parse the relocations. but without a real workload
we lose the impact of having to rewrite active buffers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-09 19:20:46 +01:00
Chris Wilson
cd306d4e65 benchmark: Measure allocation time for objects
A basic measurement, how fast can we create and populate an object with
backing storage?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-24 18:56:00 +01:00
Chris Wilson
e984d4965f benchmarks: Benchmarkify gem_exec_ctx
Measure the overhead of execution when doing nothing, switching between
a pair of contexts, or creating a new context every time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-24 18:55:49 +01:00
Chris Wilson
d88981f62b benchmarks: Measure round-trip time for an immediate vblanks
By measuring both the query and the event round trip time, we can make a
reasonable estimate of how long it takes for the query to send the
vblank following an interrupt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-23 15:52:53 +01:00
Chris Wilson
f8628a2c98 benchmarks: Add simple mmap benchmarks
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-23 12:20:43 +01:00
Chris Wilson
f689e2aa81 benchmarks: Add simple pread/pwrite benchmarks
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-23 12:20:05 +01:00
Chris Wilson
b7c33e0939 benchmarks: Benchmarkify gem_exec_nop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-22 15:14:05 +01:00
Tvrtko Ursulin
d3057d7a1e tests/gem_userptr_benchmark: Benchmarking userptr surfaces and impact
This adds a small benchmark for the new userptr functionality.

Apart from basic surface creation and destruction, also tested is the
impact of having userptr surfaces in the process address space. Reason
for that is the impact of MMU notifiers on common address space
operations like munmap() which is per process.

v2:
  * Moved to benchmarks.
  * Added pointer read/write tests.
  * Changed output to say iterations per second instead of
    operations per second.
  * Multiply result by batch size for multi-create* tests
    for a more comparable number with create-destroy test.

v3:
  * Use ALIGN macro.
  * Catchup with big lib/ reorganization.
  * Removed unused code and one global variable.
  * Fixed up some warnings.

v4:
  * Fixed feature test, does not matter here but makes it
    consistent with gem_userptr_blits and clearer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-25 17:48:49 +02:00
Tvrtko Ursulin
5d7649690c benchmarks: Build them on Android.
They build fine so give them some exposure.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2014-04-24 13:49:20 +01:00