61 Commits

Author SHA1 Message Date
Chris Wilson
b68a6428db benchmarks: Add a set-domain benchmark
Benchmark the overhead of changing from GTT to CPU domains and vice
versa. Effectively this measures the cost of a clflush, and how well the
driver can avoid them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-22 20:54:04 +00:00
Chris Wilson
4c14aa18c1 benchmarks/gem_blt: Fixup a couple of non-llc foibles
When extending the batch for multiple copies, we need to remember to
flag it as being in the CPU write domain so that the new values get
flushed out to main memory before execution. We also have to be careful
not to specify NO_RELOC for the extended batch as the execobjects will
have been updated but we write the wrong presumed offsets. Subsequent
iterations will be correct and we can tell the kernel then to skip the
relocations entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-12 10:54:11 +00:00
Thomas Wood
2643793255 Fix comparison of unsigned integers
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-11-11 14:20:55 +00:00
Chris Wilson
3bc3ab27ea benchmarks: Add README
Add a README to introduce the ezbench.sh benchmark runner.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-10 14:04:58 +00:00
Chris Wilson
5cabb8c543 benchmarks/gem_blt: Report peak throughput
Report the highest throughput measured from a large set of runs to
improve sensitivity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-10 14:04:58 +00:00
Chris Wilson
ce65232cf5 benchmarks/gem_wait: Remove pthread_cancel()
Apparently the pthread shim on Android doesn't have pthread cancellation,
so use the plain old volatile to terminate the CPU hogs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-30 15:51:21 +00:00
Chris Wilson
9024a72d29 benchmark/gem_wait: poc for benchmarking i915_wait_request overhead
One scenario under recent discussion is that of having a thundering herd
in i915_wait_request - where the overhead of waking up every waiter for
every batchbuffer was significantly impacting customer throughput. This
benchmark tries to replicate something to that effect by having a large
number of consumers generating a busy load (a large copy followed by
lots of small copies to generate lots of interrupts) and tries to wait
upon all the consumers concurrenctly (to reproduce the thundering herd
effect). To measure the overhead, we have a bunch of cpu hogs - less
kernel overhead in waiting should allow more CPU throughput.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-30 15:04:55 +00:00
Derek Morton
0ab76a22d1 benchmarks/gem_blt: Include igt.h in gem_blt.c
To fix a build error on android

Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-10-15 16:59:59 +01:00
Ville Syrjälä
f52e7ec787 Replace __gem_mmap__{cpu,gtt,wc}() + igt_assert() with gem_mmap__{cpu,gtt,wc}()
gem_mmap__{cpu,gtt,wc}() already has the assert built in, so replace
 __gem_mmap__{cpu,gtt,wc}() + igt_assert() with it.

Mostly done with coccinelle, with some manual help:
@@
identifier I;
expression E1, E2, E3, E4, E5, E6;
@@
(
-  I = __gem_mmap__gtt(E1, E2, E3, E4);
+  I = gem_mmap__gtt(E1, E2, E3, E4);
...
-  igt_assert(I);
|
-  I = __gem_mmap__cpu(E1, E2, E3, E4, E5);
+  I = gem_mmap__cpu(E1, E2, E3, E4, E5);
...
-  igt_assert(I);
|
-  I = __gem_mmap__wc(E1, E2, E3, E4, E5);
+  I = gem_mmap__wc(E1, E2, E3, E4, E5);
...
-  igt_assert(I);
)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-12 19:57:58 +03:00
Ville Syrjälä
b8a77dd6c8 Make gem_mmap__{cpu,gtt,wc}() assert on failure
Rename the current gem_mmap__{cpu,gtt,wc}() functions into
__gem_mmap__{cpu,gtt,wc}(), and add back wrappers with the original name
that assert that the pointer is valid. Most callers will expect a valid
pointer and shouldn't have to bother with failures.

To avoid changing anything (yet), sed 's/gem_mmap__/__gem_mmap__/g'
over the entire codebase.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-09 19:16:26 +03:00
Ville Syrjälä
7eaae3c201 Sprinkle igt_assert(ptr) after gem_mmap__{cpu,gtt,wc}
Do the following
 ptr = gem_mmap__{cpu,gtt,wc}()
+igt_assert(ptr);

whenever the code doesn't handle the NULL ptr in any kind of
specific way.

Makes it easier to move the assert into gem_mmap__{cpu,gtt,wc}() itself.

Mostly done with coccinelle, with some manual cleanups:
@@
identifier I;
@@
<... when != igt_assert(I)
     when != igt_require(I)
     when != igt_require_f(I, ...)
     when != I != NULL
     when != I == NULL
(
  I = gem_mmap__gtt(...);
+ igt_assert(I);
|
  I = gem_mmap__cpu(...);
+ igt_assert(I);
|
  I = gem_mmap__wc(...);
+ igt_assert(I);
)
...>

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-09 18:33:30 +03:00
Chris Wilson
d878e18dfd benchmarks/gem_blt: Fix compilation after rebase and add batch-size
Add an option to do more than one copy per batch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-06 17:04:31 +01:00
Chris Wilson
8253e7dc84 benchmarks: Measure BLT performance
Execute N blits and time how long they complete to measure both GPU
limited bandwidth and submission overhead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-06 10:24:07 +01:00
Derek Morton
1b492e311c benchmarks: Fix build errors on Android M-Dessert
Android M-Dessert treats implicit declaration of function warnings
as errors resulting in igt failing to build.

This patch fixes the errors by including missing header files as
required. Mostly this involved including igt.h in the benchmarks.

Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-10-02 09:42:30 +02:00
Micah Fedke
c81d293aed convert drm_open_any*() calls to drm_open_driver*(DRIVER_INTEL) calls with cocci
Apply the new API to all call sites within the test suite using the following
semantic patch:

// Semantic patch for replacing drm_open_any* with arch-specific drm_open_driver* calls
@@
identifier i =~ "\bdrm_open_any\b";
@@
- i()
+ drm_open_driver(DRIVER_INTEL)

@@
identifier i =~ "\bdrm_open_any_master\b";
@@
- i()
+ drm_open_driver_master(DRIVER_INTEL)

@@
identifier i =~ "\bdrm_open_any_render\b";
@@
- i()
+ drm_open_driver_render(DRIVER_INTEL)

@@
identifier i =~ "\b__drm_open_any\b";
@@
- i()
+ __drm_open_driver(DRIVER_INTEL)

Signed-off-by: Micah Fedke <micah.fedke@collabora.co.uk>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-09-11 14:39:43 +01:00
Thomas Wood
1dcace3018 build: fix unused-result warnings
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-09-08 16:15:16 +01:00
Chris Wilson
5e68ad9f82 benchmarks/gem_exec_reloc: Allow profiling 0 relocs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-21 22:27:35 +01:00
Chris Wilson
77b8af218c benchmark/gem_exec_trace: Inline everything
Avoid the globals and make the dispatch one huge function and hope GCC
works some magic.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-14 20:54:35 +01:00
Chris Wilson
a64e6c39b1 benchmark/gem_exec_tracer: Tweak to handle SNA
SNA starts by feeding in deliberately bad ioctls in order to detect the
kernel interface versions. A quick solution is to always feed it to the
ioctl and only record the trace if it is valid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-14 20:34:21 +01:00
Derek Morton
d524a964fc benckmarks/Android.mk: Fix building benchmarks for Android
The commit "benchmarks: Do not install to system-wide bin/" changed
the benchmark file list from bin_PROGRAMS to benchmarks_PROGRAMS.
However Android.mk was not updated, resulting in IGT failing to
build for Android.
This commit adds that change. It also adds LOCAL_MODULE_PATH to
specify where the built benchmarks should be put.

v2: I discovered that the existing definitions of LOCAL_MODULE_PATH
were creating what should have been an invalid path. Not sure how it
was ever working previously, but fixed now.

Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-08-13 11:28:22 +01:00
Chris Wilson
38b3bd6b7c benchmarks: Add a microbenchmark for relocation overhead
Allow specification of the many different busyness modes and relocation
interfaces, along with the number of buffers to use and relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-11 15:31:02 +01:00
Chris Wilson
98bcc18572 benchmarks/gem_exec_trace: Unmap each trace after replay
Just on the off chance someone is replaying a bunch of traces, remember
to cleanup up.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-10 18:53:26 +01:00
Chris Wilson
b483e68173 benchmarks/gem_exec_trace: Mark the mmap as sequentially read
Use madvise(MADV_SEQUENTIAL) to let the kernel optimise for our
straightforward sequential read pattern.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-10 18:53:26 +01:00
Chris Wilson
3911621d0d benchmarks: Rename the gem_exec_trace tracer module
Now that we actually install the benchmarks into a sane location,
slightly abuse it to put the tracer for gem_exec_trace alongside.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-10 18:24:15 +01:00
Chris Wilson
d9462e61f9 benchmarks/gem_exec_trace: Clear all new bo handles
When reallocing the bo array, remember to set the new entries to 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-10 16:16:36 +01:00
Chris Wilson
4c74a683c1 benchmarks: Do not install to system-wide bin/
These benchmarks are first-and-foremost development tools, not aimed at
general users. As such they should not be installed into the system-wide
bin/ directory, but installed into libexec/.

v2: Now actually install beneath ${libexec}

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-10 15:53:08 +01:00
Chris Wilson
0393e7288b benchmarks: Record and replay calls to EXECBUFFER2
This slightly idealises the behaviour of clients with the aim of
measuring the kernel overhead of different workloads. This test focuses
on the cost of relocating batchbuffers.

A trace file is generated with an LD_PRELOAD intercept around
execbuffer, which we can then replay at our leisure. The replay replaces
the real buffers with a set of empty ones so the only thing that the
kernel has to do is parse the relocations. but without a real workload
we lose the impact of having to rewrite active buffers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-09 19:20:46 +01:00
Derek Morton
1ae1d290bf benchmarks/Android.mk, tools/Android.mk: Fix android build error
Recently added tools / benckmarks have the same module name as
existing tests. Android does not allow duplicate modules. This
patch appends _benchmark and _tool to the module names used when
building benckmarks and tools to prevent clashes with tests of
the same name.

Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-08-06 14:39:44 +02:00
Chris Wilson
cd306d4e65 benchmark: Measure allocation time for objects
A basic measurement, how fast can we create and populate an object with
backing storage?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-24 18:56:00 +01:00
Chris Wilson
42a386b83b benchmarks: Measure mmap fault latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-24 18:55:49 +01:00
Chris Wilson
e984d4965f benchmarks: Benchmarkify gem_exec_ctx
Measure the overhead of execution when doing nothing, switching between
a pair of contexts, or creating a new context every time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-24 18:55:49 +01:00
Chris Wilson
e14507ce98 benchmarks: Add kms_vblank to .gitignore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-24 14:16:44 +01:00
Chris Wilson
d88981f62b benchmarks: Measure round-trip time for an immediate vblanks
By measuring both the query and the event round trip time, we can make a
reasonable estimate of how long it takes for the query to send the
vblank following an interrupt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-23 15:52:53 +01:00
Chris Wilson
af510c249d benchmarks: gem_prw add the read/write switch to getopt
In my haste to merge the two gem_pread/gem_pwrite, I forgot to write up
the command line switch to getopt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-23 12:30:06 +01:00
Chris Wilson
f8628a2c98 benchmarks: Add simple mmap benchmarks
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-23 12:20:43 +01:00
Chris Wilson
f689e2aa81 benchmarks: Add simple pread/pwrite benchmarks
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-23 12:20:05 +01:00
Chris Wilson
b7c33e0939 benchmarks: Benchmarkify gem_exec_nop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-07-22 15:14:05 +01:00
Tvrtko Ursulin
85ee6e7b36 gem_userptr_benchmark: Test overlapping bo mmu notifier performance impact
Current userptr kernel implementation downgrades tracking VMA ranges (real
userspace ones) to an inefficient linear walk for any process which has
instantiated overlapping userptr objects.

This adds a test which shows the performance cliff on, most visibly, generic
userspace mmap(2) and munmap(2) operations between unsync, non-overlapping
and overlapping userptr objects.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Daniel <thomas.daniel@intel.com>
2015-06-02 13:51:41 +01:00
Thomas Wood
277ca2b992 lib: print a stack trace when a test assertion fails
Add an optional dependency on libunwind to print stack traces when a
test assertion fails.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-03-26 15:50:05 +00:00
Tim Gore
a11117e42f Android.mk: replace std=c99 with std=gnu99
The android makefiles were passing the -std=c99 flag to the
compiler which disables the typeof keyword. This causes a
build fail for a recent addition to igt_aux.h.
Change this to -std=gnu99, which is the flag used in the
linux build

Signed-off-by: Tim Gore <tim.gore@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2014-12-12 12:18:22 +00:00
Chris Wilson
10552b5ca6 batch: Specify number of relocations to accommodate
Since relocations are variable size, depending upon generation, it is
easier to handle the resizing of the batch request inside the
BEGIN_BATCH macro. This still leaves us with having to resize commands
in a few places - which still need adaption for gen8+.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2014-08-30 11:44:51 +01:00
Chris Wilson
982f7eb238 Prepare for 64bit relocation addresses
This reveal that quite a few locations were writing relocation offsets
but only allowing for 32 bit addresses. To reveal such places in active
tests, we also now double check that we do not use more batch space than
declared.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2014-08-29 20:02:10 +01:00
Tvrtko Ursulin
bf57e93f50 igt/gem_userptr_benchmark: Fix for upstream ioctl number
Hardcoding has upsides and downsides.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-23 14:52:31 +02:00
Daniel Vetter
1b55886c4b test/gem_userptr_*: Fix compile fail
Also shut up warnings. Those revealed incorrect usage of local
variables in conjunction with igt_fixture/igt_subtest. Since those use
longjmps we need to move the out of the stackframe those magic blocks
are declared in.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-25 17:54:08 +02:00
Tvrtko Ursulin
d3057d7a1e tests/gem_userptr_benchmark: Benchmarking userptr surfaces and impact
This adds a small benchmark for the new userptr functionality.

Apart from basic surface creation and destruction, also tested is the
impact of having userptr surfaces in the process address space. Reason
for that is the impact of MMU notifiers on common address space
operations like munmap() which is per process.

v2:
  * Moved to benchmarks.
  * Added pointer read/write tests.
  * Changed output to say iterations per second instead of
    operations per second.
  * Multiply result by batch size for multi-create* tests
    for a more comparable number with create-destroy test.

v3:
  * Use ALIGN macro.
  * Catchup with big lib/ reorganization.
  * Removed unused code and one global variable.
  * Fixed up some warnings.

v4:
  * Fixed feature test, does not matter here but makes it
    consistent with gem_userptr_blits and clearer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-25 17:48:49 +02:00
Tvrtko Ursulin
5d7649690c benchmarks: Build them on Android.
They build fine so give them some exposure.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2014-04-24 13:49:20 +01:00
Daniel Vetter
c03c6ceb29 lib: rename intel_gpu_tools.h to intel_io.h
With the header cleanup we can now give this header a suitable name,
since it now really only contains register access and other I/O
functions and assorted definitions.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-22 21:34:29 +01:00
Daniel Vetter
254f19ba8d lib: unnecessary header removal for drmtest.h, part 2
I've left unistd.h in it - it's not strictly required but most users
of drmtest.h want it for the open helpers, and then you kinda need to
close that file descriptor again ...

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-22 21:29:01 +01:00
Daniel Vetter
e49ceb8690 lib: unnecessary header removal for drmtest.h, part 1
Brought a few missing headers to light in ioctl_wrappers.h, too.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-22 21:07:37 +01:00
Ben Widawsky
f4dfa37e85 bdw: Update obvious missing blit support
This provides a macro that allows us to update all the arbitrary blit
commands we have stuck throughout the code. It assumes we don't actually
use 64b relocs (which is currently true). This also allows us to easily find
all the areas we need to update later when we really use the upper dword.

This block was done mostly with a sed job, and represents the easier
in test blit implementations.

v2 by Oscar: s/OUT_BATCH/BEGIN_BATCH in BLIT_COPY_BATCH_START

CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
2013-11-06 09:34:35 -08:00