98 Commits

Author SHA1 Message Date
Chris Wilson
d545610861 lib/igt_aux: Divert ioctls for signal injection
To simplify and speed up running interruptible tests, use a custom
ioctl() function that control the signaling and detect when we need no
more iterations to trigger an interruption.

We use a realtime timer to inject the signal after a certain delay,
increasing the delay on every loop to try and exercise different code
paths within the function. The first delay is very short such that we
hopefully enter the kernel with a pending signal.

Clients should use

struct igt_sigiter iter = {};
while (igt_sigiter_repeat(&iter, enable_interrupts=true))
	do_test()

to automatically repeat the test until we can inject no more signals
into the ioctls. This is condensed into a macro

igt_interruptible(enable_interrupts=true)
	do_test();

for convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-19 15:20:43 +00:00
Derek Morton
d264c73929 benchmarks/gem_syslatency: Add extra android guard to attr_setaffinity_np
Android defines __USE_GNU but does not provide pthread_attr_setaffinity_np()
so added an extra guard arround pthread_attr_setaffinity_np().

Signed-off-by: Derek Morton <derek.j.morton@intel.com>
2016-03-11 11:34:48 +00:00
Chris Wilson
3e2443f838 igt/gem_exec_nop: Fix logical inversion for checking of valid execbuf
Only if the trial __gem_execbuf reports an error do we want to remove
the fancy LUT flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-10 12:46:52 +00:00
Chris Wilson
544ba6ca88 benchmarks/gem_syslatency: Guard setaffinity_np
pthread_setaffinity_np is a GNU extensions, so add some __USE_GNU
ifdeffry and hope for the best if unavailable.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-10 12:28:26 +00:00
Chris Wilson
3e0d9ef02c benchmarks/gem_syslatency: Subtract the clock_gettime() overhead
Since clock_gettime() should be a fixed overhead that adds to the
latency result, subtract it from the result.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-10 10:15:31 +00:00
Chris Wilson
2a41c4b183 benchmarks/gem_syslatency: Prevent CPU sleeps (C-states)
In order to keep the latency as low as possible for the idle load, we
need to keep the CPU awake. Otherwise we end up with the busy workload
having lower latency than the idle workload!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-10 09:29:26 +00:00
Chris Wilson
c084c2b88b benchmarks/gem_syslatency: Measure unloaded latency
Also useful to know how much worse than baseline the latency is when the
gem load is applied. For slower systems, presenting in nanoseconds makes
it hard to read, so switch to microseconds for output.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-10 08:41:25 +00:00
Chris Wilson
6cd15fb930 benchmarks: Add gem_syslatency
Instead of measuring the wakeup latency of a GEM client, we turn the
tables here and ask what is the wakeup latency of a normal process
competing with GEM. In particular, a realtime process that expects
deterministic latency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-09 23:40:21 +00:00
Chris Wilson
74761382b3 benchmarks/gem_latency: Replace igt_stats with igt_mean
Use a simpler statically allocated struct for computing the mean as
otherwise we many run out of memeory!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-08 14:58:59 +00:00
Chris Wilson
f3751d53bd benchmarks/gem_blt: Measure the throughput of synchronous copies
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-01 15:07:29 +00:00
Tiago Vignatti
e1f663b543 lib: Add gem_userptr and __gem_userptr helpers
This patch moves userptr definitions and helpers implementation that were
locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other
tests can make use of them as well. There's no functional changes.

v2: added __ function to differentiate when errors want to be handled back in
the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc.
v8: remove local_i915_gem_userptr from gem_concurrent_all.c to use the global
helpers instead.

Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-02-11 18:15:44 +01:00
Chris Wilson
e3b68bb666 lib: Share common __gem_execbuf()
An oft-repeated function to check EXECBUFFER2 for a particular fail
condition.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-27 14:45:18 +00:00
Chris Wilson
51bb53663e benchmarks/gem_latency: Allow setting an infinite time
Well, 24000 years.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-06 10:21:40 +00:00
Chris Wilson
e21368c53a benchmarks/gem_mmap: Convert to run over a fixed period
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-02 16:31:05 +00:00
Chris Wilson
9b90234414 benchmarks/gem_exec_nop: Convert to running for a fixed time
Like the previous patch to gem_exec_ctx, retrict gem_exec_nop to running
for a fixed length of time, rather than over a range of different
execution counts. In order to retain some measurement of that range,
allow measuring individual execution versus continuous dispatch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-02 15:43:52 +00:00
Chris Wilson
6953899beb benchmarks/gem_exec_ctx: Run for a fixed time
Rather than investigate the curve for dispatch latency, just run for a
fixed time and report an average latency. Instead offer two modes,
average single dispatch latency, average continuous dispatch latency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-02 12:31:51 +00:00
Chris Wilson
276fb3d3f4 benchmarks/gem_exec_ctx: Fix fd switching between default contexts
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-01 14:38:07 +00:00
Chris Wilson
3d5b50b4f0 benchmarks/gem_blt: Estimate memory bandwidth to improve test runtime
If we autotune the workload to only take 0.1s and then repeat the
measurements over 2s, we can bound the benchmark runtime. (Roughly of
course! Sometimes the dispartity between main memory CPU bandwidth, and
GPU execution bandwidth throws off the runtime, but that's the purpose
of the benchmark!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-31 22:10:11 +00:00
Chris Wilson
1b9085b979 benchmarks/gem_latency: Hide spinlocks for android
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21 16:32:08 +00:00
Chris Wilson
a1d465a3c5 benchmarks/gem_latency: Serialise mmio reads
The joy of our hardware; don't let two threads attempt to read the same
register at the same time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21 13:34:58 +00:00
Chris Wilson
3ebce37b65 benchmarks/gem_latency: Guard against inferior pthreads.h
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21 10:00:21 +00:00
Chris Wilson
3cc8f957f1 benchmarks/gem_latency: Measure CPU usage
Try and gauge the amount of CPU time used for each dispatch/wait cycle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 21:22:35 +00:00
Chris Wilson
a91ee853b1 benchmarks/gem_latency: Measure effect of using RealTime priority
Allow the producers to be set with maximum RT priority to verify that
the waiters are not exhibiting priorty-inversion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 21:22:35 +00:00
Chris Wilson
27e093dd1f benchmarks/gem_latency: Use RCS on Sandybridge
Reading BCS_TIMESTAMP just returns 0...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 13:02:02 +00:00
Chris Wilson
c0942bf528 benchmarks/gem_latency: Rearrange thread cancellation
Try a different pattern to cascade the cancellation from producers to
their consumers in order to avoid one potential deadlock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 13:02:02 +00:00
Chris Wilson
8ea61ec1ff benchmarks/gem_latency: Tweak workload
Do the workload before the nop, so that if combining both, there is a
better chance for the spurious interrupts. Emit just one workload batch
(use the nops to generate spurious interrupts) and apply the factor to
the number of copies to make inside the workload - the intention is that
this gives sufficient time for all producers to run concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 13:02:02 +00:00
Chris Wilson
db011021a1 benchmarks/gem_latency: Add output field specifier
Just to make it easier to integrate into ezbench.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 15:07:56 +00:00
Chris Wilson
646cab4c0c benchmarks/gem_latency: Split the nop/work/latency measurement
Split the distinct phases (generate interrupts, busywork, measure
latency) into separate batches for finer control.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
e37a4c8092 benchmarks/gem_latency: Add time control
Allow the user to choose a time to run for, default 10s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
2ef368acfa benchmarks/gem_latency: Add nop dispatch latency measurement
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
1db5b05243 benchmarks/gem_latency: Expose the workload factor
Allow the user to select how many batches each producer submits before
waiting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
6dbe0a3012 benchmarks/gem_latency: Measure whole execution throughput
Knowing how long it takes to execute the workload (and how that scales)
is interesting to put the latency figures into perspective.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
2f74892ebd benchmarks/gem_latency: Fix for !LLC
Late last night I forgot I had only added the llc CPU mmaping and not
the !llc GTT mapping for byt/bsw.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 10:32:38 +00:00
Chris Wilson
39bad606c5 benchmarks: Remove gem_wait
Superseded by gem_latency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 01:31:06 +00:00
Chris Wilson
c9da0b5221 benchmark: Measure of latency of producers -> consumers, gem_latency
The goal is measure how long it takes for clients waiting on results to
wakeup after a buffer completes, and in doing so ensure scalibilty of
the kernel to large number of clients.

We spawn a number of producers. Each producer submits a busyload to the
system and records in the GPU the BCS timestamp of when the batch
completes. Then each producer spawns a number of waiters, who wait upon
the batch completion and measure the current BCS timestamp register and
compare against the recorded value.

By varying the number of producers and consumers, we can study different
aspects of the design, in particular how many wakeups the kernel does
for each interrupt (end of batch). The more wakeups on each batch, the
longer it takes for any one client to finish.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 01:30:57 +00:00
Chris Wilson
39e44dfa4c benchmarks/gem_exec_nop: Flush retirement lists before executing
wait-ioctl skips a couple of side-effects of retiring, so provoke them
using set-domain before we sleep.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-04 13:59:37 +00:00
Chris Wilson
d44100ed23 benchmarks/gem_exec_ctx: Measure switching between fds
Switching between fds also involves a context switch, include it amongst
the measurements.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-27 10:03:56 +00:00
Chris Wilson
b68a6428db benchmarks: Add a set-domain benchmark
Benchmark the overhead of changing from GTT to CPU domains and vice
versa. Effectively this measures the cost of a clflush, and how well the
driver can avoid them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-22 20:54:04 +00:00
Chris Wilson
4c14aa18c1 benchmarks/gem_blt: Fixup a couple of non-llc foibles
When extending the batch for multiple copies, we need to remember to
flag it as being in the CPU write domain so that the new values get
flushed out to main memory before execution. We also have to be careful
not to specify NO_RELOC for the extended batch as the execobjects will
have been updated but we write the wrong presumed offsets. Subsequent
iterations will be correct and we can tell the kernel then to skip the
relocations entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-12 10:54:11 +00:00
Thomas Wood
2643793255 Fix comparison of unsigned integers
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-11-11 14:20:55 +00:00
Chris Wilson
3bc3ab27ea benchmarks: Add README
Add a README to introduce the ezbench.sh benchmark runner.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-10 14:04:58 +00:00
Chris Wilson
5cabb8c543 benchmarks/gem_blt: Report peak throughput
Report the highest throughput measured from a large set of runs to
improve sensitivity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-10 14:04:58 +00:00
Chris Wilson
ce65232cf5 benchmarks/gem_wait: Remove pthread_cancel()
Apparently the pthread shim on Android doesn't have pthread cancellation,
so use the plain old volatile to terminate the CPU hogs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-30 15:51:21 +00:00
Chris Wilson
9024a72d29 benchmark/gem_wait: poc for benchmarking i915_wait_request overhead
One scenario under recent discussion is that of having a thundering herd
in i915_wait_request - where the overhead of waking up every waiter for
every batchbuffer was significantly impacting customer throughput. This
benchmark tries to replicate something to that effect by having a large
number of consumers generating a busy load (a large copy followed by
lots of small copies to generate lots of interrupts) and tries to wait
upon all the consumers concurrenctly (to reproduce the thundering herd
effect). To measure the overhead, we have a bunch of cpu hogs - less
kernel overhead in waiting should allow more CPU throughput.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-30 15:04:55 +00:00
Derek Morton
0ab76a22d1 benchmarks/gem_blt: Include igt.h in gem_blt.c
To fix a build error on android

Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-10-15 16:59:59 +01:00
Ville Syrjälä
f52e7ec787 Replace __gem_mmap__{cpu,gtt,wc}() + igt_assert() with gem_mmap__{cpu,gtt,wc}()
gem_mmap__{cpu,gtt,wc}() already has the assert built in, so replace
 __gem_mmap__{cpu,gtt,wc}() + igt_assert() with it.

Mostly done with coccinelle, with some manual help:
@@
identifier I;
expression E1, E2, E3, E4, E5, E6;
@@
(
-  I = __gem_mmap__gtt(E1, E2, E3, E4);
+  I = gem_mmap__gtt(E1, E2, E3, E4);
...
-  igt_assert(I);
|
-  I = __gem_mmap__cpu(E1, E2, E3, E4, E5);
+  I = gem_mmap__cpu(E1, E2, E3, E4, E5);
...
-  igt_assert(I);
|
-  I = __gem_mmap__wc(E1, E2, E3, E4, E5);
+  I = gem_mmap__wc(E1, E2, E3, E4, E5);
...
-  igt_assert(I);
)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-12 19:57:58 +03:00
Ville Syrjälä
b8a77dd6c8 Make gem_mmap__{cpu,gtt,wc}() assert on failure
Rename the current gem_mmap__{cpu,gtt,wc}() functions into
__gem_mmap__{cpu,gtt,wc}(), and add back wrappers with the original name
that assert that the pointer is valid. Most callers will expect a valid
pointer and shouldn't have to bother with failures.

To avoid changing anything (yet), sed 's/gem_mmap__/__gem_mmap__/g'
over the entire codebase.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-09 19:16:26 +03:00
Ville Syrjälä
7eaae3c201 Sprinkle igt_assert(ptr) after gem_mmap__{cpu,gtt,wc}
Do the following
 ptr = gem_mmap__{cpu,gtt,wc}()
+igt_assert(ptr);

whenever the code doesn't handle the NULL ptr in any kind of
specific way.

Makes it easier to move the assert into gem_mmap__{cpu,gtt,wc}() itself.

Mostly done with coccinelle, with some manual cleanups:
@@
identifier I;
@@
<... when != igt_assert(I)
     when != igt_require(I)
     when != igt_require_f(I, ...)
     when != I != NULL
     when != I == NULL
(
  I = gem_mmap__gtt(...);
+ igt_assert(I);
|
  I = gem_mmap__cpu(...);
+ igt_assert(I);
|
  I = gem_mmap__wc(...);
+ igt_assert(I);
)
...>

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-09 18:33:30 +03:00
Chris Wilson
d878e18dfd benchmarks/gem_blt: Fix compilation after rebase and add batch-size
Add an option to do more than one copy per batch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-06 17:04:31 +01:00
Chris Wilson
8253e7dc84 benchmarks: Measure BLT performance
Execute N blits and time how long they complete to measure both GPU
limited bandwidth and submission overhead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-06 10:24:07 +01:00