17 Commits

Author SHA1 Message Date
Chris Wilson
51bb53663e benchmarks/gem_latency: Allow setting an infinite time
Well, 24000 years.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-06 10:21:40 +00:00
Chris Wilson
1b9085b979 benchmarks/gem_latency: Hide spinlocks for android
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21 16:32:08 +00:00
Chris Wilson
a1d465a3c5 benchmarks/gem_latency: Serialise mmio reads
The joy of our hardware; don't let two threads attempt to read the same
register at the same time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21 13:34:58 +00:00
Chris Wilson
3ebce37b65 benchmarks/gem_latency: Guard against inferior pthreads.h
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21 10:00:21 +00:00
Chris Wilson
3cc8f957f1 benchmarks/gem_latency: Measure CPU usage
Try and gauge the amount of CPU time used for each dispatch/wait cycle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 21:22:35 +00:00
Chris Wilson
a91ee853b1 benchmarks/gem_latency: Measure effect of using RealTime priority
Allow the producers to be set with maximum RT priority to verify that
the waiters are not exhibiting priorty-inversion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 21:22:35 +00:00
Chris Wilson
27e093dd1f benchmarks/gem_latency: Use RCS on Sandybridge
Reading BCS_TIMESTAMP just returns 0...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 13:02:02 +00:00
Chris Wilson
c0942bf528 benchmarks/gem_latency: Rearrange thread cancellation
Try a different pattern to cascade the cancellation from producers to
their consumers in order to avoid one potential deadlock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 13:02:02 +00:00
Chris Wilson
8ea61ec1ff benchmarks/gem_latency: Tweak workload
Do the workload before the nop, so that if combining both, there is a
better chance for the spurious interrupts. Emit just one workload batch
(use the nops to generate spurious interrupts) and apply the factor to
the number of copies to make inside the workload - the intention is that
this gives sufficient time for all producers to run concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20 13:02:02 +00:00
Chris Wilson
db011021a1 benchmarks/gem_latency: Add output field specifier
Just to make it easier to integrate into ezbench.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 15:07:56 +00:00
Chris Wilson
646cab4c0c benchmarks/gem_latency: Split the nop/work/latency measurement
Split the distinct phases (generate interrupts, busywork, measure
latency) into separate batches for finer control.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
e37a4c8092 benchmarks/gem_latency: Add time control
Allow the user to choose a time to run for, default 10s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
2ef368acfa benchmarks/gem_latency: Add nop dispatch latency measurement
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
1db5b05243 benchmarks/gem_latency: Expose the workload factor
Allow the user to select how many batches each producer submits before
waiting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
6dbe0a3012 benchmarks/gem_latency: Measure whole execution throughput
Knowing how long it takes to execute the workload (and how that scales)
is interesting to put the latency figures into perspective.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 12:16:52 +00:00
Chris Wilson
2f74892ebd benchmarks/gem_latency: Fix for !LLC
Late last night I forgot I had only added the llc CPU mmaping and not
the !llc GTT mapping for byt/bsw.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 10:32:38 +00:00
Chris Wilson
c9da0b5221 benchmark: Measure of latency of producers -> consumers, gem_latency
The goal is measure how long it takes for clients waiting on results to
wakeup after a buffer completes, and in doing so ensure scalibilty of
the kernel to large number of clients.

We spawn a number of producers. Each producer submits a busyload to the
system and records in the GPU the BCS timestamp of when the batch
completes. Then each producer spawns a number of waiters, who wait upon
the batch completion and measure the current BCS timestamp register and
compare against the recorded value.

By varying the number of producers and consumers, we can study different
aspects of the design, in particular how many wakeups the kernel does
for each interrupt (end of batch). The more wakeups on each batch, the
longer it takes for any one client to finish.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19 01:30:57 +00:00