ntel-gpu-tools

mirror of https://github.com/tiagovignatti/intel-gpu-tools.git synced 2025-11-05 04:27:24 +00:00

Author	SHA1	Message	Date
Chris Wilson	7bd2ac6642	gem_exec_lut_handle: Fix presumed_offset to force relocation on full-ppgtt If the object is at offset 0, quite likely using full-ppgtt, then the presumed_offset set also to 0 causes the relocation to be skipped. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-04-16 17:17:19 +01:00
Chris Wilson	1a50172302	benchmarks: Include my ezbench test runners Just a set of scripts to integrate these benchmarks with ezbench. They need to be revised to plugin into latest version of ezbench. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-04-03 18:32:07 +01:00
Chris Wilson	eac26718e6	benchmarks/gem_latency: Add a -C switch to measure impact of cmdparser Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-04-03 18:32:07 +01:00
Chris Wilson	4e2a785e24	benchmarks/gem_exec_nop: Include a measurement across all rings For sync, it really is just the average latency across all rings, but for continuous we can expect to see the effect of concurrent dispatch across rings. Hopefully. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-24 11:25:38 +00:00
Chris Wilson	756f3e0cb7	lib: Add a GPU error detector If we listen to the uevents from the kernel, we can detect when the GPU hangs. This requires us to fork a helper process to do so and send a signal back to the parent. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-24 11:25:38 +00:00
Chris Wilson	d545610861	lib/igt_aux: Divert ioctls for signal injection To simplify and speed up running interruptible tests, use a custom ioctl() function that control the signaling and detect when we need no more iterations to trigger an interruption. We use a realtime timer to inject the signal after a certain delay, increasing the delay on every loop to try and exercise different code paths within the function. The first delay is very short such that we hopefully enter the kernel with a pending signal. Clients should use struct igt_sigiter iter = {}; while (igt_sigiter_repeat(&iter, enable_interrupts=true)) do_test() to automatically repeat the test until we can inject no more signals into the ioctls. This is condensed into a macro igt_interruptible(enable_interrupts=true) do_test(); for convenience. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-19 15:20:43 +00:00
Derek Morton	d264c73929	benchmarks/gem_syslatency: Add extra android guard to attr_setaffinity_np Android defines __USE_GNU but does not provide pthread_attr_setaffinity_np() so added an extra guard arround pthread_attr_setaffinity_np(). Signed-off-by: Derek Morton <derek.j.morton@intel.com>	2016-03-11 11:34:48 +00:00
Chris Wilson	3e2443f838	igt/gem_exec_nop: Fix logical inversion for checking of valid execbuf Only if the trial __gem_execbuf reports an error do we want to remove the fancy LUT flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-10 12:46:52 +00:00
Chris Wilson	544ba6ca88	benchmarks/gem_syslatency: Guard setaffinity_np pthread_setaffinity_np is a GNU extensions, so add some __USE_GNU ifdeffry and hope for the best if unavailable. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-10 12:28:26 +00:00
Chris Wilson	3e0d9ef02c	benchmarks/gem_syslatency: Subtract the clock_gettime() overhead Since clock_gettime() should be a fixed overhead that adds to the latency result, subtract it from the result. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-10 10:15:31 +00:00
Chris Wilson	2a41c4b183	benchmarks/gem_syslatency: Prevent CPU sleeps (C-states) In order to keep the latency as low as possible for the idle load, we need to keep the CPU awake. Otherwise we end up with the busy workload having lower latency than the idle workload! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-10 09:29:26 +00:00
Chris Wilson	c084c2b88b	benchmarks/gem_syslatency: Measure unloaded latency Also useful to know how much worse than baseline the latency is when the gem load is applied. For slower systems, presenting in nanoseconds makes it hard to read, so switch to microseconds for output. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-10 08:41:25 +00:00
Chris Wilson	6cd15fb930	benchmarks: Add gem_syslatency Instead of measuring the wakeup latency of a GEM client, we turn the tables here and ask what is the wakeup latency of a normal process competing with GEM. In particular, a realtime process that expects deterministic latency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-09 23:40:21 +00:00
Chris Wilson	74761382b3	benchmarks/gem_latency: Replace igt_stats with igt_mean Use a simpler statically allocated struct for computing the mean as otherwise we many run out of memeory! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-08 14:58:59 +00:00
Chris Wilson	f3751d53bd	benchmarks/gem_blt: Measure the throughput of synchronous copies Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-01 15:07:29 +00:00
Tiago Vignatti	e1f663b543	lib: Add gem_userptr and __gem_userptr helpers This patch moves userptr definitions and helpers implementation that were locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other tests can make use of them as well. There's no functional changes. v2: added __ function to differentiate when errors want to be handled back in the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc. v8: remove local_i915_gem_userptr from gem_concurrent_all.c to use the global helpers instead. Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2016-02-11 18:15:44 +01:00
Chris Wilson	e3b68bb666	lib: Share common __gem_execbuf() An oft-repeated function to check EXECBUFFER2 for a particular fail condition. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-01-27 14:45:18 +00:00
Chris Wilson	51bb53663e	benchmarks/gem_latency: Allow setting an infinite time Well, 24000 years. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-01-06 10:21:40 +00:00
Chris Wilson	e21368c53a	benchmarks/gem_mmap: Convert to run over a fixed period Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-01-02 16:31:05 +00:00
Chris Wilson	9b90234414	benchmarks/gem_exec_nop: Convert to running for a fixed time Like the previous patch to gem_exec_ctx, retrict gem_exec_nop to running for a fixed length of time, rather than over a range of different execution counts. In order to retain some measurement of that range, allow measuring individual execution versus continuous dispatch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-01-02 15:43:52 +00:00
Chris Wilson	6953899beb	benchmarks/gem_exec_ctx: Run for a fixed time Rather than investigate the curve for dispatch latency, just run for a fixed time and report an average latency. Instead offer two modes, average single dispatch latency, average continuous dispatch latency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-01-02 12:31:51 +00:00
Chris Wilson	276fb3d3f4	benchmarks/gem_exec_ctx: Fix fd switching between default contexts Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-01-01 14:38:07 +00:00
Chris Wilson	3d5b50b4f0	benchmarks/gem_blt: Estimate memory bandwidth to improve test runtime If we autotune the workload to only take 0.1s and then repeat the measurements over 2s, we can bound the benchmark runtime. (Roughly of course! Sometimes the dispartity between main memory CPU bandwidth, and GPU execution bandwidth throws off the runtime, but that's the purpose of the benchmark!) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-31 22:10:11 +00:00
Chris Wilson	1b9085b979	benchmarks/gem_latency: Hide spinlocks for android Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-21 16:32:08 +00:00
Chris Wilson	a1d465a3c5	benchmarks/gem_latency: Serialise mmio reads The joy of our hardware; don't let two threads attempt to read the same register at the same time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-21 13:34:58 +00:00
Chris Wilson	3ebce37b65	benchmarks/gem_latency: Guard against inferior pthreads.h Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-21 10:00:21 +00:00
Chris Wilson	3cc8f957f1	benchmarks/gem_latency: Measure CPU usage Try and gauge the amount of CPU time used for each dispatch/wait cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 21:22:35 +00:00
Chris Wilson	a91ee853b1	benchmarks/gem_latency: Measure effect of using RealTime priority Allow the producers to be set with maximum RT priority to verify that the waiters are not exhibiting priorty-inversion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 21:22:35 +00:00
Chris Wilson	27e093dd1f	benchmarks/gem_latency: Use RCS on Sandybridge Reading BCS_TIMESTAMP just returns 0... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 13:02:02 +00:00
Chris Wilson	c0942bf528	benchmarks/gem_latency: Rearrange thread cancellation Try a different pattern to cascade the cancellation from producers to their consumers in order to avoid one potential deadlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 13:02:02 +00:00
Chris Wilson	8ea61ec1ff	benchmarks/gem_latency: Tweak workload Do the workload before the nop, so that if combining both, there is a better chance for the spurious interrupts. Emit just one workload batch (use the nops to generate spurious interrupts) and apply the factor to the number of copies to make inside the workload - the intention is that this gives sufficient time for all producers to run concurrently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 13:02:02 +00:00
Chris Wilson	db011021a1	benchmarks/gem_latency: Add output field specifier Just to make it easier to integrate into ezbench. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 15:07:56 +00:00
Chris Wilson	646cab4c0c	benchmarks/gem_latency: Split the nop/work/latency measurement Split the distinct phases (generate interrupts, busywork, measure latency) into separate batches for finer control. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	e37a4c8092	benchmarks/gem_latency: Add time control Allow the user to choose a time to run for, default 10s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	2ef368acfa	benchmarks/gem_latency: Add nop dispatch latency measurement Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	1db5b05243	benchmarks/gem_latency: Expose the workload factor Allow the user to select how many batches each producer submits before waiting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	6dbe0a3012	benchmarks/gem_latency: Measure whole execution throughput Knowing how long it takes to execute the workload (and how that scales) is interesting to put the latency figures into perspective. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	2f74892ebd	benchmarks/gem_latency: Fix for !LLC Late last night I forgot I had only added the llc CPU mmaping and not the !llc GTT mapping for byt/bsw. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 10:32:38 +00:00
Chris Wilson	39bad606c5	benchmarks: Remove gem_wait Superseded by gem_latency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 01:31:06 +00:00
Chris Wilson	c9da0b5221	benchmark: Measure of latency of producers -> consumers, gem_latency The goal is measure how long it takes for clients waiting on results to wakeup after a buffer completes, and in doing so ensure scalibilty of the kernel to large number of clients. We spawn a number of producers. Each producer submits a busyload to the system and records in the GPU the BCS timestamp of when the batch completes. Then each producer spawns a number of waiters, who wait upon the batch completion and measure the current BCS timestamp register and compare against the recorded value. By varying the number of producers and consumers, we can study different aspects of the design, in particular how many wakeups the kernel does for each interrupt (end of batch). The more wakeups on each batch, the longer it takes for any one client to finish. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 01:30:57 +00:00
Chris Wilson	39e44dfa4c	benchmarks/gem_exec_nop: Flush retirement lists before executing wait-ioctl skips a couple of side-effects of retiring, so provoke them using set-domain before we sleep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-04 13:59:37 +00:00
Chris Wilson	d44100ed23	benchmarks/gem_exec_ctx: Measure switching between fds Switching between fds also involves a context switch, include it amongst the measurements. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-27 10:03:56 +00:00
Chris Wilson	b68a6428db	benchmarks: Add a set-domain benchmark Benchmark the overhead of changing from GTT to CPU domains and vice versa. Effectively this measures the cost of a clflush, and how well the driver can avoid them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-22 20:54:04 +00:00
Chris Wilson	4c14aa18c1	benchmarks/gem_blt: Fixup a couple of non-llc foibles When extending the batch for multiple copies, we need to remember to flag it as being in the CPU write domain so that the new values get flushed out to main memory before execution. We also have to be careful not to specify NO_RELOC for the extended batch as the execobjects will have been updated but we write the wrong presumed offsets. Subsequent iterations will be correct and we can tell the kernel then to skip the relocations entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-12 10:54:11 +00:00
Thomas Wood	2643793255	Fix comparison of unsigned integers Signed-off-by: Thomas Wood <thomas.wood@intel.com>	2015-11-11 14:20:55 +00:00
Chris Wilson	3bc3ab27ea	benchmarks: Add README Add a README to introduce the ezbench.sh benchmark runner. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-10 14:04:58 +00:00
Chris Wilson	5cabb8c543	benchmarks/gem_blt: Report peak throughput Report the highest throughput measured from a large set of runs to improve sensitivity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-10 14:04:58 +00:00
Chris Wilson	ce65232cf5	benchmarks/gem_wait: Remove pthread_cancel() Apparently the pthread shim on Android doesn't have pthread cancellation, so use the plain old volatile to terminate the CPU hogs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-30 15:51:21 +00:00
Chris Wilson	9024a72d29	benchmark/gem_wait: poc for benchmarking i915_wait_request overhead One scenario under recent discussion is that of having a thundering herd in i915_wait_request - where the overhead of waking up every waiter for every batchbuffer was significantly impacting customer throughput. This benchmark tries to replicate something to that effect by having a large number of consumers generating a busy load (a large copy followed by lots of small copies to generate lots of interrupts) and tries to wait upon all the consumers concurrenctly (to reproduce the thundering herd effect). To measure the overhead, we have a bunch of cpu hogs - less kernel overhead in waiting should allow more CPU throughput. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-30 15:04:55 +00:00
Derek Morton	0ab76a22d1	benchmarks/gem_blt: Include igt.h in gem_blt.c To fix a build error on android Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Thomas Wood <thomas.wood@intel.com>	2015-10-15 16:59:59 +01:00

1 2 3

103 Commits