ntel-gpu-tools

mirror of https://github.com/tiagovignatti/intel-gpu-tools.git synced 2025-11-10 15:07:17 +00:00

Author	SHA1	Message	Date
Chris Wilson	3cc8f957f1	benchmarks/gem_latency: Measure CPU usage Try and gauge the amount of CPU time used for each dispatch/wait cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 21:22:35 +00:00
Chris Wilson	a91ee853b1	benchmarks/gem_latency: Measure effect of using RealTime priority Allow the producers to be set with maximum RT priority to verify that the waiters are not exhibiting priorty-inversion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 21:22:35 +00:00
Chris Wilson	27e093dd1f	benchmarks/gem_latency: Use RCS on Sandybridge Reading BCS_TIMESTAMP just returns 0... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 13:02:02 +00:00
Chris Wilson	c0942bf528	benchmarks/gem_latency: Rearrange thread cancellation Try a different pattern to cascade the cancellation from producers to their consumers in order to avoid one potential deadlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 13:02:02 +00:00
Chris Wilson	8ea61ec1ff	benchmarks/gem_latency: Tweak workload Do the workload before the nop, so that if combining both, there is a better chance for the spurious interrupts. Emit just one workload batch (use the nops to generate spurious interrupts) and apply the factor to the number of copies to make inside the workload - the intention is that this gives sufficient time for all producers to run concurrently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-20 13:02:02 +00:00
Chris Wilson	db011021a1	benchmarks/gem_latency: Add output field specifier Just to make it easier to integrate into ezbench. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 15:07:56 +00:00
Chris Wilson	646cab4c0c	benchmarks/gem_latency: Split the nop/work/latency measurement Split the distinct phases (generate interrupts, busywork, measure latency) into separate batches for finer control. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	e37a4c8092	benchmarks/gem_latency: Add time control Allow the user to choose a time to run for, default 10s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	2ef368acfa	benchmarks/gem_latency: Add nop dispatch latency measurement Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	1db5b05243	benchmarks/gem_latency: Expose the workload factor Allow the user to select how many batches each producer submits before waiting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	6dbe0a3012	benchmarks/gem_latency: Measure whole execution throughput Knowing how long it takes to execute the workload (and how that scales) is interesting to put the latency figures into perspective. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 12:16:52 +00:00
Chris Wilson	2f74892ebd	benchmarks/gem_latency: Fix for !LLC Late last night I forgot I had only added the llc CPU mmaping and not the !llc GTT mapping for byt/bsw. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 10:32:38 +00:00
Chris Wilson	39bad606c5	benchmarks: Remove gem_wait Superseded by gem_latency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 01:31:06 +00:00
Chris Wilson	c9da0b5221	benchmark: Measure of latency of producers -> consumers, gem_latency The goal is measure how long it takes for clients waiting on results to wakeup after a buffer completes, and in doing so ensure scalibilty of the kernel to large number of clients. We spawn a number of producers. Each producer submits a busyload to the system and records in the GPU the BCS timestamp of when the batch completes. Then each producer spawns a number of waiters, who wait upon the batch completion and measure the current BCS timestamp register and compare against the recorded value. By varying the number of producers and consumers, we can study different aspects of the design, in particular how many wakeups the kernel does for each interrupt (end of batch). The more wakeups on each batch, the longer it takes for any one client to finish. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-19 01:30:57 +00:00
Chris Wilson	39e44dfa4c	benchmarks/gem_exec_nop: Flush retirement lists before executing wait-ioctl skips a couple of side-effects of retiring, so provoke them using set-domain before we sleep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-12-04 13:59:37 +00:00
Chris Wilson	d44100ed23	benchmarks/gem_exec_ctx: Measure switching between fds Switching between fds also involves a context switch, include it amongst the measurements. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-27 10:03:56 +00:00
Chris Wilson	b68a6428db	benchmarks: Add a set-domain benchmark Benchmark the overhead of changing from GTT to CPU domains and vice versa. Effectively this measures the cost of a clflush, and how well the driver can avoid them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-22 20:54:04 +00:00
Chris Wilson	4c14aa18c1	benchmarks/gem_blt: Fixup a couple of non-llc foibles When extending the batch for multiple copies, we need to remember to flag it as being in the CPU write domain so that the new values get flushed out to main memory before execution. We also have to be careful not to specify NO_RELOC for the extended batch as the execobjects will have been updated but we write the wrong presumed offsets. Subsequent iterations will be correct and we can tell the kernel then to skip the relocations entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-12 10:54:11 +00:00
Thomas Wood	2643793255	Fix comparison of unsigned integers Signed-off-by: Thomas Wood <thomas.wood@intel.com>	2015-11-11 14:20:55 +00:00
Chris Wilson	3bc3ab27ea	benchmarks: Add README Add a README to introduce the ezbench.sh benchmark runner. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-10 14:04:58 +00:00
Chris Wilson	5cabb8c543	benchmarks/gem_blt: Report peak throughput Report the highest throughput measured from a large set of runs to improve sensitivity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-10 14:04:58 +00:00
Chris Wilson	ce65232cf5	benchmarks/gem_wait: Remove pthread_cancel() Apparently the pthread shim on Android doesn't have pthread cancellation, so use the plain old volatile to terminate the CPU hogs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-30 15:51:21 +00:00
Chris Wilson	9024a72d29	benchmark/gem_wait: poc for benchmarking i915_wait_request overhead One scenario under recent discussion is that of having a thundering herd in i915_wait_request - where the overhead of waking up every waiter for every batchbuffer was significantly impacting customer throughput. This benchmark tries to replicate something to that effect by having a large number of consumers generating a busy load (a large copy followed by lots of small copies to generate lots of interrupts) and tries to wait upon all the consumers concurrenctly (to reproduce the thundering herd effect). To measure the overhead, we have a bunch of cpu hogs - less kernel overhead in waiting should allow more CPU throughput. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-30 15:04:55 +00:00
Derek Morton	0ab76a22d1	benchmarks/gem_blt: Include igt.h in gem_blt.c To fix a build error on android Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Thomas Wood <thomas.wood@intel.com>	2015-10-15 16:59:59 +01:00
Ville Syrjälä	f52e7ec787	Replace __gem_mmap__{cpu,gtt,wc}() + igt_assert() with gem_mmap__{cpu,gtt,wc}() gem_mmap__{cpu,gtt,wc}() already has the assert built in, so replace __gem_mmap__{cpu,gtt,wc}() + igt_assert() with it. Mostly done with coccinelle, with some manual help: @@ identifier I; expression E1, E2, E3, E4, E5, E6; @@ ( - I = __gem_mmap__gtt(E1, E2, E3, E4); + I = gem_mmap__gtt(E1, E2, E3, E4); ... - igt_assert(I); \| - I = __gem_mmap__cpu(E1, E2, E3, E4, E5); + I = gem_mmap__cpu(E1, E2, E3, E4, E5); ... - igt_assert(I); \| - I = __gem_mmap__wc(E1, E2, E3, E4, E5); + I = gem_mmap__wc(E1, E2, E3, E4, E5); ... - igt_assert(I); ) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-12 19:57:58 +03:00
Ville Syrjälä	b8a77dd6c8	Make gem_mmap__{cpu,gtt,wc}() assert on failure Rename the current gem_mmap__{cpu,gtt,wc}() functions into __gem_mmap__{cpu,gtt,wc}(), and add back wrappers with the original name that assert that the pointer is valid. Most callers will expect a valid pointer and shouldn't have to bother with failures. To avoid changing anything (yet), sed 's/gem_mmap__/__gem_mmap__/g' over the entire codebase. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-09 19:16:26 +03:00
Ville Syrjälä	7eaae3c201	Sprinkle igt_assert(ptr) after gem_mmap__{cpu,gtt,wc} Do the following ptr = gem_mmap__{cpu,gtt,wc}() +igt_assert(ptr); whenever the code doesn't handle the NULL ptr in any kind of specific way. Makes it easier to move the assert into gem_mmap__{cpu,gtt,wc}() itself. Mostly done with coccinelle, with some manual cleanups: @@ identifier I; @@ <... when != igt_assert(I) when != igt_require(I) when != igt_require_f(I, ...) when != I != NULL when != I == NULL ( I = gem_mmap__gtt(...); + igt_assert(I); \| I = gem_mmap__cpu(...); + igt_assert(I); \| I = gem_mmap__wc(...); + igt_assert(I); ) ...> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-09 18:33:30 +03:00
Chris Wilson	d878e18dfd	benchmarks/gem_blt: Fix compilation after rebase and add batch-size Add an option to do more than one copy per batch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-06 17:04:31 +01:00
Chris Wilson	8253e7dc84	benchmarks: Measure BLT performance Execute N blits and time how long they complete to measure both GPU limited bandwidth and submission overhead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-10-06 10:24:07 +01:00
Derek Morton	1b492e311c	benchmarks: Fix build errors on Android M-Dessert Android M-Dessert treats implicit declaration of function warnings as errors resulting in igt failing to build. This patch fixes the errors by including missing header files as required. Mostly this involved including igt.h in the benchmarks. Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-10-02 09:42:30 +02:00
Micah Fedke	c81d293aed	convert drm_open_any() calls to drm_open_driver(DRIVER_INTEL) calls with cocci Apply the new API to all call sites within the test suite using the following semantic patch: // Semantic patch for replacing drm_open_any* with arch-specific drm_open_driver* calls @@ identifier i =~ "\bdrm_open_any\b"; @@ - i() + drm_open_driver(DRIVER_INTEL) @@ identifier i =~ "\bdrm_open_any_master\b"; @@ - i() + drm_open_driver_master(DRIVER_INTEL) @@ identifier i =~ "\bdrm_open_any_render\b"; @@ - i() + drm_open_driver_render(DRIVER_INTEL) @@ identifier i =~ "\b__drm_open_any\b"; @@ - i() + __drm_open_driver(DRIVER_INTEL) Signed-off-by: Micah Fedke <micah.fedke@collabora.co.uk> Signed-off-by: Thomas Wood <thomas.wood@intel.com>	2015-09-11 14:39:43 +01:00
Thomas Wood	1dcace3018	build: fix unused-result warnings Signed-off-by: Thomas Wood <thomas.wood@intel.com>	2015-09-08 16:15:16 +01:00
Chris Wilson	5e68ad9f82	benchmarks/gem_exec_reloc: Allow profiling 0 relocs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-21 22:27:35 +01:00
Chris Wilson	77b8af218c	benchmark/gem_exec_trace: Inline everything Avoid the globals and make the dispatch one huge function and hope GCC works some magic. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-14 20:54:35 +01:00
Chris Wilson	a64e6c39b1	benchmark/gem_exec_tracer: Tweak to handle SNA SNA starts by feeding in deliberately bad ioctls in order to detect the kernel interface versions. A quick solution is to always feed it to the ioctl and only record the trace if it is valid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-14 20:34:21 +01:00
Derek Morton	d524a964fc	benckmarks/Android.mk: Fix building benchmarks for Android The commit "benchmarks: Do not install to system-wide bin/" changed the benchmark file list from bin_PROGRAMS to benchmarks_PROGRAMS. However Android.mk was not updated, resulting in IGT failing to build for Android. This commit adds that change. It also adds LOCAL_MODULE_PATH to specify where the built benchmarks should be put. v2: I discovered that the existing definitions of LOCAL_MODULE_PATH were creating what should have been an invalid path. Not sure how it was ever working previously, but fixed now. Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Thomas Wood <thomas.wood@intel.com>	2015-08-13 11:28:22 +01:00
Chris Wilson	38b3bd6b7c	benchmarks: Add a microbenchmark for relocation overhead Allow specification of the many different busyness modes and relocation interfaces, along with the number of buffers to use and relocations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-11 15:31:02 +01:00
Chris Wilson	98bcc18572	benchmarks/gem_exec_trace: Unmap each trace after replay Just on the off chance someone is replaying a bunch of traces, remember to cleanup up. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-10 18:53:26 +01:00
Chris Wilson	b483e68173	benchmarks/gem_exec_trace: Mark the mmap as sequentially read Use madvise(MADV_SEQUENTIAL) to let the kernel optimise for our straightforward sequential read pattern. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-10 18:53:26 +01:00
Chris Wilson	3911621d0d	benchmarks: Rename the gem_exec_trace tracer module Now that we actually install the benchmarks into a sane location, slightly abuse it to put the tracer for gem_exec_trace alongside. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-10 18:24:15 +01:00
Chris Wilson	d9462e61f9	benchmarks/gem_exec_trace: Clear all new bo handles When reallocing the bo array, remember to set the new entries to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-10 16:16:36 +01:00
Chris Wilson	4c74a683c1	benchmarks: Do not install to system-wide bin/ These benchmarks are first-and-foremost development tools, not aimed at general users. As such they should not be installed into the system-wide bin/ directory, but installed into libexec/. v2: Now actually install beneath ${libexec} Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-10 15:53:08 +01:00
Chris Wilson	0393e7288b	benchmarks: Record and replay calls to EXECBUFFER2 This slightly idealises the behaviour of clients with the aim of measuring the kernel overhead of different workloads. This test focuses on the cost of relocating batchbuffers. A trace file is generated with an LD_PRELOAD intercept around execbuffer, which we can then replay at our leisure. The replay replaces the real buffers with a set of empty ones so the only thing that the kernel has to do is parse the relocations. but without a real workload we lose the impact of having to rewrite active buffers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-08-09 19:20:46 +01:00
Derek Morton	1ae1d290bf	benchmarks/Android.mk, tools/Android.mk: Fix android build error Recently added tools / benckmarks have the same module name as existing tests. Android does not allow duplicate modules. This patch appends _benchmark and _tool to the module names used when building benckmarks and tools to prevent clashes with tests of the same name. Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-08-06 14:39:44 +02:00
Chris Wilson	cd306d4e65	benchmark: Measure allocation time for objects A basic measurement, how fast can we create and populate an object with backing storage? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-07-24 18:56:00 +01:00
Chris Wilson	42a386b83b	benchmarks: Measure mmap fault latency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-07-24 18:55:49 +01:00
Chris Wilson	e984d4965f	benchmarks: Benchmarkify gem_exec_ctx Measure the overhead of execution when doing nothing, switching between a pair of contexts, or creating a new context every time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-07-24 18:55:49 +01:00
Chris Wilson	e14507ce98	benchmarks: Add kms_vblank to .gitignore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-07-24 14:16:44 +01:00
Chris Wilson	d88981f62b	benchmarks: Measure round-trip time for an immediate vblanks By measuring both the query and the event round trip time, we can make a reasonable estimate of how long it takes for the query to send the vblank following an interrupt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-07-23 15:52:53 +01:00
Chris Wilson	af510c249d	benchmarks: gem_prw add the read/write switch to getopt In my haste to merge the two gem_pread/gem_pwrite, I forgot to write up the command line switch to getopt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-07-23 12:30:06 +01:00

1 2

77 Commits