Extremely basic check that we can dispatch an execbuf on every ring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Moved gem_quiescent_gpu() call to the run path.
Signed-off-by: Gabriel Feceoru <gabriel.feceoru@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The bufmgr import creates a new handle from a name for the userptr - we
can discard our original handle immediately.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
intel_residency has a cairo dependency through igt_fb.c. Remove it
if ANDROID_HAS_CAIRO is not defined.
Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Fixup some fallout from the connector probing changes so testdisplay -m
will pick up newly hotplugged displays correctly.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org.
If we have to share the GTT with others, we cannot rely on being able to
fill it and have to factor in some slack for others.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As well as ensuring the kernel doesn't simply crash when asked to do
lots of objects, check it actually aligns them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Daniel has suggested that I put vc4 testing into igt, since it's got
the piglit integration and KMS coverage already. This gets the ccore
building so that I can start writing tests.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Some bits can't be built on non-x86 architectures, mostly because they
require x86-specific assembly primitives. Disable these by default on
non-x86 architectures.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can move it from softpin test into lib, and since softpin support is
highly unlikely to go away in-between getparam ioctl calls, let's just
do a single call and store the value.
v2: rebase
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No functional changes.
While I'm here, let's also rename gem_uses_aliasing_ppgtt (since it's
being used to indicate if we are using ANY kind of ppgtt) and introduce
gem_uses_full_ppgtt to drop some unnecessary code from tests that were
previously calling getparam directly instead of using ioctl wrapper.
v2: drop gem_uses_full_48b_ppgtt since it's no longer used anywhere,
s/48b/64b (Chris)
v3: rebase
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After the recent discussions regarding the effects of the vblank
disabling policies on PC state residencies, I started running some
experiments to reevaluate some non-intuitive conclusions I had
reached. In order to help me do this, I decided to write this tool.
The idea is very simple: the tool puts the system on an screen-on idle
state, checks which PC state residency is the deepest we can reach,
measures its residency, then does some not-so-idle tests and measures
the residencies. You can use the tool to compare different Kernel
trees and you can also use the tool to compare enabled vs disabled
features.
It's obvious that these cases do not represent real-world use cases of
our driver, but they are already enough to highlight differences
between the many patches I wrote. I was even able to catch a bug in
one of my patches by spotting an unexpected regression in the
residencies.
I've been using this tool for FBC, but I expect it to also be useful
for PSR, DRRS and similar features. I've been measuring the effects of
different optimizations I wrote, and I've also been measuring the FBC
vs no-FBC cases.
It is also important to highlight that if your system is not properly
configured for efficient power savings the tool may not be able to
show differences between the results. On my Broadwell machine, for
example, if I don't run "powertop --auto-tune" before running the
tool, I get PC2 as the deepest state, and 90%+ residency for every
workload. After properly configuring the machine, I get PC7 as the
deepest state, which is the expected.
So far I only tested this tool on BDW and SKL, and it may hit some
unexpected assertions for older platforms.
I only implemented the cases that are immediately useful for me, but
we may also expand the tool in the future. We can add more important
workloads. We can add support for screen-off cases, so we can compare
the effects of runtime PM and other screen-off features. There's a lot
we can do, but none of this is on my current priority list.
And remember: /usr/bin/paste is your friend when comparing results.
v2:
- Be more idle at setup_idle().
- Improve printing for /usr/bin/paste usage.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
If we don't close the handle from the last pass, we don't free up the
previous pass's vma immediately, changing the hole allocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
snooped objects are not allowed to abutt uncached objects on older gen
(!llc and global GTT) or else the GPU may hang if it prefetches across a
page boundary into a different memory type (i.e. CS reading from snoop).
The kernel should be checking the alignment rules as normal.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we use a MAP_SHARED mmaping for the our backing storage for userptr,
then it will be inherited across the fork with the same address. ideal
for continuity testing of children.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We have to avoid the COW alias for the intel_bufmgr and intel_batch
cache as the child may close the object (in its local cache) leaving an
alias in the parent cache pointing to a stale object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The issue here is that the pointer inherited upon the child is
copied-on-write, i.e. the pointer is private to each process, but the
handle is shared. This means that writes and reads in the child are
going to a different set of pages than the GPU's object - the test is
simply broken. To overcome this we would need to mmap the shared buffer
into the child.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The current forked modes recreate their handles in the children and just
look at any complications arising from contention. This mode looks at
inheriting the fd+handles from the parent into the child and seeing if
we can use them within the child.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After setting the flag for NORELOC (to avoid having to pay the cost of
validating the relocations on every pass), we need to make sure that
we set EXEC_OBJECT_WRITE so that we do track the outstanding writes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we have the same function in a few places to read the
debugfs/i915_ring_missed_irq file, move it to the core.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For the older gen, MI_STORE_DATA_IMM is a privileged command so we need
to set the "secure" batch flag, and we also need to instruct the command
to use the GTT virtual address.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since we need a lot of memory, trim off the less significant digits for
easier human consumption.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The objective of this test is to check how the driver handles a full
ring. To that end we need only submit enough work to fill the ring by
submitting work faster than the GPU can execute it. If we are more
careful in our batch construction, we can feed them much faster and
achieve the same results much quicker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With softpin we can explicitly manage the layout of the objects to be
executed, deliberately forcing the reuse of active pages in an attempt
to spot misbehaviour in the CS TLBs. Being explicit allows us to
eliminate a lot of the CPU overhead between execbuf, hopefully
increasing the likelihood of a conflict.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Some potential callers want to inject a hang into a particular context,
some want to trigger an actual ban and others may or may not want to
capture the associated error state. Expand the hang injection interface
to suit all.
v2: Disable the new kernel API, but push to provide a missing piece of
infrastucture to unbreak compilation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the hang injection now itself checks for validity before use, the
tests don't need to do so themselves. Except in certain situations! If
the test forks, it should do requirement checks before the fork (so that
we don't anger the igt gods) and if the test plays around i915.reset
then it needs to do an early igt_require_hang_ring() that is not
affected by the changes to i915.reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we move the igt_require() into the hang injector, this makes simple
test cases even more convenient. More complex test cases can always do
their own precursory check before settting up the test.
However, this does embed the assumption that the first context we are
called from is safe (i.e no i915.enable_hangcheck/i915.reset
interferrence).
v2: A couple of environment variables to skip hang testing or to force
hang injection even if the GPU cannot be reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The range we chose to overwrite in the target had an off-by-one error
that could cause it to compute a size that went past the end of the
buffer (and so trigger EINVAL). Fortuituously with our seed this did not
occur. Whilst changing the range calculation, update the error logging
to include the range information.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For softpinning, we do not require either userptr or extended ppgtt, so
remove those requirements and make the tests work universally. (Certain
ABI tests require large GTT, or per-process GTT.)
In the process, make the tests more extensive - validate overlapping
handling more careful, explicitly test no-relocation support, validate
more ABI handling. And for fun, cause a kernel GPF.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Compute the largest alignment for the most number of objects we can create,
then trying an execbuf with them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The test just aims to execute batches on alternating rings with a write
target such that every batch must be executed after the previous
completes. This stresses the inter-ring synchronisation, which is
interrupt driven if the gpu does not support semaphores, and so is a
good stress tests for detecting "missed interrupt syndrome". Make that
detection explicit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>