drv_hangman.c: In function ‘hangcheck_unterminated’:
drv_hangman.c:290:27: warning: integer overflow in expression [-Woverflow]
int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The hangcheck logic will not flag an hang if acthd keeps increasing.
However, if a malformed batch jumps to an invalid offset in the ppgtt it
can potentially continue executing through the whole address space
without triggering the hangcheck mechanism.
This patch adds a test to simulate the issue. I've kept the test running
for more than 10 minutes before killing it on a BDW and no hang occurred.
I've sampled i915_hangcheck_info a few times during the run and got the
following:
Hangcheck active, fires in 468ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x47df685ecc [current 0x4926b81d90]
max ACTHD = 0x47df685ecc
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
Hangcheck active, fires in 424ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x6c953d3a34 [current 0x6de5e76fa4]
max ACTHD = 0x6c953d3a34
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
Hangcheck active, fires in 1692ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x1f49b0366dc [current 0x1f4dcbd88ec]
max ACTHD = 0x1f49b0366dc
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
v2: use the new gem_wait() function (Chris)
v3: switch to unterminated batch and rename test, remove redundant
check, update test requirements (Chris), update top comment
v4: force gpu reset if the hang detection fails (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[Mika: removed batch_len=8]
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
All the external viewer expects of the GPU error capture is to extract
the exact batch that triggered the hang. Everything else is internal
detail to aide in post-mortem debugging of the kernel driver (i.e.
subject to change) and not of the userspace portion (under control of
the test).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
No functional changes.
While I'm here, let's also rename gem_uses_aliasing_ppgtt (since it's
being used to indicate if we are using ANY kind of ppgtt) and introduce
gem_uses_full_ppgtt to drop some unnecessary code from tests that were
previously calling getparam directly instead of using ioctl wrapper.
v2: drop gem_uses_full_48b_ppgtt since it's no longer used anywhere,
s/48b/64b (Chris)
v3: rebase
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way we correctly auto-skip instead of falling over the
lack of i915 debugfs files first and fail the testcase due to
that.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Apply the new API to all call sites within the test suite using the following
semantic patch:
// Semantic patch for replacing drm_open_any* with arch-specific drm_open_driver* calls
@@
identifier i =~ "\bdrm_open_any\b";
@@
- i()
+ drm_open_driver(DRIVER_INTEL)
@@
identifier i =~ "\bdrm_open_any_master\b";
@@
- i()
+ drm_open_driver_master(DRIVER_INTEL)
@@
identifier i =~ "\bdrm_open_any_render\b";
@@
- i()
+ drm_open_driver_render(DRIVER_INTEL)
@@
identifier i =~ "\b__drm_open_any\b";
@@
- i()
+ __drm_open_driver(DRIVER_INTEL)
Signed-off-by: Micah Fedke <micah.fedke@collabora.co.uk>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
Add a header that includes all the headers for the library. This allows
reorganisation of the library without affecting programs using it and
also simplifies the headers that need to be included to use the library.
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
The integer comparison macros give us better error output by including
the actual values that failed the comparison.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Also move forcewake and stop_rings code from igt_debugfs to igt_gt
since it fits better. And move the hang injection fork helpers from
igt_aux to igt_gt, too.
Also push the intel_gen call into igt_hang_ring while at it.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
This test will not run on Android as the coreu service
remains running even after the android system is stopped.
Coreu is a client of drm and when the test finds this it
fails an assert.
Coreu is started by the init process and there is no
tidy, non invasive way to stop it (init just restarts it).
Coreu isn't doing anything and would not be expected to
interfere with this test. In addition, all the other
igt tests just rely on the user/test script to ensure
that there are no other drm clients, so this test can
do the same. On Android we must rely on coreu being
dormant when this test runs.
Signed-off-by: Tim Gore <tim.gore@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This test has a few checks that batch buffer addresses in the error
state match the expected address for the userspace supplied batch.
But the batch buffer copy piece of the command parser means that
the logged addresses are actually _supposed_ to be different. So
skip just those checks.
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 6903ab04e5f9048e3932eb3225e94b6a228681ba.
The igt_assert conversion rule is broken and doesn't invert the check
as it should.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Guarantees that error capture works at a very basic level.
v2: Also check that the ring object contains a reloc with MI_BB_START
for the presumed batch object's address.
v3: Chris review comments:
- Move variables to local scope.
- Do not assume there is only one request.
- Some gen encode flags into the BB start address.
Also, use igt_set/get_stop_rings as suggested by Mika Kuoppala.
v4: Make as a subtest of drv_hangman.
v5: Rebase
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> <v4>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Mixing script and standlone tests didn't mix well with the
strict i915_ring_stop flags handling. Also squash drv_missed_irq_hang
to the new test.
v2: - Remove missed irq test (Daniel Vetter)
- gitignore fixed (Oscar Mateo)
- fix check_other_clients to handle dangling fd's
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78322
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> <v1>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>