If we don't reset exit_handler_count before forking, we may have a
case where the forked process is killed before it even does
"exit_handler_count = 0": in that case, it is still finishing forking.
When that happens, we may end up calling our exit handlers. On the
specific bug I'm investigating, we call igt_reset_connnectors(), which
ends up in a deadlock inside malloc_atfork. If we attach gdb to the
forked process and get a backtrace, we have:
(gdb) bt
0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
1 0x00007f15634d36bf in _L_lock_10524 () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f15634d12ef in malloc_atfork (sz=139729840351352, caller=<optimized out>) at arena.c:181
3 0x00007f15640466a1 in drmMalloc () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
4 0x00007f1564049ad7 in drmModeGetResources () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
5 0x0000000000408f84 in igt_reset_connectors () at igt_kms.c:1656
6 0x00000000004092dc in call_exit_handlers (sig=15) at igt_core.c:1130
7 fatal_sig_handler (sig=15) at igt_core.c:1154
8 <signal handler called>
9 0x00007f15634cce60 in ptmalloc_unlock_all2 () at arena.c:298
10 0x00007f156350ca3f in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:188
11 0x000000000040a029 in __igt_fork_helper (proc=proc@entry=0x610fc4 <signal_helper>) at igt_core.c:910
12 0x000000000040459d in igt_fork_signal_helper () at igt_aux.c:110
13 0x0000000000402ab7 in __real_main63 () at bug.c:76
14 0x000000000040296e in main (argc=<optimized out>, argv=<optimized out>) at bug.c:63
After doing some searches for "stuck at malloc_atfork", it seems to me
we probably shouldn't be doing any malloc calls at this point of the
code, so the best way to do that is to make sure we can't really run
the exit handlers.
So on this patch, instead of resetting the exit handlers after
forking, we reset them before forking, and then restore the original
value on the parent process.
I can reproduce this problem by running "./kms_flip --run-subtest
2x-flip-vs-modeset" under an infinite loop. Usually after a few
hundred calls, we end up stuck on the deadlock mentioned above. QA
says this problem happens every time, but I'm not sure what is the
difference between our environments that makes the race condition so
much easier for them.
The kms_flip.c problem can be considered a regression introduced by:
commit eef768f283466b6d7cb3f08381f72ccf3951dc99
Author: Thomas Wood <thomas.wood@intel.com>
Date: Wed Jun 18 14:28:43 2014 +0100
tests: enable extra connectors in kms_flip and kms_pipe_crc_basic
even though this commit is not the one that introduced the real
problem.
It is also possible to reproduce this problem with a few modifications
to template.c:
- Add a call to igt_enable_connectors() inside the first fixture.
- Add igt_fork_signal_helper() and igt_stop_signal_helper() calls
around subtest B.
Note that the crucial piece is that the parent actively kills helper
children, and if we skip tests this can happen _really_ fast. See e.g.
commit a031a1bf93b828585e7147f06145fc5030814547
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Fri Sep 13 16:43:22 2013 +0200
lib/drmtest: ducttape over fork race
for past hilarity in this area.
Cc: Thomas Wood <thomas.wood@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81367
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I was thrown by the routine calling itself gen7 when in it gen8 specific
and required 64bit relocation fixes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since relocations are variable size, depending upon generation, it is
easier to handle the resizing of the batch request inside the
BEGIN_BATCH macro. This still leaves us with having to resize commands
in a few places - which still need adaption for gen8+.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reveal that quite a few locations were writing relocation offsets
but only allowing for 32 bit addresses. To reveal such places in active
tests, we also now double check that we do not use more batch space than
declared.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Except in igt_core since that would lead to some hilarious recursions.
v2: Don't fflush any more, spotted by Chris.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Various stuff all over. Most done with the igt.cocci spatch, but
with a few fixups by hand. And add igt_core.h includes where needed.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We were either returning 0, or a negative value cast to an unsigned int
for errors and the clients of that API weren't exactly checking
anything.
We're in luck, we can take shortcuts in a testing library to just assert
when an expected error occurs.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
In the future, we'll need more than X tiling here. So give a full enum
instead of bool meaning X-tiled.
It's fine to do this change without updating the users just yet as
'true' happens to be I915_TILING_X.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Plus a bit an overview section explaining the split in the library - a
few people (everyone except me it seems) didn't really understand it.
v2: Fix typo'ed s/kmstest_set_vt_graphics_mode/kmstest_get_pipe_from_crtc_id/
in a doc comment spotted by Imre.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Group them a bit both in the header and .c file, and make sure they
appear in the same order in both.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also shuffle things around a bit to make sure the order in the header
matches the order in the .c file.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Both pm_rpm.c and pm_lpsp.c call it "disable_all_screens", but let's
give it a name that better describes what the implementation does.
v2: Rename to kmstest_unset_all_crtcs (Daniel).
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
So we can use this function on places that also need the property
pointer, without having to call drmModeGetProperty() again with the
returned id.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Since these functions only really use the drm_fd. The goal is to be
able to reuse these functions on programs that don't use the
igt_display_t structure.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Enable gem_media_fill test for CHV platform. In addition to differences in
media IP blocks from Broadwell, the command sequence also differs for
programming the media pipeline, e.g., should not send a MEDIA_STATE_FLUSH
right before the MI_BATCH_BUFFER_END of batch buffers using MEDIA_OBJECT.
Uses explicit IS_BROADWELL / IS_CHERRYVIEW to distinguish in gen8 media
fill handling.
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Reviewed-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ensure tests using igt_enable_connectors can still run even if the
relevant debugfs files are not available.
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
Most tests use a printable character as the value for getopt to return,
so avoid conflicts by using non-printing values for the standard options.
v2: fix "-h" short option
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
If you don't do this, it is excluded from the tarball generated by make
distcheck.
1.6 and 1.7 both are not buildable as a result.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
We don't need to keep a reference to the surface, the cairo context will
keep a reference to it until we destroy it.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
This is preparation work for when we need a different way to get a
linear buffer we can use with cairo.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Before issuing any i915 specific ioctls, check the driver is i915
otherwise we make other drivers emit nasty errors at the start of every
test.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
commit 743dc7997aa9f5210055896940d87c88983dcda6
breaks the build under Android because version.h
is not created. This happens because the android
make executes from the ANDROID_BUILD_TOP directory
rather than from the directory containing the source
files, so we need to differentiate between Android
and linux builds. This is V2 of this patch based on
Thomas Wood's suggestion.
Signed-off-by: Tim Gore <tim.gore@intel.com>
[Thomas: Fix distcheck issues]
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
One of the side-effects we test for are kernel oops and knowing the
guilty subtest can help speed up debugging. We can write to /dev/kmsg to
inject messages into dmesg, so let's do so before the start of every
test.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reduce code duplication as the igt_stop_helper can reuse
igt_wait_helper() to replace its own waiting routine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The atexit() and signal() callbacks both need to only use signalsafe
functions - that excludes the use of assert. So simplify
fork_helper_exit_handler() and children_exit_handler().
__lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
0x00007fd630883d2b in _L_lock_13840 () from /lib/x86_64-linux-gnu/libc.so.6
0x00007fd630881df8 in __GI___libc_realloc (oldmem=0xfcb010, bytes=88) at malloc.c:3025
0x00007fd63087111b in _IO_vasprintf (result_ptr=0x7fff35dc4780, format=<optimised out>, args=args@entry=0x7fff35dc4658) at vasprintf.c:84
0x00007fd630852907 in ___asprintf (string_ptr=string_ptr@entry=0x7fff35dc4780, format=format@entry=0x7fd63097f718 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n") at asprintf.c:35
0x00007fd63082dd92 in __assert_fail_base (fmt=0x7fd63097f718 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x40cff5 "ret == 0", file=file@entry=0x4108d1 "igt_core.c", line=line@entry=872, function=function@entry=0x410ea0 <__PRETTY_FUNCTION__.8052> "children_exit_handler") at assert.c:57
0x00007fd63082dee2 in __GI___assert_fail (assertion=assertion@entry=0x40cff5 "ret == 0", file=file@entry=0x4108d1 "igt_core.c", line=line@entry=872, function=function@entry=0x410ea0 <__PRETTY_FUNCTION__.8052> "children_exit_handler") at assert.c:101
0x000000000040b03f in children_exit_handler (sig=<optimised out>) at igt_core.c:872
0x000000000040b089 in call_exit_handlers (sig=2) at igt_core.c:1029 fatal_sig_handler (sig=2) at igt_core.c:1053 <signal handler called>
0x00007fd6308bfe63 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:130
0x00007fd630bd6045 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:25
0x000000000040c51a in __igt_fork () at igt_core.c:900
0x00000000004036c2 in forking_evictions (ops=0x614360 <fault_ops>, surface_size=1048576, flags=5, trash_surfaces=<optimised out>, working_surfaces=338, fd=4) at eviction_common.c:203
test_forking_evictions (size=1048576, flags=5, count=338, fd=4) at gem_userptr_blits.c:1086
main (argc=1, argv=0x7fff35dc5328) at gem_userptr_blits.c:1478
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move version.h generation into lib/Makefile.sources so that it can be
shared between the Autotools and Android build systems. Also make sure the
"updating version.h" message is only displayed when version.h actually
changes and remove unnecessary includes of version.h.
This also includes changes from Tvrtko Ursulin to prevent a build from
within the git repository failing when git is not available.
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>