mirror of
https://github.com/ioacademy-jikim/debugging
synced 2025-06-08 08:26:14 +00:00
1249 lines
64 KiB
HTML
1249 lines
64 KiB
HTML
<html>
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||
<title>7. Helgrind: a thread error detector</title>
|
||
<link rel="stylesheet" type="text/css" href="vg_basic.css">
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.78.1">
|
||
<link rel="home" href="index.html" title="Valgrind Documentation">
|
||
<link rel="up" href="manual.html" title="Valgrind User Manual">
|
||
<link rel="prev" href="cl-manual.html" title="6. Callgrind: a call-graph generating cache and branch prediction profiler">
|
||
<link rel="next" href="drd-manual.html" title="8. DRD: a thread error detector">
|
||
</head>
|
||
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
||
<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
|
||
<td width="22px" align="center" valign="middle"><a accesskey="p" href="cl-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
|
||
<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
|
||
<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
|
||
<th align="center" valign="middle">Valgrind User Manual</th>
|
||
<td width="22px" align="center" valign="middle"><a accesskey="n" href="drd-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
|
||
</tr></table></div>
|
||
<div class="chapter">
|
||
<div class="titlepage"><div><div><h1 class="title">
|
||
<a name="hg-manual"></a>7. Helgrind: a thread error detector</h1></div></div></div>
|
||
<div class="toc">
|
||
<p><b>Table of Contents</b></p>
|
||
<dl class="toc">
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.overview">7.1. Overview</a></span></dt>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.api-checks">7.2. Detected errors: Misuses of the POSIX pthreads API</a></span></dt>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.lock-orders">7.3. Detected errors: Inconsistent Lock Orderings</a></span></dt>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.data-races">7.4. Detected errors: Data Races</a></span></dt>
|
||
<dd><dl>
|
||
<dt><span class="sect2"><a href="hg-manual.html#hg-manual.data-races.example">7.4.1. A Simple Data Race</a></span></dt>
|
||
<dt><span class="sect2"><a href="hg-manual.html#hg-manual.data-races.algorithm">7.4.2. Helgrind's Race Detection Algorithm</a></span></dt>
|
||
<dt><span class="sect2"><a href="hg-manual.html#hg-manual.data-races.errmsgs">7.4.3. Interpreting Race Error Messages</a></span></dt>
|
||
</dl></dd>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.effective-use">7.5. Hints and Tips for Effective Use of Helgrind</a></span></dt>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.options">7.6. Helgrind Command-line Options</a></span></dt>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.monitor-commands">7.7. Helgrind Monitor Commands</a></span></dt>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.client-requests">7.8. Helgrind Client Requests</a></span></dt>
|
||
<dt><span class="sect1"><a href="hg-manual.html#hg-manual.todolist">7.9. A To-Do List for Helgrind</a></span></dt>
|
||
</dl>
|
||
</div>
|
||
<p>To use this tool, you must specify
|
||
<code class="option">--tool=helgrind</code> on the Valgrind
|
||
command line.</p>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.overview"></a>7.1. Overview</h2></div></div></div>
|
||
<p>Helgrind is a Valgrind tool for detecting synchronisation errors
|
||
in C, C++ and Fortran programs that use the POSIX pthreads
|
||
threading primitives.</p>
|
||
<p>The main abstractions in POSIX pthreads are: a set of threads
|
||
sharing a common address space, thread creation, thread joining,
|
||
thread exit, mutexes (locks), condition variables (inter-thread event
|
||
notifications), reader-writer locks, spinlocks, semaphores and
|
||
barriers.</p>
|
||
<p>Helgrind can detect three classes of errors, which are discussed
|
||
in detail in the next three sections:</p>
|
||
<div class="orderedlist"><ol class="orderedlist" type="1">
|
||
<li class="listitem"><p><a class="link" href="hg-manual.html#hg-manual.api-checks" title="7.2. Detected errors: Misuses of the POSIX pthreads API">
|
||
Misuses of the POSIX pthreads API.</a></p></li>
|
||
<li class="listitem"><p><a class="link" href="hg-manual.html#hg-manual.lock-orders" title="7.3. Detected errors: Inconsistent Lock Orderings">
|
||
Potential deadlocks arising from lock
|
||
ordering problems.</a></p></li>
|
||
<li class="listitem"><p><a class="link" href="hg-manual.html#hg-manual.data-races" title="7.4. Detected errors: Data Races">
|
||
Data races -- accessing memory without adequate locking
|
||
or synchronisation</a>.
|
||
</p></li>
|
||
</ol></div>
|
||
<p>Problems like these often result in unreproducible,
|
||
timing-dependent crashes, deadlocks and other misbehaviour, and
|
||
can be difficult to find by other means.</p>
|
||
<p>Helgrind is aware of all the pthread abstractions and tracks
|
||
their effects as accurately as it can. On x86 and amd64 platforms, it
|
||
understands and partially handles implicit locking arising from the
|
||
use of the LOCK instruction prefix. On PowerPC/POWER and ARM
|
||
platforms, it partially handles implicit locking arising from
|
||
load-linked and store-conditional instruction pairs.
|
||
</p>
|
||
<p>Helgrind works best when your application uses only the POSIX
|
||
pthreads API. However, if you want to use custom threading
|
||
primitives, you can describe their behaviour to Helgrind using the
|
||
<code class="varname">ANNOTATE_*</code> macros defined
|
||
in <code class="varname">helgrind.h</code>.</p>
|
||
<p>Following those is a section containing
|
||
<a class="link" href="hg-manual.html#hg-manual.effective-use" title="7.5. Hints and Tips for Effective Use of Helgrind">
|
||
hints and tips on how to get the best out of Helgrind.</a>
|
||
</p>
|
||
<p>Then there is a
|
||
<a class="link" href="hg-manual.html#hg-manual.options" title="7.6. Helgrind Command-line Options">summary of command-line
|
||
options.</a>
|
||
</p>
|
||
<p>Finally, there is
|
||
<a class="link" href="hg-manual.html#hg-manual.todolist" title="7.9. A To-Do List for Helgrind">a brief summary of areas in which Helgrind
|
||
could be improved.</a>
|
||
</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.api-checks"></a>7.2. Detected errors: Misuses of the POSIX pthreads API</h2></div></div></div>
|
||
<p>Helgrind intercepts calls to many POSIX pthreads functions, and
|
||
is therefore able to report on various common problems. Although
|
||
these are unglamourous errors, their presence can lead to undefined
|
||
program behaviour and hard-to-find bugs later on. The detected errors
|
||
are:</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>unlocking an invalid mutex</p></li>
|
||
<li class="listitem"><p>unlocking a not-locked mutex</p></li>
|
||
<li class="listitem"><p>unlocking a mutex held by a different
|
||
thread</p></li>
|
||
<li class="listitem"><p>destroying an invalid or a locked mutex</p></li>
|
||
<li class="listitem"><p>recursively locking a non-recursive mutex</p></li>
|
||
<li class="listitem"><p>deallocation of memory that contains a
|
||
locked mutex</p></li>
|
||
<li class="listitem"><p>passing mutex arguments to functions expecting
|
||
reader-writer lock arguments, and vice
|
||
versa</p></li>
|
||
<li class="listitem"><p>when a POSIX pthread function fails with an
|
||
error code that must be handled</p></li>
|
||
<li class="listitem"><p>when a thread exits whilst still holding locked
|
||
locks</p></li>
|
||
<li class="listitem"><p>calling <code class="function">pthread_cond_wait</code>
|
||
with a not-locked mutex, an invalid mutex,
|
||
or one locked by a different
|
||
thread</p></li>
|
||
<li class="listitem"><p>inconsistent bindings between condition
|
||
variables and their associated mutexes</p></li>
|
||
<li class="listitem"><p>invalid or duplicate initialisation of a pthread
|
||
barrier</p></li>
|
||
<li class="listitem"><p>initialisation of a pthread barrier on which threads
|
||
are still waiting</p></li>
|
||
<li class="listitem"><p>destruction of a pthread barrier object which was
|
||
never initialised, or on which threads are still
|
||
waiting</p></li>
|
||
<li class="listitem"><p>waiting on an uninitialised pthread
|
||
barrier</p></li>
|
||
<li class="listitem"><p>for all of the pthreads functions that Helgrind
|
||
intercepts, an error is reported, along with a stack
|
||
trace, if the system threading library routine returns
|
||
an error code, even if Helgrind itself detected no
|
||
error</p></li>
|
||
</ul></div>
|
||
<p>Checks pertaining to the validity of mutexes are generally also
|
||
performed for reader-writer locks.</p>
|
||
<p>Various kinds of this-can't-possibly-happen events are also
|
||
reported. These usually indicate bugs in the system threading
|
||
library.</p>
|
||
<p>Reported errors always contain a primary stack trace indicating
|
||
where the error was detected. They may also contain auxiliary stack
|
||
traces giving additional information. In particular, most errors
|
||
relating to mutexes will also tell you where that mutex first came to
|
||
Helgrind's attention (the "<code class="computeroutput">was first observed
|
||
at</code>" part), so you have a chance of figuring out which
|
||
mutex it is referring to. For example:</p>
|
||
<pre class="programlisting">
|
||
Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
|
||
at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
|
||
by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
|
||
by 0x40079B: main (tc09_bad_unlock.c:50)
|
||
Lock at 0x7FEFFFA90 was first observed
|
||
at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
|
||
by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
|
||
by 0x40079B: main (tc09_bad_unlock.c:50)
|
||
</pre>
|
||
<p>Helgrind has a way of summarising thread identities, as
|
||
you see here with the text "<code class="computeroutput">Thread
|
||
#1</code>". This is so that it can speak about threads and
|
||
sets of threads without overwhelming you with details. See
|
||
<a class="link" href="hg-manual.html#hg-manual.data-races.errmsgs" title="7.4.3. Interpreting Race Error Messages">below</a>
|
||
for more information on interpreting error messages.</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.lock-orders"></a>7.3. Detected errors: Inconsistent Lock Orderings</h2></div></div></div>
|
||
<p>In this section, and in general, to "acquire" a lock simply
|
||
means to lock that lock, and to "release" a lock means to unlock
|
||
it.</p>
|
||
<p>Helgrind monitors the order in which threads acquire locks.
|
||
This allows it to detect potential deadlocks which could arise from
|
||
the formation of cycles of locks. Detecting such inconsistencies is
|
||
useful because, whilst actual deadlocks are fairly obvious, potential
|
||
deadlocks may never be discovered during testing and could later lead
|
||
to hard-to-diagnose in-service failures.</p>
|
||
<p>The simplest example of such a problem is as
|
||
follows.</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>Imagine some shared resource R, which, for whatever
|
||
reason, is guarded by two locks, L1 and L2, which must both be held
|
||
when R is accessed.</p></li>
|
||
<li class="listitem"><p>Suppose a thread acquires L1, then L2, and proceeds
|
||
to access R. The implication of this is that all threads in the
|
||
program must acquire the two locks in the order first L1 then L2.
|
||
Not doing so risks deadlock.</p></li>
|
||
<li class="listitem"><p>The deadlock could happen if two threads -- call them
|
||
T1 and T2 -- both want to access R. Suppose T1 acquires L1 first,
|
||
and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries
|
||
to acquire L1, but those locks are both already held. So T1 and T2
|
||
become deadlocked.</p></li>
|
||
</ul></div>
|
||
<p>Helgrind builds a directed graph indicating the order in which
|
||
locks have been acquired in the past. When a thread acquires a new
|
||
lock, the graph is updated, and then checked to see if it now contains
|
||
a cycle. The presence of a cycle indicates a potential deadlock involving
|
||
the locks in the cycle.</p>
|
||
<p>In general, Helgrind will choose two locks involved in the cycle
|
||
and show you how their acquisition ordering has become inconsistent.
|
||
It does this by showing the program points that first defined the
|
||
ordering, and the program points which later violated it. Here is a
|
||
simple example involving just two locks:</p>
|
||
<pre class="programlisting">
|
||
Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
|
||
|
||
Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
|
||
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
|
||
by 0x400825: main (tc13_laog1.c:23)
|
||
|
||
followed by a later acquisition of lock at 0x7FF0006D0
|
||
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
|
||
by 0x400853: main (tc13_laog1.c:24)
|
||
|
||
Required order was established by acquisition of lock at 0x7FF0006D0
|
||
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
|
||
by 0x40076D: main (tc13_laog1.c:17)
|
||
|
||
followed by a later acquisition of lock at 0x7FF0006A0
|
||
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
|
||
by 0x40079B: main (tc13_laog1.c:18)
|
||
</pre>
|
||
<p>When there are more than two locks in the cycle, the error is
|
||
equally serious. However, at present Helgrind does not show the locks
|
||
involved, sometimes because that information is not available, but
|
||
also so as to avoid flooding you with information. For example, a
|
||
naive implementation of the famous Dining Philosophers problem
|
||
involves a cycle of five locks
|
||
(see <code class="computeroutput">helgrind/tests/tc14_laog_dinphils.c</code>).
|
||
In this case Helgrind has detected that all 5 philosophers could
|
||
simultaneously pick up their left fork and then deadlock whilst
|
||
waiting to pick up their right forks.</p>
|
||
<pre class="programlisting">
|
||
Thread #6: lock order "0x80499A0 before 0x8049A00" violated
|
||
|
||
Observed (incorrect) order is: acquisition of lock at 0x8049A00
|
||
at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495)
|
||
by 0x80485B4: dine (tc14_laog_dinphils.c:18)
|
||
by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219)
|
||
by 0x39B924: start_thread (pthread_create.c:297)
|
||
by 0x2F107D: clone (clone.S:130)
|
||
|
||
followed by a later acquisition of lock at 0x80499A0
|
||
at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495)
|
||
by 0x80485CD: dine (tc14_laog_dinphils.c:19)
|
||
by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219)
|
||
by 0x39B924: start_thread (pthread_create.c:297)
|
||
by 0x2F107D: clone (clone.S:130)
|
||
</pre>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.data-races"></a>7.4. Detected errors: Data Races</h2></div></div></div>
|
||
<p>A data race happens, or could happen, when two threads access a
|
||
shared memory location without using suitable locks or other
|
||
synchronisation to ensure single-threaded access. Such missing
|
||
locking can cause obscure timing dependent bugs. Ensuring programs
|
||
are race-free is one of the central difficulties of threaded
|
||
programming.</p>
|
||
<p>Reliably detecting races is a difficult problem, and most
|
||
of Helgrind's internals are devoted to dealing with it.
|
||
We begin with a simple example.</p>
|
||
<div class="sect2">
|
||
<div class="titlepage"><div><div><h3 class="title">
|
||
<a name="hg-manual.data-races.example"></a>7.4.1. A Simple Data Race</h3></div></div></div>
|
||
<p>About the simplest possible example of a race is as follows. In
|
||
this program, it is impossible to know what the value
|
||
of <code class="computeroutput">var</code> is at the end of the program.
|
||
Is it 2 ? Or 1 ?</p>
|
||
<pre class="programlisting">
|
||
#include <pthread.h>
|
||
|
||
int var = 0;
|
||
|
||
void* child_fn ( void* arg ) {
|
||
var++; /* Unprotected relative to parent */ /* this is line 6 */
|
||
return NULL;
|
||
}
|
||
|
||
int main ( void ) {
|
||
pthread_t child;
|
||
pthread_create(&child, NULL, child_fn, NULL);
|
||
var++; /* Unprotected relative to child */ /* this is line 13 */
|
||
pthread_join(child, NULL);
|
||
return 0;
|
||
}
|
||
</pre>
|
||
<p>The problem is there is nothing to
|
||
stop <code class="varname">var</code> being updated simultaneously
|
||
by both threads. A correct program would
|
||
protect <code class="varname">var</code> with a lock of type
|
||
<code class="function">pthread_mutex_t</code>, which is acquired
|
||
before each access and released afterwards. Helgrind's output for
|
||
this program is:</p>
|
||
<pre class="programlisting">
|
||
Thread #1 is the program's root thread
|
||
|
||
Thread #2 was created
|
||
at 0x511C08E: clone (in /lib64/libc-2.8.so)
|
||
by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
|
||
by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
|
||
by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
|
||
by 0x400605: main (simple_race.c:12)
|
||
|
||
Possible data race during read of size 4 at 0x601038 by thread #1
|
||
Locks held: none
|
||
at 0x400606: main (simple_race.c:13)
|
||
|
||
This conflicts with a previous write of size 4 by thread #2
|
||
Locks held: none
|
||
at 0x4005DC: child_fn (simple_race.c:6)
|
||
by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
|
||
by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
|
||
by 0x511C0CC: clone (in /lib64/libc-2.8.so)
|
||
|
||
Location 0x601038 is 0 bytes inside global var "var"
|
||
declared at simple_race.c:3
|
||
</pre>
|
||
<p>This is quite a lot of detail for an apparently simple error.
|
||
The last clause is the main error message. It says there is a race as
|
||
a result of a read of size 4 (bytes), at 0x601038, which is the
|
||
address of <code class="computeroutput">var</code>, happening in
|
||
function <code class="computeroutput">main</code> at line 13 in the
|
||
program.</p>
|
||
<p>Two important parts of the message are:</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem">
|
||
<p>Helgrind shows two stack traces for the error, not one. By
|
||
definition, a race involves two different threads accessing the
|
||
same location in such a way that the result depends on the relative
|
||
speeds of the two threads.</p>
|
||
<p>
|
||
The first stack trace follows the text "<code class="computeroutput">Possible
|
||
data race during read of size 4 ...</code>" and the
|
||
second trace follows the text "<code class="computeroutput">This conflicts with
|
||
a previous write of size 4 ...</code>". Helgrind is
|
||
usually able to show both accesses involved in a race. At least
|
||
one of these will be a write (since two concurrent, unsynchronised
|
||
reads are harmless), and they will of course be from different
|
||
threads.</p>
|
||
<p>By examining your program at the two locations, you should be
|
||
able to get at least some idea of what the root cause of the
|
||
problem is. For each location, Helgrind shows the set of locks
|
||
held at the time of the access. This often makes it clear which
|
||
thread, if any, failed to take a required lock. In this example
|
||
neither thread holds a lock during the access.</p>
|
||
</li>
|
||
<li class="listitem">
|
||
<p>For races which occur on global or stack variables, Helgrind
|
||
tries to identify the name and defining point of the variable.
|
||
Hence the text "<code class="computeroutput">Location 0x601038 is 0 bytes inside
|
||
global var "var" declared at simple_race.c:3</code>".</p>
|
||
<p>Showing names of stack and global variables carries no
|
||
run-time overhead once Helgrind has your program up and running.
|
||
However, it does require Helgrind to spend considerable extra time
|
||
and memory at program startup to read the relevant debug info.
|
||
Hence this facility is disabled by default. To enable it, you need
|
||
to give the <code class="varname">--read-var-info=yes</code> option to
|
||
Helgrind.</p>
|
||
</li>
|
||
</ul></div>
|
||
<p>The following section explains Helgrind's race detection
|
||
algorithm in more detail.</p>
|
||
</div>
|
||
<div class="sect2">
|
||
<div class="titlepage"><div><div><h3 class="title">
|
||
<a name="hg-manual.data-races.algorithm"></a>7.4.2. Helgrind's Race Detection Algorithm</h3></div></div></div>
|
||
<p>Most programmers think about threaded programming in terms of
|
||
the basic functionality provided by the threading library (POSIX
|
||
Pthreads): thread creation, thread joining, locks, condition
|
||
variables, semaphores and barriers.</p>
|
||
<p>The effect of using these functions is to impose
|
||
constraints upon the order in which memory accesses can
|
||
happen. This implied ordering is generally known as the
|
||
"happens-before relation". Once you understand the happens-before
|
||
relation, it is easy to see how Helgrind finds races in your code.
|
||
Fortunately, the happens-before relation is itself easy to understand,
|
||
and is by itself a useful tool for reasoning about the behaviour of
|
||
parallel programs. We now introduce it using a simple example.</p>
|
||
<p>Consider first the following buggy program:</p>
|
||
<pre class="programlisting">
|
||
Parent thread: Child thread:
|
||
|
||
int var;
|
||
|
||
// create child thread
|
||
pthread_create(...)
|
||
var = 20; var = 10;
|
||
exit
|
||
|
||
// wait for child
|
||
pthread_join(...)
|
||
printf("%d\n", var);
|
||
</pre>
|
||
<p>The parent thread creates a child. Both then write different
|
||
values to some variable <code class="computeroutput">var</code>, and the
|
||
parent then waits for the child to exit.</p>
|
||
<p>What is the value of <code class="computeroutput">var</code> at the
|
||
end of the program, 10 or 20? We don't know. The program is
|
||
considered buggy (it has a race) because the final value
|
||
of <code class="computeroutput">var</code> depends on the relative rates
|
||
of progress of the parent and child threads. If the parent is fast
|
||
and the child is slow, then the child's assignment may happen later,
|
||
so the final value will be 10; and vice versa if the child is faster
|
||
than the parent.</p>
|
||
<p>The relative rates of progress of parent vs child is not something
|
||
the programmer can control, and will often change from run to run.
|
||
It depends on factors such as the load on the machine, what else is
|
||
running, the kernel's scheduling strategy, and many other factors.</p>
|
||
<p>The obvious fix is to use a lock to
|
||
protect <code class="computeroutput">var</code>. It is however
|
||
instructive to consider a somewhat more abstract solution, which is to
|
||
send a message from one thread to the other:</p>
|
||
<pre class="programlisting">
|
||
Parent thread: Child thread:
|
||
|
||
int var;
|
||
|
||
// create child thread
|
||
pthread_create(...)
|
||
var = 20;
|
||
// send message to child
|
||
// wait for message to arrive
|
||
var = 10;
|
||
exit
|
||
|
||
// wait for child
|
||
pthread_join(...)
|
||
printf("%d\n", var);
|
||
</pre>
|
||
<p>Now the program reliably prints "10", regardless of the speed of
|
||
the threads. Why? Because the child's assignment cannot happen until
|
||
after it receives the message. And the message is not sent until
|
||
after the parent's assignment is done.</p>
|
||
<p>The message transmission creates a "happens-before" dependency
|
||
between the two assignments: <code class="computeroutput">var = 20;</code>
|
||
must now happen-before <code class="computeroutput">var = 10;</code>.
|
||
And so there is no longer a race
|
||
on <code class="computeroutput">var</code>.
|
||
</p>
|
||
<p>Note that it's not significant that the parent sends a message
|
||
to the child. Sending a message from the child (after its assignment)
|
||
to the parent (before its assignment) would also fix the problem, causing
|
||
the program to reliably print "20".</p>
|
||
<p>Helgrind's algorithm is (conceptually) very simple. It monitors all
|
||
accesses to memory locations. If a location -- in this example,
|
||
<code class="computeroutput">var</code>,
|
||
is accessed by two different threads, Helgrind checks to see if the
|
||
two accesses are ordered by the happens-before relation. If so,
|
||
that's fine; if not, it reports a race.</p>
|
||
<p>It is important to understand that the happens-before relation
|
||
creates only a partial ordering, not a total ordering. An example of
|
||
a total ordering is comparison of numbers: for any two numbers
|
||
<code class="computeroutput">x</code> and
|
||
<code class="computeroutput">y</code>, either
|
||
<code class="computeroutput">x</code> is less than, equal to, or greater
|
||
than
|
||
<code class="computeroutput">y</code>. A partial ordering is like a
|
||
total ordering, but it can also express the concept that two elements
|
||
are neither equal, less or greater, but merely unordered with respect
|
||
to each other.</p>
|
||
<p>In the fixed example above, we say that
|
||
<code class="computeroutput">var = 20;</code> "happens-before"
|
||
<code class="computeroutput">var = 10;</code>. But in the original
|
||
version, they are unordered: we cannot say that either happens-before
|
||
the other.</p>
|
||
<p>What does it mean to say that two accesses from different
|
||
threads are ordered by the happens-before relation? It means that
|
||
there is some chain of inter-thread synchronisation operations which
|
||
cause those accesses to happen in a particular order, irrespective of
|
||
the actual rates of progress of the individual threads. This is a
|
||
required property for a reliable threaded program, which is why
|
||
Helgrind checks for it.</p>
|
||
<p>The happens-before relations created by standard threading
|
||
primitives are as follows:</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>When a mutex is unlocked by thread T1 and later (or
|
||
immediately) locked by thread T2, then the memory accesses in T1
|
||
prior to the unlock must happen-before those in T2 after it acquires
|
||
the lock.</p></li>
|
||
<li class="listitem"><p>The same idea applies to reader-writer locks,
|
||
although with some complication so as to allow correct handling of
|
||
reads vs writes.</p></li>
|
||
<li class="listitem"><p>When a condition variable (CV) is signalled on by
|
||
thread T1 and some other thread T2 is thereby released from a wait
|
||
on the same CV, then the memory accesses in T1 prior to the
|
||
signalling must happen-before those in T2 after it returns from the
|
||
wait. If no thread was waiting on the CV then there is no
|
||
effect.</p></li>
|
||
<li class="listitem"><p>If instead T1 broadcasts on a CV, then all of the
|
||
waiting threads, rather than just one of them, acquire a
|
||
happens-before dependency on the broadcasting thread at the point it
|
||
did the broadcast.</p></li>
|
||
<li class="listitem"><p>A thread T2 that continues after completing sem_wait
|
||
on a semaphore that thread T1 posts on, acquires a happens-before
|
||
dependence on the posting thread, a bit like dependencies caused
|
||
mutex unlock-lock pairs. However, since a semaphore can be posted
|
||
on many times, it is unspecified from which of the post calls the
|
||
wait call gets its happens-before dependency.</p></li>
|
||
<li class="listitem"><p>For a group of threads T1 .. Tn which arrive at a
|
||
barrier and then move on, each thread after the call has a
|
||
happens-after dependency from all threads before the
|
||
barrier.</p></li>
|
||
<li class="listitem"><p>A newly-created child thread acquires an initial
|
||
happens-after dependency on the point where its parent created it.
|
||
That is, all memory accesses performed by the parent prior to
|
||
creating the child are regarded as happening-before all the accesses
|
||
of the child.</p></li>
|
||
<li class="listitem"><p>Similarly, when an exiting thread is reaped via a
|
||
call to <code class="function">pthread_join</code>, once the call returns, the
|
||
reaping thread acquires a happens-after dependency relative to all memory
|
||
accesses made by the exiting thread.</p></li>
|
||
</ul></div>
|
||
<p>In summary: Helgrind intercepts the above listed events, and builds a
|
||
directed acyclic graph represented the collective happens-before
|
||
dependencies. It also monitors all memory accesses.</p>
|
||
<p>If a location is accessed by two different threads, but Helgrind
|
||
cannot find any path through the happens-before graph from one access
|
||
to the other, then it reports a race.</p>
|
||
<p>There are a couple of caveats:</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>Helgrind doesn't check for a race in the case where
|
||
both accesses are reads. That would be silly, since concurrent
|
||
reads are harmless.</p></li>
|
||
<li class="listitem"><p>Two accesses are considered to be ordered by the
|
||
happens-before dependency even through arbitrarily long chains of
|
||
synchronisation events. For example, if T1 accesses some location
|
||
L, and then <code class="function">pthread_cond_signals</code> T2, which later
|
||
<code class="function">pthread_cond_signals</code> T3, which then accesses L, then
|
||
a suitable happens-before dependency exists between the first and second
|
||
accesses, even though it involves two different inter-thread
|
||
synchronisation events.</p></li>
|
||
</ul></div>
|
||
</div>
|
||
<div class="sect2">
|
||
<div class="titlepage"><div><div><h3 class="title">
|
||
<a name="hg-manual.data-races.errmsgs"></a>7.4.3. Interpreting Race Error Messages</h3></div></div></div>
|
||
<p>Helgrind's race detection algorithm collects a lot of
|
||
information, and tries to present it in a helpful way when a race is
|
||
detected. Here's an example:</p>
|
||
<pre class="programlisting">
|
||
Thread #2 was created
|
||
at 0x511C08E: clone (in /lib64/libc-2.8.so)
|
||
by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
|
||
by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
|
||
by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
|
||
by 0x4008F2: main (tc21_pthonce.c:86)
|
||
|
||
Thread #3 was created
|
||
at 0x511C08E: clone (in /lib64/libc-2.8.so)
|
||
by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
|
||
by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
|
||
by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
|
||
by 0x4008F2: main (tc21_pthonce.c:86)
|
||
|
||
Possible data race during read of size 4 at 0x601070 by thread #3
|
||
Locks held: none
|
||
at 0x40087A: child (tc21_pthonce.c:74)
|
||
by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
|
||
by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
|
||
by 0x511C0CC: clone (in /lib64/libc-2.8.so)
|
||
|
||
This conflicts with a previous write of size 4 by thread #2
|
||
Locks held: none
|
||
at 0x400883: child (tc21_pthonce.c:74)
|
||
by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
|
||
by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
|
||
by 0x511C0CC: clone (in /lib64/libc-2.8.so)
|
||
|
||
Location 0x601070 is 0 bytes inside local var "unprotected2"
|
||
declared at tc21_pthonce.c:51, in frame #0 of thread 3
|
||
</pre>
|
||
<p>Helgrind first announces the creation points of any threads
|
||
referenced in the error message. This is so it can speak concisely
|
||
about threads without repeatedly printing their creation point call
|
||
stacks. Each thread is only ever announced once, the first time it
|
||
appears in any Helgrind error message.</p>
|
||
<p>The main error message begins at the text
|
||
"<code class="computeroutput">Possible data race during read</code>". At
|
||
the start is information you would expect to see -- address and size
|
||
of the racing access, whether a read or a write, and the call stack at
|
||
the point it was detected.</p>
|
||
<p>A second call stack is presented starting at the text
|
||
"<code class="computeroutput">This conflicts with a previous
|
||
write</code>". This shows a previous access which also
|
||
accessed the stated address, and which is believed to be racing
|
||
against the access in the first call stack. Note that this second
|
||
call stack is limited to a maximum of 8 entries to limit the
|
||
memory usage.</p>
|
||
<p>Finally, Helgrind may attempt to give a description of the
|
||
raced-on address in source level terms. In this example, it
|
||
identifies it as a local variable, shows its name, declaration point,
|
||
and in which frame (of the first call stack) it lives. Note that this
|
||
information is only shown when <code class="varname">--read-var-info=yes</code>
|
||
is specified on the command line. That's because reading the DWARF3
|
||
debug information in enough detail to capture variable type and
|
||
location information makes Helgrind much slower at startup, and also
|
||
requires considerable amounts of memory, for large programs.
|
||
</p>
|
||
<p>Once you have your two call stacks, how do you find the root
|
||
cause of the race?</p>
|
||
<p>The first thing to do is examine the source locations referred
|
||
to by each call stack. They should both show an access to the same
|
||
location, or variable.</p>
|
||
<p>Now figure out how how that location should have been made
|
||
thread-safe:</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>Perhaps the location was intended to be protected by
|
||
a mutex? If so, you need to lock and unlock the mutex at both
|
||
access points, even if one of the accesses is reported to be a read.
|
||
Did you perhaps forget the locking at one or other of the accesses?
|
||
To help you do this, Helgrind shows the set of locks held by each
|
||
threads at the time they accessed the raced-on location.</p></li>
|
||
<li class="listitem">
|
||
<p>Alternatively, perhaps you intended to use a some
|
||
other scheme to make it safe, such as signalling on a condition
|
||
variable. In all such cases, try to find a synchronisation event
|
||
(or a chain thereof) which separates the earlier-observed access (as
|
||
shown in the second call stack) from the later-observed access (as
|
||
shown in the first call stack). In other words, try to find
|
||
evidence that the earlier access "happens-before" the later access.
|
||
See the previous subsection for an explanation of the happens-before
|
||
relation.</p>
|
||
<p>
|
||
The fact that Helgrind is reporting a race means it did not observe
|
||
any happens-before relation between the two accesses. If
|
||
Helgrind is working correctly, it should also be the case that you
|
||
also cannot find any such relation, even on detailed inspection
|
||
of the source code. Hopefully, though, your inspection of the code
|
||
will show where the missing synchronisation operation(s) should have
|
||
been.</p>
|
||
</li>
|
||
</ul></div>
|
||
</div>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.effective-use"></a>7.5. Hints and Tips for Effective Use of Helgrind</h2></div></div></div>
|
||
<p>Helgrind can be very helpful in finding and resolving
|
||
threading-related problems. Like all sophisticated tools, it is most
|
||
effective when you understand how to play to its strengths.</p>
|
||
<p>Helgrind will be less effective when you merely throw an
|
||
existing threaded program at it and try to make sense of any reported
|
||
errors. It will be more effective if you design threaded programs
|
||
from the start in a way that helps Helgrind verify correctness. The
|
||
same is true for finding memory errors with Memcheck, but applies more
|
||
here, because thread checking is a harder problem. Consequently it is
|
||
much easier to write a correct program for which Helgrind falsely
|
||
reports (threading) errors than it is to write a correct program for
|
||
which Memcheck falsely reports (memory) errors.</p>
|
||
<p>With that in mind, here are some tips, listed most important first,
|
||
for getting reliable results and avoiding false errors. The first two
|
||
are critical. Any violations of them will swamp you with huge numbers
|
||
of false data-race errors.</p>
|
||
<div class="orderedlist"><ol class="orderedlist" type="1">
|
||
<li class="listitem">
|
||
<p>Make sure your application, and all the libraries it uses,
|
||
use the POSIX threading primitives. Helgrind needs to be able to
|
||
see all events pertaining to thread creation, exit, locking and
|
||
other synchronisation events. To do so it intercepts many POSIX
|
||
pthreads functions.</p>
|
||
<p>Do not roll your own threading primitives (mutexes, etc)
|
||
from combinations of the Linux futex syscall, atomic counters, etc.
|
||
These throw Helgrind's internal what's-going-on models
|
||
way off course and will give bogus results.</p>
|
||
<p>Also, do not reimplement existing POSIX abstractions using
|
||
other POSIX abstractions. For example, don't build your own
|
||
semaphore routines or reader-writer locks from POSIX mutexes and
|
||
condition variables. Instead use POSIX reader-writer locks and
|
||
semaphores directly, since Helgrind supports them directly.</p>
|
||
<p>Helgrind directly supports the following POSIX threading
|
||
abstractions: mutexes, reader-writer locks, condition variables
|
||
(but see below), semaphores and barriers. Currently spinlocks
|
||
are not supported, although they could be in future.</p>
|
||
<p>At the time of writing, the following popular Linux packages
|
||
are known to implement their own threading primitives:</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>Qt version 4.X. Qt 3.X is harmless in that it
|
||
only uses POSIX pthreads primitives. Unfortunately Qt 4.X
|
||
has its own implementation of mutexes (QMutex) and thread reaping.
|
||
Helgrind 3.4.x contains direct support
|
||
for Qt 4.X threading, which is experimental but is believed to
|
||
work fairly well. A side effect of supporting Qt 4 directly is
|
||
that Helgrind can be used to debug KDE4 applications. As this
|
||
is an experimental feature, we would particularly appreciate
|
||
feedback from folks who have used Helgrind to successfully debug
|
||
Qt 4 and/or KDE4 applications.</p></li>
|
||
<li class="listitem">
|
||
<p>Runtime support library for GNU OpenMP (part of
|
||
GCC), at least for GCC versions 4.2 and 4.3. The GNU OpenMP runtime
|
||
library (<code class="filename">libgomp.so</code>) constructs its own
|
||
synchronisation primitives using combinations of atomic memory
|
||
instructions and the futex syscall, which causes total chaos since in
|
||
Helgrind since it cannot "see" those.</p>
|
||
<p>Fortunately, this can be solved using a configuration-time
|
||
option (for GCC). Rebuild GCC from source, and configure using
|
||
<code class="varname">--disable-linux-futex</code>.
|
||
This makes libgomp.so use the standard
|
||
POSIX threading primitives instead. Note that this was tested
|
||
using GCC 4.2.3 and has not been re-tested using more recent GCC
|
||
versions. We would appreciate hearing about any successes or
|
||
failures with more recent versions.</p>
|
||
</li>
|
||
</ul></div>
|
||
<p>If you must implement your own threading primitives, there
|
||
are a set of client request macros
|
||
in <code class="computeroutput">helgrind.h</code> to help you
|
||
describe your primitives to Helgrind. You should be able to
|
||
mark up mutexes, condition variables, etc, without difficulty.
|
||
</p>
|
||
<p>
|
||
It is also possible to mark up the effects of thread-safe
|
||
reference counting using the
|
||
<code class="computeroutput">ANNOTATE_HAPPENS_BEFORE</code>,
|
||
<code class="computeroutput">ANNOTATE_HAPPENS_AFTER</code> and
|
||
<code class="computeroutput">ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</code>,
|
||
macros. Thread-safe reference counting using an atomically
|
||
incremented/decremented refcount variable causes Helgrind
|
||
problems because a one-to-zero transition of the reference count
|
||
means the accessing thread has exclusive ownership of the
|
||
associated resource (normally, a C++ object) and can therefore
|
||
access it (normally, to run its destructor) without locking.
|
||
Helgrind doesn't understand this, and markup is essential to
|
||
avoid false positives.
|
||
</p>
|
||
<p>
|
||
Here are recommended guidelines for marking up thread safe
|
||
reference counting in C++. You only need to mark up your
|
||
release methods -- the ones which decrement the reference count.
|
||
Given a class like this:
|
||
</p>
|
||
<pre class="programlisting">
|
||
class MyClass {
|
||
unsigned int mRefCount;
|
||
|
||
void Release ( void ) {
|
||
unsigned int newCount = atomic_decrement(&mRefCount);
|
||
if (newCount == 0) {
|
||
delete this;
|
||
}
|
||
}
|
||
}
|
||
</pre>
|
||
<p>
|
||
the release method should be marked up as follows:
|
||
</p>
|
||
<pre class="programlisting">
|
||
void Release ( void ) {
|
||
unsigned int newCount = atomic_decrement(&mRefCount);
|
||
if (newCount == 0) {
|
||
ANNOTATE_HAPPENS_AFTER(&mRefCount);
|
||
ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount);
|
||
delete this;
|
||
} else {
|
||
ANNOTATE_HAPPENS_BEFORE(&mRefCount);
|
||
}
|
||
}
|
||
</pre>
|
||
<p>
|
||
There are a number of complex, mostly-theoretical objections to
|
||
this scheme. From a theoretical standpoint it appears to be
|
||
impossible to devise a markup scheme which is completely correct
|
||
in the sense of guaranteeing to remove all false races. The
|
||
proposed scheme however works well in practice.
|
||
</p>
|
||
</li>
|
||
<li class="listitem">
|
||
<p>Avoid memory recycling. If you can't avoid it, you must use
|
||
tell Helgrind what is going on via the
|
||
<code class="function">VALGRIND_HG_CLEAN_MEMORY</code> client request (in
|
||
<code class="computeroutput">helgrind.h</code>).</p>
|
||
<p>Helgrind is aware of standard heap memory allocation and
|
||
deallocation that occurs via
|
||
<code class="function">malloc</code>/<code class="function">free</code>/<code class="function">new</code>/<code class="function">delete</code>
|
||
and from entry and exit of stack frames. In particular, when memory is
|
||
deallocated via <code class="function">free</code>, <code class="function">delete</code>,
|
||
or function exit, Helgrind considers that memory clean, so when it is
|
||
eventually reallocated, its history is irrelevant.</p>
|
||
<p>However, it is common practice to implement memory recycling
|
||
schemes. In these, memory to be freed is not handed to
|
||
<code class="function">free</code>/<code class="function">delete</code>, but instead put
|
||
into a pool of free buffers to be handed out again as required. The
|
||
problem is that Helgrind has no
|
||
way to know that such memory is logically no longer in use, and
|
||
its history is irrelevant. Hence you must make that explicit,
|
||
using the <code class="function">VALGRIND_HG_CLEAN_MEMORY</code> client request
|
||
to specify the relevant address ranges. It's easiest to put these
|
||
requests into the pool manager code, and use them either when memory is
|
||
returned to the pool, or is allocated from it.</p>
|
||
</li>
|
||
<li class="listitem">
|
||
<p>Avoid POSIX condition variables. If you can, use POSIX
|
||
semaphores (<code class="function">sem_t</code>, <code class="function">sem_post</code>,
|
||
<code class="function">sem_wait</code>) to do inter-thread event signalling.
|
||
Semaphores with an initial value of zero are particularly useful for
|
||
this.</p>
|
||
<p>Helgrind only partially correctly handles POSIX condition
|
||
variables. This is because Helgrind can see inter-thread
|
||
dependencies between a <code class="function">pthread_cond_wait</code> call and a
|
||
<code class="function">pthread_cond_signal</code>/<code class="function">pthread_cond_broadcast</code>
|
||
call only if the waiting thread actually gets to the rendezvous first
|
||
(so that it actually calls
|
||
<code class="function">pthread_cond_wait</code>). It can't see dependencies
|
||
between the threads if the signaller arrives first. In the latter case,
|
||
POSIX guidelines imply that the associated boolean condition still
|
||
provides an inter-thread synchronisation event, but one which is
|
||
invisible to Helgrind.</p>
|
||
<p>The result of Helgrind missing some inter-thread
|
||
synchronisation events is to cause it to report false positives.
|
||
</p>
|
||
<p>The root cause of this synchronisation lossage is
|
||
particularly hard to understand, so an example is helpful. It was
|
||
discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
|
||
in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The
|
||
canonical POSIX-recommended usage scheme for condition variables
|
||
is as follows:</p>
|
||
<pre class="programlisting">
|
||
b is a Boolean condition, which is False most of the time
|
||
cv is a condition variable
|
||
mx is its associated mutex
|
||
|
||
Signaller: Waiter:
|
||
|
||
lock(mx) lock(mx)
|
||
b = True while (b == False)
|
||
signal(cv) wait(cv,mx)
|
||
unlock(mx) unlock(mx)
|
||
</pre>
|
||
<p>Assume <code class="computeroutput">b</code> is False most of
|
||
the time. If the waiter arrives at the rendezvous first, it
|
||
enters its while-loop, waits for the signaller to signal, and
|
||
eventually proceeds. Helgrind sees the signal, notes the
|
||
dependency, and all is well.</p>
|
||
<p>If the signaller arrives
|
||
first, <code class="computeroutput">b</code> is set to true, and the
|
||
signal disappears into nowhere. When the waiter later arrives, it
|
||
does not enter its while-loop and simply carries on. But even in
|
||
this case, the waiter code following the while-loop cannot execute
|
||
until the signaller sets <code class="computeroutput">b</code> to
|
||
True. Hence there is still the same inter-thread dependency, but
|
||
this time it is through an arbitrary in-memory condition, and
|
||
Helgrind cannot see it.</p>
|
||
<p>By comparison, Helgrind's detection of inter-thread
|
||
dependencies caused by semaphore operations is believed to be
|
||
exactly correct.</p>
|
||
<p>As far as I know, a solution to this problem that does not
|
||
require source-level annotation of condition-variable wait loops
|
||
is beyond the current state of the art.</p>
|
||
</li>
|
||
<li class="listitem"><p>Make sure you are using a supported Linux distribution. At
|
||
present, Helgrind only properly supports glibc-2.3 or later. This
|
||
in turn means we only support glibc's NPTL threading
|
||
implementation. The old LinuxThreads implementation is not
|
||
supported.</p></li>
|
||
<li class="listitem"><p>If your application is using thread local variables,
|
||
helgrind might report false positive race conditions on these
|
||
variables, despite being very probably race free. On Linux, you can
|
||
use <code class="option">--sim-hints=deactivate-pthread-stack-cache-via-hack</code>
|
||
to avoid such false positive error messages
|
||
(see <a class="xref" href="manual-core.html#opt.sim-hints">--sim-hints</a>).
|
||
</p></li>
|
||
<li class="listitem">
|
||
<p>Round up all finished threads using
|
||
<code class="function">pthread_join</code>. Avoid
|
||
detaching threads: don't create threads in the detached state, and
|
||
don't call <code class="function">pthread_detach</code> on existing threads.</p>
|
||
<p>Using <code class="function">pthread_join</code> to round up finished
|
||
threads provides a clear synchronisation point that both Helgrind and
|
||
programmers can see. If you don't call
|
||
<code class="function">pthread_join</code> on a thread, Helgrind has no way to
|
||
know when it finishes, relative to any
|
||
significant synchronisation points for other threads in the program. So
|
||
it assumes that the thread lingers indefinitely and can potentially
|
||
interfere indefinitely with the memory state of the program. It
|
||
has every right to assume that -- after all, it might really be
|
||
the case that, for scheduling reasons, the exiting thread did run
|
||
very slowly in the last stages of its life.</p>
|
||
</li>
|
||
<li class="listitem">
|
||
<p>Perform thread debugging (with Helgrind) and memory
|
||
debugging (with Memcheck) together.</p>
|
||
<p>Helgrind tracks the state of memory in detail, and memory
|
||
management bugs in the application are liable to cause confusion.
|
||
In extreme cases, applications which do many invalid reads and
|
||
writes (particularly to freed memory) have been known to crash
|
||
Helgrind. So, ideally, you should make your application
|
||
Memcheck-clean before using Helgrind.</p>
|
||
<p>It may be impossible to make your application Memcheck-clean
|
||
unless you first remove threading bugs. In particular, it may be
|
||
difficult to remove all reads and writes to freed memory in
|
||
multithreaded C++ destructor sequences at program termination.
|
||
So, ideally, you should make your application Helgrind-clean
|
||
before using Memcheck.</p>
|
||
<p>Since this circularity is obviously unresolvable, at least
|
||
bear in mind that Memcheck and Helgrind are to some extent
|
||
complementary, and you may need to use them together.</p>
|
||
</li>
|
||
<li class="listitem">
|
||
<p>POSIX requires that implementations of standard I/O
|
||
(<code class="function">printf</code>, <code class="function">fprintf</code>,
|
||
<code class="function">fwrite</code>, <code class="function">fread</code>, etc) are thread
|
||
safe. Unfortunately GNU libc implements this by using internal locking
|
||
primitives that Helgrind is unable to intercept. Consequently Helgrind
|
||
generates many false race reports when you use these functions.</p>
|
||
<p>Helgrind attempts to hide these errors using the standard
|
||
Valgrind error-suppression mechanism. So, at least for simple
|
||
test cases, you don't see any. Nevertheless, some may slip
|
||
through. Just something to be aware of.</p>
|
||
</li>
|
||
<li class="listitem">
|
||
<p>Helgrind's error checks do not work properly inside the
|
||
system threading library itself
|
||
(<code class="computeroutput">libpthread.so</code>), and it usually
|
||
observes large numbers of (false) errors in there. Valgrind's
|
||
suppression system then filters these out, so you should not see
|
||
them.</p>
|
||
<p>If you see any race errors reported
|
||
where <code class="computeroutput">libpthread.so</code> or
|
||
<code class="computeroutput">ld.so</code> is the object associated
|
||
with the innermost stack frame, please file a bug report at
|
||
<a class="ulink" href="http://www.valgrind.org/" target="_top">http://www.valgrind.org/</a>.
|
||
</p>
|
||
</li>
|
||
</ol></div>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.options"></a>7.6. Helgrind Command-line Options</h2></div></div></div>
|
||
<p>The following end-user options are available:</p>
|
||
<div class="variablelist">
|
||
<a name="hg.opts.list"></a><dl class="variablelist">
|
||
<dt>
|
||
<a name="opt.free-is-write"></a><span class="term">
|
||
<code class="option">--free-is-write=no|yes
|
||
[default: no] </code>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<p>When enabled (not the default), Helgrind treats freeing of
|
||
heap memory as if the memory was written immediately before
|
||
the free. This exposes races where memory is referenced by
|
||
one thread, and freed by another, but there is no observable
|
||
synchronisation event to ensure that the reference happens
|
||
before the free.
|
||
</p>
|
||
<p>This functionality is new in Valgrind 3.7.0, and is
|
||
regarded as experimental. It is not enabled by default
|
||
because its interaction with custom memory allocators is not
|
||
well understood at present. User feedback is welcomed.
|
||
</p>
|
||
</dd>
|
||
<dt>
|
||
<a name="opt.track-lockorders"></a><span class="term">
|
||
<code class="option">--track-lockorders=no|yes
|
||
[default: yes] </code>
|
||
</span>
|
||
</dt>
|
||
<dd><p>When enabled (the default), Helgrind performs lock order
|
||
consistency checking. For some buggy programs, the large number
|
||
of lock order errors reported can become annoying, particularly
|
||
if you're only interested in race errors. You may therefore find
|
||
it helpful to disable lock order checking.</p></dd>
|
||
<dt>
|
||
<a name="opt.history-level"></a><span class="term">
|
||
<code class="option">--history-level=none|approx|full
|
||
[default: full] </code>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<p><code class="option">--history-level=full</code> (the default) causes
|
||
Helgrind collects enough information about "old" accesses that
|
||
it can produce two stack traces in a race report -- both the
|
||
stack trace for the current access, and the trace for the
|
||
older, conflicting access. To limit memory usage, "old" accesses
|
||
stack traces are limited to a maximum of 8 entries, even if
|
||
<code class="option">--num-callers</code> value is bigger.</p>
|
||
<p>Collecting such information is expensive in both speed and
|
||
memory, particularly for programs that do many inter-thread
|
||
synchronisation events (locks, unlocks, etc). Without such
|
||
information, it is more difficult to track down the root
|
||
causes of races. Nonetheless, you may not need it in
|
||
situations where you just want to check for the presence or
|
||
absence of races, for example, when doing regression testing
|
||
of a previously race-free program.</p>
|
||
<p><code class="option">--history-level=none</code> is the opposite
|
||
extreme. It causes Helgrind not to collect any information
|
||
about previous accesses. This can be dramatically faster
|
||
than <code class="option">--history-level=full</code>.</p>
|
||
<p><code class="option">--history-level=approx</code> provides a
|
||
compromise between these two extremes. It causes Helgrind to
|
||
show a full trace for the later access, and approximate
|
||
information regarding the earlier access. This approximate
|
||
information consists of two stacks, and the earlier access is
|
||
guaranteed to have occurred somewhere between program points
|
||
denoted by the two stacks. This is not as useful as showing
|
||
the exact stack for the previous access
|
||
(as <code class="option">--history-level=full</code> does), but it is
|
||
better than nothing, and it is almost as fast as
|
||
<code class="option">--history-level=none</code>.</p>
|
||
</dd>
|
||
<dt>
|
||
<a name="opt.conflict-cache-size"></a><span class="term">
|
||
<code class="option">--conflict-cache-size=N
|
||
[default: 1000000] </code>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<p>This flag only has any effect
|
||
at <code class="option">--history-level=full</code>.</p>
|
||
<p>Information about "old" conflicting accesses is stored in
|
||
a cache of limited size, with LRU-style management. This is
|
||
necessary because it isn't practical to store a stack trace
|
||
for every single memory access made by the program.
|
||
Historical information on not recently accessed locations is
|
||
periodically discarded, to free up space in the cache.</p>
|
||
<p>This option controls the size of the cache, in terms of the
|
||
number of different memory addresses for which
|
||
conflicting access information is stored. If you find that
|
||
Helgrind is showing race errors with only one stack instead of
|
||
the expected two stacks, try increasing this value.</p>
|
||
<p>The minimum value is 10,000 and the maximum is 30,000,000
|
||
(thirty times the default value). Increasing the value by 1
|
||
increases Helgrind's memory requirement by very roughly 100
|
||
bytes, so the maximum value will easily eat up three extra
|
||
gigabytes or so of memory.</p>
|
||
</dd>
|
||
<dt>
|
||
<a name="opt.check-stack-refs"></a><span class="term">
|
||
<code class="option">--check-stack-refs=no|yes
|
||
[default: yes] </code>
|
||
</span>
|
||
</dt>
|
||
<dd><p>
|
||
By default Helgrind checks all data memory accesses made by your
|
||
program. This flag enables you to skip checking for accesses
|
||
to thread stacks (local variables). This can improve
|
||
performance, but comes at the cost of missing races on
|
||
stack-allocated data.
|
||
</p></dd>
|
||
<dt>
|
||
<a name="opt.ignore-thread-creation"></a><span class="term">
|
||
<code class="option">--ignore-thread-creation=<yes|no>
|
||
[default: no]</code>
|
||
</span>
|
||
</dt>
|
||
<dd>
|
||
<p>
|
||
Controls whether all activities during thread creation should be
|
||
ignored. By default enabled only on Solaris.
|
||
Solaris provides higher throughput, parallelism and scalability than
|
||
other operating systems, at the cost of more fine-grained locking
|
||
activity. This means for example that when a thread is created under
|
||
glibc, just one big lock is used for all thread setup. Solaris libc
|
||
uses several fine-grained locks and the creator thread resumes its
|
||
activities as soon as possible, leaving for example stack and TLS setup
|
||
sequence to the created thread.
|
||
This situation confuses Helgrind as it assumes there is some false
|
||
ordering in place between creator and created thread; and therefore many
|
||
types of race conditions in the application would not be reported.
|
||
To prevent such false ordering, this command line option is set to
|
||
<code class="computeroutput">yes</code> by default on Solaris.
|
||
All activity (loads, stores, client requests) is therefore ignored
|
||
during:</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>
|
||
pthread_create() call in the creator thread
|
||
</p></li>
|
||
<li class="listitem"><p>
|
||
thread creation phase (stack and TLS setup) in the created thread
|
||
</p></li>
|
||
</ul></div>
|
||
<p>
|
||
Also new memory allocated during thread creation is untracked,
|
||
that is race reporting is suppressed there. DRD does the same thing
|
||
implicitly. This is necessary because Solaris libc caches many objects
|
||
and reuses them for different threads and that confuses
|
||
Helgrind.</p>
|
||
</dd>
|
||
</dl>
|
||
</div>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.monitor-commands"></a>7.7. Helgrind Monitor Commands</h2></div></div></div>
|
||
<p>The Helgrind tool provides monitor commands handled by Valgrind's
|
||
built-in gdbserver (see <a class="xref" href="manual-core-adv.html#manual-core-adv.gdbserver-commandhandling" title="3.2.5. Monitor command handling by the Valgrind gdbserver">Monitor command handling by the Valgrind gdbserver</a>).
|
||
</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem">
|
||
<p><code class="varname">info locks [lock_addr]</code> shows the list of locks
|
||
and their status. If <code class="varname">lock_addr</code> is given, only shows
|
||
the lock located at this address. </p>
|
||
<p>
|
||
In the following example, helgrind knows about one lock. This
|
||
lock is located at the guest address <code class="varname">ga
|
||
0x8049a20</code>. The lock kind is <code class="varname">rdwr</code>
|
||
indicating a reader-writer lock. Other possible lock kinds
|
||
are <code class="varname">nonRec</code> (simple mutex, non recursive)
|
||
and <code class="varname">mbRec</code> (simple mutex, possibly recursive).
|
||
The lock kind is then followed by the list of threads helding the
|
||
lock. In the below example, <code class="varname">R1:thread #6 tid 3</code>
|
||
indicates that the helgrind thread #6 has acquired (once, as the
|
||
counter following the letter R is one) the lock in read mode. The
|
||
helgrind thread nr is incremented for each started thread. The
|
||
presence of 'tid 3' indicates that the thread #6 is has not exited
|
||
yet and is the valgrind tid 3. If a thread has terminated, then
|
||
this is indicated with 'tid (exited)'.
|
||
</p>
|
||
<pre class="programlisting">
|
||
(gdb) monitor info locks
|
||
Lock ga 0x8049a20 {
|
||
kind rdwr
|
||
{ R1:thread #6 tid 3 }
|
||
}
|
||
(gdb)
|
||
</pre>
|
||
<p> If you give the option <code class="varname">--read-var-info=yes</code>,
|
||
then more information will be provided about the lock location, such as
|
||
the global variable or the heap block that contains the lock:
|
||
</p>
|
||
<pre class="programlisting">
|
||
Lock ga 0x8049a20 {
|
||
Location 0x8049a20 is 0 bytes inside global var "s_rwlock"
|
||
declared at rwlock_race.c:17
|
||
kind rdwr
|
||
{ R1:thread #3 tid 3 }
|
||
}
|
||
</pre>
|
||
</li>
|
||
<li class="listitem">
|
||
<p><code class="varname">accesshistory <addr> [<len>]</code>
|
||
shows the access history recorded for <len> (default 1) bytes
|
||
starting at <addr>. For each recorded access that overlaps
|
||
with the given range, <code class="varname">accesshistory</code> shows the operation
|
||
type (read or write), the address and size read or written, the helgrind
|
||
thread nr/valgrind tid number that did the operation and the locks held
|
||
by the thread at the time of the operation.
|
||
The oldest access is shown first, the most recent access is shown last.
|
||
</p>
|
||
<p>
|
||
In the following example, we see first a recorded write of 4 bytes by
|
||
thread #7 that has modified the given 2 bytes range.
|
||
The second recorded write is the most recent recorded write : thread #9
|
||
modified the same 2 bytes as part of a 4 bytes write operation.
|
||
The list of locks held by each thread at the time of the write operation
|
||
are also shown.
|
||
</p>
|
||
<pre class="programlisting">
|
||
(gdb) monitor accesshistory 0x8049D8A 2
|
||
write of size 4 at 0x8049D88 by thread #7 tid 3
|
||
==6319== Locks held: 2, at address 0x8049D8C (and 1 that can't be shown)
|
||
==6319== at 0x804865F: child_fn1 (locked_vs_unlocked2.c:29)
|
||
==6319== by 0x400AE61: mythread_wrapper (hg_intercepts.c:234)
|
||
==6319== by 0x39B924: start_thread (pthread_create.c:297)
|
||
==6319== by 0x2F107D: clone (clone.S:130)
|
||
|
||
write of size 4 at 0x8049D88 by thread #9 tid 2
|
||
==6319== Locks held: 2, at addresses 0x8049DA4 0x8049DD4
|
||
==6319== at 0x804877B: child_fn2 (locked_vs_unlocked2.c:45)
|
||
==6319== by 0x400AE61: mythread_wrapper (hg_intercepts.c:234)
|
||
==6319== by 0x39B924: start_thread (pthread_create.c:297)
|
||
==6319== by 0x2F107D: clone (clone.S:130)
|
||
|
||
</pre>
|
||
</li>
|
||
</ul></div>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.client-requests"></a>7.8. Helgrind Client Requests</h2></div></div></div>
|
||
<p>The following client requests are defined in
|
||
<code class="filename">helgrind.h</code>. See that file for exact details of their
|
||
arguments.</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem">
|
||
<p><code class="function">VALGRIND_HG_CLEAN_MEMORY</code></p>
|
||
<p>This makes Helgrind forget everything it knows about a
|
||
specified memory range. This is particularly useful for memory
|
||
allocators that wish to recycle memory.</p>
|
||
</li>
|
||
<li class="listitem"><p><code class="function">ANNOTATE_HAPPENS_BEFORE</code></p></li>
|
||
<li class="listitem"><p><code class="function">ANNOTATE_HAPPENS_AFTER</code></p></li>
|
||
<li class="listitem"><p><code class="function">ANNOTATE_NEW_MEMORY</code></p></li>
|
||
<li class="listitem"><p><code class="function">ANNOTATE_RWLOCK_CREATE</code></p></li>
|
||
<li class="listitem"><p><code class="function">ANNOTATE_RWLOCK_DESTROY</code></p></li>
|
||
<li class="listitem"><p><code class="function">ANNOTATE_RWLOCK_ACQUIRED</code></p></li>
|
||
<li class="listitem">
|
||
<p><code class="function">ANNOTATE_RWLOCK_RELEASED</code></p>
|
||
<p>These are used to describe to Helgrind, the behaviour of
|
||
custom (non-POSIX) synchronisation primitives, which it otherwise
|
||
has no way to understand. See comments
|
||
in <code class="filename">helgrind.h</code> for further
|
||
documentation.</p>
|
||
</li>
|
||
</ul></div>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="hg-manual.todolist"></a>7.9. A To-Do List for Helgrind</h2></div></div></div>
|
||
<p>The following is a list of loose ends which should be tidied up
|
||
some time.</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem"><p>For lock order errors, print the complete lock
|
||
cycle, rather than only doing for size-2 cycles as at
|
||
present.</p></li>
|
||
<li class="listitem"><p>The conflicting access mechanism sometimes
|
||
mysteriously fails to show the conflicting access' stack, even
|
||
when provided with unbounded storage for conflicting access info.
|
||
This should be investigated.</p></li>
|
||
<li class="listitem"><p>Document races caused by GCC's thread-unsafe code
|
||
generation for speculative stores. In the interim see
|
||
<code class="computeroutput">http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
|
||
</code>
|
||
and <code class="computeroutput">http://lkml.org/lkml/2007/10/24/673</code>.
|
||
</p></li>
|
||
<li class="listitem"><p>Don't update the lock-order graph, and don't check
|
||
for errors, when a "try"-style lock operation happens (e.g.
|
||
<code class="function">pthread_mutex_trylock</code>). Such calls do not add any real
|
||
restrictions to the locking order, since they can always fail to
|
||
acquire the lock, resulting in the caller going off and doing Plan
|
||
B (presumably it will have a Plan B). Doing such checks could
|
||
generate false lock-order errors and confuse users.</p></li>
|
||
<li class="listitem"><p> Performance can be very poor. Slowdowns on the
|
||
order of 100:1 are not unusual. There is limited scope for
|
||
performance improvements.
|
||
</p></li>
|
||
</ul></div>
|
||
</div>
|
||
</div>
|
||
<div>
|
||
<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
|
||
<tr>
|
||
<td rowspan="2" width="40%" align="left">
|
||
<a accesskey="p" href="cl-manual.html"><< 6. Callgrind: a call-graph generating cache and branch prediction profiler</a> </td>
|
||
<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
|
||
<td rowspan="2" width="40%" align="right"> <a accesskey="n" href="drd-manual.html">8. DRD: a thread error detector >></a>
|
||
</td>
|
||
</tr>
|
||
<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|