mirror of
https://github.com/ioacademy-jikim/debugging
synced 2025-06-08 16:36:21 +00:00
206 lines
9.2 KiB
Plaintext
206 lines
9.2 KiB
Plaintext
|
|
This file describes in detail how Calltree accurately tracks function
|
|
entry/exit, one of those harder-than-you'd-think things.
|
|
|
|
-----------------------------------------------------------------------------
|
|
Josef's description
|
|
-----------------------------------------------------------------------------
|
|
From: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
|
|
To: Nicholas Nethercote <njn25@cam.ac.uk>
|
|
Cc: valgrind-developers@lists.sourceforge.net
|
|
Subject: [Valgrind-developers] Re: Tracking function entry/exit
|
|
|
|
On Sunday 25 January 2004 16:53, Nicholas Nethercote wrote:
|
|
> Josef,
|
|
>
|
|
> The topic of tracking function entry/exit has come up a few times on the
|
|
> mailing lists recently. My usual answer is that it's difficult to do
|
|
> correctly. However, you seem to do it with Calltree. I looked at the
|
|
> source code a bit, and it looks like you are doing some reasonably
|
|
> complicated things to get it right, eg. unwinding the stack. How robust
|
|
> is your approach? Can you briefly explain how it works?
|
|
|
|
A note before describing the mechanism: I need to have a helper call at start
|
|
of every BB anyway, so I use this helper to do the tracking. This of course
|
|
has some overhead, and perhaps can be avoided, but it seems to add to the
|
|
robustness. I have a bug fix here for reentrent entering of a signal handler
|
|
(2 bug reports). Otherwise I have no bug reports, so I assume that the
|
|
mechanism to be quite robust.
|
|
|
|
I have a shadow call stack for every thread. For signal handlers of a thread,
|
|
I first PUSH a separation marker on the shadow stack, and use the stack as
|
|
normal. The marker is used for unwinding when leaving the signal handler.
|
|
This is fine as there is no scheduling among signal handlers of one thread.
|
|
|
|
Instrumentation of calltree:
|
|
* Store at the end of each basic block the jmpkind into a tool-global, static
|
|
variable.
|
|
* At the start of every BB, jump to a helper function.
|
|
|
|
The helper function does the following regarding function call tracking:
|
|
- for a control transfer to another ELF object/ELF section, override jmpkind
|
|
with a CALL (*1)
|
|
- for a control transfer to the 1st basic block of a function, override
|
|
jmpkind with a CALL (*2)
|
|
- do unwinding if needed (i.e, POPs of the shadow call stack)
|
|
- if jmpkind is RET and there was no unwinding/POP:
|
|
- if our call stack is empty, simulate a CALL lasting from beginning
|
|
(with Valgrind 2.1.x, this is not needed any more, as we run on
|
|
simulated CPU from first client instruction)
|
|
- otherwise this is a JMP using a RET instruction (typically used in
|
|
the runtime linker). Do a POP, setting previous BB address to call
|
|
site and override jmpkind with a CALL. By this, you get 2 function
|
|
calls from a calling site.
|
|
- when jmpkind is a CALL, push new function call from previous BB to current
|
|
BB on shadow call stack.
|
|
- Save current BB address to be available for call to handler in next BB.
|
|
|
|
Special care is needed at thread switches and enter/leave of signal handlers,
|
|
as we need separate shadow call stacks.
|
|
|
|
Known bug: We should check for the need of unwinding when ESP is explicitly
|
|
written to. I hope this doesn't create too much overhead.
|
|
|
|
Remarks:
|
|
(*1) Jumps between ELF objects are function calls to a shared library. This is
|
|
mainly done to catch the JMP from PLT code.
|
|
(*2) This is what your function tracking skin/tool does. It is needed here
|
|
mainly to catch tail recursion. In general, for functions doing a
|
|
"return otherfunction()", GCC produces JMPs with -O2.
|
|
|
|
Additional points:
|
|
- If I need a name for a function, but there is no debug info, I use the
|
|
instruction address minus the load offset of the corresponding ELF object
|
|
(if there is one) to get a relative address for that ELF object. This
|
|
offset can be used with objdump later in postprocessing tools (e.g.
|
|
objdump). I would suggest this change even for cachegrind instead of a
|
|
"???".
|
|
- I introduced the ability to specify functions to be "skipped". This means
|
|
that execution of these functions is attributed to the calling function.
|
|
The default is to skip all functions located in PLT sections. Thus, in
|
|
effect, costs of PLT functions are attributed to callers, and the call to
|
|
a shared library function starts directly with code in the other ELF
|
|
object.
|
|
- As Vg 2.1.x does pointerchecking, the instrumentation can't write to
|
|
memory space of Valgrind any longer. Currently, my tool needs
|
|
"--pointercheck=no" to be able to run. Jeremy and me already agreed on
|
|
replacing current LD/ST with a CLD/CST (Client Load/Store) with pointer
|
|
check and keep original LD/ST for tool usage without pointerchecking.
|
|
|
|
Looking at these things, it seems possible to do function tracking at end of a
|
|
basic block instead of the beginning of the next BB. This way, we can perhaps
|
|
avoid calls to helpers at every BB.
|
|
|
|
From my point of view, it would be great to integrate optional function
|
|
tracking into Valgrind core with some hooks.
|
|
|
|
Josef
|
|
|
|
|
|
-----------------------------------------------------------------------------
|
|
Josef's clarification of Nick's summary of Josef's description
|
|
-----------------------------------------------------------------------------
|
|
On Monday 21 June 2004 12:15, Nicholas Nethercote wrote:
|
|
|
|
> I've paraphrased your description to help me understand it better, but I'm
|
|
> still not quite clear on some points. I looked at the code, but found it
|
|
> hard to understand. Could you help me? I've written my questions in
|
|
> square brackets. Here's the description.
|
|
>
|
|
> --------
|
|
>
|
|
> Data structures:
|
|
>
|
|
> - have a shadow call stack for every thread
|
|
> [not sure exactly what goes on this]
|
|
|
|
That's the resizable array of struct _call_entry's.
|
|
Probably most important for call tracking is the %ESP value
|
|
directly after a CALL, and a pointer to some struct storing information
|
|
about the call arc or the called function.
|
|
|
|
The esp value is needed to be able to robustly unwind correctly at %esp
|
|
changes with %esp > stored esp on shadow stack.
|
|
|
|
> Action at BB start -- depends on jmp_kind from previous BB:
|
|
>
|
|
> - If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring,
|
|
> JmpCond or JmpSyscall) and we transferred from one ELF object/section to
|
|
> another, it must be a function call to a shared library -- treat as a
|
|
> call. This catches jmps from PLT code.
|
|
>
|
|
> - If this is the first BB of a function, treat as a call. This catches
|
|
> tail calls (which gcc uses for "return f()" with -O2).
|
|
> [What if a function had a 'goto' back to its beginning? Would that be
|
|
> interpreted as a call?]
|
|
|
|
Yes. IMHO, there is no way to distinguish between optimized tail recursion
|
|
using a jump and regular jumping. But as most functions need parameters on
|
|
the stack, a normal jump will rarely jump to the first BB of a function,
|
|
wouldn't it?
|
|
|
|
> - Unwind the shadow call stack if necessary.
|
|
> [when is "necessary"? If the real %esp > the shadow stack %esp?]
|
|
|
|
Yes. Currently I do this at every BB boundary, but perhaps it should be
|
|
checked at every %esp change. Then, OTOH, it would look strange to attribute
|
|
instructions of one BB to different functions?
|
|
|
|
> - If this is a function return and there was no shadow stack unwinding,
|
|
> this must be a RET control transfer (typically used in the runtime
|
|
> linker). Pop the shadow call stack, setting the previous BB address to
|
|
> call site and override jmpkind with a CALL. By this, you get 2 function
|
|
> calls from a calling site.
|
|
> [I don't understand this... What is a "RET control transfer"? Why do
|
|
> you end up with 2 function calls -- is that a bad thing?]
|
|
|
|
If there is a RET instruction, this usually should unwind (i.e. leave a
|
|
function) at least one entry of the shadow call stack. But this doesn't need
|
|
to be the case, i.e. even after a RET, %esp could be lower or equal to the
|
|
one on the shadow stack. E.g. suppose
|
|
|
|
PUSH addr
|
|
RET
|
|
|
|
This is only another way of saying "JMP addr", and doesn't add/remove any
|
|
stack frame at all.
|
|
Now, if addr is (according to debug information) inside of another function,
|
|
this is a JMP between functions, let's say from B to C. Suppose B was called
|
|
from A, I generate a RETURN event to A and a CALL event from A to C in this
|
|
case.
|
|
|
|
> - If we're treating the control transfer as a call, push new function call
|
|
> from previous BB to current BB on shadow call stack.
|
|
> [when is this information used?]
|
|
|
|
I meant: Append a struct call_entry to the shadow stack (together with the
|
|
current %esp value). As I said before, the shadow stack is used for robust
|
|
unwinding.
|
|
|
|
> - Save current BB address to be available for call to handler in next BB.
|
|
>
|
|
>
|
|
> Other actions:
|
|
>
|
|
> When entering a signal handler, first push a separation marker on the
|
|
> thread's shadow stack, then use it as normal. The marker is used for
|
|
> unwinding when leaving the signal handler. This is fine as there is no
|
|
> scheduling among signal handlers of one thread.
|
|
>
|
|
> Special care is needed at thread switches and enter/leave of signal
|
|
> handlers, as we need separate shadow call stacks.
|
|
> [Do you mean "separate shadow call stacks for each thread"?]
|
|
|
|
Yes.
|
|
|
|
> What about stack switching -- does it cope with that? (Not that Valgrind
|
|
> in general does...)
|
|
|
|
No.
|
|
If you could give me a hint how to do it, I would be pleased. The problem here
|
|
IMHO is: How to distinguish among a stack switch and allocating a huge array
|
|
on the stack?
|
|
|
|
Josef
|
|
|