mirror of
https://github.com/ioacademy-jikim/debugging
synced 2025-06-09 17:06:24 +00:00
217 lines
8.0 KiB
Plaintext
217 lines
8.0 KiB
Plaintext
|
|
Verification todo
|
|
~~~~~~~~~~~~~~~~~
|
|
check that illegal insns on all targets don't cause the _toIR.c's to
|
|
assert. [DONE: amd64 x86 ppc32 ppc64 arm s390]
|
|
|
|
check also with --vex-guest-chase-cond=yes
|
|
|
|
check that all targets can run their insn set tests with
|
|
--vex-guest-max-insns=1.
|
|
|
|
all targets: run some tests using --profile-flags=... to exercise
|
|
function patchProfInc_<arch> [DONE: amd64 x86 ppc32 ppc64 arm s390]
|
|
|
|
figure out if there is a way to write a test program that checks
|
|
that event checks are actually getting triggered
|
|
|
|
|
|
Cleanups
|
|
~~~~~~~~
|
|
host_arm_isel.c and host_arm_defs.c: get rid of global var arm_hwcaps.
|
|
|
|
host_x86_defs.c, host_amd64_defs.c: return proper VexInvalRange
|
|
records from the patchers, instead of {0,0}, so that transparent
|
|
self hosting works properly.
|
|
|
|
host_ppc_defs.h: is RdWrLR still needed? If not delete.
|
|
|
|
ditto ARM, Ld8S
|
|
|
|
Comments that used to be in m_scheduler.c:
|
|
tchaining tests:
|
|
- extensive spinrounds
|
|
- with sched quantum = 1 -- check that handle_noredir_jump
|
|
doesn't return with INNER_COUNTERZERO
|
|
other:
|
|
- out of date comment w.r.t. bit 0 set in libvex_trc_values.h
|
|
- can VG_TRC_BORING still happen? if not, rm
|
|
- memory leaks in m_transtab (InEdgeArr/OutEdgeArr leaking?)
|
|
- move do_cacheflush out of m_transtab
|
|
- more economical unchaining when nuking an entire sector
|
|
- ditto w.r.t. cache flushes
|
|
- verify case of 2 paths from A to B
|
|
- check -- is IP_AT_SYSCALL still right?
|
|
|
|
|
|
Optimisations
|
|
~~~~~~~~~~~~~
|
|
ppc: chain_XDirect: generate short form jumps when possible
|
|
|
|
ppc64: immediate generation is terrible .. should be able
|
|
to do better
|
|
|
|
arm codegen: Generate ORRS for CmpwNEZ32(Or32(x,y))
|
|
|
|
all targets: when nuking an entire sector, don't bother to undo the
|
|
patching for any translations within the sector (nor with their
|
|
invalidations).
|
|
|
|
(somewhat implausible) for jumps to disp_cp_indir, have multiple
|
|
copies of disp_cp_indir, one for each of the possible registers that
|
|
could have held the target guest address before jumping to the stub.
|
|
Then disp_cp_indir wouldn't have to reload it from memory each time.
|
|
Might also have the effect of spreading out the indirect mispredict
|
|
burden somewhat (across the multiple copies.)
|
|
|
|
|
|
Implementation notes
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
T-chaining changes -- summary
|
|
|
|
* The code generators (host_blah_isel.c, host_blah_defs.[ch]) interact
|
|
more closely with Valgrind than before. In particular the
|
|
instruction selectors must use one of 3 different kinds of
|
|
control-transfer instructions: XDirect, XIndir and XAssisted.
|
|
All archs must use these the same; no more ad-hoc control transfer
|
|
instructions.
|
|
(more detail below)
|
|
|
|
|
|
* With T-chaining, translations can jump between each other without
|
|
going through the dispatcher loop every time. This means that the
|
|
event check (counter dec, and exit if negative) the dispatcher loop
|
|
previously did now needs to be compiled into each translation.
|
|
|
|
|
|
* The assembly dispatcher code (dispatch-arch-os.S) is still
|
|
present. It still provides table lookup services for
|
|
indirect branches, but it also provides a new feature:
|
|
dispatch points, to which the generated code jumps. There
|
|
are 5:
|
|
|
|
VG_(disp_cp_chain_me_to_slowEP):
|
|
VG_(disp_cp_chain_me_to_fastEP):
|
|
These are chain-me requests, used for Boring conditional and
|
|
unconditional jumps to destinations known at JIT time. The
|
|
generated code calls these (doesn't jump to them) and the
|
|
stub recovers the return address. These calls never return;
|
|
instead the call is done so that the stub knows where the
|
|
calling point is. It needs to know this so it can patch
|
|
the calling point to the requested destination.
|
|
VG_(disp_cp_xindir):
|
|
Old-style table lookup and go; used for indirect jumps
|
|
VG_(disp_cp_xassisted):
|
|
Most general and slowest kind. Can transfer to anywhere, but
|
|
first returns to scheduler to do some other event (eg a syscall)
|
|
before continuing.
|
|
VG_(disp_cp_evcheck_fail):
|
|
Code jumps here when the event check fails.
|
|
|
|
|
|
* new instructions in backends: XDirect, XIndir and XAssisted.
|
|
XDirect is used for chainable jumps. It is compiled into a
|
|
call to VG_(disp_cp_chain_me_to_slowEP) or
|
|
VG_(disp_cp_chain_me_to_fastEP).
|
|
|
|
XIndir is used for indirect jumps. It is compiled into a jump
|
|
to VG_(disp_cp_xindir)
|
|
|
|
XAssisted is used for "assisted" (do something first, then jump)
|
|
transfers. It is compiled into a jump to VG_(disp_cp_xassisted)
|
|
|
|
All 3 of these may be conditional.
|
|
|
|
More complexity: in some circumstances (no-redir translations)
|
|
all transfers must be done with XAssisted. In such cases the
|
|
instruction selector will be told this.
|
|
|
|
|
|
* Patching: XDirect is compiled basically into
|
|
%r11 = &VG_(disp_cp_chain_me_to_{slow,fast}EP)
|
|
call *%r11
|
|
Backends must provide a function (eg) chainXDirect_AMD64
|
|
which converts it into a jump to a specified destination
|
|
jmp $delta-of-PCs
|
|
or
|
|
%r11 = 64-bit immediate
|
|
jmpq *%r11
|
|
depending on branch distance.
|
|
|
|
Backends must provide a function (eg) unchainXDirect_AMD64
|
|
which restores the original call-to-the-stub version.
|
|
|
|
|
|
* Event checks. Each translation now has two entry points,
|
|
the slow one (slowEP) and fast one (fastEP). Like this:
|
|
|
|
slowEP:
|
|
counter--
|
|
if (counter < 0) goto VG_(disp_cp_evcheck_fail)
|
|
fastEP:
|
|
(rest of the translation)
|
|
|
|
slowEP is used for control flow transfers that are or might be
|
|
a back edge in the control flow graph. Insn selectors are
|
|
given the address of the highest guest byte in the block so
|
|
they can determine which edges are definitely not back edges.
|
|
|
|
The counter is placed in the first 8 bytes of the guest state,
|
|
and the address of VG_(disp_cp_evcheck_fail) is placed in
|
|
the next 8 bytes. This allows very compact checks on all
|
|
targets, since no immediates need to be synthesised, eg:
|
|
|
|
decq 0(%baseblock-pointer)
|
|
jns fastEP
|
|
jmpq *8(baseblock-pointer)
|
|
fastEP:
|
|
|
|
On amd64 a non-failing check is therefore 2 insns; all 3 occupy
|
|
just 8 bytes.
|
|
|
|
On amd64 the event check is created by a special single
|
|
pseudo-instruction AMD64_EvCheck.
|
|
|
|
|
|
* BB profiling (for --profile-flags=). The dispatch assembly
|
|
dispatch-arch-os.S no longer deals with this and so is much
|
|
simplified. Instead the profile inc is compiled into each
|
|
translation, as the insn immediately following the event
|
|
check. Again, on amd64 a pseudo-insn AMD64_ProfInc is used.
|
|
Counters are now 64 bit even on 32 bit hosts, to avoid overflow.
|
|
|
|
One complexity is that at JIT time it is not known where the
|
|
address of the counter is. To solve this, VexTranslateResult
|
|
now returns the offset of the profile inc in the generated
|
|
code. When the counter address is known, VEX can be called
|
|
again to patch it in. Backends must supply eg
|
|
patchProfInc_AMD64 to make this happen.
|
|
|
|
|
|
* Front end changes (guest_blah_toIR.c)
|
|
|
|
The way the guest program counter is handled has changed
|
|
significantly. Previously, the guest PC was updated (in IR)
|
|
at the start of each instruction, except for the first insn
|
|
in an IRSB. This is inconsistent and doesn't work with the
|
|
new framework.
|
|
|
|
Now, each instruction must update the guest PC as its last
|
|
IR statement -- not its first. And no special exemption for
|
|
the first insn in the block. As before most of these are
|
|
optimised out by ir_opt, so no concerns about efficiency.
|
|
|
|
As a logical side effect of this, exits (IRStmt_Exit) and the
|
|
block-end transfer are both considered to write to the guest state
|
|
(the guest PC) and so need to be told the offset of it.
|
|
|
|
IR generators (eg disInstr_AMD64) are no longer allowed to set the
|
|
IRSB::next, to specify the block-end transfer address. Instead they
|
|
now indicate, to the generic steering logic that drives them (iow,
|
|
guest_generic_bb_to_IR.c), that the block has ended. This then
|
|
generates effectively "goto GET(PC)" (which, again, is optimised
|
|
away). What this does mean is that if the IR generator function
|
|
ends the IR of the last instruction in the block with an incorrect
|
|
assignment to the guest PC, execution will transfer to an incorrect
|
|
destination -- making the error obvious quickly.
|