mirror of
https://github.com/ioacademy-jikim/debugging
synced 2025-06-08 08:26:14 +00:00
44 lines
2.2 KiB
Plaintext
44 lines
2.2 KiB
Plaintext
-----------------------------------------------------------------------------
|
|
Notes on performance
|
|
-----------------------------------------------------------------------------
|
|
The intent of this file is to record progress in improving performance.
|
|
|
|
-----------------------------------------------------------------------------
|
|
Just before 3.1.0:
|
|
- Julian made LibVEX_Alloc() inlinable. Saved a couple of percent.
|
|
- Julian started building Vex at -O2. Saved up to 8% or so(?) in some
|
|
cases.
|
|
|
|
Post 3.1.0:
|
|
- Julian made the tree builder linear. Saved 2--13% on a range of programs.
|
|
- Nick improved vg_SP_update_pass() to identify more small constant
|
|
increments/decrements of SP, so the fast cases can be used more often.
|
|
Saved 1--3% on a few programs.
|
|
- r5345,r5346,r5352: Julian improved the dispatcher so that x86 and
|
|
AMD64 use jumps instead of call/return for calling translations.
|
|
Also, on x86, amd64, ppc32 and ppc64, --profile-flags style profiling was
|
|
removed from the despatch loop unless --profile-flags is being used.
|
|
Improved Nulgrind performance typically by 10--20%, and Memcheck
|
|
performance typically by 2--20%.
|
|
- Julian changed findSb to slowly move superblocks to the front of the list
|
|
as they were accessed. This sped up perf/heap by 25--50%, and some big
|
|
programs (eg. ktuberling) programs by a couple of percent.
|
|
- Nick reduced the iteration count of the loop in swizzle() from 20 to 5,
|
|
which gave almost identical results while saving 2% in perf/tinycc and 10%
|
|
in perf/heap on a 3GHz Prescott P4.
|
|
- Nick changed ExeContext gathering to not record/save extra zeroes at the
|
|
end. Saved 7% on perf/heap with --num-callers=50, and about 1% on
|
|
perf/tinycc.
|
|
- Julian vectorised copy_address_range_perms for common cases, which
|
|
gives about 40% speedup on artificial programs which just do
|
|
realloc() and nothing else, and about a 3-4% speedup on starting
|
|
kpresenter-1.5.0 and loading a 16-slide presentation.
|
|
|
|
COMPVBITS branch:
|
|
- Nick converted to compress V bits, initial version saved 0--5% on most
|
|
cases, with a 30% improvement on one case (tsim_arch) which calls
|
|
set_address_range_perms() a lot.
|
|
- Nick rewrote set_address_range_perms(), which gained 0--3% typically,
|
|
and 22% on tsim_arch.
|
|
|