mirror of
https://github.com/ioacademy-jikim/debugging
synced 2025-06-09 17:06:24 +00:00
370 lines
14 KiB
XML
370 lines
14 KiB
XML
<?xml version="1.0"?> <!-- -*- sgml -*- -->
|
|
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
|
|
|
|
<chapter id="bbv-manual" xreflabel="BBV">
|
|
<title>BBV: an experimental basic block vector generation tool</title>
|
|
|
|
<para>To use this tool, you must specify
|
|
<option>--tool=exp-bbv</option> on the Valgrind
|
|
command line.</para>
|
|
|
|
<sect1 id="bbv-manual.overview" xreflabel="Overview">
|
|
<title>Overview</title>
|
|
|
|
<para>
|
|
A basic block is a linear section of code with one entry point and one exit
|
|
point. A <emphasis>basic block vector</emphasis> (BBV) is a list of all
|
|
basic blocks entered during program execution, and a count of how many
|
|
times each basic block was run.
|
|
</para>
|
|
|
|
<para>
|
|
BBV is a tool that generates basic block vectors for use with the
|
|
<ulink url="http://www.cse.ucsd.edu/~calder/simpoint/">SimPoint</ulink>
|
|
analysis tool.
|
|
The SimPoint methodology enables speeding up architectural
|
|
simulations by only running a small portion of a program
|
|
and then extrapolating total behavior from this
|
|
small portion. Most programs exhibit phase-based behavior, which
|
|
means that at various times during execution a program will encounter
|
|
intervals of time where the code behaves similarly to a previous
|
|
interval. If you can detect these intervals and group them together,
|
|
an approximation of the total program behavior can be obtained
|
|
by only simulating a bare minimum number of intervals, and then scaling
|
|
the results.
|
|
</para>
|
|
|
|
<para>
|
|
In computer architecture research, running a
|
|
benchmark on a cycle-accurate simulator can cause slowdowns on the order
|
|
of 1000 times, making it take days, weeks, or even longer to run full
|
|
benchmarks. By utilizing SimPoint this can be reduced significantly,
|
|
usually by 90-95%, while still retaining reasonable accuracy.
|
|
</para>
|
|
|
|
<para>
|
|
A more complete introduction to how SimPoint works can be
|
|
found in the paper "Automatically Characterizing Large Scale
|
|
Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and
|
|
B. Calder.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.quickstart" xreflabel="Quick Start">
|
|
<title>Using Basic Block Vectors to create SimPoints</title>
|
|
|
|
<para>
|
|
To quickly create a basic block vector file, you will call Valgrind
|
|
like this:
|
|
|
|
<programlisting>valgrind --tool=exp-bbv /bin/ls</programlisting>
|
|
|
|
In this case we are running on <filename>/bin/ls</filename>,
|
|
but this can be any program. By default a file called
|
|
<computeroutput>bb.out.PID</computeroutput> will be created,
|
|
where PID is replaced by the process ID of the running process.
|
|
This file contains the basic block vector. For long-running programs
|
|
this file can be quite large, so it might be wise to compress
|
|
it with gzip or some other compression program.
|
|
</para>
|
|
|
|
<para>
|
|
To create actual SimPoint results, you will need the SimPoint utility,
|
|
available from the
|
|
<ulink url="http://www.cse.ucsd.edu/~calder/simpoint/">SimPoint webpage</ulink>.
|
|
Assuming you have downloaded SimPoint 3.2 and compiled it,
|
|
create SimPoint results with a command like the following:
|
|
|
|
<programlisting><![CDATA[
|
|
./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \
|
|
-loadFVFile bb.out.1234.gz \
|
|
-k 5 -saveSimpoints results.simpts \
|
|
-saveSimpointWeights results.weights]]></programlisting>
|
|
|
|
where bb.out.1234.gz is your compressed basic block vector file
|
|
generated by BBV.
|
|
</para>
|
|
|
|
<para>
|
|
The SimPoint utility does random linear projection using 15-dimensions,
|
|
then does k-mean clustering to calculate which intervals are
|
|
of interest. In this example we specify 5 intervals with the
|
|
-k 5 option.
|
|
</para>
|
|
|
|
<para>
|
|
The outputs from the SimPoint run are the
|
|
<computeroutput>results.simpts</computeroutput>
|
|
and <computeroutput>results.weights</computeroutput> files.
|
|
The first holds the 5 most relevant intervals of the program.
|
|
The seconds holds the weight to scale each interval by when
|
|
extrapolating full-program behavior. The intervals and the weights
|
|
can be used in conjunction with a simulator that supports
|
|
fast-forwarding; you fast-forward to the interval of interest,
|
|
collect stats for the desired interval length, then use
|
|
statistics gathered in conjunction with the weights to
|
|
calculate your results.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.usage" xreflabel="BBV Command-line Options">
|
|
<title>BBV Command-line Options</title>
|
|
|
|
<para> BBV-specific command-line options are:</para>
|
|
|
|
<!-- start of xi:include in the manpage -->
|
|
<variablelist id="bbv.opts.list">
|
|
|
|
<varlistentry id="opt.bb-out-file" xreflabel="--bb-out-file">
|
|
<term>
|
|
<option><![CDATA[--bb-out-file=<name> [default: bb.out.%p] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option selects the name of the basic block vector file. The
|
|
<option>%p</option> and <option>%q</option> format specifiers can be
|
|
used to embed the process ID and/or the contents of an environment
|
|
variable in the name, as is the case for the core option
|
|
<option><xref linkend="opt.log-file"/></option>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.pc-out-file" xreflabel="--pc-out-file">
|
|
<term>
|
|
<option><![CDATA[--pc-out-file=<name> [default: pc.out.%p] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option selects the name of the PC file.
|
|
This file holds program counter addresses
|
|
and function name info for the various basic blocks.
|
|
This can be used in conjunction
|
|
with the basic block vector file to fast-forward via function names
|
|
instead of just instruction counts. The
|
|
<option>%p</option> and <option>%q</option> format specifiers can be
|
|
used to embed the process ID and/or the contents of an environment
|
|
variable in the name, as is the case for the core option
|
|
<option><xref linkend="opt.log-file"/></option>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.interval-size" xreflabel="--interval-size">
|
|
<term>
|
|
<option><![CDATA[--interval-size=<number> [default: 100000000] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option selects the size of the interval to use.
|
|
The default is 100
|
|
million instructions, which is a commonly used value.
|
|
Other sizes can be used; smaller intervals can help programs
|
|
with finer-grained phases. However smaller interval size
|
|
can lead to accuracy issues due to warm-up effects
|
|
(When fast-forwarding the various architectural features
|
|
will be un-initialized, and it will take some number
|
|
of instructions before they "warm up" to the state a
|
|
full simulation would be at without the fast-forwarding.
|
|
Large interval sizes tend to mitigate this.)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.instr-count-only" xreflabel="--instr-count-only">
|
|
<term>
|
|
<option><![CDATA[--instr-count-only [default: no] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option tells the tool to only display instruction count
|
|
totals, and to not generate the actual basic block vector file.
|
|
This is useful for debugging, and for gathering instruction count
|
|
info without generating the large basic block vector files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
</variablelist>
|
|
<!-- end of xi:include in the manpage -->
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.fileformat" xreflabel="BBV File Format">
|
|
<title>Basic Block Vector File Format</title>
|
|
|
|
<para>
|
|
The Basic Block Vector is dumped at fixed intervals. This
|
|
is commonly done every 100 million instructions; the
|
|
<option>--interval-size</option> option can be
|
|
used to change this.
|
|
</para>
|
|
|
|
<para>
|
|
The output file looks like this:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[
|
|
T:45:1024 :189:99343
|
|
T:11:78573 :15:1353 :56:1
|
|
T:18:45 :12:135353 :56:78 314:4324263]]></programlisting>
|
|
|
|
<para>
|
|
Each new interval starts with a T. This is followed on the same line
|
|
by a series of basic block and frequency pairs, one for each
|
|
basic block that was entered during the interval. The format for
|
|
each block/frequency pair is a colon, followed by a number that
|
|
uniquely identifies the basic block, another colon, and then
|
|
the frequency (which is the number of times the block was entered,
|
|
multiplied by the number of instructions in the block). The
|
|
pairs are separated from each other by a space.
|
|
</para>
|
|
|
|
<para>
|
|
The frequency count is multiplied by the number of instructions that are
|
|
in the basic block, in order to weigh the count so that instructions in
|
|
small basic blocks aren't counted as more important than instructions
|
|
in large basic blocks.
|
|
</para>
|
|
|
|
<para>
|
|
The SimPoint program only processes lines that start with a "T". All
|
|
other lines are ignored. Traditionally comments are indicated by
|
|
starting a line with a "#" character. Some other BBV generation tools,
|
|
such as PinPoints, generate lines beginning with letters other than "T"
|
|
to indicate more information about the program being run. We do
|
|
not generate these, as the SimPoint utility ignores them.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.implementation" xreflabel="Implementation">
|
|
<title>Implementation</title>
|
|
|
|
<para>
|
|
Valgrind provides all of the information necessary to create
|
|
BBV files. In the current implementation, all instructions
|
|
are instrumented. This is slower (by approximately a factor
|
|
of two) than a method that instruments at the basic block level,
|
|
but there are some complications (especially with rep prefix
|
|
detection) that make that method more difficult.
|
|
</para>
|
|
|
|
<para>
|
|
Valgrind actually provides instrumentation at a superblock level.
|
|
A superblock has one entry point but unlike basic blocks can
|
|
have multiple exit points. Once a branch occurs into the middle
|
|
of a block, it is split into a new basic block. Because
|
|
Valgrind cannot produce "true" basic blocks, the generated
|
|
BBV vectors will be different than those generated by other tools.
|
|
In practice this does not seem to affect the accuracy of the
|
|
SimPoint results. We do internally force the
|
|
<option>--vex-guest-chase-thresh=0</option>
|
|
option to Valgrind which forces a more basic-block-like
|
|
behavior.
|
|
</para>
|
|
|
|
<para>
|
|
When a superblock is run for the first time, it is instrumented
|
|
with our BBV routine. A block info (bbInfo) structure is allocated
|
|
which holds the various information and statistics for the block.
|
|
A unique block ID is assigned to the block, and then the
|
|
structure is placed into an ordered set.
|
|
Then each native instruction in the block is instrumented to
|
|
call an instruction counting routine with a pointer to the block
|
|
info structure as an argument.
|
|
</para>
|
|
|
|
<para>
|
|
At run-time, our instruction counting routines are called once
|
|
per native instruction. The relevant block info structure is accessed
|
|
and the block count and total instruction count is updated.
|
|
If the total instruction count overflows the interval size
|
|
then we walk the ordered set, writing out the statistics for
|
|
any block that was accessed in the interval, then resetting the
|
|
block counters to zero.
|
|
</para>
|
|
|
|
<para>
|
|
On the x86 and amd64 architectures the counting code has extra
|
|
code to handle rep-prefixed string instructions. This is because
|
|
actual hardware counts a rep-prefixed instruction
|
|
as one instruction, while a naive Valgrind implementation
|
|
would count it as many (possibly hundreds, thousands or even millions)
|
|
of instructions. We handle rep-prefixed instructions specially,
|
|
in order to make the results match those obtained with hardware performance
|
|
counters.
|
|
</para>
|
|
|
|
<para>
|
|
BBV also counts the fldcw instruction. This instruction is used on
|
|
x86 machines in various ways; it is most commonly found when converting
|
|
floating point values into integers.
|
|
On Pentium 4 systems the retired instruction performance
|
|
counter counts this instruction as two instructions (all other
|
|
known processors only count it as one).
|
|
This can affect results when using SimPoint on Pentium 4 systems.
|
|
We provide the fldcw count so that users can evaluate whether it
|
|
will impact their results enough to avoid using Pentium 4 machines
|
|
for their experiments. It would be possible to add an option to
|
|
this tool that mimics the double-counting so that the generated BBV
|
|
files would be usable for experiments using hardware performance
|
|
counters on Pentium 4 systems.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.threadsupport" xreflabel="BBV Threaded Support">
|
|
<title>Threaded Executable Support</title>
|
|
|
|
<para>
|
|
BBV supports threaded programs. When a program has multiple threads,
|
|
an additional basic block vector file is created for each thread (each
|
|
additional file is the specified filename with the thread number
|
|
appended at the end).
|
|
</para>
|
|
|
|
<para>
|
|
There is no official method of using SimPoint with
|
|
threaded workloads. The most common method is to run
|
|
SimPoint on each thread's results independently, and use
|
|
some method of deterministic execution to try to match the
|
|
original workload. This should be possible with the current
|
|
BBV.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.validation" xreflabel="BBV Validation">
|
|
<title>Validation</title>
|
|
|
|
<para>
|
|
BBV has been tested on x86, amd64, and ppc32 platforms.
|
|
An earlier version of BBV was tested in detail using
|
|
hardware performance counters, this work is described in a paper
|
|
from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation
|
|
to Generate Multi-Platform SimPoints: Methodology and Accuracy" by
|
|
V.M. Weaver and S.A. McKee.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.performance" xreflabel="BBV Performance">
|
|
<title>Performance</title>
|
|
|
|
<para>
|
|
Using this program slows down execution by roughly a factor of 40
|
|
over native execution. This varies depending on the machine
|
|
used and the benchmark being run.
|
|
On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D
|
|
processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2).
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|