mirror of
https://github.com/ioacademy-jikim/debugging
synced 2025-06-09 00:46:12 +00:00
367 lines
17 KiB
HTML
367 lines
17 KiB
HTML
<html>
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||
<title>12. BBV: an experimental basic block vector generation tool</title>
|
||
<link rel="stylesheet" type="text/css" href="vg_basic.css">
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.78.1">
|
||
<link rel="home" href="index.html" title="Valgrind Documentation">
|
||
<link rel="up" href="manual.html" title="Valgrind User Manual">
|
||
<link rel="prev" href="sg-manual.html" title="11. SGCheck: an experimental stack and global array overrun detector">
|
||
<link rel="next" href="lk-manual.html" title="13. Lackey: an example tool">
|
||
</head>
|
||
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
||
<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
|
||
<td width="22px" align="center" valign="middle"><a accesskey="p" href="sg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
|
||
<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
|
||
<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
|
||
<th align="center" valign="middle">Valgrind User Manual</th>
|
||
<td width="22px" align="center" valign="middle"><a accesskey="n" href="lk-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
|
||
</tr></table></div>
|
||
<div class="chapter">
|
||
<div class="titlepage"><div><div><h1 class="title">
|
||
<a name="bbv-manual"></a>12. BBV: an experimental basic block vector generation tool</h1></div></div></div>
|
||
<div class="toc">
|
||
<p><b>Table of Contents</b></p>
|
||
<dl class="toc">
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.overview">12.1. Overview</a></span></dt>
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.quickstart">12.2. Using Basic Block Vectors to create SimPoints</a></span></dt>
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.usage">12.3. BBV Command-line Options</a></span></dt>
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.fileformat">12.4. Basic Block Vector File Format</a></span></dt>
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.implementation">12.5. Implementation</a></span></dt>
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.threadsupport">12.6. Threaded Executable Support</a></span></dt>
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.validation">12.7. Validation</a></span></dt>
|
||
<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.performance">12.8. Performance</a></span></dt>
|
||
</dl>
|
||
</div>
|
||
<p>To use this tool, you must specify
|
||
<code class="option">--tool=exp-bbv</code> on the Valgrind
|
||
command line.</p>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.overview"></a>12.1. Overview</h2></div></div></div>
|
||
<p>
|
||
A basic block is a linear section of code with one entry point and one exit
|
||
point. A <span class="emphasis"><em>basic block vector</em></span> (BBV) is a list of all
|
||
basic blocks entered during program execution, and a count of how many
|
||
times each basic block was run.
|
||
</p>
|
||
<p>
|
||
BBV is a tool that generates basic block vectors for use with the
|
||
<a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint</a>
|
||
analysis tool.
|
||
The SimPoint methodology enables speeding up architectural
|
||
simulations by only running a small portion of a program
|
||
and then extrapolating total behavior from this
|
||
small portion. Most programs exhibit phase-based behavior, which
|
||
means that at various times during execution a program will encounter
|
||
intervals of time where the code behaves similarly to a previous
|
||
interval. If you can detect these intervals and group them together,
|
||
an approximation of the total program behavior can be obtained
|
||
by only simulating a bare minimum number of intervals, and then scaling
|
||
the results.
|
||
</p>
|
||
<p>
|
||
In computer architecture research, running a
|
||
benchmark on a cycle-accurate simulator can cause slowdowns on the order
|
||
of 1000 times, making it take days, weeks, or even longer to run full
|
||
benchmarks. By utilizing SimPoint this can be reduced significantly,
|
||
usually by 90-95%, while still retaining reasonable accuracy.
|
||
</p>
|
||
<p>
|
||
A more complete introduction to how SimPoint works can be
|
||
found in the paper "Automatically Characterizing Large Scale
|
||
Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and
|
||
B. Calder.
|
||
</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.quickstart"></a>12.2. Using Basic Block Vectors to create SimPoints</h2></div></div></div>
|
||
<p>
|
||
To quickly create a basic block vector file, you will call Valgrind
|
||
like this:
|
||
|
||
</p>
|
||
<pre class="programlisting">valgrind --tool=exp-bbv /bin/ls</pre>
|
||
<p>
|
||
|
||
In this case we are running on <code class="filename">/bin/ls</code>,
|
||
but this can be any program. By default a file called
|
||
<code class="computeroutput">bb.out.PID</code> will be created,
|
||
where PID is replaced by the process ID of the running process.
|
||
This file contains the basic block vector. For long-running programs
|
||
this file can be quite large, so it might be wise to compress
|
||
it with gzip or some other compression program.
|
||
</p>
|
||
<p>
|
||
To create actual SimPoint results, you will need the SimPoint utility,
|
||
available from the
|
||
<a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint webpage</a>.
|
||
Assuming you have downloaded SimPoint 3.2 and compiled it,
|
||
create SimPoint results with a command like the following:
|
||
|
||
</p>
|
||
<pre class="programlisting">
|
||
./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \
|
||
-loadFVFile bb.out.1234.gz \
|
||
-k 5 -saveSimpoints results.simpts \
|
||
-saveSimpointWeights results.weights</pre>
|
||
<p>
|
||
|
||
where bb.out.1234.gz is your compressed basic block vector file
|
||
generated by BBV.
|
||
</p>
|
||
<p>
|
||
The SimPoint utility does random linear projection using 15-dimensions,
|
||
then does k-mean clustering to calculate which intervals are
|
||
of interest. In this example we specify 5 intervals with the
|
||
-k 5 option.
|
||
</p>
|
||
<p>
|
||
The outputs from the SimPoint run are the
|
||
<code class="computeroutput">results.simpts</code>
|
||
and <code class="computeroutput">results.weights</code> files.
|
||
The first holds the 5 most relevant intervals of the program.
|
||
The seconds holds the weight to scale each interval by when
|
||
extrapolating full-program behavior. The intervals and the weights
|
||
can be used in conjunction with a simulator that supports
|
||
fast-forwarding; you fast-forward to the interval of interest,
|
||
collect stats for the desired interval length, then use
|
||
statistics gathered in conjunction with the weights to
|
||
calculate your results.
|
||
</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.usage"></a>12.3. BBV Command-line Options</h2></div></div></div>
|
||
<p> BBV-specific command-line options are:</p>
|
||
<div class="variablelist">
|
||
<a name="bbv.opts.list"></a><dl class="variablelist">
|
||
<dt>
|
||
<a name="opt.bb-out-file"></a><span class="term">
|
||
<code class="option">--bb-out-file=<name> [default: bb.out.%p] </code>
|
||
</span>
|
||
</dt>
|
||
<dd><p>
|
||
This option selects the name of the basic block vector file. The
|
||
<code class="option">%p</code> and <code class="option">%q</code> format specifiers can be
|
||
used to embed the process ID and/or the contents of an environment
|
||
variable in the name, as is the case for the core option
|
||
<code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
|
||
</p></dd>
|
||
<dt>
|
||
<a name="opt.pc-out-file"></a><span class="term">
|
||
<code class="option">--pc-out-file=<name> [default: pc.out.%p] </code>
|
||
</span>
|
||
</dt>
|
||
<dd><p>
|
||
This option selects the name of the PC file.
|
||
This file holds program counter addresses
|
||
and function name info for the various basic blocks.
|
||
This can be used in conjunction
|
||
with the basic block vector file to fast-forward via function names
|
||
instead of just instruction counts. The
|
||
<code class="option">%p</code> and <code class="option">%q</code> format specifiers can be
|
||
used to embed the process ID and/or the contents of an environment
|
||
variable in the name, as is the case for the core option
|
||
<code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
|
||
</p></dd>
|
||
<dt>
|
||
<a name="opt.interval-size"></a><span class="term">
|
||
<code class="option">--interval-size=<number> [default: 100000000] </code>
|
||
</span>
|
||
</dt>
|
||
<dd><p>
|
||
This option selects the size of the interval to use.
|
||
The default is 100
|
||
million instructions, which is a commonly used value.
|
||
Other sizes can be used; smaller intervals can help programs
|
||
with finer-grained phases. However smaller interval size
|
||
can lead to accuracy issues due to warm-up effects
|
||
(When fast-forwarding the various architectural features
|
||
will be un-initialized, and it will take some number
|
||
of instructions before they "warm up" to the state a
|
||
full simulation would be at without the fast-forwarding.
|
||
Large interval sizes tend to mitigate this.)
|
||
</p></dd>
|
||
<dt>
|
||
<a name="opt.instr-count-only"></a><span class="term">
|
||
<code class="option">--instr-count-only [default: no] </code>
|
||
</span>
|
||
</dt>
|
||
<dd><p>
|
||
This option tells the tool to only display instruction count
|
||
totals, and to not generate the actual basic block vector file.
|
||
This is useful for debugging, and for gathering instruction count
|
||
info without generating the large basic block vector files.
|
||
</p></dd>
|
||
</dl>
|
||
</div>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.fileformat"></a>12.4. Basic Block Vector File Format</h2></div></div></div>
|
||
<p>
|
||
The Basic Block Vector is dumped at fixed intervals. This
|
||
is commonly done every 100 million instructions; the
|
||
<code class="option">--interval-size</code> option can be
|
||
used to change this.
|
||
</p>
|
||
<p>
|
||
The output file looks like this:
|
||
</p>
|
||
<pre class="programlisting">
|
||
T:45:1024 :189:99343
|
||
T:11:78573 :15:1353 :56:1
|
||
T:18:45 :12:135353 :56:78 314:4324263</pre>
|
||
<p>
|
||
Each new interval starts with a T. This is followed on the same line
|
||
by a series of basic block and frequency pairs, one for each
|
||
basic block that was entered during the interval. The format for
|
||
each block/frequency pair is a colon, followed by a number that
|
||
uniquely identifies the basic block, another colon, and then
|
||
the frequency (which is the number of times the block was entered,
|
||
multiplied by the number of instructions in the block). The
|
||
pairs are separated from each other by a space.
|
||
</p>
|
||
<p>
|
||
The frequency count is multiplied by the number of instructions that are
|
||
in the basic block, in order to weigh the count so that instructions in
|
||
small basic blocks aren't counted as more important than instructions
|
||
in large basic blocks.
|
||
</p>
|
||
<p>
|
||
The SimPoint program only processes lines that start with a "T". All
|
||
other lines are ignored. Traditionally comments are indicated by
|
||
starting a line with a "#" character. Some other BBV generation tools,
|
||
such as PinPoints, generate lines beginning with letters other than "T"
|
||
to indicate more information about the program being run. We do
|
||
not generate these, as the SimPoint utility ignores them.
|
||
</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.implementation"></a>12.5. Implementation</h2></div></div></div>
|
||
<p>
|
||
Valgrind provides all of the information necessary to create
|
||
BBV files. In the current implementation, all instructions
|
||
are instrumented. This is slower (by approximately a factor
|
||
of two) than a method that instruments at the basic block level,
|
||
but there are some complications (especially with rep prefix
|
||
detection) that make that method more difficult.
|
||
</p>
|
||
<p>
|
||
Valgrind actually provides instrumentation at a superblock level.
|
||
A superblock has one entry point but unlike basic blocks can
|
||
have multiple exit points. Once a branch occurs into the middle
|
||
of a block, it is split into a new basic block. Because
|
||
Valgrind cannot produce "true" basic blocks, the generated
|
||
BBV vectors will be different than those generated by other tools.
|
||
In practice this does not seem to affect the accuracy of the
|
||
SimPoint results. We do internally force the
|
||
<code class="option">--vex-guest-chase-thresh=0</code>
|
||
option to Valgrind which forces a more basic-block-like
|
||
behavior.
|
||
</p>
|
||
<p>
|
||
When a superblock is run for the first time, it is instrumented
|
||
with our BBV routine. A block info (bbInfo) structure is allocated
|
||
which holds the various information and statistics for the block.
|
||
A unique block ID is assigned to the block, and then the
|
||
structure is placed into an ordered set.
|
||
Then each native instruction in the block is instrumented to
|
||
call an instruction counting routine with a pointer to the block
|
||
info structure as an argument.
|
||
</p>
|
||
<p>
|
||
At run-time, our instruction counting routines are called once
|
||
per native instruction. The relevant block info structure is accessed
|
||
and the block count and total instruction count is updated.
|
||
If the total instruction count overflows the interval size
|
||
then we walk the ordered set, writing out the statistics for
|
||
any block that was accessed in the interval, then resetting the
|
||
block counters to zero.
|
||
</p>
|
||
<p>
|
||
On the x86 and amd64 architectures the counting code has extra
|
||
code to handle rep-prefixed string instructions. This is because
|
||
actual hardware counts a rep-prefixed instruction
|
||
as one instruction, while a naive Valgrind implementation
|
||
would count it as many (possibly hundreds, thousands or even millions)
|
||
of instructions. We handle rep-prefixed instructions specially,
|
||
in order to make the results match those obtained with hardware performance
|
||
counters.
|
||
</p>
|
||
<p>
|
||
BBV also counts the fldcw instruction. This instruction is used on
|
||
x86 machines in various ways; it is most commonly found when converting
|
||
floating point values into integers.
|
||
On Pentium 4 systems the retired instruction performance
|
||
counter counts this instruction as two instructions (all other
|
||
known processors only count it as one).
|
||
This can affect results when using SimPoint on Pentium 4 systems.
|
||
We provide the fldcw count so that users can evaluate whether it
|
||
will impact their results enough to avoid using Pentium 4 machines
|
||
for their experiments. It would be possible to add an option to
|
||
this tool that mimics the double-counting so that the generated BBV
|
||
files would be usable for experiments using hardware performance
|
||
counters on Pentium 4 systems.
|
||
</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.threadsupport"></a>12.6. Threaded Executable Support</h2></div></div></div>
|
||
<p>
|
||
BBV supports threaded programs. When a program has multiple threads,
|
||
an additional basic block vector file is created for each thread (each
|
||
additional file is the specified filename with the thread number
|
||
appended at the end).
|
||
</p>
|
||
<p>
|
||
There is no official method of using SimPoint with
|
||
threaded workloads. The most common method is to run
|
||
SimPoint on each thread's results independently, and use
|
||
some method of deterministic execution to try to match the
|
||
original workload. This should be possible with the current
|
||
BBV.
|
||
</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.validation"></a>12.7. Validation</h2></div></div></div>
|
||
<p>
|
||
BBV has been tested on x86, amd64, and ppc32 platforms.
|
||
An earlier version of BBV was tested in detail using
|
||
hardware performance counters, this work is described in a paper
|
||
from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation
|
||
to Generate Multi-Platform SimPoints: Methodology and Accuracy" by
|
||
V.M. Weaver and S.A. McKee.
|
||
</p>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="bbv-manual.performance"></a>12.8. Performance</h2></div></div></div>
|
||
<p>
|
||
Using this program slows down execution by roughly a factor of 40
|
||
over native execution. This varies depending on the machine
|
||
used and the benchmark being run.
|
||
On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D
|
||
processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2).
|
||
</p>
|
||
</div>
|
||
</div>
|
||
<div>
|
||
<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
|
||
<tr>
|
||
<td rowspan="2" width="40%" align="left">
|
||
<a accesskey="p" href="sg-manual.html"><< 11. SGCheck: an experimental stack and global array overrun detector</a> </td>
|
||
<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
|
||
<td rowspan="2" width="40%" align="right"> <a accesskey="n" href="lk-manual.html">13. Lackey: an example tool >></a>
|
||
</td>
|
||
</tr>
|
||
<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|