forked from icl-utk-edu/papi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
INSTALL.txt
568 lines (435 loc) · 23.9 KB
/
INSTALL.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
/*
* File: INSTALL.txt
* CVS: $Id$
* Author: Kevin London
* Mods: Dan Terpstra
* Mods: Philip Mucci
* Mods: <your name here>
* <your email address>
*/
*****************************************************************************
HOW TO INSTALL PAPI ONTO YOUR SYSTEM
*****************************************************************************
On some of the systems that PAPI supports, you can install PAPI right
out of the box without any additional setup. Others require drivers or
patches to be installed first.
The general installation steps are below, but first find your particular
Operating System's section for any additional steps that may be necessary.
NOTE: the configure and make files are located in the papi/src directory.
General Installation
1. % ./configure
% make
2. Check for errors.
a) Run a simple test case: (This will run ctests/zero)
% make test
If you get good counts, you can optionally run all the test programs
with the included test harness. This will run the tests in quiet mode,
which will print PASSED, FAILED, or SKIPPED. Tests are SKIPPED if the
functionality being tested is not supported by that platform.
% make fulltest (This will run ./run_tests.sh)
To run the tests in verbose mode:
% ./run_tests.sh -v
3. Create a PAPI binary distribution or install PAPI directly.
a) To install PAPI libraries and header files from the build tree:
% make install
b) To install PAPI manual pages from the build tree:
% make install-man
c) To install PAPI test programs from the build tree:
% make install-tests
d) To install all of the above in one step from the build tree:
% make install-all
e) To create a binary kit, papi-<arch>.tgz:
% make dist
*****************************************************************************
MORE ABOUT CONFIGURE OPTIONS
*****************************************************************************
There is an extensive array of options available from the configure
command-line. These can differ significantly from version to versions of
PAPI. For complete details on the command-line options, use:
% ./configure --help
*****************************************************************************
DOCUMENTATION BY DOXYGEN
*****************************************************************************
PAPI now ships with documentation generated by doxygen.
Documentation for the public apis can be created by running
doxygen from the doc directory.
More complete documentation of all internal apis and structures can be
generated with:
% doxygen Doxyfile-html
Doxygen documentation for the currently released version of PAPI is also
available on the website.
*****************************************************************************
Operating System Specific Installation Steps (In Alphabetical Order by OS)
*****************************************************************************
AIX - IBM POWER5 and POWER6 and POWER7
*****************************************************************************
PAPI is supported on AIX 5.x for POWER5 and POWER6.
PAPI is also tested on AIX 6.1 for POWER7.
Use ./configure to select the desired make options for your system,
specifying the --with-bitmode=32 or --with-bitmode=64 to select wordlength.
32 bits is the default.
1. On AIX 5.x, the bos.pmapi is a product level fileset (part of the OS).
However, it is not installed by default. Consult your sysadmin to
make sure it is installed.
2. Follow the general instructions for installing PAPI.
WARNING: PAPI requires XLC version 6 or greater.
Your version can be determined by running 'lslpp -a -l | grep -i xlc'.
BG/P
*****************************************************************************
BG/P is a cross-compiled environment. The machine on which PAPI is compiled
is not the machine on which PAPI runs. To compile PAPI on BG/P, specify the
BG/P environment as shown below:
% ./configure --with-OS=bgp
% make
NOTE: ./configure might fail if the cross compiler is not in your path.
If that is the case, just add it to your path and everything should work:
% export PATH=$PATH:/bgsys/drivers/ppcfloor/gnu-linux/bin
By default this will make a subset of tests in the ctests directory and all
tests in the ftests directory.
There is an additional C test program provided for the BG/P environment
that exercises the specific BG/P events and demonstrates how to
intermix the PAPI and BG/P UPC native calls. This test program is built with
the normal make sequence and can be found in the ctests/bgp directory.
The testing targets in the make file will not work in the BG/P environment.
Since BG/P supports multiple queuing systems, you must manually execute
individual programs in the ctests and ftests directories to check for successful
library creation. You can also manually edit the run_tests.sh script to
automate testing for your installation.
Most papi utilities work for BGP, including papi_avail, papi_native_avail, and
papi_command_line. Many ctests pass for BGP, but many others produce errors due
to the non-traditional architecture of BGP. In particular, PAPI_TOT_CYC always
seems to produce 0 counts, although papi_get_virt_usec and papi_get_real_usec
appear to work.
The IBM RedPaper: http://www.redbooks.ibm.com/abstracts/redp4256.html provides
further discussion about PAPI on BGP along with other performance issues.
BG/Q
*****************************************************************************
Five new components have been added to PAPI to support hardware performance
monitoring for the BG/Q platform; in particular the BG/Q network, the I/O system,
the Compute Node Kernel in addition to the processing core. There are no specific
component configure scripts for L2unit, IOunit, NWunit, CNKunit. In order to
configure PAPI for BG/Q, use the following configure options at the papi/src level:
% ./configure --prefix=< your_choice > \
--with-OS=bgq \
--with-bgpm_installdir=/bgsys/drivers/ppcfloor \
CC=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc \
F77=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gfortran \
--with-components="bgpm/L2unit bgpm/CNKunit bgpm/IOunit bgpm/NWunit"
CLE - Cray XT and XE Opteron
*****************************************************************************
The Cray XT/XE is a cross-compiled environment. You must specify the
perfmon version to configure as shown below.
Before running configure to create the makefile that supports a Cray XT/XE CLE
build of PAPI, execute the following module commands:
% module purge
% module load gcc
Note: do not load the programming environment module (e.g. PrgEnv-gnu)
but the compiler module (e.g. gcc) as shown above.
Check CLE compute nodes for the version of perfmon2 that it supports:
% aprun -b -a xt cat /sys/kernel/perfmon/version
and use this version when configuring PAPI for a perfmon2 substrate:
% configure CFLAGS="-D__crayxt" \
--with-perfmon=2.82 --prefix=<install-dir> \
--with-virtualtimer=times --with-tls=__thread \
--with-walltimer=cycle --with-ffsll --with-shared-lib=no \
--with-static-tools
Configure PAPI for a perf events substrate:
% configure CFLAGS="-D__crayxt" \
--with-perf-events --with-pe-incdir=<perf-events-hdr-dir> \
--with-assumed-kernel=2.6.34 --prefix=<install-dir> \
--with-virtualtimer=times --with-tls=__thread \
--with-walltimer=cycle --with-ffsll --with-shared-lib=no \
--with-static-tools
Invoke the make accordingly:
% make CONFIG_PFMLIB_ARCH_CRAYXT=y CONFIG_PFMLIB_SHARED=n
% make CONFIG_PFMLIB_ARCH_CRAYXT=y CONFIG_PFMLIB_SHARED=n install
The testing targets in the makefile will not work in the XT/XE CLE environment.
It is necessary to log into an interactive session and run the tests
manually through the job submission system. For example, instead of:
% make test
use:
% aprun -n1 ctests/zero
and instead of:
% make fulltest
use:
% ./run_cat_tests.sh
after substituting "aprun -n1" for "yod -sz 1" in run_cat_tests.sh.
FreeBSD - i386 & amd64
*****************************************************************************
PAPI requires FreeBSD 6 or higher to work.
Kernel needs some modifications to provide PAPI access to the performance
monitoring counters. Simply, add "options HWPMC_HOOKS" and "device hwpmc" in
the kernel configuration file. For i386 systems, add also "device apic".
(You can obtain more information in hwpmc(4), see NOTE 1 to check the
supported HW)
After this step, just recompile the kernel and boot it.
FreeBSD 7 (or greater) does not ship with a fortran compiler. To compile
fortan tests you will need to install a fortran compiler first (e.g.
installing it from /usr/ports/lang/gcc42), and setup the F77 environment
variable with the compiler you want to use (e.g. gfortran42).
Fortran compilers may issue errors due to "Integer too big for its kind *".
Add to FFLAGS environment variable a compiler option to use int*8 by default
(in gfortran42 it is -fdefault-integer-8).
Follow the "General Installation" steps.
NOTE 1:
--
HWPMC driver supports the following processors: Intel Pentium 2,
Intel Pentium Pro, Intel Pentium 3, Intel Pentium M, Intel Celeron,
Intel Pentium 4, AMD K7 (AMD Athlon) and AMD K8 (AMD Athlon64 / Opteron).
FreeBSD 8 also adds support for Core/Core2/Core-i[357]/Atom processors.
There is also a patch for FreeBSD 7/7.1 in http://wiki.freebsd.org/PmcTools
Linux - Xeon Phi [MIC, KNC, Knight's Corner]
*****************************************************************************
Full PAPI support of the MIC card requires MPSS Gold Update 2 or above, and a
cross-compilation toolchain from Intel, the Intel C compiler is also
supported.
The compiler
-----------------------------------------------------------------------------
* Download one of the MPSS full source bundles at
[http://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss]
* Untar the download.
* Extract gpl/package-cross-k1om.tar.bz2
Building PAPI - gcc cross compiler
-----------------------------------------------------------------------------
* Add usr/linux-k1om-4.7/bin or equivalent to your PATH so PAPI can find the
cross-build utils. (see above for instructions on acquiring the cross
compilation toolchain)
* You will need to invoke configure with options:
> ./configure --with-mic --host=x86_64-k1om-linux --with-arch=k1om
This sets up cross-compilation and sets options needed by PAPI.
* Run make to build the library.
Building PAPI - icc
-----------------------------------------------------------------------------
If icc is in your path,
> ./configure --with-mic
You may have to provide additional configuration options... try
> ./configure --with-mic --with-ffsll --with-walltimer=cycle --with-tls=__thread --with-virtualtimer=clock_thread_cputime_id
This builds a mic native version of the library.
Offload Code
------------
To use PAPI in MIC offload code, build a mic-native version of PAPI
as detailed above.
The PAPI utility programs can be run on the MIC using the
micnativeloadex tool provided by Intel. The MIC events may require
additional qualifiers to set the exclude_guest and exclude_host bits
to 0 (eventname:mg=1:mh=1). For example, get a list of events
available on the MIC by calling:
micnativeloadex ./utils/papi_native_avail
Then get an event count while setting the appropriate qualifiers
micnativeloadex ./utils/papi_command_line -a "CPU_CLK_UNHALTED:mg=1:mh=1"
To add offload code into your program, wrap the papi.h header as
follows:
#pragma offload_attribute (push,target(mic))
#include "papi.h"
#pragma offload_attribute (pop)
Make PAPI calls from offload code as normal.
Finally add -offload-option,mic,ld,$(path_to_papi)/libpapi.a
to your compile incantation or if that does not recognise papi library try
-offload-option,mic,compiler,"-lpapi -L<path/to dir containing libpapi.a>" to
your compile incantation
Linux - Itanium II & Montecito
*****************************************************************************
PAPI on Itanium Linux links to the perfmon library. The library version and
the Itanium version are automatically determined by configure.
If you wish to override the defaults, a number of pfm options are available
to configure. Use:
% ./configure --help
to learn more about these options.
Follow the general installation instructions to complete your installation.
PLATFORM NOTES:
The earprofile test fails under perfmon for Itanium II. It has been
reconfigured to work on the upcoming perfmon2 interface.
Linux - PPC64 (POWER5, POWER5+, POWER6 and PowerPC970)
****************************************************************************
Linux/PPC64 requires that the kernel be patched and recompiled with the
PerfCtr patch if the kernel is version 2.6.30 or older. The required patches
and complete installation instructions are provided in the
papi/src/perfctr-2.7.x directory. PPC64 is the ONLY platform that REQUIRES
use of PerfCtr 2.7.x.
*- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -*
WARNING: You should always use a PerfCtr distribution that has been distributed
with a version of PAPI or your build will fail. The reason for this is that
PAPI builds a shared library of the Perfctr runtime, on which libpapi.so
depends. PAPI also depends on the .a file, which it decomposes into component
objects files and includes in the libpapi.a file for convenience. If you
install a new perfctr, even a shared library, YOU MUST REBUILD PAPI to get
a proper, working libpapi.a.
There are several options in configure to allow you to specify your perfctr
version and location. Use:
% ./configure --help
to learn more about these options.
Follow the general installation instructions to complete your installation.
Linux Perf Events ( with kernel 2.6.32 and newer )
*****************************************************************************
Performance counter support has been merged as the "Perf Events"
subsystem as of Linux 2.6.32. This means that PAPI can be built
without patching the kernel on new enough systems.
Perf Events support is new, and certain functionality does not work.
If you need any of the functionality listed below, we recommend
you install the PerfCtr patchset and use that in conjunction with PAPI.
+ PAPI requires at least Linux kernel 2.6.32, as the earlier 2.6.31
version had some significant API changes.
+ Kernels before 2.6.33 have extra overhead when determining
whether events conflict or not.
+ Counter multiplexing is handled by PAPI (rather than perf_events)
on kernels before 2.6.33 due to a bug in the kernel perf_events code.
+ Nehalem EX support requires kernel 2.6.34 or newer.
+ Pentium 4 support requires kernel 2.6.35 or newer.
The PAPI configure script should auto-detect the availability of
Perf Events on new enough distributions (this mainly requires
that perf_event.h be available in /usr/include/linux)
On older distributions (even ones that include the 2.6.32 kernel)
the perf_event.h file might not be there. One fix is to install
your distributions linux kernel headers package, which is often
an optional package not installed by default.
If you cannot install the kernel headers, you can obtain the
perf_event.h file from your kernel and run configure as such:
./configure --with-pe-incdir=INCDIR
replacing INCDIR with the directory that perf_event.h is in.
Linux PerfCtr (requires patching the kernel)
*****************************************************************************
When using Linux kernels before 2.6.32 the kernel must be patched with
the PerfCtr patch set. (This patchset can also be used on more recent
kernels if the support provided by Perf Events is not enough for your
workload). The required patches and complete installation instructions
are provided in the papi/src/perfctr-x.y directory. Please see the INSTALL
file in that directory.
Do not forget, you also need to build your kernel with APIC support in order
for hardware overflow to work. This is very important for accurate statistical
profiling ala gprof via the hardware counters.
So, when you configure your kernel to build with PERFCTR as above, make
sure you turn on APIC support in the "Processor type and features" section.
This should be enabled by default if you are on an SMP, but it is disabled
by default on a UP.
In our 2.4.x kernels:
> grep PIC /usr/src/linux/.config
/usr/src/linux/.config:CONFIG_X86_GOOD_APIC=y
/usr/src/linux/.config:CONFIG_X86_UP_APIC=y
/usr/src/linux/.config:CONFIG_X86_UP_IOAPIC=y
/usr/src/linux/.config:CONFIG_X86_LOCAL_APIC=y
/usr/src/linux/.config:CONFIG_X86_IO_APIC=y
You can verify the APIC is working after rebooting with the new kernel
by running the 'perfex -i' command found in the perfctr/examples/perfex
directory.
PAPI on x86 assumes PerfCtr 2.6.x. NOTE: THE VERSIONS OF PERFCTR DO NOT
CORRESPOND TO LINUX KERNEL VERSIONS.
*- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -*
WARNING: You should always use a PerfCtr distribution that has been distributed
with a version of PAPI or your build may fail. Newer versions with backward
compatibility may also work. PAPI builds a shared library of the Perfctr
runtime, on which libpapi.so depends. PAPI also depends on the .a file,
which it decomposes into component objects files and includes in the libpapi.a
file for convenience. If you install a new PerfCtr, even a shared library,
YOU MUST REBUILD PAPI to get a proper, working libpapi.a.
There are several options in configure to allow you to specify your perfctr
version and location. Use:
% ./configure --help
to learn more about these options.
Follow the general installation instructions to complete your installation.PERFCT
*- IF PERFCTR IS INSTALLED BUT PAPI FAILS TO INITIALIZE -*
You may be running udev, which is not smart enough to know the permissions of
dynamically created devices. To fix this, find your udev/devices directory,
often /lib/udev/devices or /etc/udev/devices and perform the following actions:
mknod perfctr c 10 182
chmod 644 perfctr
On Ubuntu 6.06 (and probably other debian distros), add a line to
/etc/udev/rules.d/40-permissions.rules like this:
KERNEL=="perfctr", MODE="0666"
On SuSE, you may need to add something like the following to
/etc/udev/rules.d/50-udev-default.rules:
(SuSE does not have the 40-permissions.rules file in it.]
# cpu devices
KERNEL=="cpu[0-9]*", NAME="cpu/%n/cpuid"
KERNEL=="msr[0-9]*", NAME="cpu/%n/msr"
KERNEL=="microcode", NAME="cpu/microcode", MODE="0600"
KERNEL=="perfctr", NAME="perfctr", MODE="0644"
These lines tell udev to always create the device file with the appropriate permissions.
Use 'perfex -i' from the perfctr distribution to test this fix.
PLATFORM NOTES:
Opteron fails the matrix-hl test because the default definition of PAPI_FP_OPS
overcounts speculative floating point operations.
Solaris 8 - Ultrasparc
*****************************************************************************
The only requirement for Solaris is that you must be running version 2.8 or
newer. As long as that requirement is met, no additional steps are required
to install PAPI and you can follow the general installation guide.
Solaris 10 - UltraSPARC T2/Niagara 2
*****************************************************************************
PAPI supports the Niagara 2 on Solaris 10. The substrate offers support for
common basic operations like adding/reading/etc and the advanced features
multiplexing (see below), overflow handling and profiling. The implementation
for Solaris 10 is based on libcpc 2, which offers access to the underlying
performance counters. Performance counters for the UltraSPARC architecture
are described in the UltraSPARC architecture manual in general with detailed
descriptions in the actual processor manual. In case of this substrate the
documentation for performance counters can be found at:
- http://www.opensparc.net/publications/specifications/
In order to install PAPI on this platform make sure the packages SUNWcpc and
SUNWcpcu are installed. For the compilation Sun Studio 12 was used while the
substrate has been developed. GNU GCC has not been tested and would require
to modify the makefiles Makefile.solaris-niagara2 (32 bit) and
Makefile.solaris-niagara2-64bit (64 bit).
The steps required for installation are as follows:
./configure --with-bitmode=[32|64] --prefix=/is/optional
If no --with-bitmode parameter is present a default of
32 bit is assumed.
If no --prefix is used, a default of /usr/local is assumed.
make
make install
If you want to link your application against your installation you should
make sure to include at least the following linker options:
-lpapi -lcpc
PLEASE NOTE: This is the first revision of Niagara 2/libcpc 2/Solaris 10
support and needs further testing! Contributions, especially for the preset
definitions, would be very appreciated.
MULTIPLEXING: As the Niagara 2 offers no native event to count the cycles
elapsed, a "synthetic event" was created offering access to the cycle count.
This event is neither as accurate as the native events, nor it should be
used for anything else than the multiplexing mode, which needs the cycle
count in order to work. Therefore multiplexing and the preset PAPI_TOT_CYC
should be only used with caution. BEWARE OF WRONG COUNTER RESULTS!
Windows XP/2000/Server 2003 - Intel Pentium III or AMD Athlon / Opteron
*****************************************************************************
Please use PAPI 3.7 (http://icl.cs.utk.edu/projects/papi/downloads/papi-3.7.2.tar.gz)
The Windows source tree comes with Microsoft Visual Studio Version 8 projects
to build a graphical shell application, the PAPI library as a DLL, a kernel
driver to provide access to the counters, and a collection of C test programs.
The WinPMC driver must be installed with administrator privileges. See the
winpmc.html file in the papi/win2k/winpmc directory for details on building
and installing this driver.
The general installation instructions are irrelevant for Windows.
Other Platforms
*****************************************************************************
PAPI can be compiled and installed on most platforms that have GNU compilers
regardless of operating system or hardware. This includes, for example,
Macintosh systems running recent versions of OSX. However, PAPI can only
provide access to the CPU hardware counters on platforms that are directly
supported. Unsupported platforms will run, buttony provide basic timing
functions, and potential access to some non-cpu components.
*****************************************************************************
CREATING AND RUNNING COMPONENTS
*****************************************************************************
Basic instructions on how to create a new component can be found in
src/components/README. The components directory contains several components
developed by the PAPI team along with a simple yet functional "example"
component which can be used as a guide to aid third-party developers.
Assuming components are developed according to the specified guidelines,
they will function within the PAPI framework without requiring any changes
to PAPI source code.
A separate directory for each components is in the papi/src/components/
directory; e.g. the NVIDIA cuda component is in papi/src/components/cuda.
Within each component directory is a README file which should be consulted.
Typically the component needs an environment variables to be exported; e.g.
the cuda component requires the PAPI_CUDA_ROOT environment variable be set
to the directory where cuda libraries can be found.
Some components require multiple environment variables. Additional
instructions and how to address special circumstances can be found in the
README files.
The components to be added to PAPI are specified during the configuration of
PAPI by adding the --with-components=<component list> command line option to
configure. For example, to add the acpi, lustre, and net components, the
option would be:
% ./configure --with-components="acpi lustre net"