forked from icl-utk-edu/papi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASENOTES.txt
1596 lines (1339 loc) · 67.9 KB
/
RELEASENOTES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
This file documents changes in recent PAPI releases in inverse chronological
order.
For details on installing PAPI on your machine, consult the INSTALL.txt file
in this directory.
===============================================================================
PAPI 7.0.1 RELEASE NOTES 22 Feb 2023
===============================================================================
This is a minor release of PAPI. It introduces the following changes:
* Support for AMD Zen4 CPUs in libpfm4
* Support for ARM Neoverse V1 and V2 in libpfm4
* Fix a build error encountered when building the library with gcc 10 and later
* Resolve build warnings across different components
* Fix bug in the ROCm component when monitoring multiple GPUs in sampling mode
* Refactor ROCm component to simplify code and prepare it for rocmtools support
* Refactor ROCm SMI component and support XGMI events
===============================================================================
PAPI 7.0.0 RELEASE NOTES 14 Nov 2022
===============================================================================
This is a major release of PAPI, which offers several new components, including
"intel_gpu" with monitoring capabilities on Intel GPUs; "sysdetect" (along with
a new user API) for detecting details of the available hardware on a given
compute system; a significant revision of the "rocm" component for AMD GPUs;
the extension of the "cuda" component to enable performance monitoring on
NVIDIA's compute capabilities 7.0 and beyond. Furthermore, PAPI 7.0.0 ships
with a standalone "libsde" library and a new C++ API for software developers to
define software-defined events from within their applications.
For specific and detailed information on changes made for this release, see
ChangeLogP700.txt for filenames or keywords of interest and change summaries,
or go directly to the PAPI git repository.
Some Major Changes for PAPI 7.0.0 include:
* A new "intel_gpu" component with monitoring capabilities support for Intel
GPUs, including GPU hardware events and memory performance metrics (e.g.,
bytes read/written/transferred from/to L3). The PAPI "intel_gpu" component
offers two collection modes: (1) "Time-based Collection Mode," where metrics
can be read at any given time during the execution of kernels.
(2) "Kernel-based Collection Mode," where performance counter data is
available once the kernel execution is finished.
* A new "sysdetect" component for detecting a machine's architectural details,
including the hardware's topology, specific aspects about the memory hierarchy,
number and type of GPUs and CPUs on a node, thread affinity to NUMA nodes and
GPU devices, etc. Additionally, PAPI offers a new API that enables users to get
"sysdetect" details from within their application.
* A major redesign of the "rocm" component for advanced monitoring features for
the latest AMD GPUs. The PAPI "rocm" component is now thread-safe and offers
two collection modes: "sampling" and "kernel intercept" mode.
* Support for NVIDIA compute capability 7.0 and greater. This implies support for
CUPTI's new Profiling and Perfworks APIs. The PAPI CUDA component has been
refactored to work equally for NVIDIA compute capabilities <7.0 and >= 7.0.
* A significant redesign of the "sde" component into two separate entities: (1)
a standalone library "libsde" with a new API for software developers to define
software-based metrics from within their applications, and (2) the PAPI "sde"
component that enables monitoring of these new software-based events.
* A new C++ interface for "libsde," which enables software developers to define
software-defined events from within their C++ applications.
* New Counter Analysis Toolkit (CAT) benchmarks and refinements of PAPI's CAT
data analysis, specifically, the extension of PAPI's CAT with MPI and
"distributed memory"-aware benchmarks and analysis to stress all cores per
node.
* Support for FUGAKU's A64FX Arm architecture, including monitoring capabilities
for memory bandwidth and other node-wide metrics.
===============================================================================
PAPI 6.0.0 RELEASE NOTES 29 Jan 2020
===============================================================================
PAPI 6.0 is now available. This release includes a new API for SDEs (Software
Defined Events), a major revision of the 'high-level API', and several new
components, including ROCM and ROCM_SMI (for AMD GPUs), powercap_ppc and
sensors_ppc (for IBM Power9 and later), SDE, and the IO component (exposes I/O
statistics exported by the Linux kernel). Furthermore, PAPI 6.0 ships CAT, a
new Counter Analysis Toolkit that assists with native performance counter
disambiguation through micro-benchmarks.
For specific and detailed information on changes made for this release, see
ChangeLogP600.txt for filenames or keywords of interest and change summaries,
or go directly to the PAPI git repository.
Major Changes
* Added the rocm component to support performance counters on AMD GPUs.
* Added the rocm_smi component; SMI is System Management Interface to monitor
power usage on AMD GPUs, which is also writeable by the user, e.g. to reduce
power consumption on non-critical operations.
* Added 'io' component to expose I/O statistics exported by the Linux kernel
(/proc/self/io).
* Added 'SDE' component, Software Defined Events, which allows HPC software
layers to expose internal performance-critical behavior via Software Defined
Events (SDEs) through the PAPI interface.
* Added 'SDE API' to register performance-critical events that originate from
HPC software layers, and which are recognized as 'PAPI counters' and, thus,
can be monitored with the standard PAPI interface.
* Added powercap_ppc component to support monitoring and capping of power usage
on IBM PowerPC architectures (Power9 and later) using the powercap interface
exposed through the Linux kernel.
* Added 'sensors_ppc' component to support monitoring of system metrics on IBM
PowerPC architectures (Power9 and later) using the opal/exports sysfs
interface.
* Retired infiniband_umad component, it is superseded by infiniband.
* Revived PAPI's 'high-level API' to make it more intuitive and effective for
novice users and quick event reporting.
* Added 'counter_analysis_toolkit' sub-directory (CAT): A tool to assist with
native performance counter disambiguation through micro-benchmarks, which are
used to probe different important aspects of modern CPUs, to aid the
classification of native performance events.
Other Changes
* Standardized our environment variables and implemented a simplified,
unified approach for specifying libraries necessary for components, with
overrides possible for special circumstances. Eliminated component level
'configure' requirements.
* Corrected TLS issues (Thread Local Storage) and race conditions.
* Several bug fixes, documentation fixes and enhancements, improvements to
README files for user instruction and code comments.
Acknowledgements: This release is the result of efforts from many people. The
PAPI team would like to express special Thanks to Vince Weaver, Stephane
Eranian (for libpfm4), William Cohen, Steve Kaufmann, Phil Mucci, Kevin Huck,
Yunqiang Su, Carl Love, Andreas Beckmann, Al Grant and Evgeny Shcherbakov.
The PAPI release can be downloaded from http://icl.cs.utk.edu/papi/software.
===============================================================================
PAPI 5.7.0 RELEASE NOTES 4 Mar 2019
===============================================================================
PAPI 5.7 is now available. This release includes a new component, called "pcp",
which interfaces to the Performance Co-Pilot (PCP). It enables PAPI users to
monitor IBM POWER9 hardware performance events, particularly shared “NEST”
events without root access.
This release also upgrades the (to date read-only) PAPI “nvml” component with
write access to the information and controls exposed via the NVIDIA Management
Library. The PAPI “nvml” component now supports both---measuring and capping
power usage---on recent NVIDIA GPU architectures (e.g. V100).
We have added power monitoring as well as PMU support for recent Intel
architectures such as Cascade Lake, Kaby Lake, Skylake, and Knights Mill (KNM).
Furthermore, measuring power usage for AMD Fam17h chips is now available via
the “rapl” component.
For specific and detailed information on changes made for this release, see
ChangeLogP570.txt for filenames or keywords of interest and change summaries,
or go directly to the PAPI git repository.
Major Changes
* Added the component PCP (Performance Co-Pilot, IBM) which allows access to
PCP events via the PAPI interface.
* Added support for IBM POWER9 processors.
* Added power monitoring support for AMD Fam17h architectures via RAPL.
* Added power capping support for NVIDIA GPUs.
* Added benchmarks and testing for the “nvml” component, which allows
power-management (reporting and setting) for NVIDIA GPUs.
* Re-implementation of the “cuda” component to better handle GPU events,
metrics (values computed from multiple events), and NVLink events, each of
which have different handling requirements and may require separate read
groupings.
* Enhanced NVLink support, and added additional tests and example code for
NVLink (high-speed GPU interconnect).
* Extension of test suite with more advanced testing: attach_cpu_sys_validate,
attach_cpu_validate, event_destroy test, openmp.F test, attach_validate test
(rdpmc issue).
Other Changes
* ARM64 configuration now works with newer Linux kernels (>=3.19).
* As part of the “cuda” component, expanded CUPTI-only tests to distinguish
between PAPI or non-PAPI issues with NVIDIA events and metrics.
* Many memory leaks have been corrected. Not all, some 3rd party library
codes still exhibit memory leaks.
* Better reporting and error handling of bugs. Changes to “infiniband_umad”
name reporting to distinguish it from the “infiniband” component.
* Cleaning up of the source code, added documentation and test/utility files.
Acknowledgements: This release is the result of efforts from many people. The
PAPI team would like to express special Thanks to Vince Weaver, Stephane
Eranian (for libpfm4), William Cohen, Steve Kaufmann, Phil Mucci, and
Konstantin Stefanov.
The PAPI release can be downloaded from http://icl.cs.utk.edu/papi/software.
===============================================================================
PAPI 5.6.0 RELEASE NOTES 19 Dec 2017
===============================================================================
PAPI 5.6.0 contains a major cleanup of the source code and the build
system to have consistent code structure, eliminate errors, and reduce
redundancies. A number of validation tests have been added to PAPI to
verify the PAPI preset events. Improvements and changes to multiple
PAPI components have been made, varying from supporting new events to
fixes in the component testing.
For specific and detailed information on changes made in this release,
see ChangeLogP560.txt for keywords of interest or go directly to the
PAPI git repository.
Major changes
* Validation tests: A substantial effort to add validation tests to
PAPI to check and detect problems in the definition of PAPI preset
events.
* Event testing: Thorough cleanup of code in the C and Fortran testing
to add processor support, cleanup output and make the testing
behavior consistent.
* CUDA component: Updated and rewritten to support CUPTI Metric API
(combinations of basic events). This component now supports NVLink
information through the Metric API. Updated testing for the
component.
* NVML component: Updated to support power management limits and
improved event names. Minor other bug fixes.
* RAPL component: Added support for: Intel Atom models Goldmont /
Gemini_Lake / Denverton, Skylake-X / Kabylake
* PAPI preset events: Many updates to the PAPI preset event mappings;
Skylake X support, initial AMD fam17h, fix AMD fam16h, added more
Power8 events, initial Power9 events.
Other changes
* Updating man and help pages for papi_avail and papi_native_avail.
* Powercap component: Added test for setting power caps via PAPI
powercap component.
* Infiniband component: Bugfix for infiniband_umad component.
* Uncore component: Updated to support recent processors.
* Lmsensors component updated to support correct runtime linking,
better events name, and a number of bug fixes.
* Updated and fixed timer support for multiple architectures.
* All components: Cleanup and standardize testing behavior in the
components.
* Build system: Much needed cleanup of configure and make scripts.
* Support for C++ was enhanced.
* Enabling optional support for reading events using perfevent-rdpmc
on recent Linux kernels can speed up PAPI_read() by a factor of 5.
* Pthread testing limited to avoid excessive CPU consumption on highly
parallel machines.
Acknowledgements: This release is the result of efforts from many
people, with special Thanks to Vince Weaver, Phil Mucci, Steve
Kauffman, William Cohen, Will Schmidt, and Stephane Eranian (for
libpfm4) from the internal PAPI team.
===============================================================================
PAPI 5.5.1 RELEASE NOTES 18 Nov 2016
===============================================================================
PAPI 5.5.1 is now available. This is a point release intended
primarily to add support for uncore performance monitoring events on
Intel Xeon Phi Knights Landing (KNL). Other minor bugfixes have also
been made.
For specific and detailed information on changes made in this release,
see ChangeLogP551.txt for keywords of interest or go directly to the
PAPI git repository.
New Platforms:
* Added Knights Landing (KNL) uncore event support via libpfm4.
Bug Fixes:
* Fix some possible string termination problems.
* Cleanup lustre and mx components.
* Enable RAPL for Broadwell-EP.
===============================================================================
PAPI 5.5.0 RELEASE NOTES 14 Sep 2016
===============================================================================
PAPI 5.5 is now available. This release provide a new component that
provides read and write access to the information and controls exposed
via the Linux powercap interface.The PAPI powercap component supports
measuring and capping power usage on recent Intel architectures.[a][b]
We have added core support for Knights Landing (uncore support will be
released later) as well as power monitoring via the RAPL and powercap
components.
For specific and detailed information on changes made in this release,
see ChangeLogP550.txt for keywords of interest or go directly to the
PAPI git repo.
New Platforms:
* Added Knights Landing (KNL) core events and preset events.
* Added Intel Broadwell/Skylake/Knights Landing RAPL support
* Updated PAPI preset event support for Intel Broadwell/Skylake
New Component:
* Powercap component: PAPI now supports the Linux Power Capping
Framework which exposes power capping devices and power measurement
to user space via a sysfs virtual file system interface.
Enhancements:
* Add support for multiple flavors of POWER8 processors.
* Force all processors to check event schedulability by checking that
PAPI can successfully read the counters.
* Support for Intel Broadwell-EP, Skylake, Goldmont, Haswell-EP
inherited from libpfm4.
* Shared memory object (.so) naming is made more limited so that minor
updates do not break ABI compatibility.
Bug Fixes:
* Improve testlib error messages if a component fails to initialize.
* Fix _papi_hwi_postfix_calc parsing and robustness.
* Clean build rules for CUDA sampling subcomponent.
* Correct IBM Power7 and Power8 computation of PAPI_L1_DCA.
* Eliminate the sole use of ftests_skip subroutine.
* Correct the event string names for tenth.c.
* Have Fortran test support code report errors more clearly.
* Cleanup output from libmsr component.
* PAPI internal functions were marked as static to avoid exposing them
externally.
* Multiple component were fixed to make internal functions static
where possible, to avoid exposing the functions as externally
accessible entry points.
* CUDA component configuration bug fixed.
===============================================================================
PAPI 5.4.3 RELEASE NOTES 26 Jan 2016
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP543.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
New Implementations:
-------------------
* libmsr component: Using LLNL's libmsr library to access Intel
RAPL (Running Average Power Limit) library adds power capping
abilities to PAPI.
* CUDA PC sampling: A new standalone CUDA sampling tool
(papi_cuda_sampling) has been added to the CUDA component
(components/cuda/sampling/) and can be used as a preloader to
perform PC sampling on Nvidia GPUs which support the CUPTI
sampling interface (e.g. Maxwell).
* ARM Cortex A53 support: Event definitions added.
Enhancements:
------------
* Added Haswell-EP uncore support
* Initial Broadwell, Skylake support
* Added a general CUDA example (components/cuda/test) that uses
LD_PRELOAD to attach to a running CUcontext.
* Added "-check" flag to papi_avail and papi_native_avail to
test counter availability/validity.
Bug Fixes:
----------
* Clean output from papi_avail tool when there are no user defined events.
* Support PAPI_GRN_SYS granularity for perf component.
* Bug fix for infiniband_umad component.
* Bug fix for vmware component.
* Bug fix for NVML component.
* Fixed RAPL component so it reports unsupported inside a guest VM.
* Cleanup ARM CPU detection.
* Bug fix for PAPI_overflow issue for multiple eventsets.
* Increased PERF_EVENT_MAX_MPX_COUNTERS to 192 from 128.
* Fixed memory leak in papi_preset.c.
* Free allocated memory in the stealtime component.
===============================================================================
PAPI 5.4.1 RELEASE NOTES 02 Mar 2015
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP541.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
The PAPI CUDA component is updated to support CUDA 6.5 with multiple GPUs.
New Platforms:
-------------
* Updated support for Intel Haswell and Haswell-EP
* Added ARM Cortex A7
* Added ARM 1176 cpu (original Raspberry Pi)
Enhancements:
------------
* Enhance PAPI preset events to allow user defined events.
* User defined events are set up via a user event definition file.
* CUDA component is updated to support multiple devices and contexts.
* Tested under and supports CUDA 6.5.
* Note: Events for different CUDA context MUST be added from within the context.
* New test demonstrating attaching an eventset to a single CPU rather than a thread.
* Use the term "event qualifiers" instead of "event masks" to clarify understanding.
* Added pkg-config support to PAPI.
Bug Fixes:
----------
* Fixed lustre segfault bug.
* Fixed compilation in the absence of a Fortran compiler.
* Fixed bug in krental_pthreads ctest to join threads properly on exit.
* Fixed bug in perf_events where event masks were not getting cleared properly.
* Fixed memory leak bug in perf_events.
===============================================================================
PAPI 5.4.0 RELEASE NOTES 13 Nov 2014
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP540.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
Full support for CUDA 6.5 has been delayed and will be included in the next
release.
New Platforms:
-------------
* EMON power component for IBM Blue Gene/Q
* Support for the Applied Micro X-Gene processor
* Support for IBM POWER non-virtualized platform
* RAPL support for Intel Haswell models (60,69,71)
Enhancements:
------------
* Added list of supported PMU names (core/uncore components)
* Support for extended event masks (core/uncore components)
* Extension of the RAPL energy measurements on Intel via msr-safe
* Updated IBM POWER7, POWER8 presets
* 'papi_native_avail --validate' supports events that require
multiple masks to be valid
Bug Fixes:
----------
* HW counter and event count added/fixed for BGPM components
* Reduce cost of using PAPI_name_to_code
* Non-null terminated strings fixed
* Growing list of native events in core/uncore components fixed
* Cleaned up Intel IvyBridge presets
* Addressed Coverity reported issues
===============================================================================
PAPI 5.3.2 RELEASE NOTES 30 Jun 2014
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP532.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
An internal 5.3.1 is skipped, changes since 5.3 are detailed below.
New Platforms:
-------------
* Intel Silvermont
* ARM Qualcomm Krait
Enhancements:
------------
* Rapl component support for Intel Haswell-EP
* Add units to NVML component
* Refine the definition of a Flop on the *-Bridge Intel chips.
* Updated Intel Haswell presets
Bug Fixes:
----------
* FreeBSD build and component fixes
* Uncore enumeration
* Printf format specifiers standardized (use # for hex)
===============================================================================
PAPI 5.3.0 RELEASE NOTES 18 Nov 2013
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP530.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
New Platforms:
-------------
* Intel Xeon Phi ( for offload code )
Enhancements:
------------
* RAPL component better deals with counter wrap
* Floating support added for Intel IvyBridge
* PAPI_L1_ICM event added for Intel Haswell
* AMD Fam15h gets Core select umasks
* CUDA component now sets the number of native events supported
* Installed tests' code can now be built.
* host-micpower utility
Bug Fixes:
----------
* command_line utility event skipping bug
* remove extranious -openmp flag from icc builds
* Default to building all ctests, clean up much bit rot
===============================================================================
PAPI 5.2.0 RELEASE NOTES 06 Aug 2013
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP520.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
This release represents a major overhaul of several components. Support for
Intel Haswell and Power 8 has been added. Processor support code has been
moved to the components directory.
New Platform:
-------------
* Intel Haswell (initial support)
* Power 8 (initial support)
New Components:
---------------
* Host-side MIC power component
Enhancements:
------------
* Component tests are now included with install-tests make target.
* Components with external library dependencies load them at runtime
allowing better distribution (infiniband, cuda, vmware, nvml and
host-side micpower)
* Perf_events, perfctr[_ppc] and perfmon2[_ia64] have been moved under the
components directory
* (Intel) Uncore support has been split into its own component
* Lustre component better handles large numbers of filesystems
===============================================================================
PAPI 5.1.1 RELEASE NOTES 21 May 2013
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP511.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
This is a bug fix release.
New Platform:
-------------
* Intel IvyBridge-EP
Bug Fixes:
----------
* Many perf_event fixes
* Cuda component fixes
* IA64 and SPARC build fixes
Enhancements:
------------
* Better logic in run_tests.sh script
* ARM builds now use pthread_mutexes
* BG/Q overflow enhancements
===============================================================================
PAPI 5.1.0 RELEASE NOTES 11 Jan 2013
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP510.txt for keywords of interest or go directly to the PAPI git
repo.
New Platform:
-------------
* Intel Xeon Phi ( Knight's Corner or KNC or MIC )
Bug Fixes:
----------
* Various build system fixes.
* NVML component fix.
* Work around a sampling bug on Power Linux
Enhancements:
------------
* ARM Cortex A15 support.
* New API entry, PAPI_get_eventset_component
* Add options to papi_command_line to print in hex and unsigned formats
New Components:
---------------
* MIC Power component.
===============================================================================
PAPI 5.0.1 RELEASE NOTES 20 Sep 2012
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP501.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
This in a bug fix release of PAPI. Including a major bug fix in the preset
code, we recommend that all users of PAPI 5.0 upgrade; see commit 866bd51c
for a detailed discussion.
Bug Fixes:
----------
* Debugging macros with out variadic macro support.
* Building PAPI with an external libpfm4 installation.
* Fix a major bug in the preset code.
Enhancements:
-------------
* CUDA configure script better supports Kepler architecture.
* rapl support for IvyBridge.
* Libpfm4 updates for SandyBridge-EP counters.
===============================================================================
PAPI 5.0.0 RELEASE NOTES 23 Aug 2012
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP500.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
This is a major release of PAPI. Parts of both the internal component and
external low-level interfaces have changed, this will break your 4.4 compliant
components. Numerous bug fixes are also included in this release.
New Platforms:
-------------
* Intel IvyBridge
* Intel Atom Cedarview
New / Improved Components:
---------------
* nVidia Management library component - support for various system health
and power measurements on supported nVidia gpus.
* stealtime - When running in a VM, this provides information on how much
time was "stolen" by the hypervisor due to the VM being disabled.
This is currently only supported on KVM.
* RAPL - a SandyBridge RAPL (Running Average Power Level) Component
providing for energy measurement at the package-level.
* VMware component for VMware pseudo-counters
* appio - This application I/O component enables PAPI-C to determine
I/O used by the application.
Bug Fixes:
----------
* Numerous memory leaks, thread races, and compiler warnings corrected.
Enhancements:
-------------
* Major overhaul of the component interface.
* Update perf_event.c rdpmc support
* Minor uncore fixes plus changes for rdpmc.
* Add a PAPI_REF_CYC preset event, defined as UNHALTED_REFERENCE_CYCLES for
all Intel platforms on which this native event is supported.
* Component names are now standardized in a meaningful way.
* Multiplexing under perf_events has been improved.
* FreeBSD cleanup/updates
* appio component now intercepts recv()
* Power7 definition of L1_DCA and LST_INS updated to a countable definition
* Added BGPM's opcode and generic event functionality to PAPI for BG/Q
(requires Q32 driver V1R1M2).
Open Issues:
-------------
* SandyBridge PAPI_FP_* events only produce reasonable results when counted
by themselves.
* Ivy Bridge does not support floating point events.
Experimental:
-------------
Known Bugs:
-----------
* Software multiplexing is known to have a memory leak.
* The byte-profile test is known to fail on Power7/AIX
Deprecated:
---------------------
* Java PAPI wrappers
* Windows
===============================================================================
PAPI 4.4.0 RELEASE NOTES 17 Apr 2012
===============================================================================
For specific and detailed information on changes made in this release, grep
ChangeLogP440.txt for keywords of interest or go directly to the PAPI git
repo.
GENERAL NOTES
===============================================================================
This is a major release of PAPI-C. Support for IBM Blue Gene/Q has been added.
Multiple bug fixes are also included in this release.
This is also the first release of papi made from the git repository;
git clone http://icl.cs.utk.edu/git/papi.git
Visit the PAPI Reference pages for more information at:
http://icl.cs.utk.edu/projects/papi/wiki/Main_Page
And visit the PAPI website for the latest updates:
http://icl.cs.utk.edu/papi/
RECENT CHANGES IN PAPI 4.4.0
===============================================================================
New Platforms:
-------------
* src/Rules.bgpm... Added PAPI support for Blue Gene/Q.
Bug Fixes:
----------
* Fix buffer overrun in lmsensors component
* libpfm4: Update to current git libpfm4 snapshot
* Fix broken Pentium 4 Prescott support we were missing the netbusrt_p
declaration in papi_events.csv
* Fix various locking issues in the threaded code.
* Fix multiplexing of large eventsets on perf_events systems.
This presented when using more than 31 multiplexed events on perf_event
Enhancements:
-------------
* Update the release machinery for git.
===============================================================================
PAPI 4.2.1 RELEASE NOTES 13 Feb 2012
===============================================================================
For specific and detailed information on changes made in this release, grep
the ChangeLogP421.txt file for keywords of interest or go directly to the PAPI
cvs tree.
GENERAL NOTES
===============================================================================
This is a minor release of PAPI-C. It does not break binary or semantic
compatibility with previous versions.
Visit the PAPI Reference pages for more information at:
http://icl.cs.utk.edu/projects/papi/wiki/Main_Page
And visit the PAPI website for the latest updates:
http://icl.cs.utk.edu/papi/
RECENT CHANGES IN PAPI 4.2.1
===============================================================================
Bug Fixes:
----------
* solaris substrate set_domain call was added.
* multiplexing math errors were fixed in perf_events.c
* more multiplexing read path errors were identified and fixed
* src/linux-timer.c: Fix compilation warning if you specify
--with-walltime=gettimeofday
* src/linux-timer.c: Fix the build on Linux systems using mmtimer
* src/linux-common.c: Update the linux MHz detection code to use
bogoMIPS when there is no MHz field available in /proc/cpuinfo.
* src/: configure, configure.in: Fix a typo in the perfctr section;
it was causing a machine to default to perfctr when it had no
performance interface. ( a centos vm image with a 2.6.18 kernel)
Also checks that we actually have perfctr if we specify
--with-perfctr.
* Fix SMP ARM issues reported by Harald Servat.
Also, adds proper header dependency checking in the Rules files.
* src/ctests/api.c: Make the api test actually test PAPI_flops() as
it claims to do, rather than PAPI_flips().
* src/papi_events.csv: Update the coreduo (not core2) events. Most
notably the FP events were wrong.
* src/papi_events.csv: Modify Intel Sandybridge PAPI_FP_OPS and
PAPI_FP_INS events to not count x87 fp instructions.
The problem is that the current predefines were made by adding 5
events. With the NMI watchdog stealing an event and/or
hyperthreading reducing the number of available counters by half,
we just couldn't fit.
This now raises the potential for people using x87-compiled
floating point on Sandybridge and getting 0 FP_OPS. This is only
likely if running a 32-bit kernel and *not* compiling your code
with -msse.
A long-term solution might be trying to find a better set of FP
predefines for sandybridge.
* src/components/lmsensors/: Rules.lmsensors, configure.in: Fixed
configure error message and rules link error for shared object
linking. Thanks Will Cohen.
* src/components/lmsensors/linux-lmsensors.h: Added missing string
header
* src/components/net/tests/: net_values_by_code.c,
net_values_by_name.c: Apply patch suggested by Will Cohen to
check for system return values.
* src/Makefile.inc: Patch to cleanup dependencies, allowing for
parallel makes. Patch due to Will Cohen from redhat
* src/: papi_internal.c, threads.c: Fix two race
conditions that are probably the cause of the pthrtough
double-free error.
When freeing a thread, we remove and free all eventsets belonging
to that thread. This could race with the thread itself removing
the evenset, causing some ESI fields to be freed twice.
The problem was found by using the Valgrind 3.8 Helgrind tool
valgrind --tool=helgrind --free-is-write=yes ctests/pthrtough
In order for Helgrind to work, I had to temporarily modify PAPI
to use POSIX pthread mutexes for locking.
Enhancements:
-------------
* general doxygen cleanups
* cleanup output of overflow_allcounters for clarity in debugging
* updates to most recent (as of Feb 1) libpfm4
* remove now-opaque event codes from papi_native_avail
and papi_xml_event_info
* src/: papi_internal.c Update the component initialization code
so that it can handle a PAPI ERROR return gracefully. Previously
there was no way to indicate initialization failure besides just
setting num_native_events to 0.
New Platforms:
-------------
* src/libpfm4/lib/: pfmlib_amd64_fam11h.c, events/amd64_events_fam11h.h
Support for AMD Family 11.
* src/libpfm4/lib/: pfmlib_amd64_fam12h.c, events/amd64_events_fam12h.h
Support for AMD Family 12.
Deprecated Platforms:
---------------------
* remove obsolete ACPI component
New / Improved Components:
---------------
* PAPI CUDA component updated for CUDA / CUPTI 4.1.
* SetCudaDevice() now works with the latest CUDA 4.1 version.
* Auto-detection of CUDA version for backward compatibility.
* PAPI_read() now accumulates event values. This fixes a bug
in earlier versions.
* extensive updates and cleanups to the example and coretemp components.
* significant updates of lustre, and mx components
* The linux net component underwent extensive updates and cleanups.
In particular, it nows dynamically detects the network
interface names [1] and export 16 counters for each interface
(see also src/components/net/{CHANGES,README}).
Open Issues:
-------------
* multiplex1.c was rewritten to expose a multiplexing bug in the perf_events
kernel (3.0.3) for MIPS
* src/components/lmsensors/: Latest versions of lmsensors are incompatible
with current lmsensors component. Interface needs to be updated for forward
compatibility.
* There's a problem with broken overflow on POWER6 linux systems.
We suspect a kernel problem, but don't know exactly which version(s)
We're running a 2.6.36 kernel where the problem has been identified.
It may be fixed in newer versions.
Experimental:
-------------
* a new vmware component has been added to report a variety of soft events
when running as a guest in a VMWare environment
===============================================================================
PAPI 4.2.0 RELEASE NOTES 26 Oct 2011
===============================================================================
For specific and detailed information on changes made in this release, grep
the ChangeLogP420.txt file for keywords of interest or go directly to the PAPI
cvs tree.
GENERAL NOTES
===============================================================================
This is a major release of PAPI-C. It add a significant new feature in
user-defined events. It also marks a shift from external (and out-dated)
man pages to doxygen generated man pages. These pages can be found online at:
http://icl.cs.utk.edu/papi/docs/. They are also installable with "make install",
and you can build your own versions using doxygen.
Visit the PAPI Reference pages for more information at:
http://icl.cs.utk.edu/projects/papi/wiki/Main_Page
And visit the PAPI website for the latest updates:
http://icl.cs.utk.edu/papi/
RECENT CHANGES IN PAPI 4.2.0
===============================================================================
Bug Fixes:
----------
* Bug in CUDA v4.0 fixed. It caused a threaded application to hang when
parent called cuInit() before fork() and child called also cuInit().
All fork ctests pass now if papi is configured with cuda component.
* If papi is configured with cuda component and running a threaded
application, we need to make sure that a thread doesn't free the same
memory location(s) more than once. Now all pthread ctests pass with cuda.
* ctests/thrspecific works now with the CUDA component
* Added CudaRemoveEvent functionality (broken in earlier CUDA RC versions).
* ctests/all_native_events works now for the default CUDA device.
* Add locking to papi_pfm4_events so that adding/looking up event names
doesn't have a race condition when multiple threads are doing it at once.
* Fixed a series of problems with Itanium builds.
* Set FD_CLOEXEC on the overflow signal handler fd. Otherwise if we exec()
with overflow enabled, the exec'd process will quickly die due to lack
of signal handler. This patch is needed due to a change in behavior in
Linux 3.0. Mark Krentel first noticed this problem.
* Recent Ubuntu versions use the ld flag --as-needed by default, which
breaks the PAPI configure step for the libdl check, as the
--as-needed flag enforces the rule that libraries must come after the
object files on the command line, not before. The fix for this is to put
the libdl check it in LIBS instead of in LDFLAGS.
* Removed an fopen() without an fclose() on /proc/cpuinfo in papi.c.
This was being done to set the event masks properly for itanium and p4.
Since the platform code sets CPU vendor and family for us we don't
really have to open cpuinfo. This fix may also work on non-Linux
systems too.
* Update papi.h to properly detect if being built with a C99 compiler.
Enhancements:
-------------
* Default support for libpfm4
* ./configure --with-libpfm3 to support legacy libpfm3 builds
* PERF_COUNT_SW software events are available under perf_events with
libpfm4
* Nehalem/Westmere/SandyBridge Offcore event support is ready,
but support is not yet available in the Linux kernel.
* Add new utility to display PAPI error codes and description strings.
* Add API to access error descriptions: PAPI_descr_error( int error_code).
* Add support for handling multiattach properly.
* Cleanups to avoid gcc-4.6 warnings.
* Added ability to add tests to components. All component tests are
compiled with PAPI when typing 'make'and cleaned up with 'make clean'
or 'make clobber'. Also added tests to the example and cuda components.
* CUDA component is now thread-safe. Multiple CPU
threads can access the same CUDA context. Note, it's possible to
create a different CUDA context for each thread, but then we are
likely running into a limitation that only one context can be
profiled at a time.
* LOTS of code cleanup thanks to Will Cohen of RedHat.
* Refactored test code so no-cpu-counters can build with components
* Build all utilities with no-cpu-counters
* Modify run_tests.sh so that you can set the VALGRIND command
externally via environment variable without having to edit
run_tests.sh itself. Also adds Date and cpuinfo information to the
beginning of run_tests.sh results. This can help when run_tests.sh
output is passed around when debugging a problem.
* Parallel make now works.
New Platforms:
-------------
* AMD Family 14h Bobcat (libpfm4 only)
* Intel SandyBridge (libpfm4 only)
* ARM Cortex-A8 and Cortex-A9 (libpfm4 only)
Deprecated Platforms:
---------------------
* although still technically supported, we are no longer actively testing
platforms based on the perfmon and perfctr patches. All linux kernels
> 2.6.32 provide internal support for perf_events.
New / Improved Components:
---------------
* Add a number of 'native' events to the component info structure in
example component.
* Introduce a papi_component_avail utility; lists the components we were
built with, optionally with native/preset counts and version number.
Open Issues:
-------------
* On newer Linux kernels (2.6.34+) the nmi_watchdog counter can steal one
of the counters, reducing by one the total available.
There's a bug in Linux where if you try to use the full number of
counters on such a system with a group leader, the sys_perf_open()
call will succeed only to fail at read time.
(instead of the proper error code at open time).
I do wish there were a way to notify the user more visibly,
because losing a counter (when you might only have 4 total to
begin with) is a big deal, and most Linux vendors are starting to
ship kernels with the nmi_watchdog enabled.
Experimental:
-------------
* Preliminary support for MIPS 74K.
===============================================================================
PAPI 4.1.4 RELEASE NOTES 29 Aug 2011
===============================================================================
For specific and detailed information on changes made in this release, grep
the ChangeLogP414.txt file for keywords of interest or go directly to the PAPI
cvs tree.
GENERAL NOTES
===============================================================================
This is an internal release of PAPI-C targetted specifically for a Cray tools
release. It precedes a more general 4.2.0 release and incorporates changes and
updates since PAPI 4.1.3.
Detailed changes will be documented in the 4.2.0 release. Meanwhile the list
below highlights the most significant changes since 4.1.3.
* Intel SandyBridge is now supported
* libpfm4 support has been updated
* internal doxygen documentation has been added for the entire API
* the man pages have been replaced with doxygen generated man pages
* CUDA component support has been improved
* an infrastructure for testing components only has been implemented
* various bugs have been addressed