-
-
Notifications
You must be signed in to change notification settings - Fork 38
/
ChangeLog
128 lines (82 loc) · 3.44 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
2024-06-29 Kim Walisch <[email protected]>
Version 3.1
* Improve AVX512 algorithm for trailing 64 bytes.
* AVX512 algorithm does not require AVX512-BITALG extension anymore.
2024-06-27 Kim Walisch <[email protected]>
Version 3.0
* Add ARM SVE algorithm.
* Replace AVX512BW algorithm by faster AVX512 VPOPCNTDQ algorithm.
* Add MSVC support for ARM NEON.
* Improve preprocessor checks using __has_include() macro.
* Port tests from AppVeyor to GitHub actions.
* Get rid of unaligned uint64_t memory acceses, this fixes
test failures when using GCC compiler sanitizers.
* Prefix all libpopcnt macros using LIBPOPCNT_ to avoid any naming collisions.
2020-08-10 Kim Walisch <[email protected]>
* Use unaligned memory accesses to improve performance.
Aligning memory causes branch mispredictions which
can significantly deteriorate performance especially
for small array sizes.
* On x86/x64 runtime checks are now removed if the user
compiles his code with e.g. -mpopcnt, -mavx2,
-march=native, ...
2020-07-19 Kim Walisch <[email protected]>
* Enable AVX2, AVX512 by default for MSVC.
Thanks to KOConchobhair for the pull request #15.
2019-12-31 Kim Walisch <[email protected]>
Version 2.3 released.
* Up to 10% speedup on ARM NEON.
* Fix unaligned memory access on ARM.
2018-01-13 Kim Walisch <[email protected]>
Version 2.2 released.
* Up to 6x faster on old x86 CPUs without POPCNT.
2017-11-10 Kim Walisch <[email protected]>
Version 2.1 released.
* Up to 20% ARM NEON speedup.
* test/test1.cpp: Add only 1 bits test case.
* test/test2.c: Add only 1 bits test case.
2017-10-27 Kim Walisch <[email protected]>
Version 2.0 released.
* Add AVX512 support.
* Support AVX2 and AVX512 for the MSVC compiler.
* Support CPUID for MSVC and the C programming language.
2017-09-09 Kim Walisch <[email protected]>
Version 1.9 released.
* libpopcnt.h: Fix MSVC POPCNT detection.
2017-04-24 Kim Walisch <[email protected]>
Version 1.8 released.
* Fixed "illegal instruction" runtime crash with GCC 4.4.7.
* benchmark.cpp: Print statistics and algorithm name.
* Reduce CMake minimum version to 2.8.
2017-04-08 Kim Walisch <[email protected]>
Version 1.7 released.
* Refactor ARM NEON popcount algorithm.
2017-04-04 Kim Walisch <[email protected]>
Version 1.6 released.
* Add ARM NEON popcount algorithm.
* Fix x86 segmentation fault for CPUs without POPCNT.
2017-04-02 Kim Walisch <[email protected]>
Version 1.5 released.
* libpopcnt.h now supports C and C++!
* Use CMake build system.
* test2.c: Add C test.
* Fix clang-cl bug on Windows.
2017-04-01 Kim Walisch <[email protected]>
Version 1.4 released.
* libpopcnt.h: Fix compiler warning.
2017-03-30 Kim Walisch <[email protected]>
Version 1.3 released.
libpopcnt.h is now C++ only (previously C/C++), the reason being
that the cpuid check cannot be made thread-safe using plain C,
whereas in C++ its trivial (Meyers Singleton).
* Add benchmark.cpp.
* Use AVX2 for arrays >= 512 bytes (previously 1024).
* Improve Makefile.
2017-03-26 Kim Walisch <[email protected]>
Version 1.2 released.
* Add cpuid check for x86 CPUs.
* Compiles without -mpopcnt, -mavx2 flags.
* Successfully tested on IBM POWER8 (generates popcntd, GCC 5.4).
* Successfully tested using clang-cl (Windows).
* Add ChangeLog.
* Update README.md.