Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE2 compatibility should be warranted #10

Closed
weltling opened this issue Jun 8, 2015 · 21 comments
Closed

SSE2 compatibility should be warranted #10

weltling opened this issue Jun 8, 2015 · 21 comments
Assignees

Comments

@weltling
Copy link
Collaborator

weltling commented Jun 8, 2015

See the actual discussion here 1a6ea48, but in short - while various SSE versions have useful features, SSE2 is the standard set. The suggestion is to retain the compatibility with SSE2 even while some features from the upper SSE versions are used.

This is done by the following steps

  • retaining the emulation layer for the features that aren't available with SSE2
  • provide a legacy option to the makefiles, so then a fully SSE2 compatible version of the library can be compiled. This shouldn't involve any run time feature replacement (by usage of cpuid, etc).
@weltling weltling self-assigned this Jun 8, 2015
@wshager
Copy link

wshager commented Dec 30, 2015

This emulation of smmintrin.h (adapted from SSEPlus) suffices to compile. I wanted to see what Emscripten would make of this, but the generated javascript floods the CPU...

smmintrin.h.txt

Good luck with the project.

@wshager
Copy link

wshager commented Dec 30, 2015

Correction: the first two tests work, the simple_demo() hangs (compiled with Emscripten).

@weltling
Copy link
Collaborator Author

weltling commented Jan 8, 2016

Hi,

i've started to work on this having half a free day. But it is somehow strange :) I had to find out that i have no single machine not supporting sse4.1. Nevertheless, i made this small patch locally in the makefile:

diff --git a/makefile b/makefile
index 0e1c8de..228e737 100644
--- a/makefile
+++ b/makefile
@@ -2,10 +2,17 @@
 .SUFFIXES:
 #
 .SUFFIXES: .cpp .o .c .h
+
+ifeq ($(SSE),2)
+SSE_ARG=-msse2
+else
+SSE_ARG=-msse4.1
+endif
+
 ifeq ($(DEBUG),1)
-CFLAGS = -fPIC  -std=c89 -ggdb -msse4.1 -march=native -Wall -Wextra -pedantic
+CFLAGS = -fPIC  -std=c89 -ggdb $(SSE_ARG) -march=native -Wall -Wextra -pedantic
 else
-CFLAGS = -fPIC -std=c89 -O3 -msse4.1  -march=native -Wall -Wextra -pedantic
+CFLAGS = -fPIC -std=c89 -O3 $(SSE_ARG) -march=native -Wall -Wextra -pedantic
 endif # debug
 LDFLAGS = -shared
 LIBNAME=libsimdcomp.so.0.0.3

And then i compile like make SSE=2. Surprisingly, everything compiles fine and unit tests pass through. It is strange - firstly, it should not even compile, should it? Or, it could be good, that gcc already emulates something. Or gcc ignores this option as the VM supports SSE4_1. I have no chance to move the binaries and test on a machine without SSE4_1 as i don't have such HW. Can someone explain this?

Thanks.

@weltling
Copy link
Collaborator Author

weltling commented Jan 8, 2016

@wshager thanks for the interest. Could you please be more verbose about how you compile and test it, and what HW was used?

Thanks.

@wshager
Copy link

wshager commented Jan 8, 2016

@weltling I used Emscripten to compile, and since it only supports SIMD intrinsics up to SSE2, newer headers simply won't be found in the Emscripten lib. The HW doesn't seem relevant in this case. The current lack of SIMD support in javascript kinda puts this project on hold for me, since I'm looking for a solution in the browser.

@lemire
Copy link
Member

lemire commented Jan 8, 2016

@weltling

SSE 4.1 was introduced ~ 2007. I am not sure whether anyone makes x64 CPUs today without SSE 4.1.

Anyone knows?

lemire added a commit that referenced this issue Jan 8, 2016
SSE2 compatibility should be warranted #10
@lemire
Copy link
Member

lemire commented Jan 8, 2016

Checked in a version of the code that should compile fine given the proper compiler settings.

@wshager Hopefully, the code should be compilable by Emscripten if there is SIMD support.

@lemire lemire closed this as completed Jan 8, 2016
@weltling
Copy link
Collaborator Author

weltling commented Jan 8, 2016

@lemire, yeah, i was linking this ttp://store.steampowered.com/hwsurvey in another thread which tells that sse4.1 is still used by round 83% in the gaming, which should be the indicator.

I was planning long to go for the emulation layer, just it was quite busy times on my side inbetween. But now when i've came to it, the strange thing was that gcc -msse2 was compiling just fine the existing code even before your patch. So maybe gcc already inserts some emulation, or ignores the option. That's why I was wondering :) BTW Visual Studio seems to not to have options to enforce some specific SSE version compatibility, seems the runtime will do the job. But by the plan I had was to implement a compatibility layer explicitly.

Thanks.

@lemire
Copy link
Member

lemire commented Jan 8, 2016

The goal of my fix is to allow @wshager to go forward with his plan to compile this library with Emscripten. I'd be very interested in hearing about the results...

yeah, i was linking this http://store.steampowered.com/hwsurvey in another thread which tells that sse4.1 is still used by round 83% in the gaming, which should be the indicator.

It is an interesting reference.

the strange thing was that gcc -msse2 was compiling just fine the existing code even before your patch. So maybe gcc already inserts some emulation, or ignores the option.

I do not think that -msse2 precludes SSE 4.1. If you want to forbid the compiler to use SSE 4.1, you have to use a negative flag like -mno-sse4.1.

You can certainly emulate SSE 4.1 functions, but you will get awful performance. It is probably not worth it. Better not use SIMD instructions in the first place if you have to support obsolete hardware.

@weltling
Copy link
Collaborator Author

weltling commented Jan 8, 2016

@lemire great, -msse2 -mno-sse4.1 worked as expected. Looks like gcc expects the least version with -m, instead of a most one. Was my wrong interpretation. With these options now it properly fails to compile.

I got it exactly the way you told. With your fix it satisfies SSE2. But effectively it excludes several functionality. I was more talking about keeping the APIs, but having necessary parts emulated. Regarding performance - clear, SSE 2 were worse than 4.1. But I were more about compatibility. It's just down to whether to retain APIs compatible vs. moving forward with features.

Another point is also, that SIMD in any form will certainly provide a better performance compared to plain C/C++. More practical applicability vs. speed within SSE versions, as round 1/5 of cases sounds still significant to me. OFC i don't insist on this, just it was what we discussed earlier, why i opened this issue back then. I'm likely to call it a day then, mostly for the time reasons for my part.

Thanks.

@lemire
Copy link
Member

lemire commented Jan 8, 2016

SIMD in any form will certainly provide a better performance compared to plain C/C++.

Some non-SSE2 SIMD instructions we use (e.g., pshufb) cannot be emulated using SSE2 instructions. So you need to write the SIMD register to memory, reload it into regular registers, do the computation, then reload a SIMD register. That's going to be atrociously slow in some cases.

Even when the SIMD instruction can be emulated with SSE2 instructions (e.g., pmaxud), it may involve more operations than a non-SIMD algorithm would use. On the whole, you may end up with a significantly worse latency and throughput.

@wshager
Copy link

wshager commented Jan 9, 2016

@lemire thanks! The example now compiles to Emscripten. Steps to reproduce:

emcc -O2 src/simdbitpacking.c -o simdbitpacking.bc -msse2 -Iinclude
emcc -O2 src/simdcomputil.c -o simdcomputil.bc -msse2 -Iinclude
emcc -O2 src/simdintegratedbitpacking.c -o simdintegratedbitpacking.bc -msse2 -Iinclude
emcc -O2 example.c -o example.bc -msse2 -Iinclude
emcc -O2 example.bc simdbitpacking.bc simdintegratedbitpacking.bc simdcomputil.bc -o project.html -s ALLOW_MEMORY_GROWTH=1

I'll puzzle some more on dynamic calls from JS later, and report my experiences in #13. Moving forward, I expect Emscripten to support SSE4.1 in short term.
emscripten-core/emscripten#4030

@juj
Copy link

juj commented Jan 11, 2016

This looks like a very interesting use case of SIMD.js! I'd be very curious to see comparison graphs of native scalar vs native SIMD vs JS scalar vs JS SIMD performance to get a reality check of the suitability of the upcoming SIMD.js specification on your SIMD use case. If you generate such benchmark results at some point, please ping me in for the numbers!

@wshager
Copy link

wshager commented Jan 11, 2016

@juj I'd be happy to.

@lemire
Copy link
Member

lemire commented Jan 11, 2016

@wshager

Which version of emscripten do you use? The fact that there is no -msse2 flag on my version does not make me hopeful.

$ emcc --version
emcc (Emscripten GCC-like replacement) 1.10.0 ()
Copyright (C) 2013 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ emcc -O2 src/simdbitpacking.c -o simdbitpacking.bc -msse2 -Iinclude
clang: warning: argument unused during compilation: '-msse2'
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:6:1: error: redefinition of '_mm_set_ps'
_mm_set_ps(float z, float y, float x, float w)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:6:1: note: previous definition is here
_mm_set_ps(float z, float y, float x, float w)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:12:1: error: redefinition of '_mm_set1_ps'
_mm_set1_ps(float w)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:12:1: note: previous definition is here
_mm_set1_ps(float w)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:18:1: error: redefinition of '_mm_setzero_ps'
_mm_setzero_ps(void)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:18:1: note: previous definition is here
_mm_setzero_ps(void)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:24:1: error: redefinition of '_mm_store_ps'
_mm_store_ps(float *p, __m128 a)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:24:1: note: previous definition is here
_mm_store_ps(float *p, __m128 a)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:30:1: error: redefinition of '_mm_movemask_ps'
_mm_movemask_ps(__m128 a)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:30:1: note: previous definition is here
_mm_movemask_ps(__m128 a)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:36:1: error: redefinition of '_mm_add_ps'
_mm_add_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:36:1: note: previous definition is here
_mm_add_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:42:1: error: redefinition of '_mm_sub_ps'
_mm_sub_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:42:1: note: previous definition is here
_mm_sub_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:48:1: error: redefinition of '_mm_mul_ps'
_mm_mul_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:48:1: note: previous definition is here
_mm_mul_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:54:1: error: redefinition of '_mm_div_ps'
_mm_div_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:54:1: note: previous definition is here
_mm_div_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:60:1: error: redefinition of '_mm_min_ps'
_mm_min_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:60:1: note: previous definition is here
_mm_min_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:66:1: error: redefinition of '_mm_max_ps'
_mm_max_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:66:1: note: previous definition is here
_mm_max_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:72:1: error: redefinition of '_mm_sqrt_ps'
_mm_sqrt_ps(__m128 a)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:72:1: note: previous definition is here
_mm_sqrt_ps(__m128 a)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:80:1: error: redefinition of '_mm_cmplt_ps'
_mm_cmplt_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:80:1: note: previous definition is here
_mm_cmplt_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:86:1: error: redefinition of '_mm_cmple_ps'
_mm_cmple_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:86:1: note: previous definition is here
_mm_cmple_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:92:1: error: redefinition of '_mm_cmpeq_ps'
_mm_cmpeq_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:92:1: note: previous definition is here
_mm_cmpeq_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:98:1: error: redefinition of '_mm_cmpge_ps'
_mm_cmpge_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:98:1: note: previous definition is here
_mm_cmpge_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:104:1: error: redefinition of '_mm_cmpgt_ps'
_mm_cmpgt_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:104:1: note: previous definition is here
_mm_cmpgt_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:110:1: error: redefinition of '_mm_and_ps'
_mm_and_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:110:1: note: previous definition is here
_mm_and_ps(__m128 a, __m128 b)
^
In file included from src/simdbitpacking.c:4:
In file included from include/simdbitpacking.h:14:
In file included from include/simdcomputil.h:11:
In file included from /usr/share/emscripten/system/include/emscripten/emmintrin.h:1:
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:116:1: error: redefinition of '_mm_andnot_ps'
_mm_andnot_ps(__m128 a, __m128 b)
^
/usr/share/emscripten/system/include/emscripten/xmmintrin.h:116:1: note: previous definition is here
_mm_andnot_ps(__m128 a, __m128 b)
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
ERROR    root: compiler frontend failed to generate LLVM bitcode, halting

@wshager
Copy link

wshager commented Jan 11, 2016

@lemire I use 1.35.0. Choose 'emsdk_portable install latest' / 'emsdk_portable activate latest'.

@lemire
Copy link
Member

lemire commented Jan 11, 2016

@wshager Makes sense. Thanks.

@lemire
Copy link
Member

lemire commented Jan 11, 2016

@wshager Right. It builds.

$ emcc --version
emcc (Emscripten gcc/clang-like replacement) 1.35.0 (commit 42fb486c53d627b203b77af6e5d0c088c0ad03ad)
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ emcc -O2 src/simdbitpacking.c -o simdbitpacking.bc -msse2 -Iinclude
INFO:root:(Emscripten: Running sanity checks)
$ emcc -O2 src/simdcomputil.c -o simdcomputil.bc -msse2 -Iinclude
$ emcc -O2 src/simdintegratedbitpacking.c -o simdintegratedbitpacking.bc -msse2 -Iinclude
$ emcc -O2 example.c -o example.bc -msse2 -Iinclude
$ emcc -O2 example.bc simdbitpacking.bc simdintegratedbitpacking.bc simdcomputil.bc -o project.html -s ALLOW_MEMORY_GROWTH=1
WARNING:root:generating system library: libc.bc...

WARNING:root:                                     ok
WARNING:root:generating system library: dlmalloc.bc...
WARNING:root:                                         ok

@wshager
Copy link

wshager commented Jan 11, 2016

@lemire you can add -s EXPORTED_FUNCTIONS="['_main', '_compress_decompress_demo','_varying_bit_width_demo','_simple_demo']" to export functions separately.

@lemire
Copy link
Member

lemire commented Jan 11, 2016

Thanks.

I am just checking things out in case others want to try it out (so that it is documented).

@wshager
Copy link

wshager commented Jan 11, 2016

@lemire great. I'm going to try to rewrite some functions using SIMD.js. I'll report that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants