-
Notifications
You must be signed in to change notification settings - Fork 64
Armv8.2 SM3和SM4
Sun Yimin edited this page Feb 18, 2022
·
31 revisions
go test -v -short -bench . -run=^$ ./...
goos: linux
goarch: arm64
pkg: github.com/emmansun/gmsm/sm3
BenchmarkHash8Bytes
BenchmarkHash8Bytes-2 2738724 438.4 ns/op 18.25 MB/s
BenchmarkHash1K
BenchmarkHash1K-2 192519 6232 ns/op 164.32 MB/s
BenchmarkHash8K
BenchmarkHash8K-2 24950 48112 ns/op 170.27 MB/s
BenchmarkHash8K_SH256
BenchmarkHash8K_SH256-2 223354 5369 ns/op 1525.81 MB/s
PASS
ok github.com/emmansun/gmsm/sm3 5.857s
和CPU指令级别的差距基本上是10倍!
AESE指令相当于:
- AddRoundKey(state, RoudKey)
- ShiftRows(State)
- SubBytes(State)
所以,如果RoundKey = 0, 那么AESE相当于执行了
- ShiftRows(State)
- SubBytes(State)
go test -v -short -bench . -run=^$ ./...
goos: linux
goarch: arm64
pkg: github.com/emmansun/gmsm/sm4
BenchmarkEncrypt
BenchmarkEncrypt-2 2145859 559.1 ns/op 28.62 MB/s
BenchmarkDecrypt
BenchmarkDecrypt-2 2145296 559.4 ns/op 28.60 MB/s
BenchmarkExpand
BenchmarkExpand-2 2064466 581.2 ns/op
PASS
ok github.com/emmansun/gmsm/sm4 5.334s
SM4EKEY SM4E 目前golang还没有支持SM4E/SM4EKEY指令,不过我们可以根据不支持的操作码来处理:
- Clone codes from https://github.com/golang/arch
- 修改arm64asm/tables.go: 增加SM4E/SM4EKEY常量;同时加入opstr;加入指令到instFormats。
// SM4E <Vd>.4S, <Vn>.4S
{0xfffffc00, 0xcec08400, SM4E, instArgs{arg_Vd_arrangement_4S, arg_Vn_arrangement_4S}, nil},
// SM4EKEY <Vd>.4S, <Vn>.4S, <Vm>.4S
{0xffe0fc00, 0xce60c800, SM4EKEY, instArgs{arg_Vd_arrangement_4S, arg_Vn_arrangement_4S, arg_Vm_arrangement_4S}, nil},
- 修改arm64asm/plan9x.go,noSuffixOpSet里加上SM4E和SM4EKEY,这个是可选的,加了的话,plan9x的指令就不会出现V前缀。
- 写测试,testDecodeLine()方法是从decode_test.go的testDecode()方法中抽出来的。看了那个Decode()方法就能编码出那些32位的code了。
func TestDecodeSM4Codes(t *testing.T) {
//gnu syntax, load 16 bytes plaintext to v8 (need to reverse byte order first), 32 round keys to v0-v7, the final result should be reverse byte order again
testDecodeLine(t, "gnu", "0884c0ce| sm4e v8.4s, v0.4s")
testDecodeLine(t, "gnu", "2884c0ce| sm4e v8.4s, v1.4s")
testDecodeLine(t, "gnu", "4884c0ce| sm4e v8.4s, v2.4s")
testDecodeLine(t, "gnu", "6884c0ce| sm4e v8.4s, v3.4s")
testDecodeLine(t, "gnu", "8884c0ce| sm4e v8.4s, v4.4s")
testDecodeLine(t, "gnu", "a884c0ce| sm4e v8.4s, v5.4s")
testDecodeLine(t, "gnu", "c884c0ce| sm4e v8.4s, v6.4s")
testDecodeLine(t, "gnu", "e884c0ce| sm4e v8.4s, v7.4s")
//plan9 syntax, load 16 bytes plaintext to v8 (need to reverse byte order first), 32 round keys to v0-v7, the final result should be reverse byte order again
testDecodeLine(t, "plan9", "0884c0ce| SM4E V0.S4, V8.S4")
testDecodeLine(t, "plan9", "2884c0ce| SM4E V1.S4, V8.S4")
testDecodeLine(t, "plan9", "4884c0ce| SM4E V2.S4, V8.S4")
testDecodeLine(t, "plan9", "6884c0ce| SM4E V3.S4, V8.S4")
testDecodeLine(t, "plan9", "8884c0ce| SM4E V4.S4, V8.S4")
testDecodeLine(t, "plan9", "a884c0ce| SM4E V5.S4, V8.S4")
testDecodeLine(t, "plan9", "c884c0ce| SM4E V6.S4, V8.S4")
testDecodeLine(t, "plan9", "e884c0ce| SM4E V7.S4, V8.S4")
//gnu syntax, load 32 ck to v0-v7, root key (reverse byte order first) xor fk to v8, the result round keys will be in v9, need to move v9 to v8 from second invocation of sm4ekey
testDecodeLine(t, "gnu", "09c960ce| sm4ekey v9.4s, v8.4s, v0.4s")
testDecodeLine(t, "gnu", "09c961ce| sm4ekey v9.4s, v8.4s, v1.4s")
testDecodeLine(t, "gnu", "09c962ce| sm4ekey v9.4s, v8.4s, v2.4s")
testDecodeLine(t, "gnu", "09c963ce| sm4ekey v9.4s, v8.4s, v3.4s")
testDecodeLine(t, "gnu", "09c964ce| sm4ekey v9.4s, v8.4s, v4.4s")
testDecodeLine(t, "gnu", "09c965ce| sm4ekey v9.4s, v8.4s, v5.4s")
testDecodeLine(t, "gnu", "09c966ce| sm4ekey v9.4s, v8.4s, v6.4s")
testDecodeLine(t, "gnu", "09c967ce| sm4ekey v9.4s, v8.4s, v7.4s")
//gnu syntax, load 32 ck to v0-v7, root key (reverse byte order first) xor fk to v8, the result round keys will be in v9 (1,3,5,7) and v8 (2,4,6,8),避免寄存器copy。
testDecodeLine(t, "gnu", "09c960ce| sm4ekey v9.4s, v8.4s, v0.4s")
testDecodeLine(t, "gnu", "28c961ce| sm4ekey v8.4s, v9.4s, v1.4s")
testDecodeLine(t, "gnu", "09c962ce| sm4ekey v9.4s, v8.4s, v2.4s")
testDecodeLine(t, "gnu", "28c963ce| sm4ekey v8.4s, v9.4s, v3.4s")
testDecodeLine(t, "gnu", "09c964ce| sm4ekey v9.4s, v8.4s, v4.4s")
testDecodeLine(t, "gnu", "28c965ce| sm4ekey v8.4s, v9.4s, v5.4s")
testDecodeLine(t, "gnu", "09c966ce| sm4ekey v9.4s, v8.4s, v6.4s")
testDecodeLine(t, "gnu", "28c967ce| sm4ekey v8.4s, v9.4s, v7.4s")
}
每次sm4e/sm4ekey只能执行4轮,所以需要调用8次。
4.然后,你就可以在golang的arm64的汇编中使用那些32位的codes了。
WORD $0x0884c0ce // SM4E V0.S4, V8.S4
用指令字的缺点主要是易读性差,另外一个就是不能或不好写宏代码。
可惜没有环境!!!
SM3和SM4 CPU指令实现,找不到相关CPU环境,mark先。
- Summary of A64 cryptographic instructions
- Arm A64 Instruction Set Architecture
- linux arm64 crypto / (https://github.com/torvalds/linux/tree/master/arch/arm64/crypto)
- A Quick Guide to Go's Assembler
- Golang arm instructions mapping
- A C/C++ header file that converts Intel SSE intrinsics to Arm/Aarch64 NEON intrinsics.
- asm2go