Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for CRYSTALS-Dilithium (NIST PQC Round2) #87

Merged
merged 2 commits into from
May 11, 2020

Conversation

bwesterb
Copy link
Member

@bwesterb bwesterb commented Feb 12, 2020

https://pq-crystals.org/dilithium/data/dilithium-specification-round2.pdf

  • This implementation is based on the spec with the reference
    implementation used as, what is in a name, reference. It turns out to
    be pretty close to the reference implementation. I did not simply
    mimic their choices, but checked them (for instance, whether
    coefficients stay bounded.) I did not find an issue except for some
    errors in comments.

Performance

On my MBP (coffee lake, 8569U) I get for Dilithium3 (using AVX2 optimized NTT):

BenchmarkSkUnpack-8            	   18980	     61840 ns/op
BenchmarkPkUnpack-8            	   19743	     60087 ns/op
BenchmarkVerify-8              	   39998	     30421 ns/op
BenchmarkSign-8                	    5420	    228631 ns/op
BenchmarkGenerateKey-8         	   12037	     99672 ns/op
BenchmarkPublicFromPrivate-8   	  107148	     11512 ns/op

The non-optimized C reference implementation:

BenchmarkKeygen-8   	   10000	    109216 ns/op
BenchmarkSign-8     	    2206	    545616 ns/op
BenchmarkVerify-8   	   10000	    108419 ns/op

The AVX2 optimized C reference implementation:

BenchmarkKeygen-8   	   36510	     32242 ns/op
BenchmarkSign-8     	   10000	    117060 ns/op
BenchmarkVerify-8   	   31605	     34123 ns/op

To be fair, we need to compare our Unpacking+Signing/Verification. So that's

keygen   99µs (our) vs 109µs (cref) and  32µs (avx2ref)
signing 290µs (our) vs 546µs (cref) and 117µs (avx2ref)
verif    60µs (our) vs 108µs (cref) and  34µs (avx2ref)

The biggest room for improvement is using AVX2 to compute four parallel SHAKE streams to derive the matrix A. (Mat.Derive)

To do

  • Add public API
  • Compare performance against C implementation
  • Support AES-based CRH
  • Compare against NIST test vectors
  • Cache A, NTT(s1), ... in public/private key structs.
  • Implement crypto.Signer.
  • Check some implementation details
  • Document Make/UseHint
  • Check whether the optimized checks on the signature (as also used
    in the reference implementation) correspond to those prescribed in
    the specification.

Nice to have

  • Add AVX2 optimized NTT
  • Fuzzing
  • Use internal/test where appropriate.

Investigate

  • Check how stupid the current Go compiler is --- it might be worthwhile
    to do one of the following.
    • Add explicit constants for literals like 1<<(D-1)
    • Manually unroll loops in Poly, VecK and VecL
    • Eliminate branches
  • Check whether Public Key has multiple valid packings. (No.)

@bwesterb
Copy link
Member Author

( @armfazh Not yet ready for review. )

@bwesterb bwesterb force-pushed the dilithium branch 3 times, most recently from 62a2da3 to c692ce5 Compare February 12, 2020 15:07
Copy link

@wbl wbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably cherrypick this one out

@bwesterb bwesterb force-pushed the dilithium branch 3 times, most recently from 4ae9fb8 to 60cc5e3 Compare March 12, 2020 14:34
@bwesterb
Copy link
Member Author

Added support for AES. Rearranged files a bit.

@bwesterb bwesterb force-pushed the dilithium branch 2 times, most recently from dfcb2e3 to 98731f4 Compare March 18, 2020 14:22
@bwesterb bwesterb changed the title [WIP] Support for CRYSTALS-Dilithium (NIST PQC Round2) Support for CRYSTALS-Dilithium (NIST PQC Round2) Mar 18, 2020
@codecov-io
Copy link

codecov-io commented Mar 18, 2020

Codecov Report

Merging #87 into master will decrease coverage by 2.93%.
The diff coverage is 60.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #87      +/-   ##
==========================================
- Coverage   79.59%   76.66%   -2.94%     
==========================================
  Files          61      107      +46     
  Lines        5656    11815    +6159     
==========================================
+ Hits         4502     9058    +4556     
- Misses       1107     2573    +1466     
- Partials       47      184     +137     
Impacted Files Coverage Δ
sign/dilithium/internal/aes.go 0.00% <0.00%> (ø)
sign/dilithium/internal/pack.go 0.00% <0.00%> (ø)
sign/dilithium/mode1/internal/dilithium.go 73.11% <ø> (ø)
sign/dilithium/mode1/internal/mat.go 100.00% <ø> (ø)
sign/dilithium/mode1/internal/pack.go 60.95% <ø> (ø)
sign/dilithium/mode1/internal/sample.go 80.85% <ø> (ø)
sign/dilithium/mode1/internal/vec.go 92.10% <ø> (ø)
sign/dilithium/mode1aes/internal/dilithium.go 72.45% <ø> (ø)
sign/dilithium/mode1aes/internal/mat.go 100.00% <ø> (ø)
sign/dilithium/mode1aes/internal/pack.go 60.95% <ø> (ø)
... and 82 more

@bwesterb
Copy link
Member Author

@armfazh Ready for initial review.

@armfazh armfazh self-requested a review March 19, 2020 00:18
Copy link

@wbl wbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few parts I didn't understand but it looks good

sign/dilithium/internal/field.go Show resolved Hide resolved
sign/dilithium/internal/field.go Outdated Show resolved Hide resolved
sign/dilithium/internal/ntt.go Show resolved Hide resolved
Copy link
Contributor

@armfazh armfazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general comment is to ensure that dilithium package provides security levels specified by a name/identifier.
This can be achieved by a struct that contains the parameters of each instance. So the implementation runs generally using the parameters provided by the struct.

sign/dilithium/README.md Outdated Show resolved Hide resolved
sign/dilithium/internal/aes.go Show resolved Hide resolved
sign/dilithium/internal/aes.go Outdated Show resolved Hide resolved
sign/dilithium/internal/aes.go Show resolved Hide resolved
sign/dilithium/internal/field.go Show resolved Hide resolved
sign/dilithium/mode1/dilithium.go Show resolved Hide resolved
sign/dilithium/mode1/dilithium.go Show resolved Hide resolved
sign/dilithium/mode3/internal/dilithium_test.go Outdated Show resolved Hide resolved
g.update(nil)
}

func TestPQCgenKATSign(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When (for some reason) this test fails, it would be difficult to know what was the expected output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, then it fails for, say, some signature. If you just get the bad signature, you're still stuck. Why is the signature bad? So we'd have to add debug statements for the intermediate values, etc. We'd need to dig down anyway then, so unfolding this first step already does not make a big difference in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, as is, there are some other reasons for which the hash could not match.
For example, the KAT vector could differ in its format.
Also, it would be good to include the original test vector file in the /testdata folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the KAT vector could differ in its format.

Of course, but that still would require some digging. I would suspect the CI to truncate the difference between the expected values anyways.

Also, it would be good to include the original test vector file in the /testdata folder.

The reference implementation does not include these test vectors — it also generates them on the fly.

@bwesterb
Copy link
Member Author

bwesterb commented Apr 22, 2020

My general comment is to ensure that dilithium package provides security levels specified by a name/identifier.

I've added a top-level interface package to do this. I also keep the separate subpackages for the separate modes as this make quite a significant performance difference (due to the K and L parameters determining the size of vectors and matrices.)

Copy link
Contributor

@armfazh armfazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra comments.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
}

// Mode is a certain configuration of the Dilithium signature scheme.
type Mode interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file should be on the top package, so anyone can see the functionality that a Mode provides.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've liked that as well, but then there is a cyclic dependency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-level package needs to import the mode1, mode2, etc. subpackages. These subpackages themselves then would need to import the top-level package for the interfaces.

sign/dilithium/mode/mode.go Outdated Show resolved Hide resolved

// NewKeyFromSeed derives a public/private key pair using the given seed.
// Panics if len(seed) != SeedSize()
NewKeyFromSeed(seed []byte) (PublicKey, PrivateKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious if there is needed a method that returns the seed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand. A method on the PrivateKey? Or do you mean why this function exists at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the following scenario:
Alice generates a key pair using GenerateKey(nil) -> sk,pk.
Then, Alice knows that she can get the same keys using NewKeyFromSeed if the seed is provided.
So, how does she can retrieve the seed previously generated in order to use NewKeyFromSeed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is a bit tricky.

Preferably we would add a func Seed() [SeedSize]byte function on PrivateKey to return the seed that was used to generate it. But that requires us to store the seed inside the private key. However, the binary format of the Dilithium private key does not store the seed, so we can't store it. We could store it in the struct anyway, but then we have a Seed() function which only works in certain situations (namely when it has just been created instead of unmarshaled).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the solution is to hide NewKeyFromSeed from the API.

@bwesterb
Copy link
Member Author

bwesterb commented May 2, 2020

@armfazh Can you have another look?

@armfazh
Copy link
Contributor

armfazh commented May 2, 2020

Did you integrate some of the suggested changes proposed here?

Hope it helps you to better organize the code. For example, the code for running KAT tests can reside in a single place, same reasoning applies for other parts of the code.

Ideally, I would like to get ride of having duplicated files per mode.

@bwesterb
Copy link
Member Author

bwesterb commented May 3, 2020

Did you integrate some of the suggested changes proposed here?

Oh sorry, I missed that. Will take a look tomorrow.

@bwesterb
Copy link
Member Author

bwesterb commented May 7, 2020

@armfazh Most of the duplicated code is now generated by gen.go. I also got rid of the separate mode package using your suggestion. This PR now includes the AVX2 optimized NTT.

@bwesterb
Copy link
Member Author

bwesterb commented May 7, 2020

@wbl Perhaps you would like to have another look as several things have changed after you reviewed this.

go.mod Outdated Show resolved Hide resolved
sign/dilithium/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@armfazh armfazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to merge this soon.
could you please solve only this issues (so we can proceed to the merge) and any other changes or improvements can reside in subsequent PRs.

Name() string
}

var modes = make(map[string]Mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to query a mode by ID (an integer) rather than by a string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone knows at compile-time which mode they want, they can use the global such as dilithium.Mode3AES or simply the subpackage dilithium/mode3.

If it only becomes known at compile-time, then the program only has the name of the mode to work with.

Or would you prefer a switch on the string instead of a map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is with using a string rather than a simple integer.
the map is ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But which integers? There are no standardised IDs for the modes of Dilithium and if we're adding constants for them a user might just as well simply use the globals for each mode.

sign/dilithium/gen.go Show resolved Hide resolved
sign/dilithium/gen.go Outdated Show resolved Hide resolved
sign/dilithium/internal/field.go Show resolved Hide resolved

func TestMakeHint(t *testing.T) {
if !*runVeryLongTest {
t.SkipNow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.SkipNow()
t.Log("Skipped one long test, add -long flag to run longer tests")
t.SkipNow()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test takes several hours to run and so uses the -very-long flag. You're sure we want to advertise it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the added value to run 1h tests?

Copy link
Member Author

@bwesterb bwesterb May 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered adding a fuzzer for the tricky makeHint() and useHint() functions, but noticed that the amount of inputs are small enough to simply test them all. This function does that: it tests every possible input for makeHint() and checks whether it works as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still an issue?

sign/dilithium/internal/params/params.go Outdated Show resolved Hide resolved
sign/dilithium/mode1/dilithium.go Show resolved Hide resolved
sign/dilithium/mode1/dilithium.go Show resolved Hide resolved
sign/dilithium/mode1/internal/params_test.go Outdated Show resolved Hide resolved
sign/dilithium/mode1/internal/params_test.go Outdated Show resolved Hide resolved
@bwesterb
Copy link
Member Author

bwesterb commented May 9, 2020

I would like to merge this soon.
could you please solve only this issues (so we can proceed to the merge) and any other changes or improvements can reside in subsequent PRs.

Ok, I replied to all issues.

Copy link
Contributor

@armfazh armfazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments

go.sum Outdated Show resolved Hide resolved
sign/dilithium/mode1.go Outdated Show resolved Hide resolved
sign/dilithium/mode3/internal/dilithium.go Show resolved Hide resolved
@bwesterb
Copy link
Member Author

bwesterb commented May 11, 2020

@armfazh The CLI fails now because I reverted go.mod and go.sum as you requested.

(Also see mmcloughlin/avo#147 )

https://pq-crystals.org/dilithium/data/dilithium-specification-round2.pdf

This implementation is based on the spec with the reference
implementation used as, what is in a name, a reference.  It turns out
to be pretty close to the reference implementation.  I did not
simply mimic their choices, but checked them (for instance, whether
coefficients stay bounded.)  I did not find an issue except for
some errors in comments.

If available, AVX2 is used to speed up the NTT.  Our AVX2 NTT and
inverse-NTT implementations are different from the optimized reference
implementation.  I chose to use a simpler "scheduling" for ease of
review.  Our NTT is a tiny bit faster, but our inverse-NTT is a bit slower.

There is still room for improvement: the ExpandA (i.e. Mat.Derive)
and ExpandY (i.e. PolyDeriveUniformLeGamma1) can be optimized with AVX2.

There is a considerable amount of precomputation that can be shared
between the computation of multiple sigantures with the same private
key.  We do this precomputation when unpacking the private key and store
it in the PrivateKey struct.   Same applies for verification of multiple
signatures for a public key.

Performance (coffee lake, 8569U) for Dilithium mode 3 using AVX2.

    BenchmarkSkUnpack-8            	   18980	     61840 ns/op
    BenchmarkPkUnpack-8            	   19743	     60087 ns/op
    BenchmarkVerify-8              	   39998	     30421 ns/op
    BenchmarkSign-8                	    5420	    228631 ns/op
    BenchmarkGenerateKey-8         	   12037	     99672 ns/op
    BenchmarkPublicFromPrivate-8   	  107148	     11512 ns/op

Non-optmized C reference implementation

    BenchmarkKeygen-8   	   10000	    109216 ns/op
    BenchmarkSign-8     	    2206	    545616 ns/op
    BenchmarkVerify-8   	   10000	    108419 ns/op

AVX2 optimized C reference implementation

    BenchmarkKeygen-8   	   36510	     32242 ns/op
    BenchmarkSign-8     	   10000	    117060 ns/op
    BenchmarkVerify-8   	   31605	     34123 ns/op

To make a fair comparisson (due to the precomputations), one needs to
take Sign+SkUnpack and Verify+PkUnpack together.  So that is:

    keygen   99µs (our) vs 109µs (cref) and  32µs (avx2ref)
    signing 290µs (our) vs 546µs (cref) and 117µs (avx2ref)
    verif    60µs (our) vs 108µs (cref) and  34µs (avx2ref)
@bwesterb
Copy link
Member Author

I've squashed all commits, created a new commit message and rebased on upstream master.

@armfazh
Copy link
Contributor

armfazh commented May 11, 2020

The CLI fails now because I reverted go.mod and go.sum as you requested.

For circl, I solved this by updating Makefile as:

generate: clean
	$(GO) generate -v ./...
	$(GO) mod tidy

Try this, hope it makes CI happier.

Makefile Show resolved Hide resolved
@armfazh armfazh self-requested a review May 11, 2020 19:28
Copy link
Contributor

@armfazh armfazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments.

@armfazh armfazh merged commit 52b0d7a into cloudflare:master May 11, 2020
@bwesterb
Copy link
Member Author

Thanks for addressing the comments.

Of course! Thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants