Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute basis in parallel #42

Open
iliailmer opened this issue Aug 6, 2022 · 20 comments
Open

Compute basis in parallel #42

iliailmer opened this issue Aug 6, 2022 · 20 comments

Comments

@iliailmer
Copy link

Hello! I was wondering if Groebner.jl currently supports parallel computation for Groebner basis?

@sumiya11
Copy link
Owner

sumiya11 commented Aug 6, 2022

Hi, currently there is no support for parallel computation (but it is in Todo list 😄 )

@iliailmer
Copy link
Author

@sumiya11 thanks for your answer!

@sumiya11
Copy link
Owner

sumiya11 commented Feb 2, 2023

Hi @iliailmer , if you are still interested, there is now pr #46 for parallel gb over rationals, results there look quite promising.

Let me know if you have some specific challenging systems over rationals where you want to compute a Groebner basis. If you have the ability to share them, it could perhaps help me optimize parallel computation a bit

@iliailmer
Copy link
Author

@sumiya11 Thank you for letting me know! Here is a link to the gist with 3 systems that are pretty challenging
https://gist.github.com/iliailmer/d42db418229ced3510fb78647d475be3

Please let me know if the link works for you.

@sumiya11
Copy link
Owner

sumiya11 commented Feb 2, 2023

Thanks very much for sharing! These are over GF, but my multi-threading is designed only over rationals :(

I wonder if for these systems computing the GB of the original problem over rationals would also make sense ?

@iliailmer
Copy link
Author

oh, I think it's my mistake, I think I copied the incorrect definition. I will change them

@iliailmer
Copy link
Author

@sumiya11 here is the updated link https://gist.github.com/iliailmer/2a8b0a4a689c755a65dba638dc205a49

@sumiya11
Copy link
Owner

Hi @iliailmer , Groebner.jl v0.6+ has an option to use multi-threading internally.
For now, it is turned off by default, but you can enable it like this:

groebner(sys, threaded=:yes)

Note that it is quite experimental still, you may observe no speedup for smaller problems. :^)

@sumiya11
Copy link
Owner

Also, I have a question, is it correct that systems that you have shared above are similar to the ones from the paper "Obtaining weights for Gröbner basis computation in parameter identifiability problems" ? Thanks!

@iliailmer
Copy link
Author

Thank you.

Also, I have a question, is it correct that systems that you have shared above are similar to the ones from the paper "Obtaining weights for Gröbner basis computation in parameter identifiability problems" ? Thanks!

Yes, that's right.

@sumiya11
Copy link
Owner

Cool. I use the systems that you provided for benchmarking, and I wish to mention this fact in my preprint about Groebner.jl, would it be alright with you?

I have another question: would it be possible to obtain the other systems from that paper, such as HPV, COVID m1, etc?
For example, if there is a script that generates these equations given a state-space model, I could try to produce the systems myself. Thanks !

@iliailmer
Copy link
Author

iliailmer commented Jan 11, 2024

@sumiya11 thats fine with me

you can use SIAN in maple or its Julia version to obtain those (the system in the source code of both is called Et_hat)

Edit: there are also files here

Hope this helps!

@sumiya11
Copy link
Owner

Thank you a lot! Just in case, I made a bundle of systems from the available examples in SIAN:

https://github.com/sumiya11/Groebner.jl/tree/master/benchmark/generate/benchmark_systems/SIAN

@parisseb
Copy link

Is there a limitation to the number of threads? I tried cyclic9 with julia -t16 julia_c9 where the file julia_c9 content is below. The message "[ Info: Using 16 threads." is displayed, but checking with top I only get CPU %800.
`using Groebner, AbstractAlgebra, Base.Threads
@info "Using $(nthreads()) threads."
R, (x1, x2, x3,x4,x5,x6,x7,x8,x9) = QQ["x1","x2","x3","x4","x5","x6","x7","x8","x9"]
c9=[x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9, x1x2 + x2x3 + x3x4 + x4x5 + x5*x6 \

  • x6x7 + x7x8 + x1x9 + x8x9, x1x2x3 + x2x3x4 + x3x4x5 + x4x5x6 + x5x6x7 \
  • x6x7x8 + x1x2x9 + x1x8x9 + x7x8x9, x1x2x3x4 + x2x3x4x5 + x3x4x5x6 +
    x4
    x5x6x7 + x5x6x7x8 + x1x2x3x9 + x1x2x8x9 + x1x7x8x9 + x6x7x8x9, x1
    x2x3
    x4x5 + x2x3x4x5x6 + x3x4x5x6x7 + x4x5x6x7x8 + x1x2x3x4x9 + x1
    x2x3x8x9 + x1x2x7x8x9 + x1x6x7x8x9 + x5x6x7x8x9, x1x2x3x4x5x6 + x2
    x3x4x5x6x7 + x3x4x5x6x7x8 + x1x2x3x4x5x9 + x1x2x3x4x8x9 + x1x2x3
    x7x8x9 + x1x2x6x7x8x9 + x1x5x6x7x8x9 + x4x5x6x7x8x9, x1x2x3x4x5*
    x6x7 + x2x3x4x5x6x7x8 + x1x2x3x4x5x6x9 + x1x2x3x4x5x8x9 + x1x2x3
    x4x7x8x9 + x1x2x3x6x7x8x9 + x1x2x5x6x7x8x9 + x1x4x5x6x7x8x9 + x3
    x4x5x6x7x8x9, x1x2x3x4x5x6x7x8 + x1x2x3x4x5x6x7x9 + x1x2x3x4x5
    x6x8x9 + x1x2x3x4x5x7x8x9 + x1x2x3x4x6x7x8x9 + x1x2x3x5x6x7x8*x9\
  • x1x2x4x5x6x7x8x9 + x1x3x4x5x6x7x8x9 + x2x3x4x5x6x7x8x9, x1x2*
    x3x4x5x6x7x8x9 - 1]
    @time G=groebner(c9,ordering=DegRevLex(),threaded=:yes);
    `

Also do you have any idea of the RAM required to run cyclic9 or any Groebner basis? I'm not a julia user, just interested in comparing your code with my own gbasis code inside giac. It seems julia does memory allocation with garbage collecting,therefore it's somewhat hard to know the real memory required, I see with top about 20G of RAM for cyclic9 (RES, 30G VIRT), but it might be an overestimation.

@sumiya11
Copy link
Owner

sumiya11 commented Feb 17, 2024

Hi @parisseb ,

There is a limitation on the number of threads: at most 8 threads are used during the first multi-modular batches.
Eventually, after several iterations, the batch size increases (so that >8 threads become relevant).

This is done 1) to not overshoot the required number of primes by a lot, and 2) bearing in mind that many threads on a single process may be not very Julia-friendly because of many memory allocations and garbage collection.

Also do you have any idea of the RAM required to run cyclic9 or any Groebner basis?

I also observe maximum RSS about 26 GB for cyclic9.
Edit: I use the command Sys.maxrss() in Julia.
Though I note that we haven't measured memory usage systematically yet (or optimized for it).

I'm not a julia user, just interested in comparing your code with my own gbasis code inside giac.

That would be very interesting to see! Please do let me know what you find out :-).

Feel free to let me know if you have other questions.

@sumiya11
Copy link
Owner

sumiya11 commented Feb 17, 2024

For what it's worth, cyclic9 takes 33 minutes on i9-13900 with 8 threads with Groebner.jl

@parisseb
Copy link

My best timing for cyclic9 with giac (on a Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz) is 1300 seconds with 32 threads (16*2 threads for 16 parallel modular computations and 2 threads by modular computation). Total memory used with giac is 8.3G (giac has optimizations for memory usage). Groebner.jl takes 2050s with 8 threads on the same server, and 17.3G of RAM. This is not representative: while RAM usage is in favor of giac in all situtations speed is not, your code is most of the time faster, that's the reason why I'm interested to compare.

I found some tricks that I can probably implement in giac. The first one is what you call probabilistic linear algebra for modular Groebner basis, cyclic9 modulo is more than 2* faster than with linalg=:deterministic with Groebner.jl. The next trick is vectorization for rational Groebner basis computation, by computing 4 primes at the same time, you are taking full advantage of SIMD instructions, without paying too much for memory. I will certainly have a look if I can improve giac speed with SIMD instructions...

If you are interested in giac (e.g. to include it in your very nice automatic benchmarking system), the latest source code is available here
https://www-fourier.univ-grenoble-alpes.fr/~parisse/giac/giac-1.9.0.tar.bz2
(I also maintain a deb package frequently and popular Linux distros have also packages but they are sometimes outdated, but compilation of giac should be straightforward).
There are some gbasis benchmarks files in examples/groebner, in short the syntax is very similar to (old) maple syntax (with :; instead of : for running a command without display). Multi-threading is controlled by commands like threads:=16; gbasis_simult_primes(8); (means 16 threads are used, 8 parallel modular computations, 16/8=2 threads per prime).
You can get intermediate info logging with commands like debug_infolevel:=1 or 2 or ... (or from the shell export GIAC_DEBUG=1).

@sumiya11
Copy link
Owner

@parisseb Thanks for the info! I have not used giac before, I will check it out.

My best timing for cyclic9 with giac (on a Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz) is 1300 seconds with 32 threads (16*2 threads for 16 parallel modular computations and 2 threads by modular computation).

Do I understand you correctly that giac computes for different primes in parallel, and also at the same time parallelizes some parts (linear algebra?) for each prime ?

BTW, with 8 threads, Groebner.jl uses batches of size 32 (8 threads * 4 primes at a time), this could maybe explain high memory usage.

The first one is what you call probabilistic linear algebra for modular Groebner basis, cyclic9 modulo is more than 2* faster than with linalg=:deterministic with Groebner.jl.

This is a very cool trick indeed. You can also find a description of this probabilistic linear algebra in Section 2 of "An Algorithm For Splitting Polynomial Systems Based On F4" by Monagan & Pearce.

I might also have a couple of questions about F4 in giac, could you tell me where it is better to ask them ?

@parisseb
Copy link

Do I understand you correctly that giac computes for different primes in parallel, and also at the same time parallelizes some parts (linear algebra?) for each prime ?

Yes; that's what I do. Linear algebra is parallelized, this does not cost much additionnal memory (a little bit for preallocation to avoid memory locks between threads).

BTW, with 8 threads, Groebner.jl uses batches of size 32 (8 threads * 4 primes at a time), this could maybe explain high memory usage.

From my understanding, the largest matrix you build for cyclic9 has 100e6 non zero coefficients, and in what you call part A/B of the matrix, there are repetitions, it's mainly in part D where all coefficients are distincts, here you have 20M non zero coefficients at most. That's 160M bytes for 1 prime (4+4 bytes per coeff), or 360M bytes for 4 primes (4+32 bytes per coeff). Add about 400M bytes for the indices of the whole matrix, we should have about 800M for 1 thread for this matrix, say 1G by thread, 8G for 8 threads + additional memory for chinese remaindering and storing the basis, you can probably optimize a little bit memory footprint,perhaps to 10G.

The first one is what you call probabilistic linear algebra for modular Groebner basis, cyclic9 modulo is more than 2* faster than with linalg=:deterministic with Groebner.jl.

This is a very cool trick indeed. You can also find a description of this probabilistic linear algebra in Section 2 of "An Algorithm For Splitting Polynomial Systems Based On F4" by Monagan & Pearce.

This is nice, but it won't help much for rational coefficients, because all blocks reducing to 0 will already be cancelled by learning.

I might also have a couple of questions about F4 in giac, could you tell me where it is better to ask them ?

On Xcas forum https://xcas.univ-grenoble-alpes.fr/forum/, or by email, or we can open an comparison issue here.

@sumiya11
Copy link
Owner

Now Groebner.jl by default uses multithreading if julia is started with --threads=N, N > 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants