-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute basis in parallel #42
Comments
Hi, currently there is no support for parallel computation (but it is in Todo list 😄 ) |
@sumiya11 thanks for your answer! |
Hi @iliailmer , if you are still interested, there is now pr #46 for parallel gb over rationals, results there look quite promising. Let me know if you have some specific challenging systems over rationals where you want to compute a Groebner basis. If you have the ability to share them, it could perhaps help me optimize parallel computation a bit |
@sumiya11 Thank you for letting me know! Here is a link to the gist with 3 systems that are pretty challenging Please let me know if the link works for you. |
Thanks very much for sharing! These are over I wonder if for these systems computing the GB of the original problem over rationals would also make sense ? |
oh, I think it's my mistake, I think I copied the incorrect definition. I will change them |
@sumiya11 here is the updated link https://gist.github.com/iliailmer/2a8b0a4a689c755a65dba638dc205a49 |
Hi @iliailmer , Groebner.jl v0.6+ has an option to use multi-threading internally. groebner(sys, threaded=:yes) Note that it is quite experimental still, you may observe no speedup for smaller problems. :^) |
Also, I have a question, is it correct that systems that you have shared above are similar to the ones from the paper "Obtaining weights for Gröbner basis computation in parameter identifiability problems" ? Thanks! |
Thank you.
Yes, that's right. |
Cool. I use the systems that you provided for benchmarking, and I wish to mention this fact in my preprint about Groebner.jl, would it be alright with you? I have another question: would it be possible to obtain the other systems from that paper, such as HPV, COVID m1, etc? |
Thank you a lot! Just in case, I made a bundle of systems from the available examples in SIAN: https://github.com/sumiya11/Groebner.jl/tree/master/benchmark/generate/benchmark_systems/SIAN |
Is there a limitation to the number of threads? I tried cyclic9 with julia -t16 julia_c9 where the file julia_c9 content is below. The message "[ Info: Using 16 threads." is displayed, but checking with top I only get CPU %800.
Also do you have any idea of the RAM required to run cyclic9 or any Groebner basis? I'm not a julia user, just interested in comparing your code with my own gbasis code inside giac. It seems julia does memory allocation with garbage collecting,therefore it's somewhat hard to know the real memory required, I see with top about 20G of RAM for cyclic9 (RES, 30G VIRT), but it might be an overestimation. |
Hi @parisseb , There is a limitation on the number of threads: at most 8 threads are used during the first multi-modular batches. This is done 1) to not overshoot the required number of primes by a lot, and 2) bearing in mind that many threads on a single process may be not very Julia-friendly because of many memory allocations and garbage collection.
I also observe maximum RSS about 26 GB for cyclic9.
That would be very interesting to see! Please do let me know what you find out :-). Feel free to let me know if you have other questions. |
For what it's worth, cyclic9 takes 33 minutes on i9-13900 with 8 threads with Groebner.jl |
My best timing for cyclic9 with giac (on a Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz) is 1300 seconds with 32 threads (16*2 threads for 16 parallel modular computations and 2 threads by modular computation). Total memory used with giac is 8.3G (giac has optimizations for memory usage). Groebner.jl takes 2050s with 8 threads on the same server, and 17.3G of RAM. This is not representative: while RAM usage is in favor of giac in all situtations speed is not, your code is most of the time faster, that's the reason why I'm interested to compare. I found some tricks that I can probably implement in giac. The first one is what you call probabilistic linear algebra for modular Groebner basis, cyclic9 modulo is more than 2* faster than with linalg=:deterministic with Groebner.jl. The next trick is vectorization for rational Groebner basis computation, by computing 4 primes at the same time, you are taking full advantage of SIMD instructions, without paying too much for memory. I will certainly have a look if I can improve giac speed with SIMD instructions... If you are interested in giac (e.g. to include it in your very nice automatic benchmarking system), the latest source code is available here |
@parisseb Thanks for the info! I have not used giac before, I will check it out.
Do I understand you correctly that giac computes for different primes in parallel, and also at the same time parallelizes some parts (linear algebra?) for each prime ? BTW, with 8 threads, Groebner.jl uses batches of size 32 (8 threads * 4 primes at a time), this could maybe explain high memory usage.
This is a very cool trick indeed. You can also find a description of this probabilistic linear algebra in Section 2 of "An Algorithm For Splitting Polynomial Systems Based On F4" by Monagan & Pearce. I might also have a couple of questions about F4 in giac, could you tell me where it is better to ask them ? |
Yes; that's what I do. Linear algebra is parallelized, this does not cost much additionnal memory (a little bit for preallocation to avoid memory locks between threads).
From my understanding, the largest matrix you build for cyclic9 has 100e6 non zero coefficients, and in what you call part A/B of the matrix, there are repetitions, it's mainly in part D where all coefficients are distincts, here you have 20M non zero coefficients at most. That's 160M bytes for 1 prime (4+4 bytes per coeff), or 360M bytes for 4 primes (4+32 bytes per coeff). Add about 400M bytes for the indices of the whole matrix, we should have about 800M for 1 thread for this matrix, say 1G by thread, 8G for 8 threads + additional memory for chinese remaindering and storing the basis, you can probably optimize a little bit memory footprint,perhaps to 10G.
This is nice, but it won't help much for rational coefficients, because all blocks reducing to 0 will already be cancelled by learning.
On Xcas forum https://xcas.univ-grenoble-alpes.fr/forum/, or by email, or we can open an comparison issue here. |
Now Groebner.jl by default uses multithreading if julia is started with --threads=N, N > 1 |
Hello! I was wondering if
Groebner.jl
currently supports parallel computation for Groebner basis?The text was updated successfully, but these errors were encountered: