Fortran code and python runners for SPH
This code was written by me during PhD at MoCA.
The code meant to be self-explanatory.
Clearly, it is not.
To write readable python-like-almost-human-like code in runners.
All runners are based on playground/runner/fortdriver.
- Fortdriver reads all the constants and enumerators from const.f90
- Fortdriver compiles SPH engine
- Fortdriver run the initial setup of SPH engine with respect to
Context.setup
- Fortdriver may access all the results of the run and modify dumps as you like
- Additional runs after dumps were modified? Sure thing.
Typical examples are convergence of aisotropic diffusion on a glass like latice for different methods and initial conditions (paper.pdf) and flux-limited diffusion radiation on a shock tube.
It is possible to run the code without FortDriver in backward compatibility mode, but it requires a lot of hardcoded lines and ambiguous execute commands, for example
>> make kernel=m6 useomp=t debug=f
>> ./execute --xmax 1.0 --nsnapshots 10.0 --silent no --au 1.0
--equations hydro --influencefile influence.info
--xmin -1.0 --artts yes --adden yes --resolution 512
--hfac 1.0 --coordsys cartesian --ics shock12
--resultfile result.info --process relaxation
--usedumps yes --dim 2.0 --tfinish 20.0 --ddw fab
- N-deminsional SPH in two loops
- bunch of kernels that can be chosen on compile time only (FortDriver can handle this) (playground/src/kernel)
- bunch of second derivative methods that can be chosen on run time and implemented with the function pointers (playground/src/kernel/kernel.f90)
- close-packed, glass-like and, of course, uniform initial latices (playground/src/IC)
- bunch of specific initial conditions, which are deprecated as it is the age of FortDriver now (playground/src/IC/IC.f90)
- own implementations of appendable arrays, (python-like) lists, dictionaries and others
- free, periodic, mirroring and fixed border conditions (playground/src/BC.f90)
- hydrodynamics, magnetohydrodynamics, diffusion, radiative hydrodynamics (playground/src/circuit2.f90)
- statefull hdf5 dumps and the restore of a stopped/modified simmulation (playground/src/state.f90)
- Leapfrog and Super-Time-Stepping integrators (playground/src/sts_integrator.f90)
- KDTree based neighbour search (playground/src/utils/neighboursearch.f90)
- cartesian and cilindrical coordinate systems
The code was implemented only for research purposes.
There are plenty more fish in the sea.
M_{6/2} kernel for Phantom
sph_playground/kernel_zipm6.f90
A simple test which checks that in Fortran
it is indeed up to 30%
faster to use *0.25
rather than \4.
sph_playground/test_4vs025.f90
Compiled with gfortran -O4 -Wall test_4vs025.f90 -fopenmp -o test_4vs025
sph_playground/test_soa_aos.f90
Compiled with gfortran -O4 -Wall test_soa_aos.f90 -fopenmp -o test_soa_aos
This test is simple, which is why the code is just a horrible boilerplate.
The idea of the test is to determine which memory allignment is the fastest for SPH in Fortran.
We iterate over the array of "particles" with the number of particles N=10^4...10^12.
For each particle, we store mass, force, acceleration, velocity, and position in 3D. All are randomly generated numbers outside of the time measure regions.
For each particle we do some "physics" (by the way this is the example of the winner -- AOS)
P(i).Acceleration.x = P(i).Force.x / P(i).Mass
P(i).Acceleration.y = P(i).Force.y / P(i).Mass
P(i).Acceleration.z = P(i).Force.z / P(i).Mass
P(i).Velocity.x = P(i).Velocity.x + P(i).Acceleration.x * dt
P(i).Velocity.y = P(i).Velocity.y + P(i).Acceleration.y * dt
P(i).Velocity.z = P(i).Velocity.z + P(i).Acceleration.z * dt
P(i).Position.x = P(i).Position.x + P(i).Velocity.x * dt
P(i).Position.y = P(i).Position.y + P(i).Velocity.x * dt
P(i).Position.z = P(i).Position.z + P(i).Velocity.z * dt
The main reason for such "physics" is to enforce the interconnection of preperties and simultaneous read/write operations for each particle.
We check sequential i=1...N
and random accesses. For random access, we create a random array of indexes random(indexes[1...N])
that need to be updated during the main loop. After that, we iterate over the indexes in the array. The number of operations is O(N)
again.
The memory alignments are:
SOA (1 loop) -- Structure Of Arrays.
Storage {
mass(1...N)
force_x(1...N)
...
position_z(1...N)
}
SOA (3 loops) -- same as SOA (1 loop), but we used 3 smaller main loops in particle update. Because why not.
AOS -- Array Of Structures.
Storage(1...N) = [
Particle {
mass
force {
x
y
z
}
...
position {
...
}
}
]
NDA -- N-N-Dimensional Arrays
Acceleration(x...z, 1...N) = Force(x...z, 1...N)/Mass(1...N)
Velocity(x...z, 1...N) = Velocity(x...z, 1...N) + Acceleration(x...z, 1...N)*dt
Position(x...z, 1...N) = Position(x...z, 1...N) + Velocity(x...z, 1...N)*dt
SAR -- Super-Array, where the second index is the number of a particle and the first index is the number of a property (mass, force_x, ... position_z)
Storage(1...13, 1...N)