Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing partiticles from set of ID's #21

Closed
KseniaBastrakova opened this issue Sep 22, 2020 · 47 comments
Closed

Reducing partiticles from set of ID's #21

KseniaBastrakova opened this issue Sep 22, 2020 · 47 comments

Comments

@KseniaBastrakova
Copy link
Collaborator

KseniaBastrakova commented Sep 22, 2020

Now, the library supports only the selection of particle species for reduction. There was a request to add functionality that will reduce the number of particles only from the selected set.

As a small feature, it can be additional script, that the script takes as an argument file with a set of particle indexes (in .dat format) with indexes of particles to which the reduction should be applied. The algorithm only applies to particles from the openPMDfile with the selected indices

@alex-koe
Copy link

After offline disscussion with @KseniaBastrakova @prometeusPi, i using this approach:

  1. copy original file from PIConGPU as it worked in Minimum working example #22
  2. open the copied (!) file RW, i.e., with python ('r+')
  3. Filtering only the wanted particles, e.g., bunch particles in the FWHM of energy
    • This means, for the datasets
      {"particleId", "weighting",
      "momentum/x", "momentum/y", "momentum/z",
      "momentumPrev1/x", "momentumPrev1/y", "momentumPrev1/z",
      "position/x", "position/y", "position/z",
      "positionOffset/x", "positionOffset/y", "positionOffset/z"}
      delete all particles except for the wanted particles
    • make sure that the attributes of above datasets are preserved
  4. Update the shape attribute for mass and charge to the new particle number
  5. close file and make sure that data is sucessfully written

Now i submitted a jobs as mentioned in #22 .
It failed with:

Series.set_software_version is deprecated. Set the version with the second argument of Series.set_software
I == 0
I == 1
I == 2
I == 3
I == 4
I == 5
I == 6
I == 7
I == 8
I == 9
I == 10
I == 11
I == 12
I == 13
I == 14
I == 15
I == 16
I == 17
I == 18
I == 19
I == 20
I == 21
I == 22
I == 23
I == 24
I == 25
I == 26
I == 27
I == 28
I == 29
I == 30
I == 31
I == 32
I == 33
I == 34
I == 35
I == 36
I == 37
I == 38
I == 39
I == 40
I == 41
I == 42
I == 43
I == 44
I == 45
I == 46
Traceback (most recent call last):
  File "reduction_main.py", line 822, in <module>
    base_reduction_function(args.hdf, args.hdf_re, "random", parameters, args.iteration)
  File "reduction_main.py", line 220, in base_reduction_function
    reduce_one_iteration(series_hdf, series_hdf_reduction, algorithm, iteration)
  File "reduction_main.py", line 206, in reduce_one_iteration
    process_iteration_group(algorithm, current_iteration, series_hdf, series_hdf_reduction, reduction_iteration)
  File "reduction_main.py", line 240, in process_iteration_group
    process_patches_in_group_v2(iteration.particles[name_group], series_hdf,
  File "reduction_main.py", line 686, in process_patches_in_group_v2
    data, dict_data_indexes, last_idx = get_data(series_hdf, particle_species, weights, idx_start, idx_end)
  File "reduction_main.py", line 642, in get_data
    if len(absolute_values[0]) == 1:
IndexError: index 0 is out of bounds for axis 0 with size 0

@alex-koe
Copy link

I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?

@KseniaBastrakova
Copy link
Collaborator Author

I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?

yes, I think, you missed "unit_si" or "macroweighted" attribute
Could you send me a link to the file? It will be helpful to reproduce

@alex-koe
Copy link

This is the path to the file on taurus
/scratch/ws/1/s6119265-simdata-exchange where you will find the file simData_filtered_60000.h5. This is the file i got the error with.
I dont know if this is the good file exchange but i hope it works. Do you have a taurus account?

@sbastrakov
Copy link
Member

Thanks @alex-koe .
Unfortunately neither @KseniaBastrakova nor I have an account there. So could you transfer to Hemera please?

@alex-koe
Copy link

@sbastrakov the file is now on /bigdata/hplsim/production/LWFA_betatron_decoherence/ on hemera. It should be readable. Please copy the file for your usage.

@KseniaBastrakova
Copy link
Collaborator Author

@sbastrakov the file is now on /bigdata/hplsim/production/LWFA_betatron_decoherence/ on hemera. It should be readable. Please copy the file for your usage.

Thank you for providing the file

I think, i understand your problem:
in file simData_filtered_60000.h5 the sizes of all datasets:momentum, charge, position, etc ..is 1705615.
you can check it, for example, by:

    f = h5py.File('simData_filtered_60000.h5', 'r')
    result = f["data"]["60000"]["particles"]["en_all"]["particleId"]
    print(result.shape)

and output is : (1705615,).
And your dataset numParticlesOffset in particle patches doesn't correspond values here from 0 to 1.30344E7. (that much bigger)

    f = h5py.File('simData_filtered_60000.h5', 'r')
    result = f["data"]["60000"]["particles"]["en_all"]["particlePatches"]['numParticlesOffset'][0:193]
    print(result)

and output is:

[       0    24357    29493    33502    56927    82990   104215   119799
   ...
 13029684 13029684 13029684 13029684 13029684 13032670 13033621 13034400]

numParticlesOffset` has to be in range [0: particle_species_size]. It indicates start of new particles chunk.

So, i can suggest two solutions:

  1. i can fix the file, by changing 'numParticlesOffset' from origin to version that has values only from 0 to 1705615
  2. i cah add additional parameter "skip particle patch and make new" in redutor

@sbastrakov
Copy link
Member

So what may have gone wrong is that when removing the particles except the IDs you wanted to keep, the patch information was not updated correspondingly @alex-koe . Not 100% sure, just seems plausible. I think the easiest is to just remove the patches datasets alltogether then? (They are optional, and the reductor works when they are not present, just does not work when they are present but wrong)

@alex-koe
Copy link

alex-koe commented Oct 15, 2020

@sbastrakov I agree, the patch information was not updated. Just removing the patches is the easiest approach.
I deleted the patches datasets from the file and restarted the job.
The result was a segmention fault. Does the script need to be adjusted?

@sbastrakov
Copy link
Member

As i thought no, i think @KseniaBastrakova will take a look

@KseniaBastrakova
Copy link
Collaborator Author

@alex-koe i think, i reproduce your problem and fix it.
Could you make
git fetch && git pull
and try reduction once again?

@alex-koe
Copy link

@KseniaBastrakova It is working! Great work.
And it is fast: it took only about 2min to run the job. :-)

@alex-koe
Copy link

@sbastrakov @KseniaBastrakova I have only a minor question. When i do read out the position dataset, it looks the same for original and reduced particles and he data set is reduced in size. The weights seems to be ok, i.e., the sum of the weights give the same charge. The average position and spread in position is consistent between original and reduced set.

However, i do not understand the momentum:
This is the original set in (y, p_y):
image

This is the reduced set:
image
The distribution in y looks fine. The distribution in p_y seems to have a different factor.

For reading the momentum, i do the following:

""" label is one of {x,y,z}, species is *en*, ..."""
    u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)]  # ~ gamma beta m c
    w = f['/data/{}/particles/{}/weighting'.format(timestep, species)]  # 
    u_unitSI = u.attrs["unitSI"]
    uy =  u_unitSI*np.array(u) /np.array(w)  /(const.m_e*const.c)   ## the normalized momenum gamma beta_i

Do i have to take special care here to get the correct momentum?

@sbastrakov
Copy link
Member

Thanks for reporting. It could be that somewhere (not sure on whose side) the macroweighted flag is treated incorrectly.

@sbastrakov
Copy link
Member

sbastrakov commented Oct 16, 2020

To extend on that thought: @alex-koe your way of computing momentum is not suitable for general openPMD files, as the momentum attribute could theoretically be for both single- and macroparticle, and there is a special attribute that informs on what way was used. However, I assume you know how it is for your particular file (that momentum is for macroparticles, like in PIConGPU output) and read it accordingly. So I think it is more likely there is something fishy in the reductor @KseniaBastrakova , e.g. maybe the reductor writes it out with the different macroweighted way?

@alex-koe
Copy link

@sbastrakov so what would be then the correct way? I used pretty used the same way that did give me the correct momentum for all simulations as before and never had an issue. Is the 'reduced' file somehow in a different format?

@alex-koe
Copy link

alex-koe commented Oct 22, 2020

This is another thing i tried out, the x-ux space with the random reductor:
grafik
These are about 5001 particles illustrated by a scatter plot.

@KseniaBastrakova
Copy link
Collaborator Author

@alex-koe thank you for details. I think, the problem on my side: i found a bug, that can couse this behavior. I will fix it today.

@KseniaBastrakova
Copy link
Collaborator Author

@alex-koe , now it's work better. Could you make
git fetch && git pull and try once again?

@alex-koe
Copy link

@KseniaBastrakova I tried it again. It is now different :-D
grafik
grafik
The momentum and (x-ux) looks strange.

@KseniaBastrakova
Copy link
Collaborator Author

KseniaBastrakova commented Oct 28, 2020

@alex-koe . So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce
I just tried to build two graphs: before and after redaction by this:


def print_graph():
    f = h5py.File(''simData_beta_60000.h5', 'r')
    timestep = 60000
    species = 'en_all'
    m_e = 9.1e-31
    c = 1080000000
    label = "y"
    u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)]  # ~ gamma beta m c
    w = f['/data/{}/particles/{}/weighting'.format(timestep, species)]  #
    u_unitSI = u.attrs["unitSI"]
    uy = u_unitSI * np.array(u) / np.array(w) / (m_e * c)
    plt.scatter(u, uy, marker='x')
    plt.xlabel('y [um]')
    plt.ylabel("p_y[MeV/e]")
    plt.show()

first, simData_beta_60000 before reduction
after_simData_beta_60000

and after reduction with ratio of deleted particles = 0.1
after_simData_beta_60000

So, graphs look similar, so i can't reproduce problem

@alex-koe
Copy link

So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce
I just tried to build two graphs: before and after redaction by this:

I put the submit script, the jupyter notebook i used for the plots shown and the sim file for in and out to the tar archive for-Ksenia.tar.bz2 on /bigdata/hplsim/production/LWFA_betatron_decoherence/ (on hemera). So you can follow what i did. A major difference to your data is actually, that i used a much larger ratio of deleted particles close to one. I dont know if this could causing something.

@sbastrakov
Copy link
Member

I don't think a large ratio should not work. It would certainly make the sampling much worse, but if that's acceptable otherwise should not be an issue.

@KseniaBastrakova
Copy link
Collaborator Author

@alex-koe , could you give me access to the for-Ksenia.tar.bz2? I just don't have rights to copy it

@alex-koe
Copy link

alex-koe commented Nov 3, 2020

@alex-koe , could you give me access to the for-Ksenia.tar.bz2? I just don't have rights to copy it

@KseniaBastrakova you have now right access to the file.

@KseniaBastrakova
Copy link
Collaborator Author

@alex-koe, Thank you for scripts and sample. It's really useful

The problem that momentums are "macroweighted" that means, that we store total momentum for all electrons in microparticle.
(weigh * one_marticle_momentum)

So, if you delete 0.9, so the start number of macroparticles is 13037395 rest number of macroparticles is 1303739, so we deleted11733655macroparticles. And now, we need to redristribute their weight between rest macropartilces, that means
we add about 1300 weight to each rest macroparticle.
And, becouse we store momemum as momentum * weight_of_macroparticle we get such big momentums

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

@sbastrakov
Copy link
Member

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

@KseniaBastrakova
Copy link
Collaborator Author

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

I can make additional script to fix it, as a temporary solution

@alex-koe
Copy link

alex-koe commented Nov 5, 2020

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

I can make additional script to fix it, as a temporary solution

Sorry, i maybe not correctly understand: It is not possible to redistribute the weighting after the reduction process? I would like to have the total physical quantities, ie. energy, momentum, charge, mass, conserved. Such as, that if i have in the original file 100pC of charge and the average momentum is 100MeV/c, then the reduced file has the same charge and average momentum but just the number of macro particle is reduced.

@sbastrakov
Copy link
Member

sbastrakov commented Nov 5, 2020

@alex-koe let me reiterate. Of couse, it is possible to remove some macroparticles and redistribute their weights among others so that the physical conservation laws are in place, and that's the goal of reduction.

The point @KseniaBastrakova was trying to make was merely that in your file, the momentums were stored per macroparticle and not per physical particle. The openPMD standard has a special property for more or less each piece of particle data that says whether it's for a real- or a macro-particle, this propery is called macroweighted. (E.g. PIConGPU outputs momentums per macro-particle.) The reduction script supports both cases or at least is supposed to do it. So the value of macroweighted determines if the momentums and other affected data should be rescaled when a weighting changes. And of course the post-processing should take care of both options regarless of reduction. So when both the reduction and the post-processing handle macroweighted correctly, there should be no violation of momentum conservation, etc. with any parameters of reduction.

@PrometheusPi
Copy link
Member

Sorry, @KseniaBastrakova is there an issue with the reduction script or with the input data?
(All picongpu momentum are macro-weighted, thus if this is the issue, we could not apply the reduction to our simulations.)
Sorry, I am a bit confused by the discussion and just try to understand the current status and the remaining problem.

@sbastrakov
Copy link
Member

sbastrakov commented Nov 9, 2020

I am also not sure what the current status is. To reiterate my previous point, this reduction package does in principle support both macro-weighted and non macro-weighted data. So if it produces incorrect results, this is a bug in implementation and we should investigate and fix it, but it is not a principal limitation.

If I understood @KseniaBastrakova 's last point correctly, she did not see an obvious error in the last run, but I guess @alex-koe did. Maybe it is better to talk offline

@KseniaBastrakova
Copy link
Collaborator Author

Yes, I think it makes sense to talk offline. We defenetly have missunderstanding here

@PrometheusPi
Copy link
Member

Okay - then let's meet - today is quite busy for me, how about tomorrow at 11 AM?

@sbastrakov
Copy link
Member

Works fine with me

@KseniaBastrakova
Copy link
Collaborator Author

For me too

@PrometheusPi
Copy link
Member

PrometheusPi commented Nov 10, 2020

Just spoke with @alex-koe offline - a meeting tomorrow at 11 AM works for him as well.

@KseniaBastrakova
Copy link
Collaborator Author

Thanks for really useful examples and scripts!
I think, i made a fix:
Here is distribution for normalized momentum u_unitSI * np.array(u) / np.array(w) / (const.m_e * const.c)
before reduction:
XY_before_reduction
YZ_before_reduction

and after reduction (ratio of deleted particles = 0.9)
XY_after_reduction
YZ_after_reduction

please do:
git commit && git pull to get the fix

@PrometheusPi
Copy link
Member

@KseniaBastrakova Great work! 👍 the momenta look correct. Could you please do a plt.hist2d together with a plt.colorbar() instead of a scatter plot? This would validate weighting as well.

@KseniaBastrakova
Copy link
Collaborator Author

@PrometheusPi here are graphs (coordinate, momentum) for each dimension before and after reduction. (for file simData_beta_60000.h5)

Xp_x_before_reduction

Xp_x_after_reduction

Yp_y_before_reduction

Yp_y_after_reduction

Zp_z_before_reduction

Zp_z_after_reduction

@PrometheusPi
Copy link
Member

PrometheusPi commented Nov 16, 2020

@KseniaBastrakova Thanks - could please use the weights parameter of the hist2d plot? This would ensure that the weightings scale as required. So far, hist2d assumes a weighting of 1 and thus only counts macro particles (causing a factor 10 difference).

@KseniaBastrakova
Copy link
Collaborator Author

@PrometheusPi , i added weights : H = ax.hist2d(z_position, z_momemtum, weights=weights_reduction, bins=50) and got graphs. it looks preety similar

Xp_x_before_reduction

Xp_x_after_reduction

Yp_y_before_reduction

Yp_y_after_reduction

Zp_z_before_reduction

Zp_z_after_reduction

@PrometheusPi
Copy link
Member

PrometheusPi commented Nov 17, 2020

@KseniaBastrakova Thank you, that looks really good! d^2Q/dydp_y looks perfect (y \in {x,y,z}).

@PrometheusPi
Copy link
Member

@alex-koe Can you please try this with your data.

@alex-koe
Copy link

@PrometheusPi Here is a comparision between original (left col.) and reducted particle sets (right col.):
grafik
Sorry for not adding any axis description but it would be to small in the output. The rows are as follows: (y,uy), (x,ux), (z,uz). The date is taken from run 006. The only very minor detail i see is that the colorbar scaling (auto) is slightly different. But i guess that is more my issue in how to sample it correctly for such a small data set of N=5000 (right side).
@KseniaBastrakova you did a great work.

@alex-koe
Copy link

For documention reasons, i copy the script for the particle filtering:

""" 
needed Input:
+ timestep
+ filename_bunch_idendifiers
+ filename_filtered_file
""" 
###### init ####
import numpy as np
import scipy as sp
import h5py
import scipy.constants as const

timestep                              = 60000
filename_bunch_idendifiers = "bunch-identifiers.dat"
################ WARING ################
# The filtered file will be overwritten, copy this file instead from the original data
filename_filtered_file           = "simData_run006_filtered_{}.h5".format(timestep)

# read particle ids for filtering
ids = np.loadtxt(filename_bunch_idendifiers, dtype=np.uint64)

##### open h5 files 
filtered_file = h5py.File(filename_filtered_file, "r+")

h = filtered_file["/data/{}/particles/en_all".format(timestep)]
current_ids = h["particleId"][()]
m   = np.in1d(current_ids, ids)
if m.sum() != len(ids):
    print("ERR: requested IDs are not fully contained in H5 file. Abort.")
    exit

paths = ["particleId", "weighting",
             "momentum/x", "momentum/y", "momentum/z",
             "momentumPrev1/x", "momentumPrev1/y", "momentumPrev1/z",
             "position/x", "position/y", "position/z",
             "positionOffset/x", "positionOffset/y", "positionOffset/z"]
for p in paths:
    temp = h[p][m]
    temp_items = h[p].attrs.items()
    h.__delitem__(p)
    h[p]  = temp
    for i in temp_items:
        h[p].attrs.create(name=i[0], data=i[1])
for p in ["mass", 'charge']:
    temp = h[p].attrs['shape']
    temp[0] = np.sum(m)
    h[p].attrs['shape'] = temp

#### delete particle patches because reduction script does not process them correctly
filtered_file["/data/{}/particles/en_all/".format(timestep)].__delitem__('particlePatches')
filtered_file.close()

@alex-koe
Copy link

alex-koe commented Nov 30, 2020

@KseniaBastrakova i think this issue can be closed. I just do not know how to do ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants