Reducing partiticles from set of ID's #21

KseniaBastrakova · 2020-09-22T07:35:14Z

Now, the library supports only the selection of particle species for reduction. There was a request to add functionality that will reduce the number of particles only from the selected set.

As a small feature, it can be additional script, that the script takes as an argument file with a set of particle indexes (in .dat format) with indexes of particles to which the reduction should be applied. The algorithm only applies to particles from the openPMDfile with the selected indices

alex-koe · 2020-10-12T08:31:34Z

After offline disscussion with @KseniaBastrakova @prometeusPi, i using this approach:

copy original file from PIConGPU as it worked in Minimum working example #22
open the copied (!) file RW, i.e., with python ('r+')
Filtering only the wanted particles, e.g., bunch particles in the FWHM of energy
- This means, for the datasets
  {"particleId", "weighting",
  "momentum/x", "momentum/y", "momentum/z",
  "momentumPrev1/x", "momentumPrev1/y", "momentumPrev1/z",
  "position/x", "position/y", "position/z",
  "positionOffset/x", "positionOffset/y", "positionOffset/z"}
  delete all particles except for the wanted particles
- make sure that the attributes of above datasets are preserved
Update the shape attribute for mass and charge to the new particle number
close file and make sure that data is sucessfully written

Now i submitted a jobs as mentioned in #22 .
It failed with:

Series.set_software_version is deprecated. Set the version with the second argument of Series.set_software
I == 0
I == 1
I == 2
I == 3
I == 4
I == 5
I == 6
I == 7
I == 8
I == 9
I == 10
I == 11
I == 12
I == 13
I == 14
I == 15
I == 16
I == 17
I == 18
I == 19
I == 20
I == 21
I == 22
I == 23
I == 24
I == 25
I == 26
I == 27
I == 28
I == 29
I == 30
I == 31
I == 32
I == 33
I == 34
I == 35
I == 36
I == 37
I == 38
I == 39
I == 40
I == 41
I == 42
I == 43
I == 44
I == 45
I == 46
Traceback (most recent call last):
  File "reduction_main.py", line 822, in <module>
    base_reduction_function(args.hdf, args.hdf_re, "random", parameters, args.iteration)
  File "reduction_main.py", line 220, in base_reduction_function
    reduce_one_iteration(series_hdf, series_hdf_reduction, algorithm, iteration)
  File "reduction_main.py", line 206, in reduce_one_iteration
    process_iteration_group(algorithm, current_iteration, series_hdf, series_hdf_reduction, reduction_iteration)
  File "reduction_main.py", line 240, in process_iteration_group
    process_patches_in_group_v2(iteration.particles[name_group], series_hdf,
  File "reduction_main.py", line 686, in process_patches_in_group_v2
    data, dict_data_indexes, last_idx = get_data(series_hdf, particle_species, weights, idx_start, idx_end)
  File "reduction_main.py", line 642, in get_data
    if len(absolute_values[0]) == 1:
IndexError: index 0 is out of bounds for axis 0 with size 0

alex-koe · 2020-10-12T08:32:49Z

I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?

KseniaBastrakova · 2020-10-12T11:44:49Z

I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?

yes, I think, you missed "unit_si" or "macroweighted" attribute
Could you send me a link to the file? It will be helpful to reproduce

alex-koe · 2020-10-13T12:03:45Z

This is the path to the file on taurus
/scratch/ws/1/s6119265-simdata-exchange where you will find the file simData_filtered_60000.h5. This is the file i got the error with.
I dont know if this is the good file exchange but i hope it works. Do you have a taurus account?

sbastrakov · 2020-10-13T12:07:10Z

Thanks @alex-koe .
Unfortunately neither @KseniaBastrakova nor I have an account there. So could you transfer to Hemera please?

alex-koe · 2020-10-13T12:18:39Z

@sbastrakov the file is now on /bigdata/hplsim/production/LWFA_betatron_decoherence/ on hemera. It should be readable. Please copy the file for your usage.

KseniaBastrakova · 2020-10-14T23:33:47Z

@sbastrakov the file is now on /bigdata/hplsim/production/LWFA_betatron_decoherence/ on hemera. It should be readable. Please copy the file for your usage.

Thank you for providing the file

I think, i understand your problem:
in file simData_filtered_60000.h5 the sizes of all datasets:momentum, charge, position, etc ..is 1705615.
you can check it, for example, by:

    f = h5py.File('simData_filtered_60000.h5', 'r')
    result = f["data"]["60000"]["particles"]["en_all"]["particleId"]
    print(result.shape)

and output is : (1705615,).
And your dataset numParticlesOffset in particle patches doesn't correspond values here from 0 to 1.30344E7. (that much bigger)

    f = h5py.File('simData_filtered_60000.h5', 'r')
    result = f["data"]["60000"]["particles"]["en_all"]["particlePatches"]['numParticlesOffset'][0:193]
    print(result)

and output is:

[       0    24357    29493    33502    56927    82990   104215   119799
   ...
 13029684 13029684 13029684 13029684 13029684 13032670 13033621 13034400]

numParticlesOffset` has to be in range [0: particle_species_size]. It indicates start of new particles chunk.

So, i can suggest two solutions:

i can fix the file, by changing 'numParticlesOffset' from origin to version that has values only from 0 to 1705615
i cah add additional parameter "skip particle patch and make new" in redutor

sbastrakov · 2020-10-15T07:40:03Z

So what may have gone wrong is that when removing the particles except the IDs you wanted to keep, the patch information was not updated correspondingly @alex-koe . Not 100% sure, just seems plausible. I think the easiest is to just remove the patches datasets alltogether then? (They are optional, and the reductor works when they are not present, just does not work when they are present but wrong)

alex-koe · 2020-10-15T11:42:22Z

@sbastrakov I agree, the patch information was not updated. Just removing the patches is the easiest approach.
I deleted the patches datasets from the file and restarted the job.
The result was a segmention fault. Does the script need to be adjusted?

sbastrakov · 2020-10-15T12:05:01Z

As i thought no, i think @KseniaBastrakova will take a look

KseniaBastrakova · 2020-10-16T11:09:37Z

@alex-koe i think, i reproduce your problem and fix it.
Could you make
git fetch && git pull
and try reduction once again?

alex-koe · 2020-10-16T14:47:34Z

@KseniaBastrakova It is working! Great work.
And it is fast: it took only about 2min to run the job. :-)

alex-koe · 2020-10-16T15:00:03Z

@sbastrakov @KseniaBastrakova I have only a minor question. When i do read out the position dataset, it looks the same for original and reduced particles and he data set is reduced in size. The weights seems to be ok, i.e., the sum of the weights give the same charge. The average position and spread in position is consistent between original and reduced set.

However, i do not understand the momentum:
This is the original set in (y, p_y):

This is the reduced set:

The distribution in y looks fine. The distribution in p_y seems to have a different factor.

For reading the momentum, i do the following:

""" label is one of {x,y,z}, species is *en*, ..."""
    u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)]  # ~ gamma beta m c
    w = f['/data/{}/particles/{}/weighting'.format(timestep, species)]  # 
    u_unitSI = u.attrs["unitSI"]
    uy =  u_unitSI*np.array(u) /np.array(w)  /(const.m_e*const.c)   ## the normalized momenum gamma beta_i

Do i have to take special care here to get the correct momentum?

sbastrakov · 2020-10-16T15:14:18Z

Thanks for reporting. It could be that somewhere (not sure on whose side) the macroweighted flag is treated incorrectly.

sbastrakov · 2020-10-16T15:17:36Z

To extend on that thought: @alex-koe your way of computing momentum is not suitable for general openPMD files, as the momentum attribute could theoretically be for both single- and macroparticle, and there is a special attribute that informs on what way was used. However, I assume you know how it is for your particular file (that momentum is for macroparticles, like in PIConGPU output) and read it accordingly. So I think it is more likely there is something fishy in the reductor @KseniaBastrakova , e.g. maybe the reductor writes it out with the different macroweighted way?

alex-koe · 2020-10-22T07:19:22Z

@sbastrakov so what would be then the correct way? I used pretty used the same way that did give me the correct momentum for all simulations as before and never had an issue. Is the 'reduced' file somehow in a different format?

alex-koe · 2020-10-22T15:15:36Z

This is another thing i tried out, the x-ux space with the random reductor:

These are about 5001 particles illustrated by a scatter plot.

KseniaBastrakova · 2020-10-23T04:26:57Z

@alex-koe thank you for details. I think, the problem on my side: i found a bug, that can couse this behavior. I will fix it today.

KseniaBastrakova · 2020-10-23T13:58:58Z

@alex-koe , now it's work better. Could you make
git fetch && git pull and try once again?

alex-koe · 2020-10-28T06:30:59Z

@KseniaBastrakova I tried it again. It is now different :-D

The momentum and (x-ux) looks strange.

KseniaBastrakova · 2020-10-28T13:43:02Z

@alex-koe . So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce
I just tried to build two graphs: before and after redaction by this:


def print_graph():
    f = h5py.File(''simData_beta_60000.h5', 'r')
    timestep = 60000
    species = 'en_all'
    m_e = 9.1e-31
    c = 1080000000
    label = "y"
    u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)]  # ~ gamma beta m c
    w = f['/data/{}/particles/{}/weighting'.format(timestep, species)]  #
    u_unitSI = u.attrs["unitSI"]
    uy = u_unitSI * np.array(u) / np.array(w) / (m_e * c)
    plt.scatter(u, uy, marker='x')
    plt.xlabel('y [um]')
    plt.ylabel("p_y[MeV/e]")
    plt.show()

first, simData_beta_60000 before reduction

and after reduction with ratio of deleted particles = 0.1

So, graphs look similar, so i can't reproduce problem

alex-koe · 2020-10-29T14:47:33Z

So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce
I just tried to build two graphs: before and after redaction by this:

I put the submit script, the jupyter notebook i used for the plots shown and the sim file for in and out to the tar archive for-Ksenia.tar.bz2 on /bigdata/hplsim/production/LWFA_betatron_decoherence/ (on hemera). So you can follow what i did. A major difference to your data is actually, that i used a much larger ratio of deleted particles close to one. I dont know if this could causing something.

sbastrakov · 2020-10-29T15:16:50Z

I don't think a large ratio should not work. It would certainly make the sampling much worse, but if that's acceptable otherwise should not be an issue.

KseniaBastrakova · 2020-11-03T09:17:22Z

@alex-koe , could you give me access to the for-Ksenia.tar.bz2? I just don't have rights to copy it

alex-koe · 2020-11-03T09:59:44Z

@alex-koe , could you give me access to the for-Ksenia.tar.bz2? I just don't have rights to copy it

@KseniaBastrakova you have now right access to the file.

KseniaBastrakova · 2020-11-04T09:02:12Z

@alex-koe, Thank you for scripts and sample. It's really useful

The problem that momentums are "macroweighted" that means, that we store total momentum for all electrons in microparticle.
(weigh * one_marticle_momentum)

So, if you delete 0.9, so the start number of macroparticles is 13037395 rest number of macroparticles is 1303739, so we deleted11733655macroparticles. And now, we need to redristribute their weight between rest macropartilces, that means
we add about 1300 weight to each rest macroparticle.
And, becouse we store momemum as momentum * weight_of_macroparticle we get such big momentums

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

sbastrakov · 2020-11-04T09:16:45Z

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

KseniaBastrakova · 2020-11-04T09:57:35Z

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

I can make additional script to fix it, as a temporary solution

alex-koe · 2020-11-05T12:18:03Z

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

I can make additional script to fix it, as a temporary solution

Sorry, i maybe not correctly understand: It is not possible to redistribute the weighting after the reduction process? I would like to have the total physical quantities, ie. energy, momentum, charge, mass, conserved. Such as, that if i have in the original file 100pC of charge and the average momentum is 100MeV/c, then the reduced file has the same charge and average momentum but just the number of macro particle is reduced.

sbastrakov · 2020-11-05T12:55:03Z

@alex-koe let me reiterate. Of couse, it is possible to remove some macroparticles and redistribute their weights among others so that the physical conservation laws are in place, and that's the goal of reduction.

The point @KseniaBastrakova was trying to make was merely that in your file, the momentums were stored per macroparticle and not per physical particle. The openPMD standard has a special property for more or less each piece of particle data that says whether it's for a real- or a macro-particle, this propery is called macroweighted. (E.g. PIConGPU outputs momentums per macro-particle.) The reduction script supports both cases or at least is supposed to do it. So the value of macroweighted determines if the momentums and other affected data should be rescaled when a weighting changes. And of course the post-processing should take care of both options regarless of reduction. So when both the reduction and the post-processing handle macroweighted correctly, there should be no violation of momentum conservation, etc. with any parameters of reduction.

PrometheusPi · 2020-11-09T16:34:35Z

Sorry, @KseniaBastrakova is there an issue with the reduction script or with the input data?
(All picongpu momentum are macro-weighted, thus if this is the issue, we could not apply the reduction to our simulations.)
Sorry, I am a bit confused by the discussion and just try to understand the current status and the remaining problem.

sbastrakov · 2020-11-09T16:47:37Z

I am also not sure what the current status is. To reiterate my previous point, this reduction package does in principle support both macro-weighted and non macro-weighted data. So if it produces incorrect results, this is a bug in implementation and we should investigate and fix it, but it is not a principal limitation.

If I understood @KseniaBastrakova 's last point correctly, she did not see an obvious error in the last run, but I guess @alex-koe did. Maybe it is better to talk offline

KseniaBastrakova · 2020-11-10T09:07:48Z

Yes, I think it makes sense to talk offline. We defenetly have missunderstanding here

PrometheusPi · 2020-11-10T10:36:55Z

Okay - then let's meet - today is quite busy for me, how about tomorrow at 11 AM?

sbastrakov · 2020-11-10T10:45:59Z

Works fine with me

KseniaBastrakova · 2020-11-10T10:47:11Z

For me too

PrometheusPi · 2020-11-10T11:13:59Z

Just spoke with @alex-koe offline - a meeting tomorrow at 11 AM works for him as well.

KseniaBastrakova · 2020-11-16T11:00:06Z

Thanks for really useful examples and scripts!
I think, i made a fix:
Here is distribution for normalized momentum u_unitSI * np.array(u) / np.array(w) / (const.m_e * const.c)
before reduction:

and after reduction (ratio of deleted particles = 0.9)

please do:
git commit && git pull to get the fix

PrometheusPi · 2020-11-16T12:46:04Z

@KseniaBastrakova Great work! 👍 the momenta look correct. Could you please do a plt.hist2d together with a plt.colorbar() instead of a scatter plot? This would validate weighting as well.

KseniaBastrakova · 2020-11-16T14:31:33Z

@PrometheusPi here are graphs (coordinate, momentum) for each dimension before and after reduction. (for file simData_beta_60000.h5)

PrometheusPi · 2020-11-16T14:43:31Z

@KseniaBastrakova Thanks - could please use the weights parameter of the hist2d plot? This would ensure that the weightings scale as required. So far, hist2d assumes a weighting of 1 and thus only counts macro particles (causing a factor 10 difference).

KseniaBastrakova · 2020-11-17T08:38:34Z

@PrometheusPi , i added weights : H = ax.hist2d(z_position, z_momemtum, weights=weights_reduction, bins=50) and got graphs. it looks preety similar

PrometheusPi · 2020-11-17T09:23:16Z

@KseniaBastrakova Thank you, that looks really good! d^2Q/dydp_y looks perfect (y \in {x,y,z}).

PrometheusPi · 2020-11-17T09:23:55Z

@alex-koe Can you please try this with your data.

alex-koe · 2020-11-17T10:38:09Z

@PrometheusPi Here is a comparision between original (left col.) and reducted particle sets (right col.):

Sorry for not adding any axis description but it would be to small in the output. The rows are as follows: (y,uy), (x,ux), (z,uz). The date is taken from run 006. The only very minor detail i see is that the colorbar scaling (auto) is slightly different. But i guess that is more my issue in how to sample it correctly for such a small data set of N=5000 (right side).
@KseniaBastrakova you did a great work.

alex-koe · 2020-11-30T09:11:20Z

For documention reasons, i copy the script for the particle filtering:

""" 
needed Input:
+ timestep
+ filename_bunch_idendifiers
+ filename_filtered_file
""" 
###### init ####
import numpy as np
import scipy as sp
import h5py
import scipy.constants as const

timestep                              = 60000
filename_bunch_idendifiers = "bunch-identifiers.dat"
################ WARING ################
# The filtered file will be overwritten, copy this file instead from the original data
filename_filtered_file           = "simData_run006_filtered_{}.h5".format(timestep)

# read particle ids for filtering
ids = np.loadtxt(filename_bunch_idendifiers, dtype=np.uint64)

##### open h5 files 
filtered_file = h5py.File(filename_filtered_file, "r+")

h = filtered_file["/data/{}/particles/en_all".format(timestep)]
current_ids = h["particleId"][()]
m   = np.in1d(current_ids, ids)
if m.sum() != len(ids):
    print("ERR: requested IDs are not fully contained in H5 file. Abort.")
    exit

paths = ["particleId", "weighting",
             "momentum/x", "momentum/y", "momentum/z",
             "momentumPrev1/x", "momentumPrev1/y", "momentumPrev1/z",
             "position/x", "position/y", "position/z",
             "positionOffset/x", "positionOffset/y", "positionOffset/z"]
for p in paths:
    temp = h[p][m]
    temp_items = h[p].attrs.items()
    h.__delitem__(p)
    h[p]  = temp
    for i in temp_items:
        h[p].attrs.create(name=i[0], data=i[1])
for p in ["mass", 'charge']:
    temp = h[p].attrs['shape']
    temp[0] = np.sum(m)
    h[p].attrs['shape'] = temp

#### delete particle patches because reduction script does not process them correctly
filtered_file["/data/{}/particles/en_all/".format(timestep)].__delitem__('particlePatches')
filtered_file.close()

alex-koe · 2020-11-30T09:18:00Z

@KseniaBastrakova i think this issue can be closed. I just do not know how to do ...

alex-koe mentioned this issue Nov 30, 2020

Momentum deviation from original data ComputationalRadiationPhysics/openPMD-converter-GDF#10

Open

KseniaBastrakova closed this as completed Dec 2, 2020

Reducing partiticles from set of ID's #21

Reducing partiticles from set of ID's #21

Comments

KseniaBastrakova commented Sep 22, 2020 • edited Loading

alex-koe commented Oct 12, 2020

alex-koe commented Oct 12, 2020

KseniaBastrakova commented Oct 12, 2020

alex-koe commented Oct 13, 2020

sbastrakov commented Oct 13, 2020

alex-koe commented Oct 13, 2020

KseniaBastrakova commented Oct 14, 2020

sbastrakov commented Oct 15, 2020

alex-koe commented Oct 15, 2020 • edited Loading

sbastrakov commented Oct 15, 2020

KseniaBastrakova commented Oct 16, 2020

alex-koe commented Oct 16, 2020

alex-koe commented Oct 16, 2020

sbastrakov commented Oct 16, 2020

sbastrakov commented Oct 16, 2020 • edited Loading

alex-koe commented Oct 22, 2020

alex-koe commented Oct 22, 2020 • edited Loading

KseniaBastrakova commented Oct 23, 2020

KseniaBastrakova commented Oct 23, 2020

alex-koe commented Oct 28, 2020

KseniaBastrakova commented Oct 28, 2020 • edited Loading

alex-koe commented Oct 29, 2020

sbastrakov commented Oct 29, 2020

KseniaBastrakova commented Nov 3, 2020

alex-koe commented Nov 3, 2020

KseniaBastrakova commented Nov 4, 2020

sbastrakov commented Nov 4, 2020

KseniaBastrakova commented Nov 4, 2020

alex-koe commented Nov 5, 2020

sbastrakov commented Nov 5, 2020 • edited Loading

PrometheusPi commented Nov 9, 2020

sbastrakov commented Nov 9, 2020 • edited Loading

KseniaBastrakova commented Nov 10, 2020

PrometheusPi commented Nov 10, 2020

sbastrakov commented Nov 10, 2020

KseniaBastrakova commented Nov 10, 2020

PrometheusPi commented Nov 10, 2020 • edited Loading

KseniaBastrakova commented Nov 16, 2020

PrometheusPi commented Nov 16, 2020

KseniaBastrakova commented Nov 16, 2020

PrometheusPi commented Nov 16, 2020 • edited Loading

KseniaBastrakova commented Nov 17, 2020

PrometheusPi commented Nov 17, 2020 • edited Loading

PrometheusPi commented Nov 17, 2020

alex-koe commented Nov 17, 2020

alex-koe commented Nov 30, 2020

alex-koe commented Nov 30, 2020 • edited Loading

KseniaBastrakova commented Sep 22, 2020 •

edited

Loading

alex-koe commented Oct 15, 2020 •

edited

Loading

sbastrakov commented Oct 16, 2020 •

edited

Loading

alex-koe commented Oct 22, 2020 •

edited

Loading

KseniaBastrakova commented Oct 28, 2020 •

edited

Loading

sbastrakov commented Nov 5, 2020 •

edited

Loading

sbastrakov commented Nov 9, 2020 •

edited

Loading

PrometheusPi commented Nov 10, 2020 •

edited

Loading

PrometheusPi commented Nov 16, 2020 •

edited

Loading

PrometheusPi commented Nov 17, 2020 •

edited

Loading

alex-koe commented Nov 30, 2020 •

edited

Loading