Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restartable models #111

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

Restartable models #111

wants to merge 36 commits into from

Conversation

HDembinski
Copy link
Collaborator

@HDembinski HDembinski commented Dec 31, 2022

An attempt to close #106

This branch is based on #112, which should be merged first.

This is using two approaches. For most models, it is sufficient run some initialization code exactly once, which I ensure with the new _once private method. Concrete models should not implemented __init__ anymore, only _once which is called by MCRun.__init__ exactly once and after the underlying implementation has been initialized.

For Phojet and DPMJet, this does not work (at least I could not get it to work), so I followed the idea expressed in #106. I wrote new base class MCRunRemote, which a brilliant and terrible piece of engineering. Behind the scenes, it runs every call to cross_section and __call__ through a remote process which is created and dies just for the call. The beauty of it: a model which requires this workaround just has to inherit from MCRunRemote instead of MCRun, no other change required!

To maintain the illusion of having a single model and not one that is restarted all the time, I rely strongly on the random-state serialization feature that @jncots added. I further improved this so that the PRNG state of the numpy generator is also saved. Since this so important, I added more tests that check the RNG state serialization and found some issues. @jncots could you please look into this?

Other changes

  • MCRun.get_stable returns set of particles that were changed to stable
  • Kinematics is now passed to MCRun.__call__ instead of MCRun.__init__; this makes more sense for models that are restartable. Some models need a maximum energy upfront, this can be passed via init.
  • MCRun.kinematics property was removed. The kinematics should not be part of the state of the generator, we pass them explicitly to the two functions which need them, MCRun.__call__ and MCRun.cross_section. Less class state is generally better, it simplifies the design and the reasoning about something. It does not matter that the underlying Fortran stores the kinematics, because we abstracted that away in our high-level API.
  • MCRun.random_state now includes the state of the numpy PRNG
  • MCRun._composite_plan now generates heavy elements first, as a workaround for DPMJet
  • MCRun.set_unstable was renamed to MCRun.maydecay

@HDembinski HDembinski marked this pull request as ready for review January 2, 2023 17:20
@HDembinski HDembinski changed the title Turn models into singletons Restartable models Jan 2, 2023
@HDembinski
Copy link
Collaborator Author

@afedynitch You said that you are looking into fixing cross-sections for the models. I added a new test test_cross_sections.py which checks the output of all models for various combinations of projectiles and targets. The references for such checks are very rough, of course, because the models seem to disagree a lot especially on nuclear cross-sections, however, the test still checks basic sanity of the model response and I found several bugs this way, too.

@HDembinski HDembinski requested review from jncots and afedynitch and removed request for jncots January 2, 2023 23:35
@HDembinski
Copy link
Collaborator Author

From my side, this PR is ready for review, I need your help to fix the remaining things.

Tests are failing largely because of the new tests for rng_state persistence that I added (that is unrelated to my changes).

@HDembinski
Copy link
Collaborator Author

HDembinski commented Jan 2, 2023

At least one of the tests is stalling, on my own computer and on CI. I haven't figured out which one yet. This means you have to use a keyboard interrupt to complete the tests at the moment. On Windows the tests abort with a memory error, these things could be related, although I did not notice a large increase in RAM on my computer. Parallel computation works differently on Windows and Unix, so it could well be that there is a mistake which Windows reveals more drastically.

Edit: Stalling is now fixed, the issue is well understood.

@HDembinski
Copy link
Collaborator Author

The test which runs forever is test_to_hepmc3.py, I am investigating.

return CrossSectionData(total=stot, elastic=sela, inelastic=stot - sela)

assert kin.p1.is_hadron
assert kin.p2.is_hadron
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed targets are _projectiles, but in reality it's only Nuclei(), proton and neutron.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts are just an executable comment, I put this here to remind myself that we only deal with hadrons at this point in the function. The nuclear targets are handled by the part above.


# TODO set more cross-sections

def _set_kinematics(self, kin):
# Save maximal mass that has been initialized
# (DPMJET sometimes crashes if higher mass requested than initialized)

# Relax momentum and energy conservation checks at very high energies
if kin.ecm > 5e4:
Copy link
Member

@afedynitch afedynitch Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do that without setting it back. Like this the output will depend on the sequence of requested energies since energy-momentum conservation checks can reject events. Either piece has to stay in init or, one needs to save the default values and reset them if kinematics change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be in init.

Ah, I see what you mean, but it does not really matter, I think. We need to use MCRunRemote which always uses a fresh copy of dpmjet when _set_kinematics is called, so it is always called only once per dpmjet instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to fix this a bit, but you are the DPMJet expert, so I am relying on your help.

@@ -160,9 +158,9 @@ def _set_kinematics(self, kin):
self._lib.dt_init(
-1,
max(kin.plab, 100.0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the maximal initialization energy cannot be exceeded. So, it has to be saved as well.

Copy link
Collaborator Author

@HDembinski HDembinski Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dt_init can be only called once, right? At least that was my impression from observing the code. MCRunRemote makes it so that _set_kinematics is called exactly once per instance.


super().__init__(seed)
def _once(self, ecm_max=1000 * TeV):
from impy import debug_level
Copy link
Member

@afedynitch afedynitch Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the debug level cannot change in runtime? It is important if you search for something that goes wrong at event 12345 but don't want to see the output of 12344 events.

Another reason to keep it changeable in runtime is that different debug levels in the models can produce extremely different levels of output.

I don't think that the functionality was there before consistently. So might be no problem, when using this "remote run".

N-th edit: If I change the debug level in impy.debug_level in a notebook or python script, and do a second initialization, it won't be reflected here. Something like this would be great.

impy.debug_level = 0
m1 = Epos() # do something

impy.debug_level = 5
m2 = Phojet() # do something different

Copy link
Collaborator Author

@HDembinski HDembinski Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason to keep it changeable in runtime is that different debug levels in the models can produce extremely different levels of output.

I agree that being able to change debug levels for a model instance is good, but the API we have so far (also before my changes) does not allow this.

If you pass the debug level to fortran in a function that is called only at init, then you cannot change the debug level after init. However, we should be able to access the variable that is used in the fortran code to enable debug output, and then you can change it at any time.

Copy link
Collaborator Author

@HDembinski HDembinski Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this:

impy.debug_level = 0
m1 = Epos() # do something

impy.debug_level = 5
m2 = Phojet() # do something different

I thought this works, this is why debug_level is imported freshly from top level impy (and to avoid some circular import issue). Did you test it and found it does not work? It does not work, if you call Epos again instead of Phojet, but it should work if it is two different models, and both are created for the first time.

if self._lib.pho_event(-1, *self._beams)[1]:
raise RuntimeError(
"initialization failed with the current event kinematics"
)
Copy link
Member

@afedynitch afedynitch Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is wrong. pho_event(-1, ....) has to be called once for the highest energy requested for a run. Different combinations of particles and energies are set with pho_setpar. Once an event generation is triggered (pho_event(>0, ..)), PHOJET runs the remaining initializations. So pho_event(-1 only once.

Copy link
Collaborator Author

@HDembinski HDembinski Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it once with a maximum energy that is never exceeded? like 1000 * TeV?


def _cross_section(self, kin=None):
def _cross_section(self, kin):
if (kin.p2.A or 1) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really that bad at nuclei :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, perhaps you have to query the cross-section from another variable. At some point I dereference a nullptr in the calls when nuclei are used. I didn't check what exactly happens.

def __init__(self, evt_kin, *, seed=None):
super().__init__(seed)
def _once(self):
from impy import debug_level
Copy link
Member

@afedynitch afedynitch Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, no change in run time is an unnecessary design limitation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I did not design the library in this way, it was already like that.

_version = "2.1"
_library_name = "_sib21"


class Sibyll23(SIBYLLRun):
# For some reason, Sibyll23 requires MCRunRemote, but the others don't
Copy link
Member

@afedynitch afedynitch Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that only Sibyll23 crashes on multiple inits? In SIBYLL, the initialization should not be called multiple times even if it works. If I remove the check in PHOJET it will also work but will result in undefined or altered behavior.

Copy link
Collaborator Author

@HDembinski HDembinski Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can try it out. Replace MCRunRemote with MCRun further below as base class for Sibyll23 and run the tests. I don't understand why, so it is good if someone else checks this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run models in separate processes to solve multiple issues
2 participants