-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BuildCausalityNetworkTask --> Too many open files #3
Comments
Interesting.... this happens during archiving, which I have a feature branch that no longer archives, but instead uploads each file so this would probably not happen in that case. That said, 21k files per sim is a lot! Maybe too many. Is it possible to refactor the causality network task to emit a single archive of the 21k json files, then the CN opens that? Object store works best with fewer larger files than multitudes of small files like this. To solve this in the immediate term we can always raise the open file limit (here is a dumb tutorial for this): https://easyengine.io/tutorials/linux/increase-open-files-limit/ |
Storing individual files will be great. If we want to keep the Sisyphus archive feature, is this just a matter of closing each file after adding it to the archive rather than accumulating open files? Maybe the Causality builder should put all the json files into an archive, but they might have to get unpacked for the web viewer web page. I suspect the json files are organized for the Causality viewer to open incrementally. That's probably related to the viewer's limitations that keep it from being able to fit more than ≈2 generations in memory. |
`wcm.py` is like `fw_queue.py` but it takes inputs from an argparse CLI, builds a Gaia workflow, launches a suitable number of Sisyphus workers, then submits the workflow to the Gaia server. The workers will time out afterwards. The steps to use it will get simpler and better documented. At the moment: 1. Install `gcloud` and authenticate (one time setup). 2. Build the wcEcoli "runtime" container image if the pip requirements changed: `cloud/build-runtime.sh` 3. Build your own wcEcoli "wcm" code container image if you changed the code in your workspace (specifically the firetasks and the code they call): `cloud/build-wcm.sh` * It will be named `$USER-wcm-code` by default. You can give it an `ID` argument if you want more than one container image. 4. ssh to the Gaia server, opening a tunnel to its Gaia server port and (for now) to the Kafka cluster: `runscripts/sisyphus/ssh-tunnel.sh` (See that shell script for another setup step, soon to be obsolete.) 5. In another terminal tab, with the wcEcoli directory on the `PYTHONPATH`, run the workflow builder, e.g.: `python runscripts/sisyphus/wcm.py -g2 -c2`. * The option `-g2` means 2 generations, and `-c2` requests 2 CPUs on each worker node which matches their current configuration. `-c2` will run the parallel Parca and it might speed up the analyses. * Each worker node runs one task at a time. * The workflow builder launches `variant_count * init_sims` workers by default. The `--workers` argument overrides that. * Don't use `--build_causality_network` yet because it will [break the current Sisyphus code](CovertLab/sisyphus#3). ``` usage: wcm.py [-h] [--verbose [VERBOSE] | --no_verbose] [-c CPUS] [--dump [DUMP] | --no_dump] [-w WORKERS] [--ribosome_fitting [RIBOSOME_FITTING] | --no_ribosome_fitting] [--rnapoly_fitting [RNAPOLY_FITTING] | --no_rnapoly_fitting] [--debug_parca [DEBUG_PARCA] | --no_debug_parca] [-v VARIANT_TYPE FIRST_INDEX LAST_INDEX] [-g GENERATIONS] [-i INIT_SIMS] [-t TIMELINE] [--length_sec LENGTH_SEC] [--timestep_safety_frac TIMESTEP_SAFETY_FRAC] [--timestep_max TIMESTEP_MAX] [--timestep_update_freq TIMESTEP_UPDATE_FREQ] [--mass_distribution [MASS_DISTRIBUTION] | --no_mass_distribution] [--growth_rate_noise [GROWTH_RATE_NOISE] | --no_growth_rate_noise] [--d_period_division [D_PERIOD_DIVISION] | --no_d_period_division] [--translation_supply [TRANSLATION_SUPPLY] | --no_translation_supply] [--trna_charging [TRNA_CHARGING] | --no_trna_charging] [--run_analysis [RUN_ANALYSIS] | --no_run_analysis] [-p PLOT [PLOT ...]] [--build_causality_network [BUILD_CAUSALITY_NETWORK] | --no_build_causality_network] ``` 6. Watch the logs via the [GCP Logs Viewer](https://console.cloud.google.com/logs/viewer?resource=gce_instance&project=allen-discovery-center-mcovert&organizationId=302681460499&minLogLevel=0&expandAll=false×tamp=2019-07-10T22:47:58.497000000Z&customFacets=&limitCustomFacetWidth=true&interval=PT1H&scrollTimestamp=2019-07-10T21:55:14.740000000Z&dateRangeStart=2019-07-10T21:47:58.496Z&dateRangeUnbound=forwardInTime) set to "GCE VM Instance". 7. Download the outputs via the [Google Cloud Storage browser](https://console.cloud.google.com/storage/browser/sisyphus/data/?project=allen-discovery-center-mcovert&organizationId=302681460499) or (soon) by mounting it via gcsfuse. Additional code changes: * Each firetask is now responsible for creating its output directories since the tasks have access to the correct file system while the builder does not. * Factor out a subroutine to name the output variant dirs rather than replicate that. * Add a ParcaTask firetask that bundles the 4 existing firetasks into one that fits Sisyphus' functional model where inputs and outputs are distinct files and directories. * BuildCausalityNetworkTask also breaks the functional model by treating `output_network_directory` as an output once per variant, otherwise as an input. The builder works around that by asking it to write its network and dynamics into the same directory. That does mean recomputing the network per sim rather than sharing it per variant, which we could optimize by moving the network part of the builder to VariantSimDataTask, but in practice it would save very little space and time compared to the rest of its work.
BuildCausalityNetworkTask writes 29100 json files into a sim generation's seriesOut/ dir, leading to a
FileNotFoundException
(Too many open files)
in Sisyphus.That's immediately followed by a
java.lang.ClassCastException: class java.lang.Character cannot be cast to class java.util.Map$Entry
which might be an error handling bug.The text was updated successfully, but these errors were encountered: