Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ReFrame's CPU autodetect in test step #682

Conversation

casparvl
Copy link
Collaborator

@casparvl casparvl commented Aug 22, 2024

I've figured out the way we can use the CPU autodetection of ReFrame with the local spawner. We just inject the partition name for the current SLURM partition in which we are running into the ReFrame configuration file. This ensures that we get one topology file per SLURM partition that is autodetected. Note that the autodetection only needs to happen once for each architecture, and then it's there "forever" in the .reframe in the homedir of the bot.

It's good to use the autodetection, as it guarantees all the CPU info we potentially rely on in the EESSI test suite is present. This is preferable over hard-coding it, and actually recommended according to our own documentation :D

To explain a bit: prior to this PR, we would create the ReFrame config file from a template. The template would contain placeholders like __NUM_CPUS__, __NUM_SOCKETS__, __NUM_CPUS_PER_CORE and __NUM_CPUS_PER_SOCKET__. In the test_suite.sh we would detect this information from the output of lscpu. The reason we did it this way is that we thought we couldn't rely on CPU autodetection by ReFrame: ReFrame stores this in $HOME/.reframe/topology/<system_name>-<partition_name>/processor.json. Since the partition name here would always be the same (default) since we use the local spawner, it would autodetect once, and never again. That's problematic, since actually our tests run on different partitions (the different types of build nodes: zen2, zen3, zen4, haswell, etc), with different node configurations.

The downside of this approach is that automatically detecting processor config ensures that we have a predictable set of processor keywords defined in the ReFrame config file. E.g. in #585 it turned out that we were missing num_cores_per_numa_node. This is the reason our documentation recommends using CPU autodetection. When I hit that issue, I realized: we can actually use CPU autodetection if, instead of the processor information, we simply detect which SLURM partition we are running on - and use that in a template replacement. This means that when running on e.g. the x86-64-amd-zen2-node partition, __RFM_PARTITION__ get's replaced by x86-64-amd-zen2-node. ReFrame's CPU autodetect will then do CPU autodetection, and put the topology file in $HOME/.reframe/topology/BotBuildTests-x86-64-amd-zen2-node/topology.json. Then, when it runs a next time on e.g. x86-64-intel-haswell-node, it will again do CPU autodetection, and store it in $HOME/.reframe/topology/BotBuildTests-x86-64-intel-haswell-node/topology.json.

That's perfect: that's exactly how it would work with a non-local spawner. It also means CPU info only needs to be detected once. Next time the bot runs (on that partition), the info is already there and the CPU autodetection step can (and will) be skipped by ReFrame.

Caspar van Leeuwen added 2 commits August 22, 2024 16:28
Copy link

eessi-bot bot commented Aug 22, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 22, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software

@casparvl
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

    • no jobs were submitted

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account casparvl has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16827

date job status comment
Aug 22 14:37:54 UTC 2024 submitted job id 16827 awaits release by job manager
Aug 22 14:38:15 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 14:44:28 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16827.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 14:44:28 UTC 2024 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-16827.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account casparvl has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16828

date job status comment
Aug 22 14:50:20 UTC 2024 submitted job id 16828 awaits release by job manager
Aug 22 14:50:39 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 14:51:43 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16828.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 14:51:43 UTC 2024 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-16828.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account casparvl has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16829

date job status comment
Aug 22 15:04:13 UTC 2024 submitted job id 16829 awaits release by job manager
Aug 22 15:05:06 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:06:11 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16829.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 15:06:11 UTC 2024 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 18/18 test case(s) from 18 check(s) (18 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16829.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

…o auto-detect once per partition. Then, the file is cached and available to be used in the next run!
@casparvl
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account casparvl has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Aug 22, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16830

date job status comment
Aug 22 15:08:48 UTC 2024 submitted job id 16830 awaits release by job manager
Aug 22 15:09:17 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:10:21 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16830.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 15:10:21 UTC 2024 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 18/18 test case(s) from 18 check(s) (18 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16830.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

…sing local spawner. But right now, it doesn't seem to detect anything
Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16832

date job status comment
Aug 22 15:18:25 UTC 2024 submitted job id 16832 awaits release by job manager
Aug 22 15:18:50 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:35:31 UTC 2024 running job 16832 is running
Aug 22 15:56:08 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16832.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 15:56:08 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16832.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-haswell for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16833

date job status comment
Aug 22 15:18:30 UTC 2024 submitted job id 16833 awaits release by job manager
Aug 22 15:18:52 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:37:41 UTC 2024 running job 16833 is running
Aug 22 15:56:09 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16833.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 15:56:09 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16833.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-azure for architecture x86_64-amd-zen4 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/235

date job status comment
Aug 22 15:18:30 UTC 2024 submitted job id 235 awaits release by job manager
Aug 22 15:19:06 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:30:10 UTC 2024 running job 235 is running
Aug 22 16:04:09 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-235.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 16:04:09 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 13/13 test case(s) from 13 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-235.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-skylake_avx512 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16834

date job status comment
Aug 22 15:18:34 UTC 2024 submitted job id 16834 awaits release by job manager
Aug 22 15:18:54 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:37:43 UTC 2024 running job 16834 is running
Aug 22 15:53:56 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16834.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 15:53:56 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16834.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16835

date job status comment
Aug 22 15:18:38 UTC 2024 submitted job id 16835 awaits release by job manager
Aug 22 15:18:45 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:31:21 UTC 2024 running job 16835 is running
Aug 22 15:49:23 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16835.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 15:49:23 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16835.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16836

date job status comment
Aug 22 15:18:42 UTC 2024 submitted job id 16836 awaits release by job manager
Aug 22 15:18:47 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:35:29 UTC 2024 running job 16836 is running
Aug 22 15:51:41 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16836.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 15:51:41 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16836.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16837

date job status comment
Aug 22 15:18:47 UTC 2024 submitted job id 16837 awaits release by job manager
Aug 22 15:19:59 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:38:45 UTC 2024 running job 16837 is running
Aug 22 16:26:29 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16837.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 16:26:29 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16837.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_n1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16838

date job status comment
Aug 22 15:18:51 UTC 2024 submitted job id 16838 awaits release by job manager
Aug 22 15:20:01 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:38:47 UTC 2024 running job 16838 is running
Aug 22 16:17:17 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16838.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 16:17:17 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16838.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 22, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_v1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_682/16839

date job status comment
Aug 22 15:18:56 UTC 2024 submitted job id 16839 awaits release by job manager
Aug 22 15:20:03 UTC 2024 released job awaits launch by Slurm scheduler
Aug 22 15:39:58 UTC 2024 running job 16839 is running
Aug 22 16:07:53 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-16839.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 22 16:07:53 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-16839.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

casparvl pushed a commit to casparvl/software-layer that referenced this pull request Sep 3, 2024
@casparvl
Copy link
Collaborator Author

casparvl commented Sep 3, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell
bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/skylake_avx512
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4
bot: build repo:eessi.io-2023.06-software arch:aarch64/generic
bot: build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1
bot: build repo:eessi.io-2023.06-software arch:aarch64/neoverse_v1

Copy link

eessi-bot bot commented Sep 3, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/skylake_avx512 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/generic from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/generic
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_v1 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_v1
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/generic resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_v1 resulted in:

Copy link

eessi-bot bot commented Sep 3, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/haswell from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/skylake_avx512 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/generic from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/generic
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_v1 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_v1
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/haswell resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/generic resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_v1 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17637

date job status comment
Sep 03 16:11:08 UTC 2024 submitted job id 17637 awaits release by job manager
Sep 03 16:11:22 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:16:41 UTC 2024 running job 17637 is running
Sep 03 16:37:35 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17637.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 16:37:35 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17637.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-haswell for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17638

date job status comment
Sep 03 16:11:12 UTC 2024 submitted job id 17638 awaits release by job manager
Sep 03 16:11:24 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:16:43 UTC 2024 running job 17638 is running
Sep 03 16:34:09 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17638.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 16:34:09 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17638.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-azure for architecture x86_64-amd-zen4 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/244

date job status comment
Sep 03 16:11:13 UTC 2024 submitted job id 244 awaits release by job manager
Sep 03 16:11:58 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:41:42 UTC 2024 running job 244 is running
Sep 03 17:14:44 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-244.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 17:14:44 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 13/13 test case(s) from 13 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-244.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-skylake_avx512 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17639

date job status comment
Sep 03 16:11:16 UTC 2024 submitted job id 17639 awaits release by job manager
Sep 03 16:11:26 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:17:48 UTC 2024 running job 17639 is running
Sep 03 16:35:18 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17639.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 16:35:18 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17639.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17640

date job status comment
Sep 03 16:11:20 UTC 2024 submitted job id 17640 awaits release by job manager
Sep 03 16:12:36 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:21:08 UTC 2024 running job 17640 is running
Sep 03 16:39:46 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17640.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 16:39:46 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17640.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen3 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17641

date job status comment
Sep 03 16:11:24 UTC 2024 submitted job id 17641 awaits release by job manager
Sep 03 16:12:38 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:21:11 UTC 2024 running job 17641 is running
Sep 03 16:37:34 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17641.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 16:37:34 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17641.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17642

date job status comment
Sep 03 16:11:29 UTC 2024 submitted job id 17642 awaits release by job manager
Sep 03 16:12:29 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:19:54 UTC 2024 running job 17642 is running
Sep 03 17:07:09 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17642.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 17:07:09 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17642.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_n1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17643

date job status comment
Sep 03 16:11:33 UTC 2024 submitted job id 17643 awaits release by job manager
Sep 03 16:12:31 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:19:56 UTC 2024 running job 17643 is running
Sep 03 16:56:53 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17643.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 16:56:53 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17643.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_v1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_682/17644

date job status comment
Sep 03 16:11:38 UTC 2024 submitted job id 17644 awaits release by job manager
Sep 03 16:12:33 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 16:19:58 UTC 2024 running job 17644 is running
Sep 03 16:46:18 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17644.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Sep 03 16:46:18 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17644.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

casparvl commented Sep 3, 2024

Ah, I didn't realize, the topology files won't be stored in the homedir of the bot, as everything is executed in the container. There, the job dir is mounted. So it's actually in there:

/project/def-users/SHARED/jobs/2024.09/pr_682/17644/.reframe/topology/BotBuildTests-aarch64-neoverse-v1-node/processor.json

Although that means it needs to autodetect every time, that's actually a plus. We'll always have a topology file that is up-to-date, even if at some point we would change the node types or something. It also means that technically we can skip replacing the partition name, as the detected topology files will be in different prefixes (the jobdirs) anyway. Still, that replacement doesn't hurt, it makes things more clear.

Anyway, I checked that processor.json, it looks just fine:

[casparvl@login1 17644]$ head .reframe/topology/BotBuildTests-aarch64-neoverse-v1-node/processor.json
{
  "arch": "neoverse_n1",
  "topology": {
    "numa_nodes": [
      "0xffff"
    ],
    "sockets": [
      "0xffff"
    ],
    "cores": [
[casparvl@login1 17644]$ tail .reframe/topology/BotBuildTests-aarch64-neoverse-v1-node/processor.json
          "0xffff"
        ]
      }
    ]
  },
  "num_cpus": 16,
  "num_cpus_per_core": 1,
  "num_cpus_per_socket": 16,
  "num_sockets": 1

test_suite.sh Outdated Show resolved Hide resolved
Copy link
Collaborator

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@bedroge bedroge merged commit 66219a9 into EESSI:2023.06-software.eessi.io Sep 6, 2024
33 checks passed
@casparvl casparvl deleted the fix_missing_num_cores_per_numa_node_in_test_step branch September 18, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants