Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024.10.17 preset extensions #284

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

dbarry9
Copy link
Contributor

@dbarry9 dbarry9 commented Nov 15, 2024

Pull Request Description

This PR adds support for presets defined using native events from non-CPU components. The only component that has been modified in this PR is the 'cuda' component, which is limited to loading the preset table for 'cuda' presets.

These changes have been tested on the NVIDIA Grace-Hopper architecture.

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@dbarry9 dbarry9 added the status-work-in-progress PR is still being worked on label Nov 15, 2024
src/utils/papi_avail.c Outdated Show resolved Hide resolved
}

if ( print_tabular ) {
printf( " Name Code " );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header of Name Code Avail Deriv Description (Note), currently sits off centered for both the PAPI Preset Events and PAPI Component Presets:

    Name        Code    Avail Deriv Description (Note)
Start printing index = 0
PAPI_L1_DCM      0x80000000  No    No   Level 1 data cache misses
    Name        Code    Avail Deriv Description (Note)
PAPI_CUDA_HP_FMA 0x80000080  Yes   No   CUDA Half precision FMA instructions
    :device=0
        Mandatory device qualifier [0]

The master code does not have this:

    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses

src/utils/papi_avail.c Outdated Show resolved Hide resolved

if ( !print_event_info ) {
if ( print_avail_only == PAPI_PRESET_ENUM_CPU_AVAIL || print_avail_only == PAPI_PRESET_ENUM_AVAIL ) {
printf( "Of %d available events, %d ", avail_count, deriv_count );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a suggestive comment, I think it may be a good idea to have two different printouts. One printout for PAPI Preset Events and one for PAPI Component Presets.

src/components/cuda/linux-cuda.c Outdated Show resolved Hide resolved

/* check to see if a valid modifier is provided */
if (modifier != PAPI_ENUM_EVENTS &&
modifier != PAPI_ENUM_FIRST &&
modifier != PAPI_ENUM_ALL &&
modifier != PAPI_PRESET_ENUM_AVAIL &&
modifier != PAPI_PRESET_ENUM_CPU &&
modifier != PAPI_PRESET_ENUM_CPU_AVAIL &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the doxygen for PAPI_enum_event to reflect the addition of PAPI_PRESET_ENUM_CPU, PAPI_PRESET_ENUM_CPU_AVAIL, and PAPI_ENUM_FIRST_COMP?

i &= PAPI_PRESET_AND_MASK;

/* Iterate over all or all available presets. */
if ( modifier == PAPI_ENUM_EVENTS || modifier == PAPI_PRESET_ENUM_AVAIL ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested both of these modifiers:

  • PAPI_ENUM_EVENTS, does go over all available CPU Events, but for the Cuda presets I output PAPI_CUDA_HP_FMA (the first listed Cuda preset on hopper1 at Oregon) and the other three events return PAPI_ENOTPRESET.
  • PAPI_PRESET_ENUM_AVAIL, does not output just the available preset events.

i &= PAPI_PRESET_AND_MASK;

/* Iterate over all or all available presets. */
if ( modifier == PAPI_ENUM_EVENTS || modifier == PAPI_PRESET_ENUM_AVAIL ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CPU:

PAPI_PRESET_ENUM_AVAIL is showing 108 available preset events on the machine I am testing on. While the master branch is only showing 47 (we should get 47 back).

For the GPU:
PAPI_ENUM_EVENTS is only showing PAPI_CUDA_HP_FMA when starting at 0x80000080. Likewise for PAPI_PRESET_ENUM_AVAIL.

prstPtr->num_quals++;

/* Store the qualifier in the preset struct. */
prstPtr->quals[prstPtr->num_quals-1] = (char*)malloc((1+strlen(qualPtr))*sizeof(char));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not increment prstPtr->num_quals++ further down? Then you would not have to do prstPtr->num_quals-1. I may be missing something though and you actually need to increment at that point.

/* Store the qualifier's description in the preset struct. */
prstPtr->descrs[prstPtr->num_quals-1] = (char*)malloc((1+strlen(qualPtr))*sizeof(char));
if( NULL != prstPtr->descrs[prstPtr->num_quals-1] ) {
status = sprintf(prstPtr->descrs[prstPtr->num_quals-1], "%s", qualPtr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned previously, I would probably go with snprintf instead of sprintf. There are a few other occurrences below this as well.

@dbarry9 dbarry9 force-pushed the 2024.10.17_preset-extensions branch 4 times, most recently from 935ea80 to eaa020e Compare November 25, 2024 18:25
@dbarry9 dbarry9 force-pushed the 2024.10.17_preset-extensions branch 4 times, most recently from b57602c to 6c536d1 Compare November 27, 2024 23:09
Create fields in the vector struct for components to define presets.

These changes have been tested on the NVIDIA Hopper architecture.
@dbarry9 dbarry9 force-pushed the 2024.10.17_preset-extensions branch from 6c536d1 to 1212e11 Compare December 5, 2024 17:04
Add functions to facilitate CUDA presets.

These changes have been tested on the NVIDIA Hopper architecture.
Update configure to track both the number of presets per component and
the arrays of presets belonging to each component.

These changes have been tested on the NVIDIA Hopper architecture.
Updates to framework to facilitate preset events defined by native
events of non-perf_event components.

These changes have been tested on the NVIDIA Hopper architecture.
This is an aesthetic change to improve the development process.

These changes have been tested on the NVIDIA Grace-Hopper architecture.
Replace modifiers with only those that enumerate the CPU preset events.

These changes have been tested on the NVIDIA Grace-Hopper architecture.
Enumerate presets for components as well as the CPU.

These changes have been tested on the NVIDIA Grace-Hopper architecture.
@dbarry9 dbarry9 force-pushed the 2024.10.17_preset-extensions branch from 1212e11 to 6406ed1 Compare December 5, 2024 20:23
@Treece-Burgess Treece-Burgess added the update-presets PRs related to updating the PAPI presets label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-work-in-progress PR is still being worked on update-presets PRs related to updating the PAPI presets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants