Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: Fix papi_command_line segfault when passed non-existent event name #101

Conversation

anustuvicl
Copy link
Contributor

Before this fix $ papi_command_line xxxx causes segfault.

After this fix the segfault is replaced by

This utility lets you add events from the command line interface to see if they work.

Failed adding: xxxx
because: Invalid argument
No events specified!
Try running something like: /home/anustuv/papidev/papi-AP2/papi/src/install/bin/papi_command_line PAPI_TOT_CYC

This happens because the cuda component dynamically adds events to its internal structure when event_name_to_code is called without checking validity, and returns an event code. Only at PAPI_start it complains if the event cannot be added to the lower layers.

Enumeration of all cuda events is a heavy operation and thus it does not do it unless explicitly requested.

@@ -1438,11 +1438,11 @@ int cuptip_control_create(cuptiu_event_table_t *event_names, cuptic_info_t thr_i
if (papi_errno != PAPI_OK) {
goto fn_exit;
}
papi_errno = add_events_per_gpu(state, event_names);
papi_errno = nvpw_cuda_metricscontext_create(state);
if (papi_errno != PAPI_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this working at all? How can add_events_per_gpu take a state object that is created only after?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvpw_cuda_metricscontext_create creates CUPTI metrics contexts which are needed for enumeration of metrics, and not required by add events. Before, if add events failed, the function would exit without creating these, and on exit the component would try to destroy these and fail with segfault. So Creating these before add events successfully destroys these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think naming is a little confusing. My interpretation of the code was that state is the control state object and nvpw_cuda_metricscontext_create creates the control state object, while add_events_per_gpu adds events to the control state object.

Copy link

@gulaki gulaki Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuptip_control_create creates the state object, then calls nvpw_cuda_metricscontext_create then add_events_per_gpu. nvpw_cuda_metricscontext_create creates the CUPTI MetricsContext object as members of the state object. The naming might be confusing because too many things are called "context".

@anustuvicl anustuvicl merged commit d9ffaf2 into icl-utk-edu:master Oct 23, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants