rface is not enabled. This is relevant, as the Profiling Interface affects all the _hot_ code paths (in the target code, not in the offloaded code). Users of the OpenACC Profiling Interface can be expected to understand that performance will be impacted to some degree once the Profiling Interface has gotten enabled: for example, because of the _runtime_ (libgomp) calling into a third-party _library_ for every event that has been registered. We're not yet accounting for the fact that 'OpenACC events may occur during event processing'. We just handle one case specially, as required by CUDA 9.0 'nvprof', that 'acc_get_device_type' (*note acc_get_device_type::)) may be called from 'acc_ev_device_init_start', 'acc_ev_device_init_end' callbacks. We're not yet implementing initialization via a 'acc_register_library' function that is either statically linked in, or dynamically via 'LD_PRELOAD'. Initialization via 'acc_register_library' functions dynamically loaded via the 'ACC_PROFLIB' environment variable does work, as does directly calling 'acc_prof_register', 'acc_prof_unregister', 'acc_prof_lookup'. As currently there are no inquiry functions defined, calls to 'acc_prof_lookup' will always return 'NULL'. There aren't separate _start_, _stop_ events defined for the event types 'acc_ev_create', 'acc_ev_delete', 'acc_ev_alloc', 'acc_ev_free'. It's not clear if these should be triggered before or after the actual device-specific call is made. We trigger them after. Remarks about data provided to callbacks: 'acc_prof_info.event_type' It's not clear if for _nested_ event callbacks (for example, 'acc_ev_enqueue_launch_start' as part of a parent compute construct), this should be set for the nested event ('acc_ev_enqueue_launch_start'), or if the value of the parent construct should remain ('acc_ev_compute_construct_start'). In this implementation, the value will generally correspond to the innermost nested event type. 'acc_prof_info.device_type' * For 'acc_ev_compute_construct_start', and in presence of an 'if' clause with _false_ argument, this will still refer to the offloading device type. It's not clear if that's the expected behavior. * Complementary to the item before, for 'acc_ev_compute_construct_end', this is set to 'acc_device_host' in presence of an 'if' clause with _false_ argument. It's not clear if that's the expected behavior. 'acc_prof_info.thread_id' Always '-1'; not yet implemented. 'acc_prof_info.async' * Not yet implemented correctly for 'acc_ev_compute_construct_start'. * In a compute construct, for host-fallback execution/'acc_device_host' it will always be 'acc_async_sync'. It's not clear if that's the expected behavior. * For 'acc_ev_device_init_start' and 'acc_ev_device_init_end', it will always be 'acc_async_sync'. It's not clear if that's the expected behavior. 'acc_prof_info.async_queue' There is no 'limited number of asynchronous queues' in libgomp. This will always have the same value as 'acc_prof_info.async'. 'acc_prof_info.src_file' Always 'NULL'; not yet implemented. 'acc_prof_info.func_name' Always 'NULL'; not yet implemented. 'acc_prof_info.line_no' Always '-1'; not yet implemented. 'acc_prof_info.end_line_no' Always '-1'; not yet implemented. 'acc_prof_info.func_line_no' Always '-1'; not yet implemented. 'acc_prof_info.func_end_line_no' Always '-1'; not yet implemented. 'acc_event_info.event_type', 'acc_event_info.*.event_type' Relating to 'acc_prof_info.event_type' discussed above, in this implementation, this will always be the same value as 'acc_prof_info.event_type'. 'acc_event_info.*.parent_construct' * Will be 'acc_construct_parallel' for all OpenACC compute constructs as well as many OpenACC Runtime API calls; should be the one matching the actual construct, or 'acc_construct_runtime_api', respectively. * Will be 'acc_construct_enter_data' or 'acc_construct_exit_data' when processing variable mappings specified in OpenACC _declare_ directives; should be 'acc_construct_declare'. * For implicit 'acc_ev_device_init_start', 'acc_ev_device_init_end', and explicit as well as implicit 'acc_ev_alloc', 'acc_ev_free', 'acc_ev_enqueue_upload_start', 'acc_ev_enqueue_upload_end', 'acc_ev_enqueue_download_start', and 'acc_ev_enqueue_download_end', will be 'acc_construct_parallel'; should reflect the real parent construct. 'acc_event_info.*.implicit' For 'acc_ev_alloc', 'acc_ev_free', 'acc_ev_enqueue_upload_start', 'acc_ev_enqueue_upload_end', 'acc_ev_enqueue_download_start', and 'acc_ev_enqueue_download_end', this currently will be '1' also for explicit usage. 'acc_event_info.data_event.var_name' Always 'NULL'; not yet implemented. 'acc_event_info.data_event.host_ptr' For 'acc_ev_alloc', and 'acc_ev_free', this is always 'NULL'. 'typedef union acc_api_info' ... as printed in '5.2.3. Third Argument: API-Specific Information'. This should obviously be 'typedef _struct_ acc_api_info'. 'acc_api_info.device_api' Possibly not yet implemented correctly for 'acc_ev_compute_construct_start', 'acc_ev_device_init_start', 'acc_ev_device_init_end': will always be 'acc_device_api_none' for these event types. For 'acc_ev_enter_data_start', it will be 'acc_device_api_none' in some cases. 'acc_api_info.device_type' Always the same as 'acc_prof_info.device_type'. 'acc_api_info.vendor' Always '-1'; not yet implemented. 'acc_api_info.device_handle' Always 'NULL'; not yet implemented. 'acc_api_info.context_handle' Always 'NULL'; not yet implemented. 'acc_api_info.async_handle' Always 'NULL'; not yet implemented. Remarks about certain event types: 'acc_ev_device_init_start', 'acc_ev_device_init_end' * Whan a compute construct triggers implicit 'acc_ev_device_init_start' and 'acc_ev_device_init_end' events, they currently aren't _nested within_ the corresponding 'acc_ev_compute_construct_start' and 'acc_ev_compute_construct_end', but they're currently observed _before_ 'acc_ev_compute_construct_start'. It's not clear what to do: the standard asks us provide a lot of details to the 'acc_ev_compute_construct_start' callback, without (implicitly) initializing a device before? * Callbacks for these event types will not be invoked for calls to the 'acc_set_device_type' and 'acc_set_device_num' functions. It's not clear if they should be. 'acc_ev_enter_data_start', 'acc_ev_enter_data_end', 'acc_ev_exit_data_start', 'acc_ev_exit_data_end' * Callbacks for these event types will also be invoked for OpenACC _host_data_ constructs. It's not clear if they should be. * Callbacks for these event types will also be invoked when processing variable mappings specified in OpenACC _declare_ directives. It's not clear if they should be. Callbacks for the following event types will be invoked, but dispatch and information provided therein has not yet been thoroughly reviewed: * 'acc_ev_alloc' * 'acc_ev_free' * 'acc_ev_update_start', 'acc_ev_update_end' * 'acc_ev_enqueue_upload_start', 'acc_ev_enqueue_upload_end' * 'acc_ev_enqueue_download_start', 'acc_ev_enqueue_download_end' During device initialization, and finalization, respectively, callbacks for the following event types will not yet be invoked: * 'acc_ev_alloc' * 'acc_ev_free' Callbacks for the following event types have not yet been implemented, so currently won't be invoked: * 'acc_ev_device_shutdown_start', 'acc_ev_device_shutdown_end' * 'acc_ev_runtime_shutdown' * 'acc_ev_create', 'acc_ev_delete' * 'acc_ev_wait_start', 'acc_ev_wait_end' For the following runtime library functions, not all expected callbacks will be invoked (mostly concerning implicit device initialization): * 'acc_get_num_devices' * 'acc_set_device_type' * 'acc_get_device_type' * 'acc_set_device_num' * 'acc_get_device_num' * 'acc_init' * 'acc_shutdown' Aside from implicit device initialization, for the following runtime library functions, no callbacks will be invoked for shared-memory offloading devices (it's not clear if they should be): * 'acc_malloc' * 'acc_free' * 'acc_copyin', 'acc_present_or_copyin', 'acc_copyin_async' * 'acc_create', 'acc_present_or_create', 'acc_create_async' * 'acc_copyout', 'acc_copyout_async', 'acc_copyout_finalize', 'acc_copyout_finalize_async' * 'acc_delete', 'acc_delete_async', 'acc_delete_finalize', 'acc_delete_finalize_async' * 'acc_update_device', 'acc_update_device_async' * 'acc_update_self', 'acc_update_self_async' * 'acc_map_data', 'acc_unmap_data' * 'acc_memcpy_to_device', 'acc_memcpy_to_device_async' * 'acc_memcpy_from_device', 'acc_memcpy_from_device_async'