Add scope_range_id support to ROCm profiler#670
Add scope_range_id support to ROCm profiler#670magaonka-amd wants to merge 1 commit intoROCm:mainfrom
Conversation
|
|
||
| void RocmTraceCollectorImpl::ExportScopeRangeIdTree(XSpace* space) { | ||
| XPlaneBuilder plane(FindOrAddMutablePlaneWithName( | ||
| space, tsl::profiler::kScopeRangeIdTreePlaneName)); | ||
| tensorflow::profiler::XStatMetadata metadata; | ||
| for (const auto& [child_id, parent_id] : scope_range_id_tree_) { | ||
| metadata.set_id(child_id); | ||
| plane.AddStatValue(metadata, parent_id); | ||
| } |
There was a problem hiding this comment.
Bug: scope_range_id_tree_ is never populated — exported XPlane will always be empty.
This method iterates over scope_range_id_tree_, but nothing in this PR ever inserts entries into it. Compare with the CUPTI path:
- In
cupti_tracer.cc:671-681,AddScopeRangeIdSequencewalks the full range ID sequence fromAnnotationStack::GetScopeRangeIds()and builds the child→parent mapping by iterating adjacent pairs (tree.emplace(*curr, *(curr - 1))). - In
cupti_collector.cc:726-734, per-thread trees are merged into the collector'sscope_range_id_tree_.
In this PR, only the leaf scope range ID (the last element of range_ids) is stored per correlation ID in the annotation map, but the parent-child tree relationships are never constructed. The tree-building logic equivalent to AddScopeRangeIdSequence is missing, so xprof will not be able to reconstruct the scope hierarchy.
There was a problem hiding this comment.
Resolved — addressed in this revision. The tree-building logic has been added in AnnotationMap::Add (walking adjacent pairs to build child-to-parent mappings), and TakeScopeRangeIdTree() transfers it to the collector before export.
There was a problem hiding this comment.
Resolved: scope_range_id_tree_ is now populated via the tree-building logic in AnnotationMap::Add and transferred through TakeScopeRangeIdTree(). The exported XPlane will now correctly contain the scope hierarchy.
| trace_event->scope_range_id = | ||
| annotation_map()->LookUpScopeRangeId(trace_event->correlation_id); |
There was a problem hiding this comment.
Bug: scope_range_id is set only for kernel dispatch events, not for HipApi or Memcpy events.
HipApiEvent (around line 144-195) and MemcpyEvent (around line 197-263) both look up annotations via annotation_map()->LookUp() but do not call LookUpScopeRangeId. Since the scope range ID is stored in the annotation map keyed by correlation ID, and both event types have correlation IDs, these events will have scope_range_id = 0 even when a valid scope range ID exists.
The CUPTI path sets scope_range_id uniformly on all events at callback time (see cupti_tracer.cc:996).
There was a problem hiding this comment.
Resolved — addressed in this revision. LookUpScopeRangeId is now called in HipApiEvent, MemcpyEvent, and KernelEvent uniformly.
There was a problem hiding this comment.
Resolved: LookUpScopeRangeId is now called uniformly for HipApi, Memcpy, and Kernel events, so all event types correctly receive their scope_range_id.
| #include "rocm/include/rocprofiler-sdk/internal_threading.h" | ||
| #include "rocm/include/rocprofiler-sdk/registration.h" | ||
| #include "rocm/include/rocprofiler-sdk/rocprofiler.h" | ||
| #include <unistd.h> |
There was a problem hiding this comment.
Nit: #include <unistd.h> moved to incorrect position.
This C system header is now placed between rocprofiler-sdk/rocprofiler.h (third-party) and xla/backends/profiler/gpu/rocm_collector.h (project). Per Google C++ style and the project's include order convention, C system headers should be grouped together before third-party headers. This will likely be flagged by include-order linters.
| void AddEvent(RocmTracerEvent&& event, bool is_auxiliary) override; | ||
| void Flush() override; | ||
| void Export(tsl::profiler::XSpace* space) override; | ||
| void ExportScopeRangeIdTree(tsl::profiler::XSpace* space); |
There was a problem hiding this comment.
Suggestion: ExportScopeRangeIdTree should be private.
This method is an implementation detail called only by Export(). The CUPTI-side equivalent in cupti_collector.cc is a private member function. Making it public exposes an internal detail of the export pipeline.
| void ExportScopeRangeIdTree(tsl::profiler::XSpace* space); | |
| void ExportScopeRangeIdTree(tsl::profiler::XSpace* space); |
(Move this declaration to the private: section below.)
There was a problem hiding this comment.
Resolved — addressed in this revision. ExportScopeRangeIdTree is now declared in the private: section of RocmTraceCollectorImpl.
There was a problem hiding this comment.
Resolved: ExportScopeRangeIdTree is now in the private: section.
| @@ -213,6 +216,7 @@ class RocmTraceCollectorImpl : public RocmTraceCollector { | |||
| ABSL_EXCLUSIVE_LOCKS_REQUIRED(event_maps_mutex_); | |||
|
|
|||
| absl::node_hash_map<uint32_t, PerDeviceCollector> per_device_collector_; | |||
There was a problem hiding this comment.
Nit: scope_range_id_tree_ lacks ABSL_GUARDED_BY annotation.
Other mutable fields in this class are annotated with ABSL_EXCLUSIVE_LOCKS_REQUIRED(event_maps_mutex_) or similar. While Export is likely called only after tracing is stopped (no concurrent access), adding a guard annotation would maintain consistency with the rest of the class and prevent future misuse.
| // Build scope range ID tree from events in this device | ||
| // (events already have scope_range_id set from annotation lookup) |
There was a problem hiding this comment.
Nit: Misleading comment.
This comment says "Build scope range ID tree from events in this device" and "events already have scope_range_id set from annotation lookup", but no tree-building from events actually happens here. The per_device_collector_[id].Export(...) call does not contribute to scope_range_id_tree_. This comment should either be removed or updated to reflect what actually happens (once the tree-building logic is implemented).
There was a problem hiding this comment.
Resolved — addressed in this revision. The misleading comments have been removed.
There was a problem hiding this comment.
Resolved: The misleading comments about tree building have been removed.
Review SummaryThis PR adds
Additional minor findings (include order, visibility, thread-safety annotation, misleading comment) are noted inline. No tests are included for the new functionality. Automated review by Claude |
dfb32b5 to
b33722d
Compare
| auto* collector = | ||
| static_cast<RocmTraceCollectorImpl*>(rocm_trace_collector_.get()); | ||
| collector->SetScopeRangeIdTree( | ||
| rocm_tracer_->annotation_map()->TakeScopeRangeIdTree()); |
There was a problem hiding this comment.
Design: static_cast to RocmTraceCollectorImpl* breaks the RocmTraceCollector abstraction.
rocm_trace_collector_ is declared as std::unique_ptr<RocmTraceCollector> (the base interface), but this code downcasts it to the concrete RocmTraceCollectorImpl* to call SetScopeRangeIdTree. This couples GpuTracer to the concrete implementation, bypassing the polymorphic interface.
Consider either:
- Adding a
SetScopeRangeIdTree(or a combinedExportoverload that accepts the tree) to theRocmTraceCollectorbase class, or - Passing the tree as an argument to
Export(space, tree).
The CUPTI path avoids this by merging the tree inside the collector itself (CuptiTraceCollectorImpl::Flush calls MergeScopeRangeIdTrees).
There was a problem hiding this comment.
Resolved — addressed in this revision. The static_cast to RocmTraceCollectorImpl* has been removed. SetScopeRangeIdTree is now a virtual method on the RocmTraceCollector base class, so the call goes through the polymorphic interface.
| absl::flat_hash_map<uint32_t, int64_t> scope_range_id_map; | ||
| ScopeRangeIdTree scope_range_id_tree; |
There was a problem hiding this comment.
Nit: New fields lack ABSL_GUARDED_BY(mutex) annotations.
scope_range_id_map and scope_range_id_tree are accessed under map_.mutex in Add, LookUpScopeRangeId, TakeScopeRangeIdTree, and Clear, but unlike the pattern established by the existing fields, they lack thread-safety annotations. Adding ABSL_GUARDED_BY(mutex) would let Clang's thread-safety analysis verify correctness at compile time.
(The existing annotations and correlation_map fields also lack these annotations, but since this PR is adding new fields, it's a good opportunity to set the right precedent.)
There was a problem hiding this comment.
Resolved — addressed in this revision. All fields in AnnotationMapImpl (annotations, correlation_map, scope_range_id_map, scope_range_id_tree) now carry ABSL_GUARDED_BY(mutex) annotations.
| namespace xla { | ||
| namespace profiler { | ||
|
|
||
| using ScopeRangeIdTree = absl::flat_hash_map<int64_t, int64_t>; |
There was a problem hiding this comment.
Nit: Potential ScopeRangeIdTree type alias collision.
This ScopeRangeIdTree alias is defined in xla::profiler, the same namespace where the CUPTI path defines an identical alias in cupti_buffer_events.h. If both headers are ever included in the same translation unit (e.g., during future refactoring or in shared test code), this would cause a redefinition error. Consider using a shared definition in a common header instead.
There was a problem hiding this comment.
Resolved — addressed in this revision. A comment has been added clarifying that ROCm and CUPTI are never compiled together, so the duplicate alias is safe.
|
|
||
| void RocmTraceCollectorImpl::ExportScopeRangeIdTree(XSpace* space) { | ||
| XPlaneBuilder plane(FindOrAddMutablePlaneWithName( | ||
| space, tsl::profiler::kScopeRangeIdTreePlaneName)); | ||
| tensorflow::profiler::XStatMetadata metadata; |
There was a problem hiding this comment.
Nit: Missing explanatory comment for empty metadata name.
The XStatMetadata object has only id set but no name. This is intentional (matching the CUPTI implementation), but the CUPTI code has a comment explaining the rationale: "No metadata is used for this plane, we just use the XStat to transfer the map without breaking any existing proto." Adding a similar comment here would help future maintainers understand why the metadata name is deliberately left empty.
There was a problem hiding this comment.
Resolved — addressed in this revision. An explanatory comment has been added: "No metadata is used for this plane, we just use the XStat to transfer the map without breaking any existing proto."
Re-review SummaryThe two critical bugs from the initial review (empty Remaining from initial review (2 items):
New findings (4 items):
No unit tests are included for the new scope_range_id functionality — consider adding tests modeled after the CUPTI-side coverage in |
b33722d to
b2bc42c
Compare
- Capture scope_range_id from AnnotationStack during HIP API correlation callbacks and propagate it through AnnotationMap to GPU kernel events in the profiler trace - Build the scope_range_id tree (child-to-parent mapping) from the full range ID sequence in AnnotationMap::Add, matching the CUPTI AddScopeRangeIdSequence pattern - Export the tree as a separate XPlane so xprof can reconstruct the scope hierarchy for derived timeline grouping - Set scope_range_id on all event types (kernel, HIP API, memcpy) for uniform coverage, matching the CUPTI path
b2bc42c to
084618d
Compare
| #include "rocm/include/rocprofiler-sdk/internal_threading.h" | ||
| #include "rocm/include/rocprofiler-sdk/registration.h" | ||
| #include "rocm/include/rocprofiler-sdk/rocprofiler.h" | ||
| #include <unistd.h> |
There was a problem hiding this comment.
#include <unistd.h> is a C system header and belongs with the other system headers (lines 22-28), not after the third-party rocprofiler-sdk headers. Google C++ style (and XLA's .clang-format) requires system headers to be grouped together after the corresponding header.
| #include <unistd.h> | |
| #include "xla/backends/profiler/gpu/rocm_collector.h" |
And move #include <unistd.h> up into the C/C++ system headers block (after <vector>).
| void ExportScopeRangeIdTree(tsl::profiler::XSpace* space); | ||
|
|
||
| absl::node_hash_map<uint32_t, PerDeviceCollector> per_device_collector_; | ||
| ScopeRangeIdTree scope_range_id_tree_; |
There was a problem hiding this comment.
scope_range_id_tree_ is accessed from SetScopeRangeIdTree (called from CollectData on the profiler thread) and from ExportScopeRangeIdTree (called from Export, also on the profiler thread). In the current usage pattern these are sequential so there is no actual data race, but the CUPTI side stores this field in the base class without a guard as well. For consistency and defensive coding, consider documenting the single-threaded access assumption with a comment, or protecting it with event_maps_mutex_ and adding ABSL_GUARDED_BY(event_maps_mutex_) annotation, similar to the other maps in this class.
| uint32_t num_events) = 0; | ||
| virtual void Flush() = 0; | ||
| virtual void Export(tsl::profiler::XSpace* space) = 0; | ||
| virtual void SetScopeRangeIdTree(ScopeRangeIdTree tree) {} |
There was a problem hiding this comment.
The base class SetScopeRangeIdTree has a default no-op body {}. This means if someone creates a custom subclass of RocmTraceCollector, the scope range ID tree will silently be dropped. Consider making this pure virtual alongside the other pure virtual methods (AddEvent, Flush, Export), or at minimum adding a comment explaining the intentional no-op default.
| if (!scope_range_ids.empty()) { | ||
| map_.scope_range_id_map.emplace(correlation_id, scope_range_ids.back()); | ||
| if (scope_range_ids.size() > 1) { | ||
| const int64_t* head = scope_range_ids.data(); | ||
| const int64_t* curr = &scope_range_ids.back(); | ||
| for (; curr > head && !map_.scope_range_id_tree.contains(*curr); | ||
| --curr) { | ||
| map_.scope_range_id_tree.emplace(*curr, *(curr - 1)); | ||
| } | ||
| } |
There was a problem hiding this comment.
The tree-building logic iterates backward through scope_range_ids and stops as soon as it finds a node already in the tree (map_.scope_range_id_tree.contains(*curr)). This relies on the assumption that if a node is already in the tree, all its ancestors are too. This is correct given that AnnotationStack::GetScopeRangeIds() always returns the full root-to-leaf path, so any previously-inserted subtree would have been built from a path sharing the same prefix. Worth adding a brief comment explaining this invariant — the CUPTI side has similar logic in CuptiBufferEvents::AddScopeRangeIdSequence but it is more explicit about the assumption.
| void RocmTraceCollectorImpl::ExportScopeRangeIdTree(XSpace* space) { | ||
| XPlaneBuilder plane(FindOrAddMutablePlaneWithName( | ||
| space, tsl::profiler::kScopeRangeIdTreePlaneName)); | ||
| // No metadata is used for this plane, we just use the XStat to | ||
| // transfer the map without breaking any existing proto. | ||
| tensorflow::profiler::XStatMetadata metadata; | ||
| for (const auto& [child_id, parent_id] : scope_range_id_tree_) { | ||
| metadata.set_id(child_id); | ||
| plane.AddStatValue(metadata, parent_id); | ||
| } |
There was a problem hiding this comment.
The ExportScopeRangeIdTree implementation matches the CUPTI side closely, which is good. One observation: metadata is being mutated in place across loop iterations (only set_id is called, never set_name). The name field of XStatMetadata defaults to empty string, which is the intended behavior per the CUPTI implementation. This is fine, but the CUPTI side has a comment about why metadata names are empty — consider mirroring that comment here for clarity.
SummaryThis PR adds The implementation addresses the two critical gaps identified in the first review round (empty tree, missing scope_range_id on HipApi/Memcpy events) and adds Findings
Automated review by Claude |
Re-review SummaryMost issues from the previous review have been addressed — nice work! Resolved 4 previously-flagged issues (static_cast abstraction break, missing ABSL_GUARDED_BY on new fields, ScopeRangeIdTree alias collision risk, missing explanatory comment for empty metadata name). Two minor nits remain from the prior review (include ordering of No new issues found in this revision. |
Re-review Summary (Round 3)The two critical bugs from the initial review (empty tree export and missing The remaining minor items from previous rounds (include order, 🤖 Generated with Claude Code |
Submission Checklist