feat(capture): redirect heatmap data from events to heatmaps topic#53514
feat(capture): redirect heatmap data from events to heatmaps topic#53514
Conversation
Prompt To Fix All With AIThis is a comment left during a code review.
Path: rust/capture/src/events/analytics.rs
Line: 228-252
Comment:
**Data loss when redirect creation fails**
`needs_heatmap_redirect` is built once and is never updated when `create_heatmap_redirect` returns `Err`. The second loop (lines 248–252) still iterates those same `true` flags, so `strip_heatmap_data` is called even for events whose redirect was never actually produced. The original event then has `$heatmap_data` removed with no corresponding redirect on the heatmaps topic — the heatmap data is silently lost.
The fix is to track which redirects were *successfully created* (not which events *needed* one) and only strip those:
```rust
let mut redirect_created: Vec<bool> = vec![false; needs_heatmap_redirect.len()];
let mut heatmap_redirects: Vec<ProcessedEvent> = Vec::new();
for (i, (e, needs_redirect)) in events.iter().zip(needs_heatmap_redirect.iter()).enumerate() {
if *needs_redirect {
match create_heatmap_redirect(e, historical_cfg.clone(), context) {
Ok(processed) => {
metrics::counter!("capture_heatmap_redirects_created").increment(1);
heatmap_redirects.push(processed);
redirect_created[i] = true; // only set on success
}
Err(err) => {
error!("failed to create heatmap redirect: {err:#}");
// redirect_created[i] stays false → original is not stripped
}
}
}
}
// …process events…
for (event, did_redirect) in events.iter_mut().zip(redirect_created.iter()) {
if *did_redirect {
strip_heatmap_data(event);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: rust/capture/src/events/analytics.rs
Line: 1053-1062
Comment:
**`strip_heatmap_data` sets `process_heatmap = true` even when deserialization fails**
If `serde_json::from_str` on `event.event.data` fails (unlikely, but possible if data is malformed), the function falls through and still sets `process_heatmap = true`. The downstream Node.js pipeline will then see the flag and skip heatmap extraction, but the raw `data` field still contains `$heatmap_data`. The net result is the heatmap data is silently dropped.
Consider only setting the flag after a successful (or at worst, no-op) strip:
```rust
fn strip_heatmap_data(event: &mut ProcessedEvent) {
if let Ok(mut raw_event) = serde_json::from_str::<RawEvent>(&event.event.data) {
raw_event.properties.remove("$heatmap_data");
if let Ok(data) = serde_json::to_string(&raw_event) {
event.event.data = data;
}
event.metadata.process_heatmap = true;
}
// Do not set process_heatmap if we couldn't parse the event;
// leave it for the downstream pipeline to handle normally.
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "refactor: ignore heatmap data in events ..." | Re-trigger Greptile |
| for (e, needs_redirect) in events.iter().zip(needs_heatmap_redirect.iter()) { | ||
| if *needs_redirect { | ||
| match create_heatmap_redirect(e, historical_cfg.clone(), context) { | ||
| Ok(processed) => { | ||
| metrics::counter!("capture_heatmap_redirects_created").increment(1); | ||
| heatmap_redirects.push(processed); | ||
| } | ||
| Err(err) => { | ||
| error!("failed to create heatmap redirect: {err:#}"); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| let mut events: Vec<ProcessedEvent> = events | ||
| .iter() | ||
| .map(|e| process_single_event(e, historical_cfg.clone(), context)) | ||
| .collect::<Result<Vec<ProcessedEvent>, CaptureError>>()?; | ||
|
|
||
| // Strip heatmap data from originals that got redirects | ||
| for (event, needs_redirect) in events.iter_mut().zip(needs_heatmap_redirect.iter()) { | ||
| if *needs_redirect { | ||
| strip_heatmap_data(event); | ||
| } | ||
| } |
There was a problem hiding this comment.
Data loss when redirect creation fails
needs_heatmap_redirect is built once and is never updated when create_heatmap_redirect returns Err. The second loop (lines 248–252) still iterates those same true flags, so strip_heatmap_data is called even for events whose redirect was never actually produced. The original event then has $heatmap_data removed with no corresponding redirect on the heatmaps topic — the heatmap data is silently lost.
The fix is to track which redirects were successfully created (not which events needed one) and only strip those:
let mut redirect_created: Vec<bool> = vec![false; needs_heatmap_redirect.len()];
let mut heatmap_redirects: Vec<ProcessedEvent> = Vec::new();
for (i, (e, needs_redirect)) in events.iter().zip(needs_heatmap_redirect.iter()).enumerate() {
if *needs_redirect {
match create_heatmap_redirect(e, historical_cfg.clone(), context) {
Ok(processed) => {
metrics::counter!("capture_heatmap_redirects_created").increment(1);
heatmap_redirects.push(processed);
redirect_created[i] = true; // only set on success
}
Err(err) => {
error!("failed to create heatmap redirect: {err:#}");
// redirect_created[i] stays false → original is not stripped
}
}
}
}
// …process events…
for (event, did_redirect) in events.iter_mut().zip(redirect_created.iter()) {
if *did_redirect {
strip_heatmap_data(event);
}
}Prompt To Fix With AI
This is a comment left during a code review.
Path: rust/capture/src/events/analytics.rs
Line: 228-252
Comment:
**Data loss when redirect creation fails**
`needs_heatmap_redirect` is built once and is never updated when `create_heatmap_redirect` returns `Err`. The second loop (lines 248–252) still iterates those same `true` flags, so `strip_heatmap_data` is called even for events whose redirect was never actually produced. The original event then has `$heatmap_data` removed with no corresponding redirect on the heatmaps topic — the heatmap data is silently lost.
The fix is to track which redirects were *successfully created* (not which events *needed* one) and only strip those:
```rust
let mut redirect_created: Vec<bool> = vec![false; needs_heatmap_redirect.len()];
let mut heatmap_redirects: Vec<ProcessedEvent> = Vec::new();
for (i, (e, needs_redirect)) in events.iter().zip(needs_heatmap_redirect.iter()).enumerate() {
if *needs_redirect {
match create_heatmap_redirect(e, historical_cfg.clone(), context) {
Ok(processed) => {
metrics::counter!("capture_heatmap_redirects_created").increment(1);
heatmap_redirects.push(processed);
redirect_created[i] = true; // only set on success
}
Err(err) => {
error!("failed to create heatmap redirect: {err:#}");
// redirect_created[i] stays false → original is not stripped
}
}
}
}
// …process events…
for (event, did_redirect) in events.iter_mut().zip(redirect_created.iter()) {
if *did_redirect {
strip_heatmap_data(event);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Doesn't seem like this can happen - it's just the internal interfaces that have error types. I don't want to engage in a big refactoring to break out the logic that doesn't trigger errors.
| ); | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_process_events_creates_heatmap_redirect() { | ||
| let now = Utc::now(); | ||
| let context = create_test_context(now, None); | ||
| let events = vec![create_event_with_heatmap_data()]; | ||
|
|
||
| let sink = Arc::new(MockSink::new()); |
There was a problem hiding this comment.
strip_heatmap_data sets process_heatmap = true even when deserialization fails
If serde_json::from_str on event.event.data fails (unlikely, but possible if data is malformed), the function falls through and still sets process_heatmap = true. The downstream Node.js pipeline will then see the flag and skip heatmap extraction, but the raw data field still contains $heatmap_data. The net result is the heatmap data is silently dropped.
Consider only setting the flag after a successful (or at worst, no-op) strip:
fn strip_heatmap_data(event: &mut ProcessedEvent) {
if let Ok(mut raw_event) = serde_json::from_str::<RawEvent>(&event.event.data) {
raw_event.properties.remove("$heatmap_data");
if let Ok(data) = serde_json::to_string(&raw_event) {
event.event.data = data;
}
event.metadata.process_heatmap = true;
}
// Do not set process_heatmap if we couldn't parse the event;
// leave it for the downstream pipeline to handle normally.
}Prompt To Fix With AI
This is a comment left during a code review.
Path: rust/capture/src/events/analytics.rs
Line: 1053-1062
Comment:
**`strip_heatmap_data` sets `process_heatmap = true` even when deserialization fails**
If `serde_json::from_str` on `event.event.data` fails (unlikely, but possible if data is malformed), the function falls through and still sets `process_heatmap = true`. The downstream Node.js pipeline will then see the flag and skip heatmap extraction, but the raw `data` field still contains `$heatmap_data`. The net result is the heatmap data is silently dropped.
Consider only setting the flag after a successful (or at worst, no-op) strip:
```rust
fn strip_heatmap_data(event: &mut ProcessedEvent) {
if let Ok(mut raw_event) = serde_json::from_str::<RawEvent>(&event.event.data) {
raw_event.properties.remove("$heatmap_data");
if let Ok(data) = serde_json::to_string(&raw_event) {
event.event.data = data;
}
event.metadata.process_heatmap = true;
}
// Do not set process_heatmap if we couldn't parse the event;
// leave it for the downstream pipeline to handle normally.
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
That doesn't make much sense to me. First, if the header is set to true, node would process the heatmap data, but we'll remove that branch. Second, if we can't parse the data field, it's unlikely we'll parse it in Node. Parsing into raw event should fail in some other part of capture and discard the event, I assume, but would need to check.
There was a problem hiding this comment.
You're right on points 2 and 3 - if the data is malformed enough that Rust can't deserialize it, Node.js won't be able to either, and it should fail validation earlier in the pipeline.
For point 1, I think we're saying the same thing differently: when process_heatmap = true, Node skips extraction (per the PR description). So my concern was that setting the flag when stripping fails means Node would skip extraction even though $heatmap_data is still in the raw data. But as you noted, this will be moot once you remove extractHeatmapDataStep entirely, and the malformed-data scenario is already handled by earlier validation.
Fair pushback - the error path I flagged is either unreachable or benign.
Tip: You can give me more context with .greptile/rules so I get this right next time.
Problem
The joined analytics pipeline has two branches that both process heatmap data:
$heatmap_dataas a side effect alongside full event processing (persons, groups, CDP transforms, event emission)$$heatmapevents, skips person processing and event emission, just extracts heatmap dataNon-
$$heatmapevents (e.g.$pageview) that carry$heatmap_dataor scroll depth properties are currently only processed through the events branch, meaning heatmap extraction is coupled to the full event pipeline.Changes
Rust capture service:
$$heatmapevent carries heatmap data ($heatmap_data, or$prev_pageview_pathname+$current_url), capture now produces a second message to the heatmaps topic$$heatmapand a new UUID$heatmap_datastripped from its serialized data field (other properties like$prev_pageview_pathnameare kept — web analytics queries depend on them)skip_heatmap_processing: trueKafka header so the events branch knows to skip extractionNode.js ingestion pipeline:
extractHeatmapDataStepnow checks theskip_heatmap_processingheader — when true, it skips extraction and just strips$heatmap_datafrom the eventEventHeaderstype extended withskip_heatmap_processing: booleanskip_heatmap_processingTest refactors:
EventHeadersconstructions across test files replaced withcreateTestEventHeaders()helperOnce this is rolled out and confirmed stable, we can remove
extractHeatmapDataStepfrom the events branch entirely — heatmap processing will be fully handled by the heatmaps branch.How did you test this code?
has_heatmap_data,create_heatmap_redirect,strip_heatmap_data, andprocess_eventsintegration (redirect creation, no-duplicate for$$heatmap, no-redirect without heatmap data)extractHeatmapDataStepcovering theskip_heatmap_processingskip path, immutability, and fallback to normal extractionskip_heatmap_processingheader parsingPublish to changelog?
No
Docs update
N/A