fix: filter out bad timestamp issue activities to prevent overflows#3693
fix: filter out bad timestamp issue activities to prevent overflows#3693
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
joanagmaia
left a comment
There was a problem hiding this comment.
LGTM ✅
Do you think this might also be an issue for pull requests analysis data? So basically in all widgets where we rely on the timestamps and where we need to make calculations like resolvedAt, mergedAt, shouldn't we always exclude these activities? 🤔
Everywhere else where we have totals we should still include them
|
Hey @epipav I believe this is already in production, can we merge this one? |
|
|
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
This PR updates the Tinybird issue analysis copy pipe to drop obviously-bad (pre-1971) issue activity timestamps that can lead to incorrect/overflowing duration calculations in downstream metrics.
Changes:
- Filters
issues-openedandissues-closedactivities totoYear(timestamp) >= 1971to exclude 1970-era bad data. - Normalizes the pipe
TYPEvalue and adjusts whitespace/formatting in the pipe file.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1,5 +1,6 @@ | |||
| DESCRIPTION > | |||
| Compacts activities from same issue into one, keeping necessary information in a single row. Helps to serve issue-wide widgets in the development tab. | |||
| Compacts activities from same issue into one, keeping necessary information in a single row. Helps to serve issue-wide widgets in the development tab. | |||
There was a problem hiding this comment.
Line 2 uses a tab indentation under DESCRIPTION >, while other Tinybird pipes in this repo use spaces (e.g., pull_request_analysis_copy_pipe.pipe). Tabs can make diffs noisy and may be interpreted differently by tooling—please switch this to consistent space indentation.
| Compacts activities from same issue into one, keeping necessary information in a single row. Helps to serve issue-wide widgets in the development tab. | |
| Compacts activities from same issue into one, keeping necessary information in a single row. Helps to serve issue-wide widgets in the development tab. |
| FROM activityRelations_deduplicated_cleaned_bucket_union | ||
| WHERE type = 'issues-opened' | ||
| WHERE type = 'issues-opened' AND toYear(timestamp) >= 1971 |
There was a problem hiding this comment.
The new toYear(timestamp) >= 1971 predicate is non-sargable in ClickHouse/Tinybird and can prevent partition/primary-key pruning. Prefer a direct range filter like timestamp >= toDateTime('1971-01-01 00:00:00') (or toDate('1971-01-01') if timestamp is a Date) to keep the filter efficient.
| FROM activityRelations_deduplicated_cleaned_bucket_union | ||
| WHERE type = 'issues-closed' AND sourceParentId != '' | ||
| WHERE type = 'issues-closed' AND sourceParentId != '' AND toYear(timestamp) >= 1971 | ||
| GROUP BY sourceParentId |
There was a problem hiding this comment.
Same as above: toYear(timestamp) >= 1971 is likely to be less efficient than a direct timestamp range predicate. Consider switching this to timestamp >= toDateTime('1971-01-01 00:00:00') for better query pruning.
| LEFT JOIN issues_comment AS comment ON opened.sourceId = comment.sourceParentId | ||
|
|
||
| TYPE COPY | ||
| TYPE copy |
There was a problem hiding this comment.
TYPE copy deviates from the convention used across the Tinybird pipes in this repo (TYPE COPY). If Tinybird parsing or internal validation is case-sensitive (or if tooling expects the canonical form), this could break deployments—please keep it consistent as TYPE COPY.
| TYPE copy | |
| TYPE COPY |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Signed-off-by: anil <epipav@gmail.com>
090ae14 to
9d1ab68
Compare
Signed-off-by: anil <epipav@gmail.com>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |

We have 1970-dated issue activities that cause overflows when calculating
closedInSecondsand `respondedInSeconds. We now filter these activities out in the issue analysis copy pipeNote
Low Risk
Low risk: adds a simple timestamp-year filter in the
issue_analysis_copy_pipeto prevent overflow errors, with the main impact being that some malformed historical records will no longer contribute to issue metrics.Overview
Filters out issue activity rows with invalid/epoch-like timestamps by adding
toYear(timestamp) >= 1971constraints to theissues-opened,issues-closed, andissue-commentqueries inissue_analysis_copy_pipe.pipe.This prevents
closedInSeconds/respondedInSecondscalculations from overflowing when bad 1970-dated events are present, at the cost of excluding those records from theissues_analyzeddatasource.Written by Cursor Bugbot for commit 444beef. This will update automatically on new commits. Configure here.