Add breadcrumb when rate limiting fails to read the clock by bendk · Pull Request #7281 · mozilla/application-services

bendk · 2026-03-19T16:22:38Z

This rate limit check doesn't seem to be working in same cases, for example for https://bugzilla.mozilla.org/show_bug.cgi?id=2004954 I saw many reports per minute from the same user. I'm wondering if the cause is this check failing.

Pull Request checklist

Breaking changes: This PR follows our breaking change policy
- This PR follows the breaking change policy:
  - This PR has no breaking API changes, or
  - There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
Quality: This PR builds and tests run cleanly
- Note:
  - For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  - If this pull request includes a breaking change, consider cutting a new release after merging.
Tests: This PR includes thorough tests or an explanation of why it does not
Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
- Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
Dependencies: This PR follows our dependency management guidelines
- Any new dependencies are accompanied by a summary of the due diligence applied in selecting them.

This rate limit check doesn't seem to be working in same cases, for example for https://bugzilla.mozilla.org/show_bug.cgi?id=2004954 I saw many reports per minute from the same user. I'm wondering if the cause is this check failing.

mhammond · 2026-03-19T22:25:35Z

components/support/error/src/error_tracing.rs

                // through seems okay in this case. We should get back into a good state soon
                // after.
-                _ => (),
+                _ => {


I'm a little skeptical of this - I can see how extra single events might get through as the clock changes, but not how that could cause many reports per minute. What's your theory about how that actually happens in practice?

It does seem really weird.

Here's what I'm looking at in grafana. One client is generating multiple error pings per second (I need to add a client filter, but for now I just added enough filters other that I'm pretty sure I'm only capturing one person).

I don't really understand what's going on. This is one wild theory, maybe there's something very wrong at the system level. My other theory was that the FF was restarting and clearing out the rate limiting data, but it seems impossible for FF to restart that much in practice. My only other theory was that it was part of some automation. This was really just a shot in the dark. Do you have any ideas on what could be happening?

(BTW, the unit in that graph is "errors / day" which is not right for unique clients. I think it's just one unique client, but since the interval is 2 hours it's multiplying to get 12 unique clients / day. I'm going to try to fix that today.)

bendk requested review from a team and skhamis March 19, 2026 16:22

mhammond reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add breadcrumb when rate limiting fails to read the clock#7281

Add breadcrumb when rate limiting fails to read the clock#7281
bendk wants to merge 1 commit intomozilla:mainfrom
bendk:push-qpkrqorszswl

bendk commented Mar 19, 2026 •

edited

Loading

Uh oh!

mhammond Mar 19, 2026

Uh oh!

bendk Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bendk commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request checklist

Uh oh!

mhammond Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

bendk Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bendk commented Mar 19, 2026 •

edited

Loading