Improve the retry strategy

### Checklist

- [x] I agree to the terms within the [OpenFGA Code of Conduct](https://github.com/openfga/.github/blob/main/CODE_OF_CONDUCT.md).

### Describe the problem you'd like to have solved


[Retry-After](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After) is a standard header used by APIs to indicate when the SDK can retry.

The SDKs should:
* Honor this header on 429s
* Expose this header value in the error when received
* Fallback to exponential retry when this header is not available (e.g. not sent by the server on 429s or on e.g. 500s)
* Drop support for retrying based on `X-Rate-Limit-Reset` (currently only .NET SDK supports that), though still expose it in the logs

### Current State

|     |     |     |     |     |
| --- | --- | --- | --- | --- |
| **SDK** | **Retries on** | **Default Num Retries** | **Max Num Retries** | **State** |
| Python  | 429s, 500s except for 501 | 3   | 15  | Does not consider headers. Implements exponential backoff, with the following [algorithm](https://github.com/openfga/go-sdk/blob/392f47aa42b2761277e43d0dafcf300b712b1b5f/internal/utils/randomtime.go#L22-L24)  <br><br/>`2^loopCount * 100ms and 2^(loopCount + 1) * 100ms` |


### Describe the ideal solution

### Describe the ideal solution

#### Retry On
- For queries that affect state:
  - Retry on 429s, falling back to exponential backoff
  - Retry on 5xxs (except 501 not implemented) only if the Retry-After headers are sent - do not fall back to exponential backoff
- For all others:
  - All 429s, and >=500 (except 501 not implemented)

#### Max Allowable Retries

15

#### Default Number of Retries

SDKs: 3

#### Retry Parameters
1.  If `Retry-After` header is found, use it
    1.  if it is an integer, treat it as the number of seconds from now to retry, if it is &lt;1 from now or &gt;1800 from now (aka >30 min) - assume it is invalid and continue
    2.  if it is a date, parse it but if it is &lt;1 from now or &gt;1800 from now (aka >30 min) - assume it is invalid and continue
        
2.  If neither header is found, use exponential backoff but we'll add some jitter, so the retry is a random number between
    1.  `2^loopCount * 500ms and 2^(loopCount + 1) * 500ms`
    2.  if the result of (a) is > 120s, cap it at 120s which should happen between the 8th and 9th retry

That means:
* if retry-after header was returned and is valid, we’ll use it - so if it says in 4 min all good
* if retry-after header was not returned, we will retry at:
  * 100ms
  * 200ms
  * 400ms
  * 800ms
  * 1.6s
  * 3.2s
  * 6.4s
  * 12.8s
  * 25.6s
  * 51.2s
  * 102.4s
  * 120s ← at this point is is >4min since initial call
  * 120s
  * 120s
  * 120s

### Alternatives and current workarounds

_No response_

### References

_No response_

### Additional context

_No response_


SDK	Retries on	Default Num Retries	Max Num Retries	State
Python	429s, 500s except for 501	3	15	Does not consider headers. Implements exponential backoff, with the following algorithm `2^loopCount * 100ms and 2^(loopCount + 1) * 100ms`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the retry strategy #175

Checklist

Describe the problem you'd like to have solved

Current State

Describe the ideal solution

Describe the ideal solution

Retry On

Max Allowable Retries

Default Number of Retries

Retry Parameters

Alternatives and current workarounds

References

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve the retry strategy #175

Description

Checklist

Describe the problem you'd like to have solved

Current State

Describe the ideal solution

Describe the ideal solution

Retry On

Max Allowable Retries

Default Number of Retries

Retry Parameters

Alternatives and current workarounds

References

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions