feat: Add backpressure management config for concurrent requests#315
feat: Add backpressure management config for concurrent requests#315matt-codecov wants to merge 2 commits intomainfrom
Conversation
| ENV OPENSSL_INCLUDE_DIR=/usr/include | ||
|
|
||
| WORKDIR /workspace | ||
| RUN git config --global --add safe.directory /workspace |
There was a problem hiding this comment.
oh whoops, i forget whether this is actually necessary now. might be an artifact from when i was doing hooked_objectstore stuff in the sandbox. i will revert this before merging if it is not necessary
There was a problem hiding this comment.
Or otherwise move to a separate PR, it would be nice to capture the motivation of this change.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| merni::gauge!("server.requests.in_flight": count); | ||
| }); | ||
|
|
||
| let (serve_result, _) = tokio::join!(server, emitter); |
There was a problem hiding this comment.
i don't think the emitter has an exit condition so this would probably never finish
There was a problem hiding this comment.
That would mean the server couldn't shut down at all right now. I need to refresh my memory, but I think the emitter finishes automatically once the server drops its other handler (in the middleware) and then the emitter has run its last emission.
3fb2483 to
eefe8a5
Compare
eefe8a5 to
a48d6e4
Compare
| } | ||
|
|
||
| let too_many_requests = match state.config.backpressure_thresholds.in_flight_requests { | ||
| Some(max_concurrent_requests) => { |
There was a problem hiding this comment.
Bug: The /ready health check includes its own request in the in_flight_requests count, causing an off-by-one error in the concurrency limit and allowing one extra request.
Severity: MEDIUM
Suggested Fix
Adjust the comparison in the /ready handler to account for its own request. For example, change the check to state.system_metrics.in_flight_requests.get() > state.max_concurrent_requests + 1 or subtract one from the current count before comparing: (state.system_metrics.in_flight_requests.get() - 1) >= state.max_concurrent_requests.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: objectstore-server/src/endpoints/health.rs#L30
Potential issue: The `/ready` health check endpoint checks if the number of in-flight
requests exceeds the `max_concurrent_requests` limit. However, the middleware that
counts in-flight requests increments the counter before the request handler is executed.
This means the `/ready` check's own request is included in the total. When the server
has `max_concurrent_requests - 1` active requests, a `/ready` probe brings the count to
`max_concurrent_requests`. The check `count > max_concurrent_requests` then incorrectly
evaluates to false, causing the server to report as ready and accept one more request
than its configured limit.
| None => false, | ||
| }; | ||
| if too_many_requests { | ||
| tracing::warn!("Serving too many concurrent requests already, failing readiness"); |
There was a problem hiding this comment.
Note that this will send errors to sentry given the warn log level.
I think it's fine to keep, at least initially
jan-auer
left a comment
There was a problem hiding this comment.
Preliminary review.
Since we already have this counter of concurrent requests in our middleware, could we simply inject backpressure there -- as middleware? Or does this intentionally admit more requests until k8s reacts to the readiness endpoint?
| ENV OPENSSL_INCLUDE_DIR=/usr/include | ||
|
|
||
| WORKDIR /workspace | ||
| RUN git config --global --add safe.directory /workspace |
There was a problem hiding this comment.
Or otherwise move to a separate PR, it would be nice to capture the motivation of this change.
|
|
||
| /// Definitions for thresholds at which backpressure management or load shedding should kick | ||
| /// in. | ||
| pub backpressure_thresholds: BackpressureThresholds, |
There was a problem hiding this comment.
Would this fit into rate_limits? Or maybe, we rename rate_limits into more generic limits and then have throughput, bandwidth, and concurrency in there?
| merni::gauge!("server.requests.in_flight": count); | ||
| }); | ||
|
|
||
| let (serve_result, _) = tokio::join!(server, emitter); |
There was a problem hiding this comment.
That would mean the server couldn't shut down at all right now. I need to refresh my memory, but I think the emitter finishes automatically once the server drops its other handler (in the middleware) and then the emitter has run its last emission.
Adds a web-tier flood protection limit to the objectstore HTTP server. When the number of in-flight requests reaches `http.max_requests` (default: 10,000), new requests are rejected immediately with HTTP 503. `/health` and `/ready` are excluded so orchestration probes always get through. The `Http` config struct groups HTTP-layer settings separately from the service-layer `Service` struct. This keeps concerns cleanly separated and gives a natural home for future HTTP-level settings (timeouts, body size limits, etc.). The env var is `OS__HTTP__MAX_REQUESTS`. **Why direct 503 instead of readiness-based backpressure**: failing `/ready` under load lets Kubernetes drain the pod, but probe intervals (5–10s) leave a window of continued overload, recovery wastes capacity waiting for probe cycles, and multiple pods failing probes simultaneously concentrates traffic onto remaining pods. A busy pod is also semantically still *ready* — conflating load with readiness muddies alerting. Direct rejection responds in milliseconds, recovers instantly, keeps every pod in the pool, and works in any deployment environment. Tested with two unit tests (pass-through and rejection/exemption via a `Notify`-based gating pattern) and an integration test via `TestServer` with `max_requests = 0` for deterministic coverage without needing concurrent requests. Ref FS-171 Closes #315
SystemMetricsstruct to service state which currently houses our in-flight requests counterBackpressureThresholdsstruct to config which has a field for max in flight requests/readyreturn 503 if we are processing more concurrent requests than the config allowsdon't love any of the names, bikeshedding welcome
Ref FS-171