feat: Add backpressure management config for concurrent requests by matt-codecov · Pull Request #315 · getsentry/objectstore

matt-codecov · 2026-02-18T00:36:22Z

adds a SystemMetrics struct to service state which currently houses our in-flight requests counter
adds a BackpressureThresholds struct to config which has a field for max in flight requests
makes /ready return 503 if we are processing more concurrent requests than the config allows

don't love any of the names, bikeshedding welcome

Ref FS-171

linear · 2026-02-18T00:36:25Z

matt-codecov · 2026-02-18T00:39:03Z

Dockerfile.cross

 ENV OPENSSL_INCLUDE_DIR=/usr/include

 WORKDIR /workspace
+RUN git config --global --add safe.directory /workspace


oh whoops, i forget whether this is actually necessary now. might be an artifact from when i was doing hooked_objectstore stuff in the sandbox. i will revert this before merging if it is not necessary

Or otherwise move to a separate PR, it would be nice to capture the motivation of this change.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

objectstore-server/src/endpoints/health.rs

matt-codecov · 2026-02-18T00:40:10Z

objectstore-server/src/web/app.rs

-            merni::gauge!("server.requests.in_flight": count);
-        });
-
-        let (serve_result, _) = tokio::join!(server, emitter);


i don't think the emitter has an exit condition so this would probably never finish

That would mean the server couldn't shut down at all right now. I need to refresh my memory, but I think the emitter finishes automatically once the server drops its other handler (in the middleware) and then the emitter has run its last emission.

objectstore-server/src/endpoints/health.rs

sentry · 2026-02-18T00:53:51Z

objectstore-server/src/endpoints/health.rs

    }
+
+    let too_many_requests = match state.config.backpressure_thresholds.in_flight_requests {
+        Some(max_concurrent_requests) => {


Bug: The /ready health check includes its own request in the in_flight_requests count, causing an off-by-one error in the concurrency limit and allowing one extra request.
_{Severity: MEDIUM}

Suggested Fix

Adjust the comparison in the /ready handler to account for its own request. For example, change the check to state.system_metrics.in_flight_requests.get() > state.max_concurrent_requests + 1 or subtract one from the current count before comparing: (state.system_metrics.in_flight_requests.get() - 1) >= state.max_concurrent_requests.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: objectstore-server/src/endpoints/health.rs#L30 Potential issue: The `/ready` health check endpoint checks if the number of in-flight requests exceeds the `max_concurrent_requests` limit. However, the middleware that counts in-flight requests increments the counter before the request handler is executed. This means the `/ready` check's own request is included in the total. When the server has `max_concurrent_requests - 1` active requests, a `/ready` probe brings the count to `max_concurrent_requests`. The check `count > max_concurrent_requests` then incorrectly evaluates to false, causing the server to report as ready and accept one more request than its configured limit.

lcian · 2026-02-18T15:55:07Z

objectstore-server/src/endpoints/health.rs

+        None => false,
+    };
+    if too_many_requests {
+        tracing::warn!("Serving too many concurrent requests already, failing readiness");


Note that this will send errors to sentry given the warn log level.
I think it's fine to keep, at least initially

jan-auer

Preliminary review.

Since we already have this counter of concurrent requests in our middleware, could we simply inject backpressure there -- as middleware? Or does this intentionally admit more requests until k8s reacts to the readiness endpoint?

jan-auer · 2026-02-18T16:56:17Z

Dockerfile.cross

 ENV OPENSSL_INCLUDE_DIR=/usr/include

 WORKDIR /workspace
+RUN git config --global --add safe.directory /workspace


Or otherwise move to a separate PR, it would be nice to capture the motivation of this change.

jan-auer · 2026-02-18T16:58:27Z

objectstore-server/src/config.rs

+
+    /// Definitions for thresholds at which backpressure management or load shedding should kick
+    /// in.
+    pub backpressure_thresholds: BackpressureThresholds,


Would this fit into rate_limits? Or maybe, we rename rate_limits into more generic limits and then have throughput, bandwidth, and concurrency in there?

jan-auer · 2026-02-18T17:00:23Z

objectstore-server/src/web/app.rs

-            merni::gauge!("server.requests.in_flight": count);
-        });
-
-        let (serve_result, _) = tokio::join!(server, emitter);


That would mean the server couldn't shut down at all right now. I need to refresh my memory, but I think the emitter finishes automatically once the server drops its other handler (in the middleware) and then the emitter has run its last emission.

jan-auer

Please hold on this for now. We're adding more task isolation and internal concurrency limiting in a series of PRs starting with: #322

Adds a web-tier flood protection limit to the objectstore HTTP server. When the number of in-flight requests reaches `http.max_requests` (default: 10,000), new requests are rejected immediately with HTTP 503. `/health` and `/ready` are excluded so orchestration probes always get through. The `Http` config struct groups HTTP-layer settings separately from the service-layer `Service` struct. This keeps concerns cleanly separated and gives a natural home for future HTTP-level settings (timeouts, body size limits, etc.). The env var is `OS__HTTP__MAX_REQUESTS`. **Why direct 503 instead of readiness-based backpressure**: failing `/ready` under load lets Kubernetes drain the pod, but probe intervals (5–10s) leave a window of continued overload, recovery wastes capacity waiting for probe cycles, and multiple pods failing probes simultaneously concentrates traffic onto remaining pods. A busy pod is also semantically still *ready* — conflating load with readiness muddies alerting. Direct rejection responds in milliseconds, recovers instantly, keeps every pod in the pool, and works in any deployment environment. Tested with two unit tests (pass-through and rejection/exemption via a `Notify`-based gating pattern) and an integration test via `TestServer` with `max_requests = 0` for deterministic coverage without needing concurrent requests. Ref FS-171 Closes #315

refactor: move InFlightRequestsCounter ownership to State

22f3241

matt-codecov requested a review from a team as a code owner February 18, 2026 00:36

matt-codecov commented Feb 18, 2026

View reviewed changes

cursor bot reviewed Feb 18, 2026

View reviewed changes

objectstore-server/src/endpoints/health.rs Outdated Show resolved Hide resolved

sentry bot reviewed Feb 18, 2026

View reviewed changes

objectstore-server/src/endpoints/health.rs Outdated Show resolved Hide resolved

matt-codecov commented Feb 18, 2026

View reviewed changes

matt-codecov force-pushed the matt/concurrent-reqs-backpressure branch from 3fb2483 to eefe8a5 Compare February 18, 2026 00:44

sentry bot reviewed Feb 18, 2026

View reviewed changes

objectstore-server/src/endpoints/health.rs Show resolved Hide resolved

feat: add backpressure management config for concurrent requests

a48d6e4

matt-codecov force-pushed the matt/concurrent-reqs-backpressure branch from eefe8a5 to a48d6e4 Compare February 18, 2026 00:50

sentry bot reviewed Feb 18, 2026

View reviewed changes

lcian changed the title ~~feat: add backpressure management config for concurrent requests~~ feat: Add backpressure management config for concurrent requests Feb 18, 2026

lcian reviewed Feb 18, 2026

View reviewed changes

jan-auer reviewed Feb 18, 2026

View reviewed changes

jan-auer requested changes Feb 20, 2026

View reviewed changes

jan-auer mentioned this pull request Feb 23, 2026

feat(server): Add HTTP concurrency limit with 503 rejection #326

Merged

jan-auer closed this in #326 Feb 23, 2026

Uh oh!

Comments

Conversation

matt-codecov commented Feb 18, 2026 • edited by lcian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Feb 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentry bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matt-codecov commented Feb 18, 2026 •

edited by lcian

Loading