fix(engine) prevent MVCC race in blockRunWithWaitpoint pending check#3075
fix(engine) prevent MVCC race in blockRunWithWaitpoint pending check#3075
Conversation
|
WalkthroughThe change restructures waitpoint blocking into two statements: (1) a CTE inserts blocking Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
430-442: Excellent fix for the MVCC snapshot race — minor type nit onBigInt.The separation into a distinct SQL statement is the correct approach under READ COMMITTED isolation, and the explanatory comment is thorough. One small type issue:
Prisma's
$queryRawreturns PostgreSQLbigint(fromCOUNT(*)) as the JavaScript primitivebigint, not the wrapper objectBigInt. This won't cause a runtime error sinceNumber()handles both, but the type annotation is technically inaccurate.Nit: use lowercase
bigintfor the primitive type- const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>` + const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` around lines 430 - 442, The type annotation on the result of prisma.$queryRaw is using the wrapper BigInt instead of the JS primitive bigint; update the generic on prisma.$queryRaw from { pending_count: BigInt }[] to { pending_count: bigint }[] so the type accurately reflects PostgreSQL COUNT(*) returning a primitive bigint, leaving the surrounding logic (pendingCheck, Number(pendingCheck.at(0)?.pending_count ?? 0), Prisma.join($waitpoints)) unchanged.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧠 Learnings (1)
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Applied to files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: typecheck / typecheck
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: sdk-compat / Cloudflare Workers
🔇 Additional comments (2)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (2)
400-428: Clean separation of the INSERT CTE from the pending check.The CTE now correctly focuses solely on inserting the blocking associations, with its result intentionally discarded since the pending count is determined by the separate query below. The inline comments clearly explain why the
SELECT COUNT(*)is needed (to force PostgreSQL to execute theinsertedCTE branch). Looks good.
493-504: Debounce-based safety net is sound.Even if both
blockRunWithWaitpoint(seeing 0 pending) and a concurrentcompleteWaitpointboth enqueuecontinueRunIfUnblockedfor the same run, the shared job IDcontinueRunIfUnblocked:${runId}ensures the work is debounced. The two-statement approach combined with this debounce makes the race harmless from both directions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Around line 430-442: The type annotation on the result of prisma.$queryRaw is
using the wrapper BigInt instead of the JS primitive bigint; update the generic
on prisma.$queryRaw from { pending_count: BigInt }[] to { pending_count: bigint
}[] so the type accurately reflects PostgreSQL COUNT(*) returning a primitive
bigint, leaving the surrounding logic (pendingCheck,
Number(pendingCheck.at(0)?.pending_count ?? 0), Prisma.join($waitpoints))
unchanged.
c603987 to
64ddf12
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
435-435: Nit: use lowercasebigintfor the TypeScript primitive type.
BigInt(uppercase) refers to theBigIntconstructor/interface, not the primitive type. Prisma's$queryRawreturns thebigintprimitive for PostgreSQLCOUNT(*)results.Proposed fix
- const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>` + const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` at line 435, The type for the raw query result uses the wrong BigInt identifier; update the type annotation on the prisma.$queryRaw call (the pendingCheck declaration) to use the TypeScript primitive `bigint` instead of the `BigInt` constructor/interface so the inferred type matches PostgreSQL COUNT(*) results (e.g., change `{ pending_count: BigInt }[]` to `{ pending_count: bigint }[]` for pendingCheck).
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧠 Learnings (1)
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Applied to files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
400-442: Well-designed fix for the MVCC snapshot race condition.The separation of the INSERT CTE from the pending-status check into distinct SQL statements is the correct approach under PostgreSQL's READ COMMITTED isolation. The inline comments clearly explain the rationale. 👍
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Line 435: The type for the raw query result uses the wrong BigInt identifier;
update the type annotation on the prisma.$queryRaw call (the pendingCheck
declaration) to use the TypeScript primitive `bigint` instead of the `BigInt`
constructor/interface so the inferred type matches PostgreSQL COUNT(*) results
(e.g., change `{ pending_count: BigInt }[]` to `{ pending_count: bigint }[]` for
pendingCheck).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
451-458: Nit:BigInt(wrapper object) vsbigint(primitive) in the type annotation.Prisma's
$queryRawreturns the JavaScriptbigintprimitive for PostgreSQLbigint/int8columns. The type annotation on Line 451 usesBigInt(capital B), which is the wrapper object type. This doesn't cause a runtime issue (TypeScript types are erased), but it's technically the wrong type.Proposed fix
- const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>` + const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` around lines 451 - 458, The type annotation for the result of prisma.$queryRaw uses the wrapper object `BigInt` instead of the primitive `bigint`; update the generic on prisma.$queryRaw to use `{ pending_count: bigint }[]` (referencing the `pendingCheck` variable and the `prisma.$queryRaw` call) so the `pending_count` property has the correct primitive type and subsequent `Number(pendingCheck.at(0)?.pending_count ?? 0)` usage (used to compute `isRunBlocked`) aligns with the actual returned type.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧠 Learnings (1)
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Applied to files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
🔇 Additional comments (3)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (3)
367-383: Well-documented MVCC rationale — this is excellent.The comment thoroughly explains the snapshot isolation semantics, why the two statements must be separate, and why the pending check covers all requested waitpoint IDs (not just newly inserted ones). This is exactly the kind of documentation that prevents future maintainers from "optimizing" the two statements back into one.
416-444: CTE restructuring looks correct.The CTE now only handles the INSERT side effects, and the outer
SELECT COUNT(*) FROM insertedsatisfies the syntactic requirement for a final SELECT while keeping the data-modifying CTEs executable. Discarding the result is intentional since the blocking decision is now made by the separate query.
509-520: Blocking logic correctly derived from the fresh pending check — LGTM.The
isRunBlockedflag now reflects the latest committed state, ensuring that if a concurrentcompleteWaitpointcommitted between the CTE and this check, the run won't get stuck. The debouncedcontinueRunIfUnblockedenqueue is safe even if both this path andcompleteWaitpointfire it (idempotent job ID).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Around line 451-458: The type annotation for the result of prisma.$queryRaw
uses the wrapper object `BigInt` instead of the primitive `bigint`; update the
generic on prisma.$queryRaw to use `{ pending_count: bigint }[]` (referencing
the `pendingCheck` variable and the `prisma.$queryRaw` call) so the
`pending_count` property has the correct primitive type and subsequent
`Number(pendingCheck.at(0)?.pending_count ?? 0)` usage (used to compute
`isRunBlocked`) aligns with the actual returned type.
Split the CTE in blockRunWithWaitpoint so the pending waitpoint check is a separate SQL statement. In READ COMMITTED isolation, each statement gets its own snapshot, so a separate SELECT sees the latest committed state from concurrent completeWaitpoint calls. Previously, the CTE did INSERT + pending check in one statement (one snapshot). If completeWaitpoint committed between the CTE start and the SELECT, the SELECT would still see PENDING due to the stale snapshot. Neither side would enqueue continueRunIfUnblocked, leaving the run stuck forever.
…in blockRunWithWaitpoint
86190be to
39fa6d2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
444-444: Optional: replaceSELECT COUNT(*) FROM insertedwithSELECT 1.The CTE result is discarded. Since PostgreSQL data-modifying CTEs execute regardless of outer-query reference (as the comment correctly notes),
SELECT COUNT(*) FROM insertedcan be replaced withSELECT 1to avoid the unnecessary aggregation.♻️ Proposed simplification
- SELECT COUNT(*) FROM inserted`; + SELECT 1`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` at line 444, Replace the discarded CTE result expression "SELECT COUNT(*) FROM inserted" with "SELECT 1" so the data-modifying CTE still runs but avoids an unnecessary aggregation; locate the SQL string containing "SELECT COUNT(*) FROM inserted" in waitpointSystem.ts and update it to "SELECT 1" while keeping the surrounding CTE structure and any parameter bindings unchanged.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧠 Learnings (1)
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Applied to files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: typecheck / typecheck
🔇 Additional comments (3)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (3)
367-383: LGTM — comment is accurate and well-structured.The READ COMMITTED snapshot semantics are correctly described, and the rationale for querying all requested waitpoint IDs (not just successfully inserted ones) is clearly explained.
458-520: LGTM — theisRunBlockedderivation and enqueue logic are correct.The two-statement design correctly eliminates the MVCC race:
pendingCheckgets a fresh snapshot that reflects anycompleteWaitpointthat committed after the CTE started. The debounce key (continueRunIfUnblocked:${runId}) matchescompleteWaitpoint's key, so concurrent enqueueing from both sides is safely idempotent.
438-442: The_WaitpointRunConnectionsA/B column assignment is correct. The migration confirms "A" references "TaskRun" and "B" references "Waitpoint", matching the raw SQL's assignment of"A" = runIdand"B" = w.id. The table name reflects the explicit@relation("WaitpointRunConnections")name in the schema rather than implicit alphabetical model order; regardless, the A/B columns follow the correct alphabetical model ordering (TaskRun < Waitpoint).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Line 451: The type annotation for the result of prisma.$queryRaw in
pendingCheck is using the constructor type "BigInt" but should use the primitive
"bigint"; update the generic from <{ pending_count: BigInt }[]> to <{
pending_count: bigint }[]> (locate the prisma.$queryRaw call that assigns
pendingCheck in waitpointSystem.ts) so TypeScript correctly types the COUNT(*)
result.
---
Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Line 444: Replace the discarded CTE result expression "SELECT COUNT(*) FROM
inserted" with "SELECT 1" so the data-modifying CTE still runs but avoids an
unnecessary aggregation; locate the SQL string containing "SELECT COUNT(*) FROM
inserted" in waitpointSystem.ts and update it to "SELECT 1" while keeping the
surrounding CTE structure and any parameter bindings unchanged.
| // isolation, each statement gets its own snapshot. The CTE's snapshot is taken when | ||
| // it starts, so if a concurrent completeWaitpoint commits during the CTE, the CTE | ||
| // won't see it. This fresh query gets a new snapshot that reflects the latest commits. | ||
| const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>` |
There was a problem hiding this comment.
BigInt type annotation should be bigint (lowercase primitive type).
BigInt (capital) is the TypeScript constructor object type, not the bigint primitive. Prisma returns bigint primitives from COUNT(*). While Number() handles both at runtime, the type annotation is incorrect and may cause type-checker confusion.
🛠️ Proposed fix
- const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>`
+ const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>` | |
| const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` at line
451, The type annotation for the result of prisma.$queryRaw in pendingCheck is
using the constructor type "BigInt" but should use the primitive "bigint";
update the generic from <{ pending_count: BigInt }[]> to <{ pending_count:
bigint }[]> (locate the prisma.$queryRaw call that assigns pendingCheck in
waitpointSystem.ts) so TypeScript correctly types the COUNT(*) result.
Split the CTE in blockRunWithWaitpoint so the pending waitpoint check
is a separate SQL statement. In READ COMMITTED isolation, each statement
gets its own snapshot, so a separate SELECT sees the latest committed
state from concurrent completeWaitpoint calls.
Previously, the CTE did INSERT + pending check in one statement (one
snapshot). If completeWaitpoint committed between the CTE start and
the SELECT, the SELECT would still see PENDING due to the stale
snapshot. Neither side would enqueue continueRunIfUnblocked, leaving
the run stuck forever.