Skip to content

Comments

Multi tenancy review#1545

Open
parmesant wants to merge 22 commits intoparseablehq:mainfrom
parmesant:multi-tenancy-review
Open

Multi tenancy review#1545
parmesant wants to merge 22 commits intoparseablehq:mainfrom
parmesant:multi-tenancy-review

Conversation

@parmesant
Copy link
Contributor

@parmesant parmesant commented Feb 13, 2026

Fixes #XXXX.

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • New Features

    • Broad multi-tenant support across datasets, streams, storage, alerts, targets, dashboards, users, roles, RBAC, cluster, and object-store APIs.
    • Tenant management and per-tenant service suspension controls; SSE and cluster broadcasts are tenant-aware.
  • Improvements

    • Tenant-scoped metrics, billing, ingestion, query, hot-tier, migrations and persistence for per-tenant isolation and visibility.
    • HTTP handlers derive and propagate tenant context with sensible defaults.
  • Chores

    • Added dependency: parking_lot v0.12.5

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

System-wide multi-tenant support added: an optional tenant_id is threaded through metastore, object storage, parseable/streams, RBAC, HTTP handlers, alerts, metrics, hot-tier, OTEL and migrations; introduces TENANT_METADATA, per-tenant in-memory layouts, tenant-prefixed object paths and tenant-aware APIs.

Changes

Cohort / File(s) Summary
Core tenant infra
src/lib.rs, src/tenants/mod.rs, src/parseable/mod.rs, src/cli.rs
Add TENANT_METADATA, DEFAULT_TENANT, tenant lifecycle (add/load/list/suspend/delete), Parseable gains tenant-aware APIs, CLI flag for multi-tenancy.
Metastore traits & impl
src/metastore/metastore_traits.rs, src/metastore/metastores/object_store_metastore.rs
Most metastore methods gain tenant_id: &Option<String>; returns and listings become tenant-keyed maps; add delete_tenant; tenant-prefixed object paths.
Object storage core & providers
src/storage/object_storage.rs, src/storage/{s3,gcs,azure_blob,localfs}.rs
Thread tenant_id through ObjectStorage trait and providers; path helpers (schema/stream/alert/mttr/targets) accept tenant; list/put/delete and metrics/latency include tenant context.
Parseable / Streams / Staging
src/parseable/..., src/parseable/streams.rs, src/parseable/staging/mod.rs
Streams/staging convert to tenant-nested maps; Parseable APIs accept tenant_id; staging errors updated; added tenant management and per-tenant stream lifecycle handling.
HTTP handlers & modal
src/handlers/http/**, src/handlers/**
Most handlers now extract tenant_id from requests (HttpRequest), accept tenant-aware signatures and forward tenant_id to PARSEABLE, metastore, storage and RBAC calls.
RBAC & sessions/users
src/rbac/{map,mod,role,user,utils}.rs, src/handlers/http/*rbac.rs
Roles, Users, Groups, Sessions and DEFAULT_ROLE refactored to per-tenant maps; permission checks and session handling are tenant-aware; added suspension handling and tenant-scoped helpers.
Alerts & Targets
src/alerts/..., src/alerts/target.rs, src/alerts/alert_traits.rs
Alerts, AlertState, MTTRHistory and Targets become tenant-scoped: add tenant fields, extend traits/signatures, propagate tenant_id through evaluation, persistence and target lookup.
Metrics / Stats / Metadata
src/metrics/mod.rs, src/stats.rs, src/metadata.rs
Prometheus metrics and stats label sets extended with tenant_id; many metric helper signatures expanded and instrumentation updated for tenant dimension.
Query & execution
src/query/mod.rs, src/query/stream_schema_provider.rs, src/handlers/http/query.rs
Introduce InMemorySessionContext for tenant schema isolation; execute/get_dataframe/get_bin_density and schema provider accept tenant_id; manifest and stream resolution are tenant-aware.
Hot-tier, retention, enterprise utils
src/hottier.rs, src/storage/retention.rs, src/enterprise/utils.rs
Hot-tier, retention and enterprise utilities accept and iterate by tenant_id; manifest/parquet paths, validations and processing operate per-tenant.
OTEL, Connectors & Event formatting
src/otel/{logs,metrics,traces}.rs, src/connectors/kafka/*, src/event/*
OTEL flattening, Kafka consumer records and EventFormat accept tenant_id; Events include tenant_id and downstream flows use it.
Migrations & storage metadata
src/storage/store_metadata.rs, src/migration/*
Staging/remote metadata and migrations become tenant-aware; StorageMetadata extended (suspended_services, global_query_auth); migrations run per-tenant.
UI modules, dashboards, filters
src/users/**, src/prism/**, src/handlers/http/*
Dashboards, filters, prism endpoints and handlers migrated to per-tenant storage and fetch operations; handlers extract tenant_id.
Misc & tooling
Cargo.toml, src/main.rs, many src/*
Add parking_lot = "0.12.5"; many call sites updated to pass extra tenant/context params (e.g., get_node_info, get_latency, object-store check) and minor Docker base image bumps.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    rect rgba(200,220,255,0.5)
    participant Client
    participant HTTP as "HTTP Handler"
    participant TEN as "TENANT_METADATA"
    participant RBAC
    participant Parseable
    participant Metastore
    participant Storage
    end
    Client->>HTTP: POST /api/streams/{name} (X-TENANT-ID)
    HTTP->>TEN: get_tenant_id_from_request(req)
    TEN-->>HTTP: tenant_id
    HTTP->>RBAC: authorize(session, action, tenant_id)
    RBAC-->>HTTP: Ok
    HTTP->>Parseable: create_stream_and_schema_from_storage(name, tenant_id)
    Parseable->>Metastore: put_stream_json(..., tenant_id)
    Metastore->>Storage: put_object(path_with_tenant_prefix, bytes, tenant_id)
    Storage-->>Metastore: Ok
    Metastore-->>Parseable: Ok
    Parseable-->>HTTP: StreamRef
    HTTP-->>Client: 201 Created
Loading
sequenceDiagram
    autonumber
    rect rgba(220,255,220,0.5)
    participant Scheduler
    participant AlertMgr as "AlertManager"
    participant Metastore
    participant Auth as "TENANT_METADATA/Auth"
    participant QueryExec
    end
    Scheduler->>AlertMgr: evaluate_alert(alert_id, tenant_id)
    AlertMgr->>Metastore: get_alert(alert_id, tenant_id)
    Metastore-->>AlertMgr: AlertConfig
    AlertMgr->>Auth: get_global_query_auth(tenant_id)
    Auth-->>AlertMgr: auth_headers
    AlertMgr->>QueryExec: execute(query, auth_headers, tenant_id)
    QueryExec-->>AlertMgr: results
    AlertMgr->>Metastore: put_alert_state(..., tenant_id)
    Metastore-->>AlertMgr: Ok
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Feat: Metastore #1424 — extensive metastore abstraction changes; strongly related to the tenant-aware metastore method additions and return-type reshaping.
  • Modularize Alerts #1390 — earlier alerts refactor touching AlertTrait/AlertManager and alert types that this PR extends for multi-tenancy.
  • refactor: DRY object_storage #1147 — object-storage/provider changes overlapping with threading tenant parameters through storage implementations and metrics.

Suggested labels

for next release

Suggested reviewers

  • nikhilsinhaparseable
  • de-sh

"🐇
I hopped through folds of code and loam,
Planted tenants so each stream has home.
Alerts and metrics now wear a tag,
Paths prefixed — no more global gag.
A carrot wink — the cluster's grown!"

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It contains only the repository's contribution template with unfilled placeholders (e.g., 'Fixes #XXXX'), unchecked boxes, and no actual description of changes, goals, or rationale for the multi-tenancy implementation. Complete the description by filling in the issue number, describing the multi-tenancy implementation goals, explaining the chosen approach, documenting key changes across modules (alerts, streams, metastore, RBAC, etc.), and confirming testing and documentation have been completed.
Docstring Coverage ⚠️ Warning Docstring coverage is 59.49% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Multi tenancy review' is vague and does not clearly summarize the main changes. It uses generic terminology without conveying the specific implementation details or scope of the multi-tenancy feature. Revise the title to be more specific about the changes, such as 'Add multi-tenant support for alerts, streams, and metadata operations' or similar, to better reflect the actual scope of the implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (18)
src/migration/mod.rs (2)

480-493: ⚠️ Potential issue | 🟡 Minor

get_staging_metadata lacks tenant awareness.

put_staging_metadata writes to tenant-specific paths (staging_dir/{tenant}/.parseable.json), but get_staging_metadata only reads from the default path. This asymmetry means tenant-specific staging metadata won't be read.

If this is intentional for one-time migration of legacy data to tenant paths, consider adding a comment. Otherwise, get_staging_metadata should also accept tenant_id.


134-166: ⚠️ Potential issue | 🟠 Major

Add v6 handler to migrate_staging to align with remote metadata migration.

The v5_v6 migration produces v6 format, not v7. Remote metadata migration has an explicit v6 handler calling v6_v7, but migrate_staging lacks it. If staging metadata reaches v6 version, it won't be migrated to v7. Additionally, the v5 case should chain v5_v6 with v6_v7 like remote metadata does, rather than stopping at v6.

Add a v6 case handler:

Some("v6") => {
    let metadata = metadata_migration::v6_v7(staging_metadata);
    put_staging_metadata(config, &metadata, tenant_id)?;
}

And update the v5 case to call v6_v7 after v5_v6:

Some("v5") => {
    let mut metadata = metadata_migration::v5_v6(staging_metadata);
    metadata = metadata_migration::v6_v7(metadata);
    put_staging_metadata(config, &metadata, tenant_id)?;
}
src/handlers/http/cluster/mod.rs (2)

1692-1779: ⚠️ Potential issue | 🟠 Major

Tenant-agnostic querier state can mix tenants.

get_available_querier is tenant-scoped, but QUERIER_MAP and LAST_USED_QUERIER are global. That allows querier availability and round‑robin state to bleed across tenants, potentially selecting a querier that isn’t registered for the current tenant. Consider keying the map by tenant and storing a per‑tenant last_used value.

♻️ Suggested direction (per‑tenant state)
- static ref QUERIER_MAP: Arc<RwLock<HashMap<String, QuerierStatus>>> =
-     Arc::new(RwLock::new(HashMap::new()));
- static ref LAST_USED_QUERIER: Arc<RwLock<Option<String>>> = Arc::new(RwLock::new(None));
+ static ref QUERIER_MAP: Arc<RwLock<HashMap<String, HashMap<String, QuerierStatus>>>> =
+     Arc::new(RwLock::new(HashMap::new()));
+ static ref LAST_USED_QUERIER: Arc<RwLock<HashMap<String, Option<String>>>> =
+     Arc::new(RwLock::new(HashMap::new()));

1880-1926: ⚠️ Potential issue | 🔴 Critical

Forward tenant context to querier in send_query_request.

The tenant_id parameter is unused. When auth_token is None, the request only carries the node token with no tenant identifier. Even when auth_token is provided, the "tenant" header may not be included. In multi-tenant mode, the querier relies on the "tenant" header for request scoping; without it, the request falls back to the default tenant, risking cross-tenant data exposure. Add the tenant header to the outgoing request:

if let Some(tenant) = tenant_id {
    if let Ok(val) = reqwest::header::HeaderValue::from_str(tenant) {
        map.insert(
            reqwest::header::HeaderName::from_static("tenant"),
            val,
        );
    }
}
src/storage/retention.rs (1)

97-119: ⚠️ Potential issue | 🟡 Minor

Avoid panic in Task::new when days=0.
NonZeroU32::new(days).unwrap() will panic on 0; this is a public constructor and should return a validation error instead.

💡 Suggested fix
-    pub fn new(description: String, days: u32) -> Self {
-        let days = NonZeroU32::new(days).unwrap();
-        Self {
-            description,
-            days,
-            action: Action::Delete,
-        }
-    }
+    pub fn new(description: String, days: u32) -> Result<Self, String> {
+        let days = NonZeroU32::new(days).ok_or("days must be > 0")?;
+        Ok(Self {
+            description,
+            days,
+            action: Action::Delete,
+        })
+    }
src/prism/home/mod.rs (1)

156-160: ⚠️ Potential issue | 🟠 Major

Bug confirmed: users().len() counts tenants, not users in the current tenant

The Users struct is a HashMap<String, HashMap<String, User>> where the outer map is keyed by tenant. Calling .len() on it returns the number of tenants, making user_added incorrect for multi-tenant deployments.

Update to count users for the current tenant:

Suggested fix
-    let user_count = users().len();
+    let user_count = users()
+        .get(tenant_id.as_deref().unwrap_or(DEFAULT_TENANT))
+        .map(|m| m.len())
+        .unwrap_or(0);
     let user_added = user_count > 1; // more than just the default admin user

Note: tenant_id is available in scope (line 106), and the import for DEFAULT_TENANT is already present.

src/connectors/kafka/processor.rs (1)

49-103: ⚠️ Potential issue | 🔴 Critical

Add missing dataset_tag parameter and fix critical tenant-mixing bug in chunk processing

  1. Build breaker: create_stream_if_not_exists requires 7 arguments; the call is missing the final dataset_tag: Option<DatasetTag> parameter. Pass None explicitly.

  2. Critical data isolation bug: tenant_id is extracted only from records.first(), but records in a single chunk can have mixed tenant IDs (from Kafka message headers). All records in the chunk are then processed under the first record's tenant, causing cross-tenant data mixing. Chunks must be split by tenant before processing, or tenant consistency must be enforced per-chunk with an assertion.

  3. TelemetryType::default() correctly resolves to Logs, matching the into_event call—no inconsistency here.

Compile fix for missing parameter
         PARSEABLE
             .create_stream_if_not_exists(
                 stream_name,
                 StreamType::UserDefined,
                 None,
                 vec![log_source_entry],
                 TelemetryType::default(),
                 tenant_id,
+                None,
             )
             .await?;
src/handlers/http/mod.rs (1)

104-141: ⚠️ Potential issue | 🟠 Major

Add tenant context header to ingestor query request to maintain multi-tenant isolation

send_query_request_to_ingestor correctly selects ingestors using get_node_info(NodeType::Ingestor, tenant_id), but the outgoing HTTP request omits the tenant context. The ingestor's query handler derives tenant via get_tenant_id_from_request(&req), which looks for a "tenant" header. Without this header, the ingestor defaults to the wrong tenant, breaking multi-tenant data isolation.

Add tenant to the request headers:

.header("tenant", tenant_id.as_deref().unwrap_or(DEFAULT_TENANT))
src/handlers/http/alerts.rs (1)

247-329: ⚠️ Potential issue | 🟠 Major

Tenant context must be bound to the authenticated session, not derived from untrusted header.

This endpoint derives tenant_id from the tenant header via get_tenant_id_from_request(&req), but the subsequent authorization validation via alert.validate(&session_key) uses a different tenant context extracted from the session key. No validation exists to ensure these match.

Attack scenario: A caller with a valid session for tenant A can specify tenant B in the tenant header. The validation will succeed (checking access to tables in tenant A's context via user_auth_for_query), but the alert is stored and executed in tenant B's isolated storage, allowing unauthorized resource creation.

This pattern affects all alert endpoints and is systemic across the codebase. Remediation options:

  • Derive tenant exclusively from session key (preferred)
  • Validate tenant header == session's tenant and reject if mismatch
  • Remove header-based tenant specification and enforce proxy-only tenant injection with header stripping
src/rbac/role.rs (1)

91-172: ⚠️ Potential issue | 🔴 Critical

Add Action::SuperAdmin check to is_admin() function.

The is_admin() function (src/utils/mod.rs:221) only recognizes Permission::Resource(Action::All, Some(ParseableResourceType::All)) but does not check for Action::SuperAdmin, which is recognized elsewhere in permission checks (src/rbac/map.rs:343, 373, 379). This creates an inconsistency where users with SuperAdmin permissions may be denied access in code paths relying on is_admin() for authorization (e.g., handlers/http/users/dashboards.rs, filters.rs). SuperAdmin should be included in the is_admin() check.

src/handlers/http/modal/query/querier_logstream.rs (1)

93-111: ⚠️ Potential issue | 🔴 Critical

Pass tenant header to ingestors in delete requests.

The delete operation does not propagate the tenant header to ingestors, unlike the PUT operation. When send_stream_delete_request sends the DELETE request, it only includes CONTENT_TYPE and AUTHORIZATION headers. The ingestor's delete handler expects to extract the tenant from the request header via get_tenant_id_from_request(), which will fail if the tenant header is missing. This causes the ingestor to operate without proper tenant context.

Update send_stream_delete_request to accept and forward the tenant header, matching the pattern used in sync_streams_with_ingestors for PUT requests.

src/correlation.rs (1)

204-212: ⚠️ Potential issue | 🟠 Major

Delete should remove from the tenant bucket, not the top-level map

self.write().await.remove(&correlation.id) removes a tenant entry keyed by correlation ID (or no-ops), leaving stale data in memory.

🛠️ Suggested fix
-        // Delete from memory
-        self.write().await.remove(&correlation.id);
+        // Delete from memory
+        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+        if let Some(corrs) = self.write().await.get_mut(tenant) {
+            corrs.remove(&correlation.id);
+        }
src/handlers/http/oidc.rs (1)

260-298: ⚠️ Potential issue | 🟠 Major

OIDC new-user creation ignores tenant context

put_user(..., None) forces new OIDC users into the default tenant even when login is tenant-scoped. Pass the request tenant (clone before move).

🛠️ Suggested fix
-    let existing_user = find_existing_user(&user_info, tenant_id);
+    let existing_user = find_existing_user(&user_info, tenant_id.clone());
@@
-        (None, roles) => put_user(&user_id, roles, user_info, bearer, None).await?,
+        (None, roles) => put_user(&user_id, roles, user_info, bearer, tenant_id.clone()).await?,
src/parseable/mod.rs (1)

515-524: ⚠️ Potential issue | 🔴 Critical

Add missing dataset_tag argument to create_stream_if_not_exists call in kafka/processor.rs.

The call at src/connectors/kafka/processor.rs:60 is missing the new dataset_tag parameter. All other call sites have been updated correctly (passing None where appropriate). This build-breaking issue needs to be fixed immediately.

src/storage/s3.rs (1)

919-962: ⚠️ Potential issue | 🟡 Minor

Use DEFAULT_TENANT for list_old_streams HEAD metrics.
The HEAD calls inside list_old_streams are labeled with "" while other metrics in the same method use DEFAULT_TENANT, splitting default-tenant telemetry.

🛠️ Suggested fix
-                increment_object_store_calls_by_date(
-                    "HEAD",
-                    &Utc::now().date_naive().to_string(),
-                    "",
-                );
+                increment_object_store_calls_by_date(
+                    "HEAD",
+                    &Utc::now().date_naive().to_string(),
+                    tenant_str,
+                );

Based on learnings default-tenant paths are intentionally built with an empty string segment (""), so concerns are about metrics labeling rather than path resolution.

src/hottier.rs (1)

186-205: ⚠️ Potential issue | 🟠 Major

Tenant-aware hot-tier paths still use tenant-agnostic directories.

Line 203 uses get_oldest_date_time_entry(stream) (tenant-agnostic), Line 216 deletes hot_tier_path.join(stream) (tenant-agnostic), and Line 454–463 writes manifests using get_stream_path_for_date(stream, …) (tenant-agnostic). For non-default tenants, parquet files are stored under hot_tier_path/<tenant>/… (from file_path), while metadata/cleanup operates under hot_tier_path/<stream>/…, causing cross-tenant mixing and orphaned data.

🔧 Suggested direction (tenant-aware stream root)
+    fn tenant_stream_root(&self, stream: &str, tenant_id: &Option<String>) -> PathBuf {
+        if let Some(tenant) = tenant_id.as_ref() {
+            self.hot_tier_path.join(tenant).join(stream)
+        } else {
+            self.hot_tier_path.join(stream)
+        }
+    }
...
-        let path = self.hot_tier_path.join(stream);
+        let path = self.tenant_stream_root(stream, tenant_id);

Then thread tenant_id into get_stream_path_for_date, fetch_hot_tier_dates, and get_oldest_date_time_entry, and use the tenant-aware root in all call sites.

Also applies to: 208-218, 454-465

src/metastore/metastores/object_store_metastore.rs (2)

317-412: ⚠️ Potential issue | 🟠 Major

Alert state paths are not tenant-isolated (and default-tenant path mismatch).

Line 321 uses DEFAULT_TENANT in the path for None, while alert_state_json_path (Line 349/373) writes to .alerts/... without tenant. This means the default tenant reads from default/.alerts but writes to .alerts, and non-default tenants collide on the same file names.

🔧 Suggested fix (tenant-aware alert-state paths)
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
-        let base_path = RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY]);
+        let tenant = tenant_id.as_deref().unwrap_or("");
+        let base_path = RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY]);
...
-        let path = alert_state_json_path(*alert_id);
+        let path = if let Some(t) = tenant_id.as_ref() {
+            RelativePathBuf::from_iter([t, ALERTS_ROOT_DIRECTORY, &format!("alert_state_{alert_id}.json")])
+        } else {
+            alert_state_json_path(*alert_id)
+        };

Apply the same tenant-aware path logic in put_alert_state and delete_alert_state.

Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created. Passing Some("") to storage is acceptable for path creation; concerns, if any, are about metrics labeling rather than path resolution.


913-958: ⚠️ Potential issue | 🟠 Major

Tenant-prefixed root is ignored when listing per-date manifests.

Line 936 builds date_path without the tenant prefix, so manifest listing for non-default tenants will read from the wrong location. Reuse the tenant-prefixed root when constructing date_path.

🔧 Suggested fix
-            let date_path = object_store::path::Path::from(format!("{}/{}", stream_name, &date));
+            let date_path = object_store::path::Path::from(format!("{root}/{date}"));
🤖 Fix all issues with AI agents
In `@src/alerts/mod.rs`:
- Around line 1045-1141: The code inserts alerts under the raw tenant_id string
which is empty for the default tenant, but lookups use DEFAULT_TENANT, so
default-tenant alerts become unreachable; when inserting into the map (sites
where map.entry(tenant_id.clone()).or_default().insert(...)) compute a
normalized map key: if tenant_id.is_empty() use DEFAULT_TENANT (string) else use
tenant_id.clone(), and use that normalized key for map.entry(...) while keeping
alert.tenant_id.clone_from(tenant) unchanged so the AlertConfig still reflects
None for the default tenant.

In `@src/correlation.rs`:
- Around line 134-144: The create flow writes the correlation to metastore but
only inserts into the in-memory map if a tenant bucket already exists, so the
first correlation for a tenant is never cached; after calling
PARSEABLE.metastore.put_correlation(...) and computing tenant via
get_tenant_id_from_key(...) / DEFAULT_TENANT, ensure you create or get the
tenant bucket before inserting by using the map entry semantics on self (the
write lock) instead of get_mut(tenant) so you always create a new HashMap for a
missing tenant and then insert correlation.id / correlation.clone() into it.

In `@src/event/mod.rs`:
- Around line 110-111: LiveTail registry currently keys only by stream_name
causing cross-tenant leakage; update the LiveTailRegistry HashMap to include
tenant_id in its key (e.g., use a composite key struct or nested HashMap) and
change the LiveTail API to accept tenant_id for scoping: modify
LIVETAIL.process(&self.stream_name, &self.rb) to LIVETAIL.process(&tenant_id,
&self.stream_name, &self.rb) (and any LiveTailRegistry lookup to use tenant_id),
and update all call sites including new_pipe in handlers/livetail.rs
(new_pipe(...) should pass tenant_id into subscription creation) so tenant_id is
used both when creating/looking up registry entries and when broadcasting
events. Ensure method signatures on LiveTail, LiveTailRegistry, and any helper
functions are updated consistently and that keys are constructed or nested
lookups use tenant_id first then stream_name.

In `@src/handlers/http/correlation.rs`:
- Around line 45-52: The tenant is being taken directly from the request header
via get_tenant_id_from_request(&req) which enables header spoofing; instead
derive the tenant from the authenticated session obtained by
extract_session_key_from_req(&req) (or the session store/context) and use that
tenant when calling CORRELATIONS.get_correlation(&correlation_id, &tenant_id).
If a tenant header is present, compare it to the session-derived tenant and
return an authorization/error when they mismatch (unless an explicit
override/authorization check permits the header), i.e., replace the direct
header usage in the correlation lookup with the session-derived tenant and add a
mismatch check before calling CORRELATIONS.get_correlation.

In `@src/handlers/http/middleware.rs`:
- Around line 171-243: The tenant header handling allows spoofing — only
BasicAuth ingestion is validated and non-ingest flows may leave caller-supplied
tenant headers; fix by deriving the session tenant once via
get_user_and_tenant_from_request/get_user_from_request and/or
extract_session_key (inspect SessionKey variants), then for all auth types
compare any incoming "tenant" header to the derived tenant and either clear the
header (remove from req.headers_mut()) or set header_error to the same
PostError::Header(ParseHeaderError::InvalidTenantId) when the derived tenant is
missing or mismatches; update the logic around user_and_tenant_id, the Ingest
block that calls Users.validate_basic_user_tenant_id, and any branches that
currently insert the tenant into req.headers_mut() so all auth types enforce the
same validation/clearing behavior.

In `@src/handlers/http/modal/ingest/ingestor_rbac.rs`:
- Around line 49-57: The tenant-check logic in get_tenant_id_from_request/... is
inverted: currently it errors when req_tenant != DEFAULT_TENANT && req_tenant ==
user.tenant; change it to error when a non-default requested tenant does not
match the requesting user's tenant. Concretely, within the block using
DEFAULT_TENANT and user.tenant (the code that returns RBACError::Anyhow), flip
the second comparison so the condition becomes req_tenant.ne(DEFAULT_TENANT) &&
!req_tenant.eq(user.tenant.as_deref().unwrap_or(DEFAULT_TENANT)) (or
equivalent), ensuring cross-tenant creation attempts are rejected. Ensure you
still use RBACError::Anyhow(anyhow::Error::msg(...)) for the error return.

In `@src/handlers/http/modal/ingest/ingestor_role.rs`:
- Around line 45-51: The tenant-check condition is inverted: in the block around
get_tenant_id_from_request, req_tenant, DEFAULT_TENANT and sync_req.tenant_id
you currently reject when the request tenant equals the payload tenant and allow
cross-tenant operations; change it to reject when the request tenant is not the
default AND the request tenant does not match the payload tenant. Concretely, in
the if that uses req_tenant.ne(DEFAULT_TENANT) &&
(req_tenant_id.eq(&sync_req.tenant_id)), replace the second clause with a
comparison that verifies the request tenant and sync_req.tenant_id differ (e.g.
compare req_tenant != sync_req.tenant_id or req_tenant_id.as_deref() !=
Some(&sync_req.tenant_id)), keeping the same error return
(RoleError::Anyhow(...)) and using get_tenant_id_from_request, req_tenant_id,
req_tenant, DEFAULT_TENANT and sync_req.tenant_id to locate the change.

In `@src/handlers/http/role.rs`:
- Around line 154-199: get_default currently returns Null for non-default
tenants when DEFAULT_ROLE lacks an entry; fix by adding a fallback to read
persisted metadata: in get_default (and the similar oidc usage) after acquiring
tenant_id and checking DEFAULT_ROLE.read().get(tenant_id), if the map has no
entry for that tenant call get_metadata(&tenant_id).await?, extract
metadata.default_role, backfill DEFAULT_ROLE via
DEFAULT_ROLE.write().insert(tenant_id.to_owned(),
metadata.default_role.clone()), and then return that role (or Null) — this
mirrors put_default behavior and ensures persisted per-tenant defaults are
returned and the in-memory map is hydrated.
- Around line 23-37: The tenant header parsing in get_tenant_id_from_request
currently calls header_value.to_str().unwrap(), which will panic on non-UTF8
header values; change get_tenant_id_from_request to handle invalid UTF-8
gracefully by using to_str().ok() (or mapping the Result) and fall back to a
safe value or return an appropriate error instead of unwrapping; ensure you
reference DEFAULT_TENANT/PARSEABLE behavior so the function either returns
DEFAULT_TENANT when the header is missing/invalid or returns a controlled error
Result, and update any callers expecting a string to handle the non-UTF8 case
accordingly.

In `@src/handlers/livetail.rs`:
- Around line 119-122: The livetail handler currently calls
PARSEABLE.get_stream(stream, &None) which ignores tenant isolation; extract the
tenant id from the authorized SessionKey by calling get_tenant_id_from_key(&key)
(same pattern used in logstream/health_check/query) and pass that tenant context
to get_stream (e.g., use Some(tenant_id) instead of &None) so the lookup
respects tenant boundaries when obtaining the schema via PARSEABLE.get_stream.

In `@src/hottier.rs`:
- Around line 92-115: The aggregation in get_hot_tiers_size incorrectly excludes
streams when either the stream name equals current_stream or the tenant_id
differs from current_tenant_id; change the exclusion logic so only the current
stream in the same tenant is skipped: call check_stream_hot_tier_exists(&stream,
&tenant_id) as before, but replace the condition "stream != current_stream &&
tenant_id != *current_tenant_id" with a check that continues only if NOT (stream
== current_stream && tenant_id == Some(current_tenant_id.clone())), i.e. skip
only when both the stream name and tenant match the provided
current_stream/current_tenant_id; then await get_hot_tier(&stream, &tenant_id)
and add sizes as before.

In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 515-541: The get_dashboards method is overwriting dashboards per
user because dashboards.insert(tenant.to_owned(), dashboard_bytes) is called
inside the user loop; change this to aggregate dashboard_bytes into the existing
entry for that tenant (e.g., lookup dashboards.get_mut(&tenant) and
append/extend the Vec<Bytes> or create a new Vec when missing) so all users'
dashboards for a tenant are combined; keep the DEFAULT_TENANT normalization
(tenant.clone_from(&DEFAULT_TENANT.to_string())) before inserting/merging so the
map key is consistent.

In `@src/rbac/mod.rs`:
- Around line 289-326: The BasicAuth tenant selection is ambiguous across
tenants; update get_user_tenant_from_basic and mutate_request_with_basic_user to
avoid arbitrarily picking a tenant: in get_user_tenant_from_basic (and any
callers like get_user_and_tenant_from_request) collect all tenants that match
username+password and return Some(tenant) only if exactly one tenant matched
(otherwise return None); similarly, change mutate_request_with_basic_user to
search for all matching (tenant, user) entries rather than using
find_any()/first-match and only insert the tenant header when a single
unambiguous match exists (if multiple matches, do not set the header and let
upstream validation fail or require explicit tenant header).

In `@src/rbac/user.rs`:
- Around line 349-363: The validate method currently checks
read_user_groups().contains_key(&self.name) which tests tenant IDs rather than
group names; change the existence check to scope by tenant using the computed
tenant variable (let tenant =
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT))—e.g., fetch
read_user_groups().get(tenant) and then check .contains_key(&self.name) on that
tenant's map (or .map_or(false, |m| m.contains_key(&self.name))) and return
RBACError::UserGroupExists(self.name.clone()) only if the group exists within
that tenant; keep the rest of validate (role checks, non_existent_roles)
unchanged.

In `@src/storage/localfs.rs`:
- Around line 442-460: delete_stream and other LocalFS methods build paths using
DEFAULT_TENANT so non-default tenants end up using the default path; change path
construction to include the tenant path segment (empty string for default
tenant) instead of DEFAULT_TENANT: compute let tenant_segment =
tenant_id.as_deref().unwrap_or("") and use
self.root.join(tenant_segment).join(stream_name) (or join the tenant_segment
before listing dirs in list_dates, list_hours, list_minutes), and keep the
existing metric calls but ensure they use the appropriate tenant identifier as
before; apply the same pattern to the other affected functions (list_dates,
list_hours, list_minutes) and return results unchanged.

In `@src/tenants/mod.rs`:
- Around line 70-112: The runtime suspension set (tenant.suspended_services) is
being mutated in suspend_service and resume_service but the persisted metadata
returned by get_tenants (tenant.meta.suspended_services) is left stale; update
the metadata whenever you change runtime state and/or derive the metadata on
read. Specifically, in suspend_service and resume_service (and any code that
mutates suspended_services) also update tenant.value().meta.suspended_services
to mirror tenant.suspended_services (or rebuild meta from suspended_services
before returning in get_tenants); ensure get_tenants reads the up-to-date
suspended_services into the returned StorageMetadata so listings reflect current
suspensions.

In `@src/users/filters.rs`:
- Around line 128-133: Filters::update currently no-ops when the tenant bucket
doesn't exist (using s.get_mut(...)), which drops newly created filters; change
the logic in Filters::update to use the map's entry API
(entry(tenant_key).or_default()) to obtain or create the Vec<Filter> for the
tenant (use tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) as the key), then
perform the retain by filter.filter_id and push the cloned Filter so new-tenant
entries are created in-memory immediately.

In `@src/utils/mod.rs`:
- Around line 69-79: The function mutate_request_with_tenant currently calls
basic.password().unwrap(), which can panic if the password is omitted; change
this to safely handle missing passwords by using a safe default (e.g., empty
string) or passing the Option to Users.mutate_request_with_basic_user instead of
unwrapping. Locate mutate_request_with_tenant and the BasicAuth::extract result,
and replace the unwrap with a non-panicking branch (e.g.,
basic.password().map_or("", |p| p) or adjust
Users.mutate_request_with_basic_user to accept Option<&str>) so no panic occurs
when password is absent.
- Around line 118-121: The get_tenant_id_from_request function currently calls
to_str().unwrap() which can panic on invalid header bytes; change it to use
to_str().ok() and propagate the Option safely (e.g., map(|v|
v.to_str().ok().map(|s| s.to_owned())).flatten or and_then) so invalid header
values simply return None instead of panicking; update the function body for
safe header parsing consistent with the middleware's pattern.
🟡 Minor comments (3)
src/handlers/http/llm.rs-19-32 (1)

19-32: ⚠️ Potential issue | 🟡 Minor

Avoid panic on malformed tenant header.

get_tenant_id_from_request uses to_str().unwrap(), which can panic on non‑UTF8 header values. Consider safe parsing to keep the handler from crashing on bad input.

🛠️ Proposed fix (safe header parsing)
-use crate::{
-    parseable::{PARSEABLE, StreamNotFound},
-    utils::get_tenant_id_from_request,
-};
+use crate::parseable::{PARSEABLE, StreamNotFound};

@@
-    let tenant_id = get_tenant_id_from_request(&req);
+    let tenant_id = req
+        .headers()
+        .get("tenant")
+        .and_then(|v| v.to_str().ok())
+        .map(|s| s.to_owned());

Also applies to: 92-104

src/cli.rs-523-529 (1)

523-529: ⚠️ Potential issue | 🟡 Minor

Add tenant ID validation to prevent directory traversal.

The tenant_id from the HTTP header is used directly in filesystem paths without validation. A malicious tenant value like "../../../etc" could escape the staging directory. Validate that tenant_id contains only safe characters (e.g., alphanumeric, hyphens, underscores) or use path normalization to prevent traversal before path operations.

src/parseable/streams.rs-1791-1795 (1)

1791-1795: ⚠️ Potential issue | 🟡 Minor

Guard against unwrap on an empty tenant map in the test.
The map is empty at that point, so unwrap() will panic.

🛠️ Suggested fix
-        assert!(!guard.get(DEFAULT_TENANT).unwrap().contains_key(stream_name));
+        assert!(guard.get(DEFAULT_TENANT).is_none());
🧹 Nitpick comments (10)
src/migration/mod.rs (1)

512-520: Use PARSEABLE_METADATA_FILE_NAME constant instead of hardcoded string.

get_staging_metadata uses the PARSEABLE_METADATA_FILE_NAME constant, but here .parseable.json is hardcoded. Use the constant for consistency and maintainability.

♻️ Suggested fix
 pub fn put_staging_metadata(
     config: &Parseable,
     metadata: &serde_json::Value,
     tenant_id: &Option<String>,
 ) -> anyhow::Result<()> {
     let path = if let Some(tenant) = tenant_id.as_ref() {
         config
             .options
             .staging_dir()
             .join(tenant)
-            .join(".parseable.json")
+            .join(PARSEABLE_METADATA_FILE_NAME)
     } else {
-        config.options.staging_dir().join(".parseable.json")
+        config.options.staging_dir().join(PARSEABLE_METADATA_FILE_NAME)
     };
src/query/stream_schema_provider.rs (2)

282-293: Commented-out tenant URL logic appears intentionally simplified.

The staging execution plan retrieves files from staging.parquet_files() which returns absolute paths from the tenant-scoped data_path. Since PartitionedFile uses absolute paths, the file:/// base URL should resolve correctly. Consider removing the commented code if this approach is confirmed as the final design.


193-193: Minor: Typo in function name get_hottier_exectuion_plan.

The function name has a typo: "exectuion" should be "execution". Consider renaming to get_hottier_execution_plan for clarity.

✏️ Suggested rename
-    async fn get_hottier_exectuion_plan(
+    async fn get_hottier_execution_plan(

Note: This would require updating the call site at line 619 as well.

src/handlers/http/correlation.rs (1)

73-75: Avoid redundant String allocations when hashing user IDs.
get_hash accepts &str, so s.to_string() is an unnecessary clone in all three call sites.

♻️ Suggested change
-    let user_id = get_user_and_tenant_from_request(&req)
-        .map(|(s, _)| get_hash(&s.to_string()))
+    let user_id = get_user_and_tenant_from_request(&req)
+        .map(|(s, _)| get_hash(&s))
         .map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;
-    correlation.user_id = get_user_and_tenant_from_request(&req)
-        .map(|(s, _)| get_hash(&s.to_string()))
+    correlation.user_id = get_user_and_tenant_from_request(&req)
+        .map(|(s, _)| get_hash(&s))
         .map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;
-    let (user_id, tenant_id) = get_user_and_tenant_from_request(&req)
-        .map(|(s, t)| (get_hash(&s.to_string()), t))
+    let (user_id, tenant_id) = get_user_and_tenant_from_request(&req)
+        .map(|(s, t)| (get_hash(&s), t))
         .map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;

Also applies to: 89-91, 106-112

src/handlers/http/query.rs (1)

118-145: Reuse tenant_id to avoid duplicate header parsing.
get_tenant_id_from_request is called twice; storing it once reduces work and ensures consistent usage.

♻️ Suggested refactor
-    create_streams_for_distributed(tables.clone(), &get_tenant_id_from_request(&req)).await?;
-
-    let tenant_id = get_tenant_id_from_request(&req);
+    let tenant_id = get_tenant_id_from_request(&req);
+    create_streams_for_distributed(tables.clone(), &tenant_id).await?;
src/stats.rs (1)

235-273: Consider unifying the tenant_id parameter types for consistency.

The event_labels and storage_size_labels functions accept &Option<String> while their date variants (event_labels_date, storage_size_labels_date) accept &str. This inconsistency may cause confusion for callers.

♻️ Suggested unification

Consider either:

  1. Make all label functions accept &Option<String> and handle the default internally, or
  2. Make all label functions accept &str and have callers resolve the default before calling.

Option 1 (preferred for internal consistency):

 pub fn event_labels_date<'a>(
     stream_name: &'a str,
     format: &'static str,
     date: &'a str,
-    tenant_id: &'a str,
+    tenant_id: &'a Option<String>,
 ) -> [&'a str; 4] {
-    [stream_name, format, date, tenant_id]
+    let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+    [stream_name, format, date, tenant]
 }
src/handlers/http/role.rs (2)

41-86: Avoid tenant_id shadowing in put (reduces tenant-scoping mistakes)

In put, tenant_id is first an Option<String> and later shadowed as &str. This is easy to misuse when threading tenant context through storage + in-memory maps + session invalidation.

Suggested refactor
 pub async fn put(
     req: HttpRequest,
     name: web::Path<String>,
     Json(privileges): Json<Vec<DefaultPrivilege>>,
 ) -> Result<impl Responder, RoleError> {
     let name = name.into_inner();
     let tenant_id = get_tenant_id_from_request(&req);
+    let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);

     // validate the role name
     validator::user_role_name(&name).map_err(RoleError::ValidationError)?;

     let mut metadata = get_metadata(&tenant_id).await?;
     metadata.roles.insert(name.clone(), privileges.clone());

     put_metadata(&metadata, &tenant_id).await?;

-    let tenant_id = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
     mut_roles()
-        .entry(tenant_id.to_owned())
+        .entry(tenant.to_owned())
         .or_default()
         .insert(name.clone(), privileges.clone());

     let mut session_refresh_users: HashSet<String> = HashSet::new();
-    if let Some(groups) = read_user_groups().get(tenant_id) {
+    if let Some(groups) = read_user_groups().get(tenant) {
         for user_group in groups.values() {
             if user_group.roles.contains(&name) {
                 session_refresh_users
                     .extend(user_group.users.iter().map(|u| u.userid().to_string()));
             }
         }
     }

-    if let Some(users) = users().get(tenant_id) {
+    if let Some(users) = users().get(tenant) {
         for user in users.values() {
             if user.roles.contains(&name) {
                 session_refresh_users.insert(user.userid().to_string());
             }
         }
     }

     for userid in session_refresh_users {
-        mut_sessions().remove_user(&userid, tenant_id);
+        mut_sessions().remove_user(&userid, tenant);
     }

     Ok(HttpResponse::Ok().finish())
 }

201-220: Consider deduplicating RBAC metadata helpers

This file defines get_metadata/put_metadata, while src/handlers/http/modal/utils/rbac_utils.rs provides equivalent helpers (per the snippets). Keeping two copies increases drift risk, especially with multi-tenant signature changes.

src/handlers/http/modal/query/querier_role.rs (1)

21-91: Tenant-scoped role updates look consistent; consider sync_role_update failure semantics

Good: tenant is derived once (tenant), metadata + mut_roles() are updated under that tenant, and session invalidation only targets tenant-scoped users/groups. Passing tenant_id into sync_role_update(...) is the right multi-tenant fix.

Potential reliability nit: sync_role_update(...).await? happens after metadata + in-memory state are already updated. If sync fails, the handler returns an error although the change is committed locally; that can lead to confusing retries. If sync is best-effort, consider logging + returning 200 (or returning a partial-success response) instead of failing the request.

src/handlers/http/ingest.rs (1)

441-461: Stream existence check should only load from storage in Query mode.

The new condition attempts create_stream_and_schema_from_storage for any non‑Mode::All, which can trigger unexpected storage access on ingest/index nodes. Restore the prior Query‑only check to keep behavior consistent.

♻️ Suggested adjustment
-    if PARSEABLE.options.mode != Mode::All {
+    if PARSEABLE.options.mode == Mode::Query {
         match PARSEABLE
             .create_stream_and_schema_from_storage(&stream_name, &tenant_id)
             .await
         {
             Ok(true) => {}
             Ok(false) | Err(_) => return Err(StreamNotFound(stream_name.clone()).into()),
         }
     } else {
         return Err(StreamNotFound(stream_name.clone()).into());
     }
Based on learnings: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/metastore/metastores/object_store_metastore.rs (1)

940-954: ⚠️ Potential issue | 🟠 Major

Tenant prefix missing from date_path construction.

When tenant_id is provided, root correctly includes the tenant prefix (line 925-929), but date_path at line 941 only uses stream_name without the tenant prefix. This causes subsequent storage operations to access incorrect paths.

Example with tenant_id = Some("acme"):

  • root = "acme/my_stream"
  • date_path = "my_stream/2024-01-01" ✗ (should be "acme/my_stream/2024-01-01")
🔧 Proposed fix
         for date in dates {
-            let date_path = object_store::path::Path::from(format!("{}/{}", stream_name, &date));
+            let date_path = if let Some(tenant) = tenant_id {
+                object_store::path::Path::from(format!("{}/{}/{}", tenant, stream_name, &date))
+            } else {
+                object_store::path::Path::from(format!("{}/{}", stream_name, &date))
+            };
             let resp = self.storage.list_with_delimiter(Some(date_path)).await?;
🤖 Fix all issues with AI agents
In `@src/correlation.rs`:
- Line 213: The current delete call removes from the outer tenant map using
correlation.id; instead look up the tenant bucket and remove the correlation id
from that inner CorrelationMap. Replace
self.write().await.remove(&correlation.id) with logic that acquires a mutable
write lock, gets the bucket by the tenant key (e.g., correlation.tenant_id or
correlation.tenant), and then calls remove(&correlation.id) on that bucket
(handling the bucket-missing case appropriately); reference the types/methods
involved: the outer HashMap (tenant -> CorrelationMap), the inner
CorrelationMap, and the correlation.id / correlation.tenant_id fields.

In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 696-730: The migration loop is deleting the parent directory
(filters_path) instead of the specific v1 file because get_objects() only
returns file contents; update the loop that calls get_objects() and the
migration block (migrate_v1_v2, filter_value handling, and the delete_object
call) to track each file's path alongside its bytes and call delete_object with
that specific file path (not filters_path), or change get_objects() to return
(file_path, file_bytes) tuples and use the returned file_path when invoking
self.storage.delete_object and when deciding the new v2 path (filter_path) so
the original v1 file is removed after creating the v2 file.

In `@src/rbac/user.rs`:
- Around line 349-377: The validate method currently skips existence checks when
roles().get(tenant) or users().get(tenant) returns None; change validate (in the
function validate) so that if roles().get(tenant) is None you treat all
requested roles as non-existent by pushing each role from self.roles into
non_existent_roles, and likewise if users().get(tenant) is None push each
group_user.userid() from self.users into non_existent_users; keep the existing
loops when the map exists (i.e., only branch to "mark all missing" when the
Option is None) and preserve the existing error reporting (e.g., RBACError
return) behavior.
🧹 Nitpick comments (3)
src/correlation.rs (1)

180-185: update silently drops in-memory changes if tenant bucket doesn't exist.

If the tenant bucket was removed from memory (e.g., during a reload race), the in-memory update is silently skipped even though the metastore was updated. Consider using entry().or_default() for consistency with create.

♻️ Suggested improvement
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
         // Update in memory
-        if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(
-                updated_correlation.id.to_owned(),
-                updated_correlation.clone(),
-            );
-        }
+        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT).to_string();
+        self.write()
+            .await
+            .entry(tenant)
+            .or_default()
+            .insert(updated_correlation.id.to_owned(), updated_correlation.clone());
src/parseable/mod.rs (1)

1099-1126: Consider adding storage cleanup in delete_tenant.

The delete_tenant function removes in-memory state (users, roles, tenant metadata) but doesn't appear to clean up storage artifacts (streams, stored data, parseable.json for the tenant). If tenants should be fully deleted including their data, this may need additional cleanup. If this is intentional (soft delete preserving data), consider adding a comment to clarify.

src/rbac/user.rs (1)

502-508: Consider removing the commented-out update_in_metadata block.

If it’s deprecated, remove it or replace with a short TODO and an issue link to avoid lingering dead code.

@parmesant parmesant force-pushed the multi-tenancy-review branch from 8caed34 to ccb78d3 Compare February 13, 2026 09:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
src/prism/home/mod.rs (2)

381-383: ⚠️ Potential issue | 🟡 Minor

Potential panic if dashboard_id is None.

The unwrap() on dashboard_id will panic if a dashboard lacks an ID. Consider using ? with filter_map to gracefully skip such entries.

🛡️ Proposed fix to handle missing dashboard_id gracefully
     .filter_map(|dashboard| {
-        let dashboard_id = *dashboard.dashboard_id.as_ref().unwrap();
-        let dashboard_id = dashboard_id.to_string();
+        let dashboard_id = dashboard.dashboard_id.as_ref()?.to_string();
         if dashboard.title.to_lowercase().contains(query_value)
             || dashboard_id.to_lowercase().contains(query_value)
         {

409-412: ⚠️ Potential issue | 🟡 Minor

Potential panic if filter_id is None.

Similar to the dashboard case, unwrap() on filter_id will panic if a filter lacks an ID.

🛡️ Proposed fix to handle missing filter_id gracefully
     .filter_map(|filter| {
-        let filter_id = filter.filter_id.as_ref().unwrap().clone();
+        let filter_id = filter.filter_id.as_ref()?.clone();
         if filter.filter_name.to_lowercase().contains(query_value)
src/correlation.rs (2)

56-75: ⚠️ Potential issue | 🟠 Major

Normalize default-tenant key during load.

If the metastore returns an empty tenant ID for the default tenant, these correlations get stored under "" and later lookups (which use DEFAULT_TENANT) will miss them. Normalize empty keys to DEFAULT_TENANT before inserting into the in-memory map.

🛠️ Proposed fix
-        for (tenant_id, correlations_bytes) in all_correlations {
+        for (tenant_id, correlations_bytes) in all_correlations {
+            let tenant_key = if tenant_id.is_empty() {
+                DEFAULT_TENANT.to_string()
+            } else {
+                tenant_id
+            };
             let mut corrs = HashMap::new();
             for corr in correlations_bytes {
                 let correlation = match serde_json::from_slice::<CorrelationConfig>(&corr) {
                     Ok(c) => c,
                     Err(e) => {
                         error!("Unable to load correlation file : {e}");
                         continue;
                     }
                 };
                 corrs.insert(correlation.id.clone(), correlation);
             }
 
-            guard.insert(tenant_id, corrs);
+            guard.insert(tenant_key, corrs);
         }

Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created. Passing Some("") to storage is acceptable for path creation; concerns, if any, are about metrics labeling rather than path resolution.


151-188: ⚠️ Potential issue | 🟠 Major

Update flow overwrites new data and validates the wrong config.

updated_correlation.update(correlation) copies the old correlation into the new one, effectively discarding user changes, and validation runs on the old payload. The flow should apply updates to the existing object, validate the updated payload, and persist that.

🛠️ Proposed fix
-        let correlation = self
+        let mut correlation = self
             .get_correlation(&updated_correlation.id, &tenant_id)
             .await?;
         if correlation.user_id != updated_correlation.user_id {
             return Err(CorrelationError::AnyhowError(anyhow::Error::msg(format!(
                 r#"User "{}" isn't authorized to update correlation with ID - {}"#,
                 updated_correlation.user_id, correlation.id
             ))));
         }
 
-        correlation.validate(session_key).await?;
-        updated_correlation.update(correlation);
+        correlation.update(updated_correlation);
+        correlation.validate(session_key).await?;
 
         // Update in metastore
         PARSEABLE
             .metastore
-            .put_correlation(&updated_correlation, &tenant_id)
+            .put_correlation(&correlation, &tenant_id)
             .await?;
 
         let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
         // Update in memory
         if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(
-                updated_correlation.id.to_owned(),
-                updated_correlation.clone(),
-            );
+            corrs.insert(correlation.id.to_owned(), correlation.clone());
         }
 
-        Ok(updated_correlation)
+        Ok(correlation)
src/metastore/metastores/object_store_metastore.rs (2)

317-342: ⚠️ Potential issue | 🟠 Major

Default-tenant alert-state path uses the DEFAULT_TENANT directory.

When tenant_id is None, base_path is built with DEFAULT_TENANT, which will look under default/alerts even though default-tenant storage paths are intended to live at the root (""). This can make default-tenant alert states undiscoverable.

🔧 Suggested fix
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+        let tenant = tenant_id.as_deref().unwrap_or("");
         let base_path = RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY]);

Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment ("") so no extra tenant directory is created.


931-976: ⚠️ Potential issue | 🟠 Major

Fix dropped tenant prefix when listing manifest dates for non-default tenants.

For non-default tenants, the initial listing is scoped to {tenant}/{stream_name}, but list_with_delimiter returns full paths (e.g., tenant/stream/2024-01-01). Calling path.parts() on these extracts all segments: ["tenant", "stream", "2024-01-01"]. The filter removes stream_name but leaves tenant, so dates incorrectly contains the tenant name alongside actual dates. Subsequent date_path construction at line 953 uses only stream_name/{date}, producing paths like stream_name/tenant or stream_name/2024-01-01—both missing the tenant prefix. This causes manifest files for non-default tenants to be missed.

🔧 Suggested fix
-        let root = if let Some(tenant) = tenant_id {
-            format!("{tenant}/{stream_name}")
-        } else {
-            stream_name.into()
-        };
-        let resp = self.storage.list_with_delimiter(Some(root.into())).await?;
+        let base = if let Some(tenant) = tenant_id {
+            format!("{tenant}/{stream_name}")
+        } else {
+            stream_name.into()
+        };
+        let resp = self.storage.list_with_delimiter(Some(base.clone().into())).await?;
 
         let dates = resp
             .common_prefixes
             .iter()
-            .flat_map(|path| path.parts())
-            .filter(|name| name.as_ref() != stream_name && name.as_ref() != STREAM_ROOT_DIRECTORY)
-            .map(|name| name.as_ref().to_string())
+            .filter_map(|path| path.filename().map(|name| name.as_ref().to_string()))
+            .filter(|name| name.as_ref() != STREAM_ROOT_DIRECTORY)
             .collect::<Vec<_>>();
 
         for date in dates {
-            let date_path = object_store::path::Path::from(format!("{}/{}", stream_name, &date));
+            let date_path = object_store::path::Path::from(format!("{base}/{date}"));
             let resp = self.storage.list_with_delimiter(Some(date_path)).await?;
🤖 Fix all issues with AI agents
In `@src/connectors/kafka/processor.rs`:
- Around line 53-57: The current code pulls tenant_id from the first record and
reuses it for the whole chunk (the let tenant_id = ... using records.first()),
which can mix tenants; change processing so each Kafka record is handled with
its own tenant_id: in the function that iterates over records (use the existing
records iterator and the record variable r), extract r.tenant_id per-record and
pass that value into whatever downstream calls you currently give the single
tenant_id (or, alternatively, validate inside the loop that all r.tenant_id
values are identical and reject the chunk if not). Update any usages of the
previous tenant_id variable to use the per-record tenant value (e.g., replace
references to tenant_id with r.tenant_id or a local let tenant_id = r.tenant_id
inside the record-processing loop) to ensure tenant isolation.

In `@src/parseable/mod.rs`:
- Around line 1099-1125: The delete_tenant implementation removes users/roles
and TENANT_METADATA but never updates the in-memory tenant registry nor calls
the persistent metastore cleanup; refactor pub fn delete_tenant into an async fn
delete_tenant(&self, tenant_id: &str) -> Result<(), anyhow::Error> so you can
call self.metastore.delete_tenant(tenant_id).await and also remove the tenant
from self.tenants (e.g., self.tenants.remove(tenant_id) or the appropriate map
accessor) so list_tenants() stops returning it; keep existing removals
(mut_users(), mut_roles(), TENANT_METADATA.delete_tenant, Users.delete_user) but
ensure ordering handles errors from metastore.delete_tenant and propagate
failures via Result.
- Around line 1128-1163: The load_tenants function unconditionally populates
self.tenants even when is_multi_tenant is false; modify load_tenants (the
is_multi_tenant check near the top) to only extend/write self.tenants when
is_multi_tenant is true (and keep the existing TENANT_METADATA insertion logic
gated as-is), and when not multi-tenant return Ok(None) (or preserve the
previous None/Some semantics) instead of extending the tenants list; ensure the
write lock on self.tenants is only acquired and used in the multi-tenant branch
so list_tenants remains None for single-tenant mode.
🧹 Nitpick comments (4)
src/storage/retention.rs (1)

111-126: Minor style improvement: use direct equality comparison.

The days.eq(&0) can be simplified to days == 0 for better readability.

♻️ Suggested simplification
 impl Task {
     pub fn new(description: String, mut days: u32) -> Self {
-        if days.eq(&0) {
+        if days == 0 {
             days = 7;
             tracing::warn!(
                 "Using default 7 days for retention since an invalid value of 0 was provided by the user"
             );
         }
         let days = NonZeroU32::new(days).unwrap();
src/connectors/kafka/processor.rs (1)

49-57: Consider consistent extraction style for stream_name and tenant_id.

The stream_name extraction uses a fluent pattern, while tenant_id uses a more verbose if-let-else. For consistency and conciseness:

♻️ Suggested simplification
         let stream_name = records
             .first()
             .map(|r| r.topic.as_str())
             .unwrap_or_default();
-        let tenant_id = if let Some(r) = records.first() {
-            &r.tenant_id
-        } else {
-            &None
-        };
+        let tenant_id = records.first().map(|r| &r.tenant_id).unwrap_or(&None);
src/rbac/user.rs (1)

506-512: Consider removing stale commented-out method or tracking it via a TODO/issue.

src/handlers/http/ingest.rs (1)

284-338: Consider deriving tenant_id once for OTEL ingestion.
You currently re-parse the tenant header inside process_otel_content; passing a precomputed tenant_id from the handler avoids duplicate parsing and keeps a single source of truth.

♻️ Optional refactor sketch
-pub async fn setup_otel_stream(
-    req: &HttpRequest,
+pub async fn setup_otel_stream(
+    req: &HttpRequest,
+    tenant_id: &Option<String>,
     expected_log_source: LogSource,
     known_fields: &[&str],
     telemetry_type: TelemetryType,
 ) -> Result<(String, LogSource, LogSourceEntry, Option<String>), PostError> {
@@
-    let tenant_id = get_tenant_id_from_request(req);
     PARSEABLE
         .create_stream_if_not_exists(
@@
-            &tenant_id,
+            tenant_id,
             None,
         )
         .await?;
@@
-    if let Ok(stream) = PARSEABLE.get_stream(&stream_name, &tenant_id) {
+    if let Ok(stream) = PARSEABLE.get_stream(&stream_name, tenant_id) {
@@
-    PARSEABLE
-        .add_update_log_source(&stream_name, log_source_entry.clone(), &tenant_id)
+    PARSEABLE
+        .add_update_log_source(&stream_name, log_source_entry.clone(), tenant_id)
         .await?;
-async fn process_otel_content(
-    req: &HttpRequest,
+async fn process_otel_content(
+    req: &HttpRequest,
+    tenant_id: &Option<String>,
     body: web::Bytes,
     stream_name: &str,
     log_source: &LogSource,
     telemetry_type: TelemetryType,
 ) -> Result<(), PostError> {
@@
-            let tenant_id = get_tenant_id_from_request(req);
             if content_type == CONTENT_TYPE_JSON {
@@
-                    &tenant_id,
+                    tenant_id,
                 )
                 .await
                 {
 pub async fn handle_otel_logs_ingestion(
     req: HttpRequest,
     body: web::Bytes,
 ) -> Result<HttpResponse, PostError> {
+    let tenant_id = get_tenant_id_from_request(&req);
     let (stream_name, log_source, ..) = setup_otel_stream(
         &req,
+        &tenant_id,
         LogSource::OtelLogs,
         &OTEL_LOG_KNOWN_FIELD_LIST,
         TelemetryType::Logs,
     )
@@
-    process_otel_content(&req, body, &stream_name, &log_source, TelemetryType::Logs).await?;
+    process_otel_content(&req, &tenant_id, body, &stream_name, &log_source, TelemetryType::Logs).await?;

(Apply the same pattern to metrics and traces handlers.)

Comment on lines +53 to +57
let tenant_id = if let Some(r) = records.first() {
&r.tenant_id
} else {
&None
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for ConsumerRecord struct definition and tenant_id field
rg -n 'struct ConsumerRecord|pub tenant_id' --type rust src/connectors/kafka/

Repository: parseablehq/parseable

Length of output: 248


🏁 Script executed:

# Look at the full context of the kafka processor file to understand the data flow
head -100 src/connectors/kafka/processor.rs | cat -n

Repository: parseablehq/parseable

Length of output: 4222


🏁 Script executed:

# Search for where tenant_id is used in the processing flow
rg -n 'tenant_id' src/connectors/kafka/processor.rs -A 2 -B 2

Repository: parseablehq/parseable

Length of output: 768


🏁 Script executed:

# Find how records are created/consumed and if there's any tenant-aware partitioning
rg -n 'ConsumerRecord|consume|partition' src/connectors/kafka/ --type rust -A 1 | head -100

Repository: parseablehq/parseable

Length of output: 5363


🏁 Script executed:

# Find the full ConsumerRecord struct and from_borrowed_msg implementation
sed -n '141,170p' src/connectors/kafka/mod.rs | cat -n

Repository: parseablehq/parseable

Length of output: 1220


🏁 Script executed:

# Search for where tenant_id is set on ConsumerRecord
rg -n 'tenant_id.*=' src/connectors/kafka/ --type rust -B 3 -A 3

Repository: parseablehq/parseable

Length of output: 1064


🏁 Script executed:

# Find where records are produced into the chunk stream and if there's any tenant-based filtering
rg -n 'ReceiverStream|process_partition|send.*ConsumerRecord' src/connectors/kafka/ --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 5009


🏁 Script executed:

# Check the partition_stream or consumer implementation to see if records are partitioned by tenant
rg -n 'partition_stream|fn.*consume|next.*message' src/connectors/kafka/ --type rust -A 3 | head -80

Repository: parseablehq/parseable

Length of output: 5228


Process each record with its own tenant_id instead of assuming all records in a chunk share the same tenant.

Records are partitioned by Kafka topic/partition, not by tenant. Since tenant_id comes from individual message headers, a single chunk can contain messages with different tenant IDs. The current code at lines 53-57 extracts tenant_id only from the first record and applies it to all subsequent records, which causes cross-tenant data leakage if the chunk contains records from multiple tenants.

Iterate through records and either (1) process each record individually with its own tenant_id, (2) validate that all records in the chunk have the same tenant_id and reject mixed chunks, or (3) enforce tenant isolation at the producer level to guarantee records from different tenants never land in the same partition.

🤖 Prompt for AI Agents
In `@src/connectors/kafka/processor.rs` around lines 53 - 57, The current code
pulls tenant_id from the first record and reuses it for the whole chunk (the let
tenant_id = ... using records.first()), which can mix tenants; change processing
so each Kafka record is handled with its own tenant_id: in the function that
iterates over records (use the existing records iterator and the record variable
r), extract r.tenant_id per-record and pass that value into whatever downstream
calls you currently give the single tenant_id (or, alternatively, validate
inside the loop that all r.tenant_id values are identical and reject the chunk
if not). Update any usages of the previous tenant_id variable to use the
per-record tenant value (e.g., replace references to tenant_id with r.tenant_id
or a local let tenant_id = r.tenant_id inside the record-processing loop) to
ensure tenant isolation.

@parmesant parmesant force-pushed the multi-tenancy-review branch from ccb78d3 to 8f2b711 Compare February 13, 2026 09:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/metastore/metastores/object_store_metastore.rs (2)

317-341: ⚠️ Potential issue | 🟠 Major

Use an empty path segment for the default tenant.
tenant_id == None currently resolves to DEFAULT_TENANT, which makes the code look under a default/alerts directory that won’t exist in single-tenant mode. Use "" for path construction and keep tenant_id for storage calls.

🔧 Suggested fix
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+        let tenant = tenant_id.as_deref().unwrap_or("");
         let base_path = RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY]);
Based on learnings, default-tenant paths are intentionally built with an empty string segment so no extra tenant directory is created.

344-415: ⚠️ Potential issue | 🟠 Major

Alert state files are not tenant-scoped.
alert_state_json_path doesn’t include tenant context, so alert state files can collide across tenants and won’t be found by get_alert_states (which searches tenant-scoped directories).

🔧 Suggested fix
-        let path = alert_state_json_path(*alert_id);
+        let tenant = tenant_id.as_deref().unwrap_or("");
+        let path = RelativePathBuf::from_iter([
+            tenant,
+            ALERTS_ROOT_DIRECTORY,
+            format!("alert_state_{alert_id}.json"),
+        ]);
-        let path = alert_state_json_path(id);
+        let tenant = tenant_id.as_deref().unwrap_or("");
+        let path = RelativePathBuf::from_iter([
+            tenant,
+            ALERTS_ROOT_DIRECTORY,
+            format!("alert_state_{id}.json"),
+        ]);
🤖 Fix all issues with AI agents
In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 75-88: Update the doc comment on the get_objects async method to
fix the spelling mistake: change "Fetch mutiple .json objects" to "Fetch
multiple .json objects" in the comment above fn get_objects (the method that
calls self.storage.get_objects with the .json filter) so the docstring reads
correctly.
- Around line 654-760: In get_filters the migrated v1→v2 writes use
filter_path(...) which builds a RelativePath lacking the tenant prefix, so
migrated filters land in the shared/default tenant; update the write to include
tenant context (same way reads use users_dir) by prefixing the constructed
filter_path with the tenant (e.g. RelativePathBuf::from_iter([&tenant,
<filter_path>]) before calling storage.put_object) or change filter_path to
accept a tenant parameter; modify the code paths around get_filters,
filter_path, and the put_object call to ensure the RelativePath passed to
put_object contains the tenant prefix.

In `@src/parseable/mod.rs`:
- Around line 1040-1059: The add_tenant function has a TOCTOU race: the read
lock on self.tenants is released before acquiring the write lock, allowing
another thread to insert the same tenant; change the logic to acquire a write
lock (e.g., tenants.write()) once and perform the contains check and push under
that same lock so the check-and-insert is atomic, then call
TENANT_METADATA.insert_tenant while still holding (or immediately after holding)
the write lock to ensure consistency; keep the initial options.is_multi_tenant()
early-return intact and reference add_tenant, self.tenants, and TENANT_METADATA
when locating the code to modify.
🧹 Nitpick comments (8)
src/storage/retention.rs (1)

112-118: Minor style: prefer == over .eq(&0) for integer comparison.

Using days == 0 is more idiomatic Rust than days.eq(&0).

♻️ Suggested simplification
-        if days.eq(&0) {
+        if days == 0 {
             days = 7;
             tracing::warn!(
                 "Using default 7 days for retention since an invalid value of 0 was provided by the user"
             );
         }
src/metastore/metastores/object_store_metastore.rs (1)

1241-1309: Remove unnecessary tenant_id parameter or document its purpose.

put_node_metadata hardcodes tenant_id = None, but get_node_metadata accepts a tenant_id parameter passed by callers. Since the storage layer does not use tenant_id for path construction (it exists for trait compatibility only), this parameter creates inconsistency without adding value. Either remove the parameter and always pass None, or document why callers should pass specific tenant values.

src/tenants/mod.rs (1)

125-172: Consider removing the commented‑out legacy block.

It adds noise and can confuse the active serialization model.

🧹 Suggested cleanup
-// // Type for serialization and deserialization
-// #[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
-// pub struct TenantMetadata {
-//     pub tenant_id: String,
-//     pub version: String,
-//     pub mode: String,
-//     pub staging: PathBuf,
-//     pub storage: String,
-//     #[serde(default = "crate::utils::uid::generate_ulid")]
-//     pub deployment_id: uid::Uid,
-//     pub users: Vec<User>,
-//     pub user_groups: Vec<UserGroup>,
-//     pub streams: Vec<String>,
-//     pub server_mode: Mode,
-//     #[serde(default)]
-//     pub roles: HashMap<String, Vec<DefaultPrivilege>>,
-//     #[serde(default)]
-//     pub default_role: Option<String>,
-// }
-
-// impl TenantMetadata {
-//     pub fn from_storage_meta(meta: StorageMetadata, id: &str) -> Self {
-//         Self {
-//             tenant_id: id.to_owned(),
-//             version: meta.version,
-//             mode: meta.mode,
-//             staging: meta.staging,
-//             storage: meta.storage,
-//             deployment_id: meta.deployment_id,
-//             users: meta.users,
-//             user_groups: meta.user_groups,
-//             streams: meta.streams,
-//             server_mode: meta.server_mode,
-//             roles: meta.roles,
-//             default_role: meta.default_role,
-//         }
-//     }
-// }
-
-// impl MetastoreObject for TenantMetadata {
-//     fn get_object_path(&self) -> String {
-//         format!("{}/.parseable.json", &self.tenant_id)
-//     }
-
-//     fn get_object_id(&self) -> String {
-//         self.tenant_id.clone()
-//     }
-// }
src/parseable/mod.rs (5)

180-182: Remove commented-out code.

The commented-out tenant_storage field should be removed to keep the codebase clean. If this is planned for future implementation, consider tracking it in an issue instead.

Proposed fix
 pub struct Parseable {
     /// Configuration variables for parseable
     pub options: Arc<Options>,
     /// Storage engine backing parseable
     pub storage: Arc<dyn ObjectStorageProvider>,
-    // /// ObjectStorageProvider for each tenant
-    // pub tenant_storage: Arc<DashMap<String, Arc<dyn ObjectStorageProvider>>>,
-    /// Metadata and staging relating to each logstreams
+    /// Metadata and staging relating to each logstreams

1061-1097: Consider persisting to storage before updating in-memory state.

Both suspend_tenant_service and resume_tenant_service update in-memory state first (line 1066, 1086), then persist to storage. If put_remote_metadata fails, the in-memory state will be inconsistent with storage. Consider reversing the order: persist first, then update in-memory on success.

Proposed pattern for suspend_tenant_service
     pub async fn suspend_tenant_service(
         &self,
         tenant_id: String,
         service: Service,
     ) -> Result<(), anyhow::Error> {
-        TENANT_METADATA.suspend_service(&tenant_id, service.clone());
-
         // write to disk
         let tenant_id = &Some(tenant_id);
         let mut meta = get_metadata(tenant_id).await?;
         if let Some(sus) = meta.suspended_services.as_mut() {
             sus.insert(service);
         } else {
             meta.suspended_services = Some(HashSet::from_iter([service]));
         }
 
         put_remote_metadata(&meta, tenant_id).await?;
+
+        // Update in-memory state only after successful persistence
+        TENANT_METADATA.suspend_service(tenant_id.as_ref().unwrap(), service);
         Ok(())
     }

1100-1119: Remove commented-out code blocks.

Lines 1100, 1105-1107, and 1115-1118 contain commented-out code that should be removed. If this logic is needed for future implementation, track it in an issue instead.

Proposed cleanup
     pub fn delete_tenant(&self, tenant_id: &str) -> Result<(), anyhow::Error> {
-        // let mut metadata = get_metadata(&Some(tenant_id.to_owned())).await?;
         // delete users and sessions
         let users = mut_users().remove(tenant_id);
         if let Some(users) = users {
             for (userid, user) in users {
-                // metadata
-                //     .users
-                //     .retain(|u| u.tenant.eq(&Some(tenant_id.to_owned())));
-
                 Users.delete_user(&userid, &user.tenant);
             }
         }
 
         // delete roles
         mut_roles().remove(tenant_id);
-        // if let Some(roles) = mut_roles().remove(tenant_id) {
-        //     for (role, _) in roles {
-        //         // metadata.roles.retain(|r, _| !role.eq(r));
-        //     }
-        // }
 
         // delete resources

1166-1175: Consider logging lock acquisition failures.

When the read lock cannot be acquired (the else branch at line 1172), the function silently returns None, which could mask lock contention issues. Consider logging this condition for debugging purposes.

Proposed improvement
     pub fn list_tenants(&self) -> Option<Vec<String>> {
-        if let Ok(t) = self.tenants.as_ref().read()
-            && !t.is_empty()
-        {
-            let t = t.clone();
-            Some(t)
-        } else {
-            None
+        match self.tenants.as_ref().read() {
+            Ok(t) if !t.is_empty() => Some(t.clone()),
+            Ok(_) => None,
+            Err(e) => {
+                tracing::warn!("Failed to acquire read lock for tenants: {e}");
+                None
+            }
         }
     }

570-572: Consider replacing .expect() with proper error handling.

Multiple methods (lines 570-572, 853-855, 908-910, 1032-1034) use .expect(STREAM_EXISTS) which will panic if the stream is not found. While these paths assume the stream exists, race conditions could cause unexpected panics. Consider returning appropriate errors instead.

Example fix for add_update_log_source
-        let stream = self
-            .get_stream(stream_name, tenant_id)
-            .expect(STREAM_EXISTS);
+        let stream = self.get_stream(stream_name, tenant_id)?;

This requires changing the return type or mapping StreamNotFound to StreamError.

@parmesant parmesant force-pushed the multi-tenancy-review branch 2 times, most recently from f456045 to e8a40fd Compare February 13, 2026 10:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/metastore/metastores/object_store_metastore.rs (1)

316-342: ⚠️ Potential issue | 🟠 Major

Default-tenant alert state path should not use DEFAULT_TENANT.

When tenant_id is None, using DEFAULT_TENANT yields a default/alerts prefix, which won’t match default-tenant storage paths that use the empty segment. Use "" for the default tenant prefix instead.

🔧 Suggested fix
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+        let tenant = tenant_id.as_deref().unwrap_or("");
         let base_path = RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY]);

Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created. Passing Some("") to storage is acceptable for path creation; concerns, if any, are about metrics labeling rather than path resolution.

🤖 Fix all issues with AI agents
In `@src/rbac/user.rs`:
- Around line 370-382: The current control flow binds the else to if
!self.users.is_empty(), causing non_existent_users.extend(...) to run only when
self.users is empty; instead, when users().get(tenant) returns None we should
mark all requested users as non-existent. Fix by removing the outer else and
moving the extend(...) into the else branch of the if let Some(users) =
users().get(tenant) { ... } else {
non_existent_users.extend(self.users.iter().map(|u| u.userid().to_string())); },
keeping the initial check for self.users.is_empty() and the loop that checks
users.contains_key(group_user.userid()) intact (references: non_existent_users,
self.users, users(), tenant, userid()).
- Around line 408-418: The add_roles method inconsistently uses the external
tenant_id when calling mut_sessions().remove_user, which differs from
add_users/remove_roles/remove_users that use each group_user.tenant_id; update
add_roles (and its call site if needed) so after extending self.roles you
iterate group_user in &self.users and call
mut_sessions().remove_user(group_user.userid(), group_user.tenant_id()) (i.e.,
use the per-user tenant from group_user) to avoid invalidating sessions for the
wrong tenant while keeping the early-return for empty roles.
🧹 Nitpick comments (8)
src/storage/retention.rs (1)

111-126: Consider adding upper-bound validation for consistency.

TryFrom<Vec<TaskView>> validates that days <= 36500, but Task::new only validates days != 0. If Task::new is called directly with a large value, it bypasses the upper-bound check.

♻️ Suggested improvement for consistent validation
 impl Task {
     pub fn new(description: String, mut days: u32) -> Self {
         if days.eq(&0) {
             days = 7;
             tracing::warn!(
                 "Using default 7 days for retention since an invalid value of 0 was provided by the user"
             );
         }
+        if days > 36500 {
+            days = 36500;
+            tracing::warn!(
+                "Capping retention to 36500 days (approx 100 years) since an invalid value was provided"
+            );
+        }
         let days = NonZeroU32::new(days).unwrap();

Alternatively, if Task::new is only called internally with pre-validated values, this may be acceptable as-is.

src/rbac/user.rs (2)

161-165: Consider removing the commented-out code.

The salt generation change to explicit 32-byte fill is fine, but the commented-out line (// let salt = SaltString::generate(&mut OsRng);) should be removed to avoid code clutter.

🧹 Suggested cleanup
     let mut bytes = [0u8; 32];
     let r = &mut OsRng;
     r.fill_bytes(&mut bytes);
     let salt = SaltString::encode_b64(&bytes).unwrap();
-    // let salt = SaltString::generate(&mut OsRng);
     let argon2 = Argon2::default();

506-512: Remove commented-out code.

The update_in_metadata method is fully commented out. If it's no longer needed with the new tenant-aware architecture, remove it to avoid confusion.

🧹 Suggested cleanup
     pub fn remove_users_by_user_ids(&mut self, user_ids: HashSet<String>) -> Result<(), RBACError> {
         ...
         self.remove_users(users_to_remove)
     }
-
-    // pub async fn update_in_metadata(&self, tenant_id: &Option<String>) -> Result<(), RBACError> {
-    //     let mut metadata = get_metadata(tenant_id).await?;
-    //     metadata.user_groups.retain(|x| x.name != self.name);
-    //     metadata.user_groups.push(self.clone());
-    //     put_metadata(&metadata).await?;
-    //     Ok(())
-    // }
 }
src/parseable/mod.rs (5)

180-181: Consider removing commented-out code.

The commented tenant_storage field can be removed if it's no longer needed. Keeping dead code can lead to confusion.

♻️ Proposed removal
 pub struct Parseable {
     /// Configuration variables for parseable
     pub options: Arc<Options>,
     /// Storage engine backing parseable
     pub storage: Arc<dyn ObjectStorageProvider>,
-    // /// ObjectStorageProvider for each tenant
-    // pub tenant_storage: Arc<DashMap<String, Arc<dyn ObjectStorageProvider>>>,
     /// Metadata and staging relating to each logstreams

570-572: Consider returning errors instead of panicking on missing streams.

Multiple methods use .expect(STREAM_EXISTS) (here, and at lines 855, 910, 1034) which will panic if the stream doesn't exist. While the callers may validate stream existence beforehand, returning a proper StreamError would be more defensive and prevent potential runtime panics if invariants are violated.

♻️ Example refactor for this method
     pub async fn add_update_log_source(
         &self,
         stream_name: &str,
         log_source: LogSourceEntry,
         tenant_id: &Option<String>,
     ) -> Result<(), StreamError> {
-        let stream = self
-            .get_stream(stream_name, tenant_id)
-            .expect(STREAM_EXISTS);
+        let stream = self
+            .get_stream(stream_name, tenant_id)
+            .map_err(|_| StreamNotFound(stream_name.to_owned()))?;

1099-1129: Consider removing commented-out code in delete_tenant.

The implementation correctly handles in-memory cleanup as per the established contract (caller performs persistent cleanup via Metastore::delete_tenant). However, the commented-out code at lines 1105-1108 and 1115-1119 should be removed.

♻️ Proposed cleanup
     pub fn delete_tenant(&self, tenant_id: &str) -> Result<(), anyhow::Error> {
-        // let mut metadata = get_metadata(&Some(tenant_id.to_owned())).await?;
         // delete users and sessions
         let users = mut_users().remove(tenant_id);
         if let Some(users) = users {
             for (userid, user) in users {
-                // metadata
-                //     .users
-                //     .retain(|u| u.tenant.eq(&Some(tenant_id.to_owned())));
-
                 Users.delete_user(&userid, &user.tenant);
             }
         }

         // delete roles
         mut_roles().remove(tenant_id);
-        // if let Some(roles) = mut_roles().remove(tenant_id) {
-        //     for (role, _) in roles {
-        //         // metadata.roles.retain(|r, _| !role.eq(r));
-        //     }
-        // }

         // delete resources

Based on learnings: In src/parseable/mod.rs, enforce the separation of concerns for tenant deletion: Metastore::delete_tenant(tenant_id) must perform the persistent cleanup, while Parseable::delete_tenant(&self, tenant_id: &str) should only perform in-memory cleanup.


1143-1158: Conditional structure is correct but could be clearer.

The logic correctly handles the three cases (tenant with metadata in multi-tenant mode, non-multi-tenant mode, invalid tenant directory). However, the empty else if !is_multi_tenant {} branch (line 1153) is confusing. Consider restructuring for clarity.

♻️ Suggested refactor for clarity
         for tenant_id in dirs.iter() {
-            if let Some(meta) = PARSEABLE
+            let meta = PARSEABLE
                 .metastore
                 .get_parseable_metadata(&Some(tenant_id.clone()))
-                .await?
-                && is_multi_tenant
-            {
-                let metadata: StorageMetadata = serde_json::from_slice(&meta)?;
+                .await?;
 
-                TENANT_METADATA.insert_tenant(tenant_id.clone(), metadata.clone());
-            } else if !is_multi_tenant {
-            } else {
-                return Err(anyhow::Error::msg(format!(
-                    "Found invalid tenant directory with multi-tenant mode- {tenant_id}.\nExiting."
-                )));
+            match (meta, is_multi_tenant) {
+                (Some(meta), true) => {
+                    let metadata: StorageMetadata = serde_json::from_slice(&meta)?;
+                    TENANT_METADATA.insert_tenant(tenant_id.clone(), metadata.clone());
+                }
+                (None, true) => {
+                    return Err(anyhow::Error::msg(format!(
+                        "Found invalid tenant directory with multi-tenant mode- {tenant_id}.\nExiting."
+                    )));
+                }
+                (_, false) => {
+                    // Non-multi-tenant mode: skip tenant directory processing
+                }
             }
         }

290-302: Consider including tenant ID in error log for easier debugging.

When a tenant fails stream listing, the log at line 294 includes the error but not which tenant failed. Adding the tenant ID would aid debugging in multi-tenant deployments.

♻️ Enhanced logging
             for tenant in tenants {
                 if let Err(e) = PARSEABLE.metastore.list_streams(&Some(tenant)).await {
-                    tracing::error!("{e}");
+                    tracing::error!("Failed to list streams for tenant {tenant}: {e}");
                     has_stream = false;
                     break;
                 };

@parmesant parmesant force-pushed the multi-tenancy-review branch 2 times, most recently from 58b2c1b to b039be5 Compare February 13, 2026 10:20
@parmesant
Copy link
Contributor Author

@coderabbitai resume

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

✅ Actions performed

Reviews resumed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/metastore/metastores/object_store_metastore.rs (1)

344-415: ⚠️ Potential issue | 🔴 Critical

Alert state path is missing tenant context, causing multi-tenant data misplacement.

get_alert_states (lines 321-322) correctly constructs a tenant-prefixed path: [tenant, ALERTS_ROOT_DIRECTORY]. However, both get_alert_state_entry (line 349) and put_alert_state (line 373) call alert_state_json_path(id) which does not include the tenant prefix.

This inconsistency means:

  • Reading all alert states scans {tenant}/.alerts/
  • Reading/writing individual alert state uses .alerts/alert_state_{id}.json

In multi-tenant deployments, alert states will be written to and read from the wrong (shared) location.

Update alert_state_json_path to accept tenant_id (similar to alert_json_path) and update call sites:

// In src/storage/object_storage.rs
-pub fn alert_state_json_path(alert_id: Ulid) -> RelativePathBuf {
+pub fn alert_state_json_path(alert_id: Ulid, tenant_id: &Option<String>) -> RelativePathBuf {
+    if let Some(tenant) = tenant_id.as_ref() {
+        RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY, &format!("alert_state_{alert_id}.json")])
+    } else {
         RelativePathBuf::from_iter([ALERTS_ROOT_DIRECTORY, &format!("alert_state_{alert_id}.json")])
+    }
 }
// In get_alert_state_entry
-let path = alert_state_json_path(*alert_id);
+let path = alert_state_json_path(*alert_id, tenant_id);

// In put_alert_state  
-let path = alert_state_json_path(id);
+let path = alert_state_json_path(id, tenant_id);
src/correlation.rs (1)

157-185: ⚠️ Potential issue | 🟠 Major

Update path currently discards incoming changes.

correlation.validate(session_key) validates the old config, and updated_correlation.update(correlation) overwrites the new values with the old ones. This makes updates a no-op and skips validation for new tables. Validate the incoming config and apply it to the stored correlation before persisting.

🛠️ Proposed fix
-        correlation.validate(session_key).await?;
-        updated_correlation.update(correlation);
+        updated_correlation.validate(session_key).await?;
+        let mut correlation = correlation;
+        correlation.update(updated_correlation);

         // Update in metastore
         PARSEABLE
             .metastore
-            .put_correlation(&updated_correlation, &tenant_id)
+            .put_correlation(&correlation, &tenant_id)
             .await?;

         let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
         // Update in memory
         if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(
-                updated_correlation.id.to_owned(),
-                updated_correlation.clone(),
-            );
+            corrs.insert(correlation.id.to_owned(), correlation.clone());
         }

-        Ok(updated_correlation)
+        Ok(correlation)
src/handlers/http/middleware.rs (1)

242-259: ⚠️ Potential issue | 🟠 Major

Re-check authorization after token refresh.

Line 242 computes auth_result synchronously before the async block. For non-BasicAuth sessions (e.g., OIDC-issued SessionId), the authorize() function returns Response::ReloadRequired if the session is not found in active_sessions or if no permissions are cached (see rbac/mod.rs:232). Later, when refresh_token() is called at line 258, it creates/updates the session in active_sessions, but the already-computed auth_result remains unchanged. The middleware then matches against this stale result at line 261, causing the request to fail with "session expired" even though the refresh just succeeded.

Solution: Either call the refresh before invoking authorize(), or re-run authorization after a successful refresh to ensure the decision reflects the refreshed session state.

src/rbac/user.rs (1)

334-403: ⚠️ Potential issue | 🟠 Major

Enforce tenant-scoped UserGroup membership.
validate doesn’t reject users whose tenant_id differs from the group’s tenant, which breaks tenant isolation. Please guard against cross-tenant membership here.

🛠️ Possible fix
 pub fn validate(&self, tenant_id: &Option<String>) -> Result<(), RBACError> {
     let valid_name = is_valid_group_name(&self.name);
     let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+    let mut cross_tenant_users = Vec::new();
+    for group_user in &self.users {
+        let user_tenant = group_user.tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+        if user_tenant != tenant {
+            cross_tenant_users.push(group_user.userid().to_string());
+        }
+    }
     if let Some(tenant_ug) = read_user_groups().get(tenant)
         && tenant_ug.contains_key(&self.name)
     {
         return Err(RBACError::UserGroupExists(self.name.clone()));
     }
@@
         if !self.users.is_empty() {
             // validate that the users exist
             if let Some(users) = users().get(tenant) {
                 for group_user in &self.users {
                     if !users.contains_key(group_user.userid()) {
                         non_existent_users.push(group_user.userid().to_string());
                     }
                 }
             } else {
                 non_existent_users.extend(self.users.iter().map(|u| u.userid().to_string()));
             }
         }
+        if !cross_tenant_users.is_empty() {
+            non_existent_users.extend(cross_tenant_users);
+        }
Based on learnings: RBAC UserGroup is tenant-scoped in parseablehq/parseable (PR `#1545`): a group cannot contain users from different tenants. Avoid cross-tenant membership assumptions in group APIs.
src/storage/object_storage.rs (1)

1120-1138: ⚠️ Potential issue | 🟡 Minor

Handle empty tenant_id as default in stream_relative_path.
If tenant_id is Some(""), the current format produces a leading slash. Treat empty tenant IDs like the default tenant to keep paths consistent.

🛠️ Suggested fix
-    if let Some(tenant) = tenant_id
-        && !tenant.eq(DEFAULT_TENANT)
-    {
-        format!("{tenant}/{stream_name}/{file_suffix}")
-    } else {
-        format!("{stream_name}/{file_suffix}")
-    }
+    if let Some(tenant) = tenant_id.as_deref() {
+        if tenant.is_empty() || tenant == DEFAULT_TENANT {
+            format!("{stream_name}/{file_suffix}")
+        } else {
+            format!("{tenant}/{stream_name}/{file_suffix}")
+        }
+    } else {
+        format!("{stream_name}/{file_suffix}")
+    }
Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created.
🤖 Fix all issues with AI agents
In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 713-727: The three fragile .as_object().unwrap() calls on
filter_value after migrate_v1_v2 can panic if the migration returns non-object
JSON; update the code that extracts user_id, filter_id, and stream_name to first
verify filter_value.as_object() (e.g., if let Some(obj) =
filter_value.as_object() or match) and only then call .get(...).and_then(...);
if the value is not an object, handle it gracefully (log an error via the
existing logger, skip this entry, or return a controlled Err) so that code in
object_store_metastore.rs (the logic that calls migrate_v1_v2 and then reads
user_id/filter_id/stream_name) no longer uses unwrap and cannot panic on
malformed migrated data.

In `@src/parseable/mod.rs`:
- Around line 324-333: The docstring for validate_multi_tenancy is misleading:
it says the function should fail when multi-tenant traces exist but the flag is
off, yet the implementation simply delegates to load_tenants() which currently
no-ops for that case; either implement the failure logic here or change the
comment to accurately describe current behavior. Update the comment on
validate_multi_tenancy to state that it delegates to load_tenants() and that
load_tenants controls whether an error is returned (or, if you prefer enforcing,
modify validate_multi_tenancy to call load_tenants(), inspect tenant traces and
the multi-tenant start flag, and return an Err when traces exist but the flag is
not set, referencing validate_multi_tenancy and load_tenants by name).
🧹 Nitpick comments (3)
src/handlers/http/middleware.rs (1)

168-210: Tenant header normalization looks solid; update the inline comment to reflect header removal.

The non‑ingest branch now clears any caller‑supplied tenant header when derivation fails, which blocks spoofing. Line 196’s comment still reads as if we only add the header; consider updating it to mention the removal path too.

✏️ Suggested comment tweak
-                    // tenant header should not be present, modify request to add
+                    // derive tenant from session; set header when available, otherwise remove any caller-supplied header

Based on learnings: In src/handlers/http/middleware.rs, when running in multi-tenant mode, remove any incoming "tenant" header if the tenant cannot be derived from the authenticated session for non-ingest requests (excluding Login/SuperAdmin); and pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.

src/tenants/mod.rs (1)

125-172: Optional: remove the commented-out legacy TenantMetadata block.

🧹 Suggested cleanup
-// // Type for serialization and deserialization
-// #[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
-// pub struct TenantMetadata {
-//     pub tenant_id: String,
-//     pub version: String,
-//     pub mode: String,
-//     pub staging: PathBuf,
-//     pub storage: String,
-//     #[serde(default = "crate::utils::uid::generate_ulid")]
-//     pub deployment_id: uid::Uid,
-//     pub users: Vec<User>,
-//     pub user_groups: Vec<UserGroup>,
-//     pub streams: Vec<String>,
-//     pub server_mode: Mode,
-//     #[serde(default)]
-//     pub roles: HashMap<String, Vec<DefaultPrivilege>>,
-//     #[serde(default)]
-//     pub default_role: Option<String>,
-// }
-//
-// impl TenantMetadata {
-//     pub fn from_storage_meta(meta: StorageMetadata, id: &str) -> Self {
-//         Self {
-//             tenant_id: id.to_owned(),
-//             version: meta.version,
-//             mode: meta.mode,
-//             staging: meta.staging,
-//             storage: meta.storage,
-//             deployment_id: meta.deployment_id,
-//             users: meta.users,
-//             user_groups: meta.user_groups,
-//             streams: meta.streams,
-//             server_mode: meta.server_mode,
-//             roles: meta.roles,
-//             default_role: meta.default_role,
-//         }
-//     }
-// }
-//
-// impl MetastoreObject for TenantMetadata {
-//     fn get_object_path(&self) -> String {
-//         format!("{}/.parseable.json", &self.tenant_id)
-//     }
-//
-//     fn get_object_id(&self) -> String {
-//         self.tenant_id.clone()
-//     }
-// }
src/rbac/user.rs (1)

494-500: Optional: drop the commented-out update_in_metadata block.

🧹 Suggested cleanup
-    // pub async fn update_in_metadata(&self, tenant_id: &Option<String>) -> Result<(), RBACError> {
-    //     let mut metadata = get_metadata(tenant_id).await?;
-    //     metadata.user_groups.retain(|x| x.name != self.name);
-    //     metadata.user_groups.push(self.clone());
-    //     put_metadata(&metadata).await?;
-    //     Ok(())
-    // }

@parmesant parmesant force-pushed the multi-tenancy-review branch from b039be5 to 3186ae1 Compare February 13, 2026 10:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/storage/object_storage.rs (1)

1120-1138: ⚠️ Potential issue | 🟠 Major

Handle empty tenant_id to avoid leading-slash paths.

When tenant_id is Some(""), format!("{tenant}/...") yields a leading “/” and can create invalid relative paths for default-tenant uploads. Treat empty tenant as no prefix.

🛠️ Proposed fix
-    if let Some(tenant) = tenant_id
-        && !tenant.eq(DEFAULT_TENANT)
-    {
-        format!("{tenant}/{stream_name}/{file_suffix}")
-    } else {
-        format!("{stream_name}/{file_suffix}")
-    }
+    if let Some(tenant) = tenant_id.as_deref()
+        && !tenant.is_empty()
+        && !tenant.eq(DEFAULT_TENANT)
+    {
+        format!("{tenant}/{stream_name}/{file_suffix}")
+    } else {
+        format!("{stream_name}/{file_suffix}")
+    }

Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created. Passing Some("") to storage is acceptable for path creation; concerns, if any, are about metrics labeling rather than path resolution.

src/parseable/mod.rs (1)

267-306: ⚠️ Potential issue | 🟠 Major

Call validate_multi_tenancy() before load_metadata() to populate the tenants registry in multi-tenant deployments.

In multi-tenant mode, validate_storage() relies on list_tenants() to iterate over tenant directories. However, validate_multi_tenancy() (which populates the tenants list via load_tenants()) is never invoked during startup, leaving the tenants registry empty. This causes list_tenants() to return None, making multi-tenant data appear as single-tenant data during initialization, leading to incorrect startup validation.

🤖 Fix all issues with AI agents
In `@src/parseable/mod.rs`:
- Around line 1100-1129: delete_tenant currently removes users and roles but
leaves tenant-scoped UserGroup entries behind; update the cleanup to remove user
groups for the tenant as well by calling the user-groups map remover. Add or use
a mut_user_groups() accessor (mirroring mut_users()/mut_roles()) and invoke
mut_user_groups().remove(tenant_id), or call
write_user_groups().0.remove(tenant_id) before returning in delete_tenant to
ensure UserGroup entries are cleared when a tenant is deleted.

In `@src/rbac/user.rs`:
- Around line 349-382: The validate() method currently only ensures users exist
somewhere but doesn't prevent adding users from other tenants; update
UserGroup::validate to enforce tenant-scoped membership by checking each group
user against the tenant-specific users map (use the local tenant variable and
users().get(tenant) as the primary lookup) and, if a user is found in a
different tenant, treat that as invalid: collect those cross-tenant user IDs and
return a validation error (e.g., an RBACError variant like
UserGroupContainsForeignUser or UserNotInTenant) instead of allowing the group
to be created; refer to validate(), users(), read_user_groups(), self.users and
group_user.userid() to locate where to add the additional check.
🧹 Nitpick comments (4)
src/correlation.rs (1)

157-185: Optional: make cache update resilient with entry(...).or_default().

If the tenant bucket is missing between the read and write (e.g., tenant cleanup), the in-memory update is skipped. Using entry keeps the cache consistent in edge cases.

♻️ Suggested tweak
-        if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(
-                updated_correlation.id.to_owned(),
-                updated_correlation.clone(),
-            );
-        }
+        let mut guard = self.write().await;
+        let corrs = guard.entry(tenant.to_string()).or_default();
+        corrs.insert(
+            updated_correlation.id.to_owned(),
+            updated_correlation.clone(),
+        );
src/storage/retention.rs (3)

48-75: Include tenant_id in retention-load warnings

In multi-tenant mode, identical stream names across tenants can make this warning ambiguous.

💡 Suggested log context
-                    Err(err) => {
-                        warn!("failed to load retention config for {stream_name} due to {err:?}")
-                    }
+                    Err(err) => {
+                        warn!(
+                            "failed to load retention config for tenant={tenant_id:?} stream={stream_name} due to {err:?}"
+                        )
+                    }

111-126: Keep Task::new validation aligned with the 36500‑day cap

TryFrom<Vec<TaskView>> rejects durations > 36500 days, but Task::new allows any value. Consider enforcing the same bound (or returning a Result) to avoid creating invalid tasks programmatically.


211-245: Add tenant context to retention delete logs

The delete path now runs per tenant, but log lines don’t include tenant_id, making troubleshooting noisy in multi-tenant deployments.

💡 Suggested log context
-        info!("running retention task - delete for stream={stream_name}");
+        info!(
+            "running retention task - delete for tenant={tenant_id:?} stream={stream_name}"
+        );
 ...
-                error!(
-                    "Failed to update snapshot for retention cleanup (stream={}): {}. Aborting delete.",
-                    stream_name, err
-                );
+                error!(
+                    "Failed to update snapshot for retention cleanup (tenant={tenant_id:?} stream={}): {}. Aborting delete.",
+                    stream_name, err
+                );
 ...
-                    error!("Failed to run delete task {err:?}");
+                    error!(
+                        "Failed to run delete task for tenant={tenant_id:?} stream={stream_name}: {err:?}"
+                    );

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
src/storage/object_storage.rs (1)

1119-1138: ⚠️ Potential issue | 🟠 Major

Handle empty tenant IDs in stream paths (default-tenant).

If tenant_id is Some(""), the current formatting yields a leading “/” (e.g., "/stream/..."), which can misplace default-tenant uploads. Treat empty strings like None for path construction.

🔧 Suggested fix
-    if let Some(tenant) = tenant_id
-        && !tenant.eq(DEFAULT_TENANT)
-    {
-        format!("{tenant}/{stream_name}/{file_suffix}")
-    } else {
-        format!("{stream_name}/{file_suffix}")
-    }
+    if let Some(tenant) = tenant_id
+        .as_deref()
+        .filter(|t| !t.is_empty() && *t != DEFAULT_TENANT)
+    {
+        format!("{tenant}/{stream_name}/{file_suffix}")
+    } else {
+        format!("{stream_name}/{file_suffix}")
+    }

Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created. Passing Some("") to storage is acceptable for path creation; concerns, if any, are about metrics labeling rather than path resolution.

src/metastore/metastores/object_store_metastore.rs (1)

316-342: ⚠️ Potential issue | 🟠 Major

Default-tenant alert-state listing should not use DEFAULT_TENANT for None.

When tenant_id is None (single-tenant/default), this lists under "default/.alerts" instead of ".alerts", so alert-state files may be missed. Use empty root for None.

🔧 Suggested fix
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+        let tenant = tenant_id.as_deref().unwrap_or("");
         let base_path = RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY]);

Based on learnings: In src/metastore/metastores/object_store_metastore.rs, default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created. Passing Some("") to storage is acceptable for path creation; concerns, if any, are about metrics labeling rather than path resolution.

src/alerts/alert_structs.rs (1)

42-69: ⚠️ Potential issue | 🟡 Minor

Add tenant_id to reserved field list to avoid other_fields collisions.

Without this, a client can pass tenant_id/tenantId via other_fields and create conflicting flattened keys.

🔧 Suggested fix
 const RESERVED_FIELDS: &[&str] = &[
     "version",
     "id",
     "severity",
     "title",
     "query",
     "datasets",
     "alertType",
     "alert_type",
     "anomalyConfig",
     "anomaly_config",
     "forecastConfig",
     "forecast_config",
     "thresholdConfig",
     "threshold_config",
     "evalConfig",
     "eval_config",
     "targets",
     "state",
     "notificationState",
     "notification_state",
     "notificationConfig",
     "notification_config",
     "created",
     "tags",
     "lastTriggeredAt",
     "last_triggered_at",
+    "tenantId",
+    "tenant_id",
 ];
src/alerts/target.rs (2)

304-385: ⚠️ Potential issue | 🔴 Critical

Prevent Retry::Finite(0) underflow (times - 1) leading to wraparound / huge loops.

for _ in 0..(times - 1) is unsafe when times == 0 (user-provided), and will underflow. Even if UI “should” send >= 1, this is a sharp edge.

Proposed fix (saturating subtraction + validate at parse)
-                Retry::Finite(times) => {
-                    for _ in 0..(times - 1) {
+                Retry::Finite(times) => {
+                    for _ in 0..times.saturating_sub(1) {
                         let current_state = if let Ok(state) =
                             alerts.get_state(alert_id, &tenant_id).await
                         {
                             state
                         } else {
                             *state.lock().unwrap() = TimeoutState::default();
                             warn!(
                                 "Unable to fetch state for given alert_id- {alert_id}, stopping target notifs"
                             );
                             return;
                         };
                         ...
                     }
                 }

And when parsing:

         if let Some(times) = notification_config.times {
-            timeout.times = Retry::Finite(times)
+            timeout.times = Retry::Finite(times.max(1))
         }

Also applies to: 441-444


185-195: ⚠️ Potential issue | 🟠 Major

Fix Target serde/API mismatch (tenant vs tenantId) and DEFAULT_TENANT path inconsistency.

Inbound JSON uses "tenantId" (via TargetVerifier + rename_all = "camelCase"), but Target serializes it as "tenant". This breaks round-tripping: a client cannot POST back a response from GET because the field name changes. Additionally, target_json_path() uses self.tenant directly without normalizing Some("DEFAULT_TENANT") to None, risking path mismatches when targets are initially stored with tenant=None then later loaded as Some("DEFAULT_TENANT").

Rename the field to tenant_id with explicit #[serde(rename = "tenantId")] to fix both round-trip serialization and clarify the DEFAULT_TENANT contract. Apply the same change at lines 410–452 (TargetVerifier→Target conversion) and line 78 (put_target call).

Diff
 #[derive(Debug, serde::Serialize, serde::Deserialize, Clone)]
 #[serde(rename_all = "camelCase")]
 #[serde(try_from = "TargetVerifier")]
 pub struct Target {
     pub name: String,
     #[serde(flatten)]
     pub target: TargetType,
     #[serde(default = "Ulid::new")]
     pub id: Ulid,
-    pub tenant: Option<String>,
+    #[serde(rename = "tenantId")]
+    pub tenant_id: Option<String>,
 }
 impl MetastoreObject for Target {
     fn get_object_path(&self) -> String {
-        target_json_path(&self.id, &self.tenant).to_string()
+        target_json_path(&self.id, &self.tenant_id).to_string()
     }
 Ok(Target {
     name: value.name,
     target: value.target,
     id: value.id,
-    tenant: value.tenant_id,
+    tenant_id: value.tenant_id,
 })
-        .put_target(&target, &target.tenant)
+        .put_target(&target, &target.tenant_id)
-        let tenant_id = target.tenant.as_deref().unwrap_or(DEFAULT_TENANT);
+        let tenant_id = target.tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
-                    t.tenant = Some(tenant_id.clone());
+                    t.tenant_id = Some(tenant_id.clone());
src/alerts/mod.rs (3)

1055-1157: ⚠️ Potential issue | 🟠 Major

Default-tenant representation looks inconsistent (Some(DEFAULT_TENANT) vs None) and can break default schema selection.

get_tenant_id_from_key() returns None for the default tenant, and query helpers treat None as "public" (see get_number_of_agg_exprs() / get_aggregate_projection()). But in Alerts::load(), the tenant option is derived via tenant_id.is_empty(), and then alert.tenant_id.clone_from(tenant). If the metastore returns DEFAULT_TENANT as the key (per repo behavior), tenant becomes Some(DEFAULT_TENANT), which can later set DataFusion’s default_schema to "default" instead of "public".

Proposed fix (treat DEFAULT_TENANT like “no tenant” in Option fields)
-        for (tenant_id, raw_bytes) in raw_objects {
-            let tenant = if tenant_id.is_empty() {
-                &None
-            } else {
-                &Some(tenant_id.clone())
-            };
+        for (tenant_id, raw_bytes) in raw_objects {
+            let tenant = if tenant_id.is_empty() || tenant_id == DEFAULT_TENANT {
+                &None
+            } else {
+                &Some(tenant_id.clone())
+            };
             for alert_bytes in raw_bytes {
                 ...
                 // ensure that alert config's tenant is correctly set
                 alert.tenant_id.clone_from(tenant);
                 ...
-                map.entry(tenant_id.clone())
+                map.entry(tenant_id.clone())
                     .or_default()
                     .insert(*alert.get_id(), alert);
             }
         }

Based on learnings, get_tenant_id_from_key() intentionally returns None for the default tenant (i.e., don’t persist/propagate Some(DEFAULT_TENANT) as the “tenant option”).

Also applies to: 733-775


616-651: ⚠️ Potential issue | 🔴 Critical

Prevent cross-tenant SSE notification leaks by filtering sessions to the alert’s tenant.

trigger_notifications() currently broadcasts to any active session that passes user_auth_for_query(). But user_auth_for_query() derives tenant from the session, not from the alert. In multi-tenant mode, this can allow users in other tenants (with similarly named datasets/streams) to receive alerts that belong to a different tenant.

Proposed fix (gate by tenant equality before auth)
         // get active sessions
         let active_session = sessions().get_active_sessions();
         let mut broadcast_to = vec![];
         for (session, _, _) in active_session {
+            let session_tenant = get_tenant_id_from_key(&session);
+            let alert_tenant = self.tenant_id.clone();
+            let same_tenant = match (session_tenant.as_deref(), alert_tenant.as_deref()) {
+                (None, None) => true,
+                (Some(a), Some(b)) => a == b,
+                _ => false,
+            };
+            if !same_tenant {
+                continue;
+            }
             if user_auth_for_query(&session, &self.query).await.is_ok()
                 && let SessionKey::SessionId(id) = &session
             {
                 broadcast_to.push(*id);
             }
         }

733-775: ⚠️ Potential issue | 🟠 Major

Normalize tenant_id to "public" schema when it is DEFAULT_TENANT or empty.

The code sets default_schema = tenant_id.as_deref().unwrap_or("public"), which passes DEFAULT_TENANT through literally as a schema name. However:

  • In single-tenant mode, only the "public" schema is registered (query/mod.rs:200)
  • In multi-tenant mode, schemas are registered with actual tenant names (query/mod.rs:185-190)
  • metastore.get_alerts() normalizes empty tenant IDs to DEFAULT_TENANT before returning (object_store_metastore.rs:276-278), so alerts may have tenant_id = Some("DEFAULT_TENANT")

Setting default_schema = "DEFAULT_TENANT" in single-tenant mode will cause schema resolution to fail. Normalize explicitly:

Suggested fix
-    session_state
-        .config_mut()
-        .options_mut()
-        .catalog
-        .default_schema = tenant_id.as_deref().unwrap_or("public").to_owned();
+    let schema = match tenant_id.as_deref() {
+        None | Some("") | Some(DEFAULT_TENANT) => "public",
+        Some(t) => t,
+    };
+    session_state
+        .config_mut()
+        .options_mut()
+        .catalog
+        .default_schema = schema.to_owned();

Note: get_ctx().state() returns a cloned SessionContext, so mutations are local to each call—no concurrency risk.

🤖 Fix all issues with AI agents
In `@src/alerts/target.rs`:
- Around line 53-70: The code is inconsistently handling default tenants: fix by
centralizing tenant normalization (e.g., a helper like normalize_tenant_option)
and use it everywhere; in TargetConfigs::load ensure you normalize tenant_id
before assigning t.tenant (set t.tenant = None for the default tenant instead of
Some(DEFAULT_TENANT) or Some("")), and update all lookups that currently use
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) and the delete() path to call the
normalization helper first and pass the normalized tenant (or normalized Option)
into alerts.get_all_alerts and other tenant-aware methods so callers with
Some("") cannot bypass checks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/storage/gcs.rs (2)

180-207: ⚠️ Potential issue | 🟡 Minor

Record metrics only after successful GET and normalize empty tenant IDs.
Line 185-187 increments the call metric before confirming success, and empty tenant_id will label metrics with "" instead of DEFAULT_TENANT. Consider moving the call metric into the Ok branch and normalizing empty strings.

Suggested fix
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
-        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
+        let resp = self.client.get(&to_object_store_path(path)).await;
+        let tenant = tenant_id
+            .as_deref()
+            .filter(|t| !t.is_empty())
+            .unwrap_or(DEFAULT_TENANT);
         match resp {
             Ok(resp) => {
+                increment_object_store_calls_by_date(
+                    "GET",
+                    &Utc::now().date_naive().to_string(),
+                    tenant,
+                );
                 let body: Bytes = resp.bytes().await?;

Based on learnings: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted; and default-tenant paths are intentionally built with an empty string segment (""), so metrics labeling should normalize empty tenant IDs to DEFAULT_TENANT.


232-283: ⚠️ Potential issue | 🟡 Minor

Count DELETE metrics only on successful delete.
Line 249-255 increments delete counts regardless of delete outcome. This inflates DELETE metrics and file counts when a delete fails. Move files_deleted increments and DELETE call metrics into the Ok branch.

Suggested fix
-                    Ok(obj) => {
-                        files_deleted.fetch_add(1, Ordering::Relaxed);
-                        let delete_resp = self.client.delete(&obj.location).await;
-                        increment_object_store_calls_by_date(
-                            "DELETE",
-                            &Utc::now().date_naive().to_string(),
-                            tenant,
-                        );
-                        if delete_resp.is_err() {
+                    Ok(obj) => {
+                        let delete_resp = self.client.delete(&obj.location).await;
+                        if delete_resp.is_ok() {
+                            files_deleted.fetch_add(1, Ordering::Relaxed);
+                            increment_object_store_calls_by_date(
+                                "DELETE",
+                                &Utc::now().date_naive().to_string(),
+                                tenant,
+                            );
+                        } else {
                             error!(
                                 "Failed to delete object during delete stream: {:?}",
                                 delete_resp
                             );
                         }
                     }

Based on learnings: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted.

src/storage/azure_blob.rs (1)

215-239: ⚠️ Potential issue | 🟡 Minor

Record object-store call metrics only after a successful GET.
increment_object_store_calls_by_date runs even when GET fails (or the body read fails), skewing metrics. Move it into the success path after the body is read so only successful operations are counted.

🔧 Suggested fix
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
-
-        match resp {
-            Ok(resp) => {
-                let body: Bytes = resp.bytes().await?;
-                increment_files_scanned_in_object_store_calls_by_date(
-                    "GET",
-                    1,
-                    &Utc::now().date_naive().to_string(),
-                    tenant,
-                );
-                increment_bytes_scanned_in_object_store_calls_by_date(
-                    "GET",
-                    body.len() as u64,
-                    &Utc::now().date_naive().to_string(),
-                    tenant,
-                );
-                Ok(body)
-            }
-            Err(err) => Err(err.into()),
-        }
+        let resp = self.client.get(&to_object_store_path(path)).await;
+        match resp {
+            Ok(resp) => {
+                let body: Bytes = resp.bytes().await?;
+                increment_object_store_calls_by_date(
+                    "GET",
+                    &Utc::now().date_naive().to_string(),
+                    tenant,
+                );
+                increment_files_scanned_in_object_store_calls_by_date(
+                    "GET",
+                    1,
+                    &Utc::now().date_naive().to_string(),
+                    tenant,
+                );
+                increment_bytes_scanned_in_object_store_calls_by_date(
+                    "GET",
+                    body.len() as u64,
+                    &Utc::now().date_naive().to_string(),
+                    tenant,
+                );
+                Ok(body)
+            }
+            Err(err) => Err(err.into()),
+        }
Based on learnings: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted. The increment_object_store_calls_by_date() calls should be placed after the await? to ensure they only execute on successful operations.
src/storage/s3.rs (2)

924-958: ⚠️ Potential issue | 🟡 Minor

Use DEFAULT_TENANT label consistently for HEAD metrics.

Line 956 uses an empty string for the tenant label, while the rest of list_old_streams uses tenant_str (DEFAULT_TENANT). This splits metrics across two labels for the same tenant.

🔧 Proposed fix
-                increment_object_store_calls_by_date(
-                    "HEAD",
-                    &Utc::now().date_naive().to_string(),
-                    "",
-                );
+                increment_object_store_calls_by_date(
+                    "HEAD",
+                    &Utc::now().date_naive().to_string(),
+                    tenant_str,
+                );

973-1064: ⚠️ Potential issue | 🔴 Critical

Include tenant prefix in list_dates/list_hours/list_minutes path construction.

The stream_name parameter is unprefixed (see streams.list() which returns plain names keyed by tenant). In multi-tenant mode, the paths must be prefixed with tenant/{stream_name} when tenant_id is provided, consistent with the pattern already used in delete_stream (line 874-889). Currently, list_dates (via _list_dates), list_hours, and list_minutes build paths without the tenant prefix, causing them to target incorrect paths. The extracted tenant_str is only used for metrics; add it to the path format. This pattern is consistent across all storage backends (S3, GCS, Azure, LocalFS).

@parmesant parmesant force-pushed the multi-tenancy-review branch from 327d44a to d2c9634 Compare February 13, 2026 13:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/storage/azure_blob.rs (1)

214-243: ⚠️ Potential issue | 🟡 Minor

Move object‑store call counters into success paths.

Across these methods, increment_object_store_calls_by_date fires before confirming a successful Result (or regardless of delete success), so failed operations are counted. Gate the increment on Ok (and only count DELETE when delete_resp succeeds).

Proposed fix (example for GET)
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
-
-        match resp {
-            Ok(resp) => {
+        let resp = self.client.get(&to_object_store_path(path)).await;
+        match resp {
+            Ok(resp) => {
+                increment_object_store_calls_by_date(
+                    "GET",
+                    &Utc::now().date_naive().to_string(),
+                    tenant,
+                );
                 let body: Bytes = resp.bytes().await?;

Based on learnings, metrics should only be captured when operations succeed, not when they are attempted.

Also applies to: 245-266, 268-319, 321-357, 359-381, 383-472, 499-517, 639-661, 663-680, 697-722, 756-765, 896-926, 928-959

src/storage/gcs.rs (1)

179-207: ⚠️ Potential issue | 🟡 Minor

Only increment object‑store metrics on successful operations.

These blocks increment call counters before confirming success, so failed calls are counted. Move increment_object_store_calls_by_date into the Ok path (or after ?) and only count DELETE on success.

Proposed fix (example for GET)
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
-        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
-        match resp {
-            Ok(resp) => {
+        let resp = self.client.get(&to_object_store_path(path)).await;
+        let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+        match resp {
+            Ok(resp) => {
+                increment_object_store_calls_by_date(
+                    "GET",
+                    &Utc::now().date_naive().to_string(),
+                    tenant,
+                );
                 let body: Bytes = resp.bytes().await?;

Based on learnings, metrics should only be captured when operations succeed, not when they are attempted.

Also applies to: 209-230, 232-283, 285-321, 323-345, 347-438, 441-470, 481-499, 621-643, 645-663, 680-705, 738-747, 874-905, 907-938

src/storage/s3.rs (2)

342-374: ⚠️ Potential issue | 🟡 Minor

Gate object‑store call metrics on success.

These blocks increment counters before confirming success, which counts failed calls. Move increment_object_store_calls_by_date into the Ok path (or after ?) and only count DELETE on success.

Proposed fix (example for GET)
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        increment_object_store_calls_by_date(
-            "GET",
-            &Utc::now().date_naive().to_string(),
-            tenant_str,
-        );
-
-        match resp {
-            Ok(resp) => {
+        let resp = self.client.get(&to_object_store_path(path)).await;
+        match resp {
+            Ok(resp) => {
+                increment_object_store_calls_by_date(
+                    "GET",
+                    &Utc::now().date_naive().to_string(),
+                    tenant_str,
+                );
                 let body = resp.bytes().await?;

Based on learnings, metrics should only be captured when operations succeed, not when they are attempted.

Also applies to: 376-401, 403-460, 462-502, 504-531, 533-626, 629-662, 673-695, 826-848, 850-872, 889-915, 1093-1127, 1129-1165


924-958: ⚠️ Potential issue | 🟡 Minor

Avoid empty‑tenant metrics labels in list_old_streams HEAD checks.

The HEAD metric uses "" and is emitted before checking success. Use tenant_str (DEFAULT_TENANT) and increment only on Ok.

Proposed fix
-                let result = self.client.head(&StorePath::from(key)).await;
-                increment_object_store_calls_by_date(
-                    "HEAD",
-                    &Utc::now().date_naive().to_string(),
-                    "",
-                );
-                result.map(|_| ())
+                let result = self.client.head(&StorePath::from(key)).await;
+                if result.is_ok() {
+                    increment_object_store_calls_by_date(
+                        "HEAD",
+                        &Utc::now().date_naive().to_string(),
+                        tenant_str,
+                    );
+                }
+                result.map(|_| ())

Based on learnings, metrics should only be captured when operations succeed, not when they are attempted, and default-tenant paths use empty segments only for path creation (metrics labeling is the concern).

@parmesant parmesant force-pushed the multi-tenancy-review branch 3 times, most recently from 5831e15 to fb5fa26 Compare February 16, 2026 10:38
@parmesant parmesant force-pushed the multi-tenancy-review branch from fb5fa26 to 0838989 Compare February 16, 2026 10:54
@parmesant parmesant force-pushed the multi-tenancy-review branch from 97ef6dd to e8b1912 Compare February 20, 2026 11:47
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
src/handlers/http/cluster/mod.rs (1)

671-712: ⚠️ Potential issue | 🟡 Minor

Inconsistent auth handling pattern.

Unlike sync_users_with_roles_with_ingestors, sync_user_deletion_with_ingestors, and sync_user_creation, this function uses ingestor.token directly (line 688) instead of create_intracluster_auth_headermap. This inconsistency could cause issues if the headermap approach is needed to propagate tenant context or other auth headers.

🔧 Suggested fix for consistency
 pub async fn sync_password_reset_with_ingestors(
     req: HttpRequest,
     username: &str,
 ) -> Result<(), RBACError> {
     let username = username.to_owned();
     let tenant_id = get_tenant_id_from_request(&req);
-    for_each_live_node(&tenant_id, move |ingestor| {
+    let headers = req.headers().clone();
+    for_each_live_node(&tenant_id, move |ingestor| {
         let url = format!(
             "{}{}/user/{}/generate-new-password/sync",
             ingestor.domain_name,
             base_path_without_preceding_slash(),
             username
         );
 
+        let headermap = create_intracluster_auth_headermap(&headers, &ingestor.token);
         async move {
             let res = INTRA_CLUSTER_CLIENT
                 .post(url)
-                .header(header::AUTHORIZATION, &ingestor.token)
+                .headers(headermap)
                 .header(header::CONTENT_TYPE, "application/json")
                 .send()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/cluster/mod.rs` around lines 671 - 712, The function
sync_password_reset_with_ingestors is using ingestor.token directly when
building the request instead of the project's standard
create_intracluster_auth_headermap, causing inconsistent auth/tenant
propagation; update the async closure inside for_each_live_node to call
create_intracluster_auth_headermap (passing the ingestor/token and tenant
context as used elsewhere) and attach that header map to the
INTRA_CLUSTER_CLIENT POST (instead of .header(header::AUTHORIZATION,
&ingestor.token)), keeping the rest of the request flow the same so tenant
headers and any additional auth headers are propagated consistently like in
sync_users_with_roles_with_ingestors and sync_user_creation.
src/handlers/http/modal/server.rs (1)

74-105: ⚠️ Potential issue | 🟠 Major

Missing /roles list endpoint registration in main server.

get_roles_webscope() is defined at line 581-584 but not registered in configure_routes() (lines 74-105). This means the GET /roles endpoint will return 404 from the main server, while /role/{name} and other CRUD operations remain available. If clients depend on /roles, this is a breaking change.

Note: The /roles listing endpoint is still available in query_server.rs and ingest_server.rs, so the impact depends on whether the main server is the primary HTTP handler for this endpoint.

🛠️ Suggested fix
                    .service(Self::get_oauth_webscope())
+                   .service(Self::get_roles_webscope())
                    .service(Self::get_user_role_webscope())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/modal/server.rs` around lines 74 - 105, configure_routes is
missing registration of the roles listing scope causing GET /roles to 404; add
Self::get_roles_webscope() to the chain of .service(...) calls inside
configure_routes (the same web::scope(&base_path()) block that registers other
scopes like get_user_webscope and get_users_webscope) so the roles list endpoint
is exposed by the main server; ensure you place Self::get_roles_webscope()
alongside the other .service(...) entries and apply the same middleware wrapping
if required (e.g., resource_check) to match behavior in
query_server.rs/ingest_server.rs.
src/parseable/streams.rs (1)

1480-1507: ⚠️ Potential issue | 🟡 Minor

Fix tenant label values in staging metrics test.

Metrics now use DEFAULT_TENANT when tenant_id is None, but the test asserts "tenant_id". This will fail.

🛠️ Proposed fix
-        let storage_size_arrows = metrics::STORAGE_SIZE
-            .with_label_values(&["staging", &stream, "arrows", "tenant_id"])
+        let storage_size_arrows = metrics::STORAGE_SIZE
+            .with_label_values(&["staging", &stream, "arrows", DEFAULT_TENANT])
             .get();
@@
-        let storage_size_parquet = metrics::STORAGE_SIZE
-            .with_label_values(&["staging", &stream, "parquet", "tenant_id"])
+        let storage_size_parquet = metrics::STORAGE_SIZE
+            .with_label_values(&["staging", &stream, "parquet", DEFAULT_TENANT])
             .get();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parseable/streams.rs` around lines 1480 - 1507, The test
test_convert_to_parquet_with_empty_staging asserts storage metrics using the
literal "tenant_id" label but metrics now use DEFAULT_TENANT when tenant_id is
None; update the label arrays passed to metrics::STORAGE_SIZE.with_label_values
for both "arrows" and "parquet" checks to use DEFAULT_TENANT instead of the
string "tenant_id" so the assertions match the actual label value.
src/users/dashboards.rs (1)

244-312: ⚠️ Potential issue | 🟠 Major

Set dashboard.tenant_id during create/update to avoid cross-tenant persistence.

get_object_path() depends on dashboard.tenant_id, but create/update never set it. If callers omit it, dashboards can be stored under the default tenant (or lose tenant metadata). Set it from tenant_id before saving.

🛠️ Proposed fix
 pub async fn create(
     &self,
     user_id: &str,
     dashboard: &mut Dashboard,
     tenant_id: &Option<String>,
 ) -> Result<(), DashboardError> {
     dashboard.created = Some(Utc::now());
+    dashboard.tenant_id = tenant_id.clone();
     dashboard.set_metadata(user_id, None);
     let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
     let mut dashboards = self.0.write().await;
@@
 pub async fn update(
     &self,
     user_id: &str,
     dashboard_id: Ulid,
     dashboard: &mut Dashboard,
     tenant_id: &Option<String>,
 ) -> Result<(), DashboardError> {
     let mut dashboards = self.0.write().await;
+    dashboard.tenant_id = tenant_id.clone();
     let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
     if let Some(dbs) = dashboards.get_mut(tenant) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/users/dashboards.rs` around lines 244 - 312, The create and update paths
never assign dashboard.tenant_id, but get_object_path() relies on it so saved
dashboards may end up in the wrong tenant; before calling
self.save_dashboard(...) in both create(...) and update(...), set
dashboard.tenant_id = Some(tenant.to_string()) (using the tenant variable
derived from tenant_id.as_deref().unwrap_or(DEFAULT_TENANT)) so the dashboard
carries the correct tenant when save_dashboard and get_object_path are used.
src/migration/mod.rs (1)

543-564: ⚠️ Potential issue | 🟠 Major

Create the tenant staging directory before writing metadata.

staging_dir/<tenant>/.parseable.json will fail if the tenant directory doesn’t exist. Create the parent directory before OpenOptions::open.

🛠️ Proposed fix
 pub fn put_staging_metadata(
     config: &Parseable,
     metadata: &serde_json::Value,
     tenant_id: &Option<String>,
 ) -> anyhow::Result<()> {
     let path = if let Some(tenant) = tenant_id.as_ref() {
         config
             .options
             .staging_dir()
             .join(tenant)
             .join(".parseable.json")
     } else {
         config.options.staging_dir().join(".parseable.json")
     };
+    if let Some(parent) = path.parent() {
+        std::fs::create_dir_all(parent)?;
+    }
     let mut file = OpenOptions::new()
         .create(true)
         .truncate(true)
         .write(true)
         .open(path)?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/migration/mod.rs` around lines 543 - 564, The put_staging_metadata
function may fail when writing staging_dir/<tenant>/.parseable.json because the
tenant directory might not exist; before calling OpenOptions::open in
put_staging_metadata, ensure the parent directory exists by computing the target
path (the same path variable) and calling std::fs::create_dir_all on
path.parent() (or staging_dir for the no-tenant case) to create the tenant
staging directory(s) if needed, then proceed to
OpenOptions::new().create(true).truncate(true).write(true).open(path) and write
the JSON as before.
src/rbac/user.rs (1)

336-421: ⚠️ Potential issue | 🟠 Major

Normalize tenant comparison and avoid warn-level logging in UserGroup::validate.

user.tenant.eq(tenant_id) treats None vs Some(DEFAULT_TENANT) as different, which can reject valid default-tenant users. Also, tracing::warn!(group_user=?) logs user identifiers at warn level on every validation. Normalize tenant IDs and drop (or lower) the log.

🛠️ Proposed fix
             if let Some(tenant_users) = users().get(tenant) {
                 for group_user in &self.users {
-                    tracing::warn!(group_user=?group_user);
-                    if let Some(user) = tenant_users.get(group_user.userid())
-                        && !user.protected
-                        && user.tenant.eq(tenant_id)
-                    {
-                        // user is alright
-                    } else {
-                        non_existent_users.push(group_user.userid().to_string());
-                    }
+                    let group_tenant =
+                        group_user.tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+                    if group_tenant != tenant {
+                        non_existent_users.push(group_user.userid().to_string());
+                        continue;
+                    }
+                    if let Some(user) = tenant_users.get(group_user.userid())
+                        && !user.protected
+                        && user.tenant.as_deref().unwrap_or(DEFAULT_TENANT) == tenant
+                    {
+                        // user is alright
+                    } else {
+                        non_existent_users.push(group_user.userid().to_string());
+                    }
                 }
             } else {
                 non_existent_users.extend(self.users.iter().map(|u| u.userid().to_string()));
             }
Based on learnings: RBAC UserGroup is tenant-scoped; groups must not contain users from different tenants.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rbac/user.rs` around lines 336 - 421, UserGroup::validate currently logs
each group_user at warn level and compares user.tenant to tenant_id directly,
causing mismatches when tenant_id is None vs DEFAULT_TENANT; remove or lower the
tracing::warn!(group_user=?group_user) call and normalize tenant values before
comparison (use the same tenant variable you derived via
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) — e.g., compute tenant =
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) once and compare
user.tenant.as_deref().unwrap_or(DEFAULT_TENANT) == tenant) when validating
users in users().get(tenant) so default-tenant users match correctly and
protected-user logic remains unchanged (keep checks for user.protected and
tenant equality).
src/rbac/map.rs (1)

256-273: ⚠️ Potential issue | 🟠 Major

Fix session tracking: you’re dropping active sessions and retaining expired ones.
track_new overwrites any existing sessions for the user, and remove_expired_session keeps expired entries (expiry < now). This can leave old sessions in active_sessions and prevents proper expiry.

🔧 Suggested fix
-        let sessions = self.user_sessions.entry(tenant_id.to_owned()).or_default();
-        sessions.insert(user.clone(), vec![(key.clone(), expiry)]);
+        let sessions = self.user_sessions.entry(tenant_id.to_owned()).or_default();
+        sessions.entry(user.clone()).or_default().push((key.clone(), expiry));
@@
-        sessions.retain(|(_, expiry)| expiry < &now);
+        sessions.retain(|(_, expiry)| expiry > &now);

Also applies to: 303-314

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rbac/map.rs` around lines 256 - 273, track_new currently overwrites a
user's session list and remove_expired_session keeps expired entries, causing
stale entries in active_sessions; update remove_expired_session to retain only
non-expired entries (expiry > now) and remove any corresponding keys from
active_sessions for sessions that are dropped, and change track_new to append
the new (key, expiry) to the user's Vec in user_sessions (use
user_sessions.entry(...).or_default().push(...)) instead of replacing it,
ensuring active_sessions.insert still records the key with (user, tenant_id,
permissions).
src/storage/object_storage.rs (1)

1120-1138: ⚠️ Potential issue | 🟠 Major

Normalize default-tenant identifiers when building storage paths.
When tenant_id is Some(DEFAULT_TENANT) or Some(""), these helpers can produce a prefixed path or a leading slash. Default tenant storage should remain at the root.

✅ Suggested normalization helper
+fn tenant_prefix(tenant_id: &Option<String>) -> Option<&str> {
+    tenant_id
+        .as_deref()
+        .filter(|t| !t.is_empty() && *t != DEFAULT_TENANT)
+}
@@
-    if let Some(tenant) = tenant_id
-        && !tenant.eq(DEFAULT_TENANT)
-    {
+    if let Some(tenant) = tenant_prefix(tenant_id) {
         format!("{tenant}/{stream_name}/{file_suffix}")
     } else {
         format!("{stream_name}/{file_suffix}")
     }
@@
-    let tenant = tenant_id.as_deref().unwrap_or("");
+    let tenant = tenant_prefix(tenant_id).unwrap_or("");
Based on learnings: default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created.

Also applies to: 1202-1309

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/object_storage.rs` around lines 1120 - 1138, The path builder
stream_relative_path currently treats Some(DEFAULT_TENANT) or Some("") as a real
tenant and can produce prefixed or leading-slash paths; update the tenant
normalization so that tenant_id values that are empty or equal to DEFAULT_TENANT
are treated as None (root) before formatting. In practice, inside
stream_relative_path (and the similar helpers in the 1202-1309 range) normalize
tenant_id by trimming/inspecting the Option<String> and only include the tenant
segment when it is Some(t) and t is non-empty and not equal to DEFAULT_TENANT,
then build the path using the existing format logic
(format!("{tenant}/{stream_name}/{file_suffix}") vs
format!("{stream_name}/{file_suffix}")). Ensure comparisons use DEFAULT_TENANT
symbol and handle whitespace/empty-string cases to avoid a stray leading slash.
src/storage/s3.rs (2)

767-805: ⚠️ Potential issue | 🟠 Major

Tenant scoping is ignored when listing ingestor meta files.
get_ingestor_meta_file_paths lists the root and then filters by filename, which can pull paths across tenants in multi-tenant deployments. This should list under the tenant prefix when provided.

🔧 Possible fix
-        let mut object_stream = self.client.list(Some(&self.root));
+        let prefix = tenant_id
+            .as_deref()
+            .filter(|t| !t.is_empty() && *t != DEFAULT_TENANT)
+            .map(|t| StorePath::from(format!("{t}/")));
+        let mut object_stream = self.client.list(prefix.as_ref().unwrap_or(&self.root));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/s3.rs` around lines 767 - 805, get_ingestor_meta_file_paths
currently lists from self.root and then filters by filename, which ignores
tenant scope; change the listing root to include the tenant prefix using
tenant_str (e.g., build a list_root by joining self.root and tenant_str or using
DEFAULT_TENANT when tenant_id is None) and call
self.client.list(Some(&list_root)) instead of
self.client.list(Some(&self.root)); keep the rest of the logic (flag check,
files_scanned, metrics) but ensure the listed paths come from the tenant-scoped
prefix so results are limited to that tenant.

462-507: ⚠️ Potential issue | 🟠 Major

Normalize DEFAULT_TENANT before building S3 prefixes.
When tenant_id is Some(DEFAULT_TENANT), these functions will incorrectly prefix paths with DEFAULT_TENANT/, but default-tenant storage is rooted at the bucket prefix (no extra directory).

✅ Suggested normalization
-        let (prefix, tenant_str) = if let Some(tenant) = tenant_id {
-            (format!("{}/{}/", tenant, stream), tenant.as_str())
+        let tenant_norm = tenant_id
+            .as_deref()
+            .filter(|t| !t.is_empty() && *t != DEFAULT_TENANT);
+        let (prefix, tenant_str) = if let Some(tenant) = tenant_norm {
+            (format!("{}/{}/", tenant, stream), tenant)
         } else {
             (format!("{}/", stream), DEFAULT_TENANT)
         };
Based on learnings: default-tenant paths are intentionally built with an empty string segment (""), so no extra tenant directory is created.

Also applies to: 878-891, 987-1031, 1033-1080

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/s3.rs` around lines 462 - 507, The code builds S3 prefixes using
tenant_id directly, causing paths like "DEFAULT_TENANT/..." for the default
tenant; fix by normalizing the tenant segment before composing prefixes: in
_list_dates (and the other prefix-building functions mentioned) derive two
values from tenant_id — tenant_str_for_metrics =
tenant.as_deref().unwrap_or(DEFAULT_TENANT) (used for metrics) and
tenant_segment = if tenant_str_for_metrics == DEFAULT_TENANT { "" } else {
tenant_str_for_metrics } (used when formatting the prefix like format!("{}/{}/",
tenant_segment, stream) or format!("{}/", stream) when tenant_segment is empty)
so the default tenant produces no extra directory segment while metrics still
report DEFAULT_TENANT.
🧹 Nitpick comments (8)
src/handlers/http/cluster/mod.rs (4)

581-584: Remove commented-out code.

Same cleanup needed here as in the previous function.

🧹 Proposed cleanup
             let res = INTRA_CLUSTER_CLIENT
                 .delete(url)
-                // .header(header::AUTHORIZATION, &ingestor.token)
-                .headers(headermap)
+                .headers(headermap)
                 .send()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/cluster/mod.rs` around lines 581 - 584, Remove the leftover
commented-out authorization header in the INTRA_CLUSTER_CLIENT delete call:
delete the line "// .header(header::AUTHORIZATION, &ingestor.token)" so the
request chain uses .headers(headermap) only; locate the call to
INTRA_CLUSTER_CLIENT.delete(url) in this module and remove that commented line
(same cleanup as the previous function).

382-386: Function name sync_streams_with_ingestors no longer accurate.

This function now uses for_each_live_node which syncs to both ingestors and queriers. The function name and closure parameter ingestor are misleading. Consider renaming to sync_streams_with_nodes for clarity.

Also applies to: 403-404

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/cluster/mod.rs` around lines 382 - 386, The function name
sync_streams_with_ingestors and the closure parameter ingestor are now
misleading because the implementation uses for_each_live_node to act on both
ingestors and queriers; rename the function to sync_streams_with_nodes and
update the closure parameter name (e.g., from ingestor to node or target_node)
wherever referenced to reflect it operates on nodes generally (update function
declaration, all internal uses, and any callers). Also update any
comments/docstrings mentioning ingestors to mention nodes, and ensure references
to for_each_live_node remain correct after the rename.

536-538: Remove commented-out code.

This appears to be leftover from refactoring. Commented-out code should be removed to keep the codebase clean.

🧹 Proposed cleanup
             let res = INTRA_CLUSTER_CLIENT
                 .patch(url)
                 .headers(headermap)
-                // .header(header::AUTHORIZATION, &ingestor.token)
                 .header(header::CONTENT_TYPE, "application/json")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/cluster/mod.rs` around lines 536 - 538, Remove the leftover
commented-out authorization header in the request builder chain: delete the
commented line ".header(header::AUTHORIZATION, &ingestor.token)" that appears
alongside ".headers(headermap)" and ".header(header::CONTENT_TYPE,
\"application/json\")" (inside the HTTP request construction where the ingestor
token was previously used), leaving only active header calls to keep the code
clean.

366-371: Minor naming inconsistency: variable still named ingestor but represents any node type.

The closure parameter is still named ingestor but for_each_live_node now iterates over both ingestors and queriers. Consider renaming for clarity.

♻️ Suggested rename
     // Process all live ingestors in parallel
-    let results = futures::future::join_all(live_nodes.into_iter().map(|ingestor| {
+    let results = futures::future::join_all(live_nodes.into_iter().map(|node| {
         let api_fn = api_fn.clone();
-        async move { api_fn(ingestor).await }
+        async move { api_fn(node).await }
     }))
     .await;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/cluster/mod.rs` around lines 366 - 371, The closure
parameter name `ingestor` is misleading because
`for_each_live_node`/`live_nodes` can contain different node types; rename the
parameter to a neutral identifier like `node` (and update any uses inside that
closure) in the mapping passed to futures::future::join_all so the code reads
something like live_nodes.into_iter().map(|node| { let api_fn = api_fn.clone();
async move { api_fn(node).await } }).await; ensure you update all occurrences of
`ingestor` in that closure to `node` to keep names consistent with
`for_each_live_node` and `live_nodes`.
src/handlers/http/modal/query/querier_role.rs (1)

69-76: Remove commented-out code or add a TODO.

This block appears to be placeholder logic. Either remove it or convert to a proper TODO with tracking:

-    // // iterate to find if a protected user has this role
-    // if let Some(users) = users().get(tenant) {
-    //     for user in users.values() {
-    //         if user.roles.contains(&name) && user.protected {
-    //             return Err(RoleError::RoleInUse);
-    //         }
-    //     }
-    // }
+    // TODO: Add protected user check if needed for role modification
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/modal/query/querier_role.rs` around lines 69 - 76, The
commented-out loop in querier_role.rs that checks protected users (calling
users(), iterating users.values(), checking user.roles.contains(&name) and
user.protected and returning RoleError::RoleInUse) should not remain as dead
code: either delete the entire commented block, or replace it with a single-line
TODO comment referencing a ticket/issue number and a short note like "validate
protected users before role deletion" so reviewers know it is intentional;
ensure the TODO mentions the involved symbols (users(), tenant,
RoleError::RoleInUse, user.roles) for context.
src/rbac/utils.rs (1)

49-67: LGTM with minor cleanup suggestion.

The tenant-scoped role lookup is correct. However, there's commented-out code (lines 56-61) that should be removed:

                 .filter(|roles| roles.get(role_name).is_some())
-                // .map(|roles| {
-                //     if let Some(role) = roles.get(role_name) {
-                //         (role_name.to_owned(), role.clone())
-                //     }
-                // })
-                // .get(role_name)
                 .map(|roles| {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rbac/utils.rs` around lines 49 - 67, Remove the dead commented-out block
inside the direct_roles builder: delete the commented .map / .get snippet
between the call to roles().get(tenant_id) and the active .map that unwraps role
into RoleUI so the chain remains as
Users.get_role(...).iter().filter_map(|role_name|
roles().get(tenant_id).filter(|roles|
roles.get(role_name).is_some()).map(|roles| { let role =
roles.get(role_name).unwrap(); (role_name.to_owned(), RoleUI(role.clone()))
})).collect(); This cleans up the commented code without changing behavior of
direct_roles, roles(), Users.get_role, or RoleUI.
src/handlers/http/middleware.rs (1)

382-394: Consider handling TenantNotFound explicitly.

When is_action_suspended returns Err(TenantNotFound), the function falls through to return Authorized. If a request includes a tenant header for a non-existent tenant, this silently authorizes the request.

Consider whether this should return an error or specific response for unknown tenants:

     if let Some(tenant) = req.headers().get("tenant")
         && let Ok(tenant) = tenant.to_str()
     {
-        if let Ok(Some(suspension)) = TENANT_METADATA.is_action_suspended(tenant, &action) {
-            return rbac::Response::Suspended(suspension);
-        } else {
-            // tenant does not exist
+        match TENANT_METADATA.is_action_suspended(tenant, &action) {
+            Ok(Some(suspension)) => return rbac::Response::Suspended(suspension),
+            Ok(None) => {}
+            Err(_) => {
+                // tenant does not exist in metadata - consider if this should error
+            }
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/middleware.rs` around lines 382 - 394, check_suspension
currently treats any Err from TENANT_METADATA.is_action_suspended as Authorized;
change it to explicitly match the Result and handle Err(TenantNotFound) by
returning a clear error response (e.g., rbac::Response::UnknownTenant or a new
variant you add) rather than falling through to Authorized; update the code
around check_suspension and the call to TENANT_METADATA.is_action_suspended to
match on Ok(Some), Ok(None), Err(TenantNotFound) and other Err cases, returning
rbac::Response::Suspended for suspensions, the new UnknownTenant (or appropriate
error) for TenantNotFound, and handling/logging other errors as needed.
src/storage/s3.rs (1)

341-401: Metrics are recorded even when the S3 call fails.
The increment happens before the result is checked, so failed GET/PUTs still count as successful operations.

💡 Example adjustment
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant_str);
+        let resp = self.client.get(&to_object_store_path(path)).await;
         match resp {
             Ok(resp) => {
+                increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant_str);
                 let body = resp.bytes().await?;
                 ...
Based on learnings: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/s3.rs` around lines 341 - 401, The metrics are being recorded
before checking the S3 result so failed calls are counted; update
S3::_get_object and S3::_put_object to only increment metrics after a successful
response: call self.client.get/put as now, but move
increment_object_store_calls_by_date (and for GET:
increment_files_scanned_in_object_store_calls_by_date and
increment_bytes_scanned_in_object_store_calls_by_date; for PUT:
increment_files_scanned_in_object_store_calls_by_date) into the Ok(resp) branch
(use resp.bytes().await? for GET to compute body.len()) and do not call any
increments in the Err(err) branch; keep tenant_str and to_object_store_path
usages as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/handlers/http/cluster/mod.rs`:
- Around line 1915-1920: The call using
HeaderValue::from_str(&querier.token).unwrap() can panic on malformed tokens;
replace the unwrap with proper error handling: call
HeaderValue::from_str(&querier.token) and match or use ? to propagate the error
(or convert it into an appropriate HTTP error/Result) before calling
map.insert(reqwest::header::AUTHORIZATION, header_value). Ensure you reference
and handle errors from reqwest::header::HeaderValue::from_str for querier.token
so the function returns an Err/HTTP error instead of panicking.

In `@src/handlers/http/modal/query/querier_rbac.rs`:
- Around line 61-67: The current loop using if let Some(tenant_roles) =
roles().get(tenant_id.as_deref().unwrap_or(DEFAULT_TENANT)) will skip roles when
the tenant has no entry; change the logic so a missing tenant is treated as
having no roles and thus all requested roles are considered non-existent. In the
loop over user_roles, use roles().get(...).map_or(true, |tenant_roles|
!tenant_roles.contains_key(role)) (or an equivalent that treats None as empty)
to decide when to push into non_existent_roles, referencing the roles() call,
tenant_id/DEFAULT_TENANT, user_roles and non_existent_roles variables.

In `@src/parseable/mod.rs`:
- Around line 1128-1167: The load_tenants logic incorrectly treats single-tenant
mode as an error path; update load_tenants so that parsing/metadata handling and
the tenants write lock are evaluated separately from the is_multi_tenant flag:
call PARSEABLE.metastore.get_parseable_metadata for each tenant_id and only
insert into TENANT_METADATA when that returns Some AND is_multi_tenant;
skip/continue for non-multi-tenant tenants instead of falling through to an
error; when acquiring self.tenants.write() (the write lock) do not require
is_multi_tenant to succeed—acquire the lock, and if is_multi_tenant then extend
with dirs, otherwise just return Ok(()) so single-tenant startup does not return
an error.

In `@src/prism/home/mod.rs`:
- Around line 254-260: get_dashboard_titles is currently returning all tenant
dashboards without user-level authorization; update its signature call site and
implementation to accept the key (like get_stream_titles does) and filter
results through user authorization (use Users.authorize or reuse
get_dashboard_by_user logic). Concretely, change the tokio::join() invocation to
pass key into get_dashboard_titles(key, query_value, tenant_id) (or the chosen
param order), then inside get_dashboard_titles call Users.authorize(...) (or
call/replicate get_dashboard_by_user filtering) to remove dashboards the user
cannot access (allow full tenant access only for admin users), and return the
filtered list.

---

Outside diff comments:
In `@src/handlers/http/cluster/mod.rs`:
- Around line 671-712: The function sync_password_reset_with_ingestors is using
ingestor.token directly when building the request instead of the project's
standard create_intracluster_auth_headermap, causing inconsistent auth/tenant
propagation; update the async closure inside for_each_live_node to call
create_intracluster_auth_headermap (passing the ingestor/token and tenant
context as used elsewhere) and attach that header map to the
INTRA_CLUSTER_CLIENT POST (instead of .header(header::AUTHORIZATION,
&ingestor.token)), keeping the rest of the request flow the same so tenant
headers and any additional auth headers are propagated consistently like in
sync_users_with_roles_with_ingestors and sync_user_creation.

In `@src/handlers/http/modal/server.rs`:
- Around line 74-105: configure_routes is missing registration of the roles
listing scope causing GET /roles to 404; add Self::get_roles_webscope() to the
chain of .service(...) calls inside configure_routes (the same
web::scope(&base_path()) block that registers other scopes like
get_user_webscope and get_users_webscope) so the roles list endpoint is exposed
by the main server; ensure you place Self::get_roles_webscope() alongside the
other .service(...) entries and apply the same middleware wrapping if required
(e.g., resource_check) to match behavior in query_server.rs/ingest_server.rs.

In `@src/migration/mod.rs`:
- Around line 543-564: The put_staging_metadata function may fail when writing
staging_dir/<tenant>/.parseable.json because the tenant directory might not
exist; before calling OpenOptions::open in put_staging_metadata, ensure the
parent directory exists by computing the target path (the same path variable)
and calling std::fs::create_dir_all on path.parent() (or staging_dir for the
no-tenant case) to create the tenant staging directory(s) if needed, then
proceed to OpenOptions::new().create(true).truncate(true).write(true).open(path)
and write the JSON as before.

In `@src/parseable/streams.rs`:
- Around line 1480-1507: The test test_convert_to_parquet_with_empty_staging
asserts storage metrics using the literal "tenant_id" label but metrics now use
DEFAULT_TENANT when tenant_id is None; update the label arrays passed to
metrics::STORAGE_SIZE.with_label_values for both "arrows" and "parquet" checks
to use DEFAULT_TENANT instead of the string "tenant_id" so the assertions match
the actual label value.

In `@src/rbac/map.rs`:
- Around line 256-273: track_new currently overwrites a user's session list and
remove_expired_session keeps expired entries, causing stale entries in
active_sessions; update remove_expired_session to retain only non-expired
entries (expiry > now) and remove any corresponding keys from active_sessions
for sessions that are dropped, and change track_new to append the new (key,
expiry) to the user's Vec in user_sessions (use
user_sessions.entry(...).or_default().push(...)) instead of replacing it,
ensuring active_sessions.insert still records the key with (user, tenant_id,
permissions).

In `@src/rbac/user.rs`:
- Around line 336-421: UserGroup::validate currently logs each group_user at
warn level and compares user.tenant to tenant_id directly, causing mismatches
when tenant_id is None vs DEFAULT_TENANT; remove or lower the
tracing::warn!(group_user=?group_user) call and normalize tenant values before
comparison (use the same tenant variable you derived via
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) — e.g., compute tenant =
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) once and compare
user.tenant.as_deref().unwrap_or(DEFAULT_TENANT) == tenant) when validating
users in users().get(tenant) so default-tenant users match correctly and
protected-user logic remains unchanged (keep checks for user.protected and
tenant equality).

In `@src/storage/object_storage.rs`:
- Around line 1120-1138: The path builder stream_relative_path currently treats
Some(DEFAULT_TENANT) or Some("") as a real tenant and can produce prefixed or
leading-slash paths; update the tenant normalization so that tenant_id values
that are empty or equal to DEFAULT_TENANT are treated as None (root) before
formatting. In practice, inside stream_relative_path (and the similar helpers in
the 1202-1309 range) normalize tenant_id by trimming/inspecting the
Option<String> and only include the tenant segment when it is Some(t) and t is
non-empty and not equal to DEFAULT_TENANT, then build the path using the
existing format logic (format!("{tenant}/{stream_name}/{file_suffix}") vs
format!("{stream_name}/{file_suffix}")). Ensure comparisons use DEFAULT_TENANT
symbol and handle whitespace/empty-string cases to avoid a stray leading slash.

In `@src/storage/s3.rs`:
- Around line 767-805: get_ingestor_meta_file_paths currently lists from
self.root and then filters by filename, which ignores tenant scope; change the
listing root to include the tenant prefix using tenant_str (e.g., build a
list_root by joining self.root and tenant_str or using DEFAULT_TENANT when
tenant_id is None) and call self.client.list(Some(&list_root)) instead of
self.client.list(Some(&self.root)); keep the rest of the logic (flag check,
files_scanned, metrics) but ensure the listed paths come from the tenant-scoped
prefix so results are limited to that tenant.
- Around line 462-507: The code builds S3 prefixes using tenant_id directly,
causing paths like "DEFAULT_TENANT/..." for the default tenant; fix by
normalizing the tenant segment before composing prefixes: in _list_dates (and
the other prefix-building functions mentioned) derive two values from tenant_id
— tenant_str_for_metrics = tenant.as_deref().unwrap_or(DEFAULT_TENANT) (used for
metrics) and tenant_segment = if tenant_str_for_metrics == DEFAULT_TENANT { "" }
else { tenant_str_for_metrics } (used when formatting the prefix like
format!("{}/{}/", tenant_segment, stream) or format!("{}/", stream) when
tenant_segment is empty) so the default tenant produces no extra directory
segment while metrics still report DEFAULT_TENANT.

In `@src/users/dashboards.rs`:
- Around line 244-312: The create and update paths never assign
dashboard.tenant_id, but get_object_path() relies on it so saved dashboards may
end up in the wrong tenant; before calling self.save_dashboard(...) in both
create(...) and update(...), set dashboard.tenant_id = Some(tenant.to_string())
(using the tenant variable derived from
tenant_id.as_deref().unwrap_or(DEFAULT_TENANT)) so the dashboard carries the
correct tenant when save_dashboard and get_object_path are used.

---

Duplicate comments:
In `@src/handlers/http/role.rs`:
- Around line 186-209: get_default currently only reads DEFAULT_ROLE and may
return null for non-default tenants that aren't hydrated; update get_default
(and use get_tenant_id_from_request and DEFAULT_TENANT) to first check the
in-memory cache (DEFAULT_ROLE.read() and clone the Option<String>), and if
missing call get_metadata(&tenant_id).await? to fetch metadata.default_role,
backfill the cache via DEFAULT_ROLE.write().insert(tenant.to_owned(),
Some(role.clone())), and return the found role as serde_json::Value::String; if
still missing return serde_json::Value::Null wrapped in Ok(web::Json(...)) and
propagate RoleError from get_metadata.

In `@src/rbac/mod.rs`:
- Around line 306-324: The current mutate_request_with_basic_user uses parallel
find_any which can return a non-deterministic tenant when identical credentials
exist across tenants; change the logic in mutate_request_with_basic_user (and
the analogous method around lines 331-343) to perform a deterministic scan:
iterate tenants in a stable order, collect all tenants where the provided
username/password match and user.tenant.is_some(), and only insert the tenant
header if exactly one tenant matched; if zero or multiple matches, do not set
the header (or handle as an explicit ambiguity/error) to avoid routing users to
the wrong tenant.

In `@src/utils/mod.rs`:
- Around line 119-123: The function get_tenant_id_from_request uses
HeaderValue::to_str().unwrap(), which can panic on invalid header bytes; change
it to handle the error and return None instead of crashing — e.g., replace the
unwrap with a safe conversion such as to_str().ok() (or match on to_str()) when
reading req.headers().get("tenant") so get_tenant_id_from_request returns None
on invalid UTF-8 header values rather than panicking.
- Around line 70-81: The function mutate_request_with_tenant currently calls
basic.password().unwrap() which can panic if the BasicAuth password is missing;
change the BasicAuth branch in mutate_request_with_tenant to safely handle the
optional password by matching or using if let Some(password) = basic.password()
and only call Users.mutate_request_with_basic_user(basic.user_id(), password,
req) when a password exists, otherwise skip or return early (and optionally log
or set an unauthorized response); keep the existing SessionKey/session cookie
branch (sessions().mutate_request_with_tenant(...)) unchanged.

---

Nitpick comments:
In `@src/handlers/http/cluster/mod.rs`:
- Around line 581-584: Remove the leftover commented-out authorization header in
the INTRA_CLUSTER_CLIENT delete call: delete the line "//
.header(header::AUTHORIZATION, &ingestor.token)" so the request chain uses
.headers(headermap) only; locate the call to INTRA_CLUSTER_CLIENT.delete(url) in
this module and remove that commented line (same cleanup as the previous
function).
- Around line 382-386: The function name sync_streams_with_ingestors and the
closure parameter ingestor are now misleading because the implementation uses
for_each_live_node to act on both ingestors and queriers; rename the function to
sync_streams_with_nodes and update the closure parameter name (e.g., from
ingestor to node or target_node) wherever referenced to reflect it operates on
nodes generally (update function declaration, all internal uses, and any
callers). Also update any comments/docstrings mentioning ingestors to mention
nodes, and ensure references to for_each_live_node remain correct after the
rename.
- Around line 536-538: Remove the leftover commented-out authorization header in
the request builder chain: delete the commented line
".header(header::AUTHORIZATION, &ingestor.token)" that appears alongside
".headers(headermap)" and ".header(header::CONTENT_TYPE, \"application/json\")"
(inside the HTTP request construction where the ingestor token was previously
used), leaving only active header calls to keep the code clean.
- Around line 366-371: The closure parameter name `ingestor` is misleading
because `for_each_live_node`/`live_nodes` can contain different node types;
rename the parameter to a neutral identifier like `node` (and update any uses
inside that closure) in the mapping passed to futures::future::join_all so the
code reads something like live_nodes.into_iter().map(|node| { let api_fn =
api_fn.clone(); async move { api_fn(node).await } }).await; ensure you update
all occurrences of `ingestor` in that closure to `node` to keep names consistent
with `for_each_live_node` and `live_nodes`.

In `@src/handlers/http/middleware.rs`:
- Around line 382-394: check_suspension currently treats any Err from
TENANT_METADATA.is_action_suspended as Authorized; change it to explicitly match
the Result and handle Err(TenantNotFound) by returning a clear error response
(e.g., rbac::Response::UnknownTenant or a new variant you add) rather than
falling through to Authorized; update the code around check_suspension and the
call to TENANT_METADATA.is_action_suspended to match on Ok(Some), Ok(None),
Err(TenantNotFound) and other Err cases, returning rbac::Response::Suspended for
suspensions, the new UnknownTenant (or appropriate error) for TenantNotFound,
and handling/logging other errors as needed.

In `@src/handlers/http/modal/query/querier_role.rs`:
- Around line 69-76: The commented-out loop in querier_role.rs that checks
protected users (calling users(), iterating users.values(), checking
user.roles.contains(&name) and user.protected and returning
RoleError::RoleInUse) should not remain as dead code: either delete the entire
commented block, or replace it with a single-line TODO comment referencing a
ticket/issue number and a short note like "validate protected users before role
deletion" so reviewers know it is intentional; ensure the TODO mentions the
involved symbols (users(), tenant, RoleError::RoleInUse, user.roles) for
context.

In `@src/rbac/utils.rs`:
- Around line 49-67: Remove the dead commented-out block inside the direct_roles
builder: delete the commented .map / .get snippet between the call to
roles().get(tenant_id) and the active .map that unwraps role into RoleUI so the
chain remains as Users.get_role(...).iter().filter_map(|role_name|
roles().get(tenant_id).filter(|roles|
roles.get(role_name).is_some()).map(|roles| { let role =
roles.get(role_name).unwrap(); (role_name.to_owned(), RoleUI(role.clone()))
})).collect(); This cleans up the commented code without changing behavior of
direct_roles, roles(), Users.get_role, or RoleUI.

In `@src/storage/s3.rs`:
- Around line 341-401: The metrics are being recorded before checking the S3
result so failed calls are counted; update S3::_get_object and S3::_put_object
to only increment metrics after a successful response: call self.client.get/put
as now, but move increment_object_store_calls_by_date (and for GET:
increment_files_scanned_in_object_store_calls_by_date and
increment_bytes_scanned_in_object_store_calls_by_date; for PUT:
increment_files_scanned_in_object_store_calls_by_date) into the Ok(resp) branch
(use resp.bytes().await? for GET to compute body.len()) and do not call any
increments in the Err(err) branch; keep tenant_str and to_object_store_path
usages as-is.

@parmesant parmesant force-pushed the multi-tenancy-review branch 4 times, most recently from 0ea8b0a to 480749d Compare February 23, 2026 10:13
@parmesant parmesant force-pushed the multi-tenancy-review branch from 480749d to d176a46 Compare February 23, 2026 10:21
@parmesant
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/rbac/map.rs (1)

256-315: ⚠️ Potential issue | 🟠 Major

Fix session tracking: overwrite + inverted expiry retain can orphan sessions.

track_new overwrites the user’s session vector, while remove_expired_session retains expired entries (expiry < now). Together, this can drop valid sessions from user_sessions while leaving stale ones in active_sessions, which disables expiry/cleanup.

🐛 Proposed fix
-        let sessions = self.user_sessions.entry(tenant_id.to_owned()).or_default();
-        sessions.insert(user.clone(), vec![(key.clone(), expiry)]);
+        let sessions = self.user_sessions.entry(tenant_id.to_owned()).or_default();
+        sessions.entry(user.clone()).or_default().push((key.clone(), expiry));
...
-        sessions.retain(|(_, expiry)| expiry < &now);
+        sessions.retain(|(_, expiry)| expiry > &now);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rbac/map.rs` around lines 256 - 315, track_new currently overwrites the
user's session vector and remove_expired_session keeps expired entries (expiry <
now), which can orphan entries in active_sessions; change track_new to
append/push the new (key, expiry) into the existing vector instead of replacing
it, update remove_expired_session to retain only non-expired entries (expiry >
now or >= now depending on desired semantics) and when dropping expired sessions
also remove their keys from active_sessions to avoid orphans, and ensure
remove_session/remove_user behavior remains consistent with these changes (use
functions track_new, remove_expired_session, remove_session, remove_user to
locate edits).
♻️ Duplicate comments (11)
Dockerfile.debug (1)

17-17: Validate Rust 1.93 debug builder compatibility.

Please confirm the new Rust image tag exists and doesn’t break local debug builds/MSRV expectations.

docker.io/rust:1.93.0-bookworm tag availability and Rust 1.93 compatibility notes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.debug` at line 17, The Dockerfile.debug change uses "FROM
docker.io/rust:1.93.0-bookworm AS builder" which must be validated; pull and
inspect that image tag (docker pull docker.io/rust:1.93.0-bookworm) and confirm
it exists and contains Rust 1.93, then run a local debug build using the
Dockerfile.debug to ensure no MSRV breakage; if the image is missing or Rust
version mismatches the project's MSRV (check Cargo.toml/rust-toolchain), either
update the Dockerfile.debug to a supported tag or revert to the previously used
builder image and document the compatibility decision.
Dockerfile.dev (1)

17-17: Check Rust 1.93 toolchain compatibility for dev builds.

Confirm the tag’s availability and MSRV alignment for developer workflows.

rust:1.93.0-bookworm Docker Hub tag availability and Rust 1.93 release notes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.dev` at line 17, The Dockerfile’s base image line "FROM
rust:1.93.0-bookworm AS builder" may reference a tag that isn’t available or may
exceed our MSRV; verify the rust:1.93.0-bookworm tag exists on Docker Hub and
confirm Rust 1.93 is compatible with the repo MSRV (check CI config and
Cargo.toml/rust-toolchain); if the tag is missing or incompatible, update
Dockerfile.dev to a valid, available tag that matches the project MSRV (or pin
to a tested toolchain like rust:1.93 or the MSRV exact tag), and add a brief
inline comment documenting the chosen toolchain and the verification step.
Dockerfile.kafka (1)

17-17: Confirm Rust 1.93 builder image availability and MSRV alignment.

Same concern as other Dockerfiles: validate the new Rust tag and compatibility with the project’s MSRV/tooling.

rust:1.93.0-bookworm Docker Hub tag availability and Rust 1.93 release notes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.kafka` at line 17, The Dockerfile uses the base image reference
"FROM rust:1.93.0-bookworm AS builder" which may not exist or may not match the
project's MSRV; verify the "rust:1.93.0-bookworm" tag is available on Docker Hub
and that Rust 1.93 is compatible with the project's MSRV/tooling (check
cargo.toml rust-version or project README), and then update the Dockerfile to a
confirmed tag (or fallback to the project MSRV tag) or add a comment documenting
the verified tag; ensure the unique symbol "FROM rust:1.93.0-bookworm AS
builder" is changed to the tested image name or annotated if verification has
been performed.
src/handlers/http/middleware.rs (1)

259-271: ⚠️ Potential issue | 🟠 Major

Handle invalid tenant ID format to prevent panic.
HeaderValue::from_str(tid).unwrap() can panic if a tenant ID contains non-visible ASCII characters. Add validation to handle the error path and return an InvalidTenantId error instead of panicking.

Suggested fix
-            if let Ok((_, tenant)) = get_user_and_tenant_from_request(request.request())
-                && let Some(tid) = tenant.as_ref()
-            {
-                request.headers_mut().insert(
-                    HeaderName::from_static(TENANT_ID),
-                    HeaderValue::from_str(tid).unwrap(),
-                );
-                t = tenant;
-            } else {
+            if let Ok((_, tenant)) = get_user_and_tenant_from_request(request.request())
+                && let Some(tid) = tenant.as_ref()
+            {
+                if let Ok(val) = HeaderValue::from_str(tid) {
+                    request
+                        .headers_mut()
+                        .insert(HeaderName::from_static(TENANT_ID), val);
+                    t = tenant;
+                } else {
+                    request.headers_mut().remove(TENANT_ID);
+                    *header_error = Some(actix_web::Error::from(PostError::Header(
+                        crate::utils::header_parsing::ParseHeaderError::InvalidTenantId,
+                    )));
+                }
+            } else {
                 // remove the header if already present
                 request.headers_mut().remove(TENANT_ID);
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/middleware.rs` around lines 259 - 271, The current
middleware inserts the tenant header using HeaderValue::from_str(tid).unwrap(),
which can panic on invalid characters; change the logic in the block that
handles tenant insertion (the get_user_and_tenant_from_request handling and
request.headers_mut().insert call for TENANT_ID) to validate
HeaderValue::from_str(tid) and handle Err by returning an InvalidTenantId error
(or converting it into the existing error type/path used by this middleware)
instead of unwrapping, only inserting the header when HeaderValue::from_str
succeeds.
src/handlers/http/role.rs (1)

189-201: ⚠️ Potential issue | 🟠 Major

Default role lookup skips persisted metadata for non-default tenants.

get_default only consults DEFAULT_ROLE; tenants without a hydrated map entry will return null even if metadata has a default role. Add a metadata fallback that backfills the map.

🔧 Suggested fix
-    let tenant_id = get_tenant_id_from_request(&req);
-    let tenant_id = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
-    let res = if let Some(role) = DEFAULT_ROLE
-        .read()
-        // .unwrap()
-        .get(tenant_id)
-        && let Some(role) = role
-    {
-        serde_json::Value::String(role.to_string())
-    } else {
-        serde_json::Value::Null
-    };
+    let tenant_id = get_tenant_id_from_request(&req);
+    let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+    let res = if let Some(role) = DEFAULT_ROLE
+        .read()
+        .get(tenant)
+        .and_then(|role| role.clone())
+    {
+        serde_json::Value::String(role)
+    } else {
+        let metadata = get_metadata(&tenant_id).await?;
+        let role = metadata.default_role.clone();
+        DEFAULT_ROLE.write().insert(tenant.to_owned(), role.clone());
+        role.map(serde_json::Value::String)
+            .unwrap_or(serde_json::Value::Null)
+    };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/role.rs` around lines 189 - 201, get_default currently only
reads DEFAULT_ROLE and returns Null for tenants that don't have an in-memory
entry, ignoring persisted metadata; modify get_default to, after computing
tenant_id (from get_tenant_id_from_request), check
DEFAULT_ROLE.read().get(tenant_id) and if missing attempt to load the default
role from persisted metadata (call the existing metadata lookup/loader used
elsewhere), backfill DEFAULT_ROLE (using DEFAULT_ROLE.write()) with the loaded
value for that tenant, and then return the found role (or Null if none). Ensure
you reference and update DEFAULT_ROLE and use the same metadata API used by
other code paths so the in-memory map is hydrated for future calls.
src/utils/mod.rs (2)

120-123: ⚠️ Potential issue | 🟠 Major

Handle invalid tenant header bytes without panicking.

HeaderValue::to_str() can fail for non-visible ASCII; unwrap() turns that into a panic/DoS. Return None on invalid headers instead.

🔧 Suggested fix
-    req.headers()
-        .get(TENANT_ID)
-        .map(|tenant_value| tenant_value.to_str().unwrap().to_owned())
+    req.headers()
+        .get(TENANT_ID)
+        .and_then(|tenant_value| tenant_value.to_str().ok())
+        .map(|tenant_value| tenant_value.to_owned())
http::HeaderValue to_str visible ASCII error
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mod.rs` around lines 120 - 123, get_tenant_id_from_request
currently calls HeaderValue::to_str().unwrap(), which can panic on
invalid/non-visible-ASCII header bytes; change the chain to handle the Result
from to_str() and return None on error instead of panicking. Locate
get_tenant_id_from_request (and TENANT_ID usage) and replace the unwrap with a
fallible mapping (e.g., use and_then/ok or map_err -> ok) so the function
returns None when tenant_value.to_str() returns Err, preserving safe behavior
for HttpRequest header parsing.

71-81: ⚠️ Potential issue | 🟠 Major

Avoid panic when BasicAuth password is missing.

basic.password() is optional; unwrap() will panic for passwordless Basic Auth. Use a safe default or pass the Option through.

🔧 Suggested fix
-    if let Ok(basic) = &creds {
-        Users.mutate_request_with_basic_user(basic.user_id(), basic.password().unwrap(), req);
+    if let Ok(basic) = &creds {
+        let password = basic.password().unwrap_or("");
+        Users.mutate_request_with_basic_user(basic.user_id(), password, req);
actix-web-httpauth BasicAuth password optional
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/mod.rs` around lines 71 - 81, The code in
mutate_request_with_tenant uses basic.password().unwrap() which can panic when
the BasicAuth password is missing; update mutate_request_with_tenant to safely
handle the Option by reading let password = basic.password().map(|p|
p.to_string()) (or let password = basic.password().unwrap_or_default() if an
empty-string default is acceptable) and then call
Users.mutate_request_with_basic_user with that safe value (or adapt
Users.mutate_request_with_basic_user to accept Option<String> and pass password
directly); ensure you update the call site in mutate_request_with_tenant that
currently references Users.mutate_request_with_basic_user(basic.user_id(),
basic.password().unwrap(), req) to use the non-panicking password variable.
src/handlers/http/modal/query/querier_rbac.rs (1)

193-200: ⚠️ Potential issue | 🟠 Major

Role existence validation still skips missing tenant role maps.

If roles().get(tenant) returns None, the current check doesn’t mark roles as non-existent, allowing invalid role assignments. Treat a missing tenant map as “no roles”.

🔧 Suggested fix (apply in both add/remove paths)
-    roles_to_add.iter().for_each(|r| {
-        if let Some(tenant_roles) = roles().get(tenant_id.as_deref().unwrap_or(DEFAULT_TENANT))
-            && tenant_roles.get(r).is_none()
-        {
-            non_existent_roles.push(r.clone());
-        }
-    });
+    let tenant = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
+    roles_to_add.iter().for_each(|r| {
+        let role_exists = roles()
+            .get(tenant)
+            .map(|tenant_roles| tenant_roles.contains_key(r))
+            .unwrap_or(false);
+        if !role_exists {
+            non_existent_roles.push(r.clone());
+        }
+    });

Also applies to: 255-262

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/modal/query/querier_rbac.rs` around lines 193 - 200, The
current role-existence check inside the closure over roles_to_add uses
roles().get(...) && tenant_roles.get(r).is_none(), which skips cases where
roles().get(...) returns None (missing tenant map) and thus fails to mark those
roles as non-existent; update the checks in the add and remove paths (the
closure iterating roles_to_add / roles_to_remove where tenant_id,
DEFAULT_TENANT, roles(), and non_existent_roles are used) to treat a missing
tenant map as empty by first resolving tenant_roles =
roles().get(tenant_id.as_deref().unwrap_or(DEFAULT_TENANT)) and if
tenant_roles.is_none() then push r.clone() into non_existent_roles, otherwise if
tenant_roles.unwrap().get(r).is_none() push r.clone(); apply the same logic in
the other (255-262) block.
src/prism/home/mod.rs (1)

369-395: ⚠️ Potential issue | 🟠 Major

Dashboard search can leak other users’ dashboards due to operator precedence.

The current condition only enforces author == user_id for ID matches; title matches bypass the author check. Apply the author filter to both title and ID matches (or use a proper auth check).

🔧 Suggested fix
-            if dashboard.title.to_lowercase().contains(query_value)
-                || dashboard_id.to_lowercase().contains(query_value)
-                    && dashboard.author.eq(&user_id)
+            if (dashboard.title.to_lowercase().contains(query_value)
+                || dashboard_id.to_lowercase().contains(query_value))
+                && dashboard.author.eq(&user_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/prism/home/mod.rs` around lines 369 - 395, The filter in
get_dashboard_titles incorrectly applies dashboard.author.eq(&user_id) only to
the dashboard_id branch due to missing parentheses; update the condition in the
closure returned by DASHBOARDS.list_dashboards(...) so the author check applies
to both title and id matches — e.g., make the condition
(dashboard.title.to_lowercase().contains(query_value) ||
dashboard_id.to_lowercase().contains(query_value)) &&
dashboard.author.eq(&user_id), or alternatively check
dashboard.author.eq(&user_id) first and return None early for non-authors before
evaluating title/ID matches; ensure you reference the existing symbols
get_dashboard_titles, DASHBOARDS.list_dashboards, dashboard.title, dashboard_id,
dashboard.author and user_id when making the change.
src/rbac/mod.rs (1)

253-344: Duplicate: BasicAuth tenant selection remains ambiguous.

mutate_request_with_basic_user / get_user_tenant_from_basic still pick the first matching tenant across all tenants, so identical creds can route to the wrong tenant. This matches the previously reported issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rbac/mod.rs` around lines 253 - 344, The current logic in
get_user_tenant_from_basic and mutate_request_with_basic_user can return the
first tenant that matches username+password, making tenant selection ambiguous
when identical credentials exist across tenants; change
get_user_tenant_from_basic to scan all tenants and only return Some(tenant) if
exactly one tenant matches (otherwise return None), and update
mutate_request_with_basic_user to rely on that uniqueness result (do not set the
TENANT_ID header if multiple matches or None); keep function signatures but
adjust the matching loops to count matches or collect tenant ids to ensure
deterministic/unique tenant selection.
src/rbac/user.rs (1)

336-386: ⚠️ Potential issue | 🟠 Major

Avoid warn‑level PII logs and normalize tenant comparison.

tracing::warn!(group_user=...) can leak user identifiers/tenant data in normal validation paths. Also, comparing user.tenant.eq(tenant_id) can incorrectly reject default‑tenant users when callers pass Some(DEFAULT_TENANT).

🛠️ Proposed fix
-                    tracing::warn!(group_user=?group_user);
-                    if let Some(user) = tenant_users.get(group_user.userid())
-                        && !user.protected
-                        && user.tenant.eq(tenant_id)
+                    if let Some(user) = tenant_users.get(group_user.userid())
+                        && !user.protected
+                        && user.tenant.as_deref().unwrap_or(DEFAULT_TENANT) == tenant
                     {
                         // user is alright
                     } else {
                         non_existent_users.push(group_user.userid().to_string());
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rbac/user.rs` around lines 336 - 386, Remove the PII-bearing warn log and
normalize tenant comparison in UserGroup::validate: replace
tracing::warn!(group_user=?group_user) with a non-PII debug/info log (for
example log only group_user.userid() at debug level or remove the log entirely)
and change the tenant check from user.tenant.eq(tenant_id) to comparing the
resolved tenant strings (e.g. compare
user.tenant.as_deref().unwrap_or(DEFAULT_TENANT) == tenant) so default-tenant
users are correctly recognized; update references in the validate method
(symbols: UserGroup::validate, group_user, user.tenant, tenant_id,
DEFAULT_TENANT, tracing::warn!) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker-compose-test-with-kafka.yaml`:
- Around line 97-99: The docker-compose service "kafka-0" uses the unmaintained,
mutable image "docker.io/bitnamilegacy/kafka:latest"; update the image reference
in the kafka-0 service to a maintained, pinned image (e.g.,
"bitnami/kafka:<specific-version>" or "apache/kafka:<specific-version>") or
temporarily pin to a specific bitnamilegacy tag (e.g.,
"bitnamilegacy/kafka:4.0") and add a brief comment documenting the migration
plan to a maintained image; ensure you replace the image value for the kafka-0
service and avoid using the "latest" tag.

In `@Dockerfile`:
- Line 17: The Dockerfile's base image tag "FROM rust:1.93.0-bookworm" is
invalid; update that FROM line to a valid tag such as "rust:1.93-bookworm" or
preferably "rust:1.93.1-bookworm" (or pin to a specific digest for
reproducibility) so the build uses an existing Rust Bookworm image.

---

Outside diff comments:
In `@src/rbac/map.rs`:
- Around line 256-315: track_new currently overwrites the user's session vector
and remove_expired_session keeps expired entries (expiry < now), which can
orphan entries in active_sessions; change track_new to append/push the new (key,
expiry) into the existing vector instead of replacing it, update
remove_expired_session to retain only non-expired entries (expiry > now or >=
now depending on desired semantics) and when dropping expired sessions also
remove their keys from active_sessions to avoid orphans, and ensure
remove_session/remove_user behavior remains consistent with these changes (use
functions track_new, remove_expired_session, remove_session, remove_user to
locate edits).

---

Duplicate comments:
In `@Dockerfile.debug`:
- Line 17: The Dockerfile.debug change uses "FROM docker.io/rust:1.93.0-bookworm
AS builder" which must be validated; pull and inspect that image tag (docker
pull docker.io/rust:1.93.0-bookworm) and confirm it exists and contains Rust
1.93, then run a local debug build using the Dockerfile.debug to ensure no MSRV
breakage; if the image is missing or Rust version mismatches the project's MSRV
(check Cargo.toml/rust-toolchain), either update the Dockerfile.debug to a
supported tag or revert to the previously used builder image and document the
compatibility decision.

In `@Dockerfile.dev`:
- Line 17: The Dockerfile’s base image line "FROM rust:1.93.0-bookworm AS
builder" may reference a tag that isn’t available or may exceed our MSRV; verify
the rust:1.93.0-bookworm tag exists on Docker Hub and confirm Rust 1.93 is
compatible with the repo MSRV (check CI config and Cargo.toml/rust-toolchain);
if the tag is missing or incompatible, update Dockerfile.dev to a valid,
available tag that matches the project MSRV (or pin to a tested toolchain like
rust:1.93 or the MSRV exact tag), and add a brief inline comment documenting the
chosen toolchain and the verification step.

In `@Dockerfile.kafka`:
- Line 17: The Dockerfile uses the base image reference "FROM
rust:1.93.0-bookworm AS builder" which may not exist or may not match the
project's MSRV; verify the "rust:1.93.0-bookworm" tag is available on Docker Hub
and that Rust 1.93 is compatible with the project's MSRV/tooling (check
cargo.toml rust-version or project README), and then update the Dockerfile to a
confirmed tag (or fallback to the project MSRV tag) or add a comment documenting
the verified tag; ensure the unique symbol "FROM rust:1.93.0-bookworm AS
builder" is changed to the tested image name or annotated if verification has
been performed.

In `@src/handlers/http/middleware.rs`:
- Around line 259-271: The current middleware inserts the tenant header using
HeaderValue::from_str(tid).unwrap(), which can panic on invalid characters;
change the logic in the block that handles tenant insertion (the
get_user_and_tenant_from_request handling and request.headers_mut().insert call
for TENANT_ID) to validate HeaderValue::from_str(tid) and handle Err by
returning an InvalidTenantId error (or converting it into the existing error
type/path used by this middleware) instead of unwrapping, only inserting the
header when HeaderValue::from_str succeeds.

In `@src/handlers/http/modal/query/querier_rbac.rs`:
- Around line 193-200: The current role-existence check inside the closure over
roles_to_add uses roles().get(...) && tenant_roles.get(r).is_none(), which skips
cases where roles().get(...) returns None (missing tenant map) and thus fails to
mark those roles as non-existent; update the checks in the add and remove paths
(the closure iterating roles_to_add / roles_to_remove where tenant_id,
DEFAULT_TENANT, roles(), and non_existent_roles are used) to treat a missing
tenant map as empty by first resolving tenant_roles =
roles().get(tenant_id.as_deref().unwrap_or(DEFAULT_TENANT)) and if
tenant_roles.is_none() then push r.clone() into non_existent_roles, otherwise if
tenant_roles.unwrap().get(r).is_none() push r.clone(); apply the same logic in
the other (255-262) block.

In `@src/handlers/http/role.rs`:
- Around line 189-201: get_default currently only reads DEFAULT_ROLE and returns
Null for tenants that don't have an in-memory entry, ignoring persisted
metadata; modify get_default to, after computing tenant_id (from
get_tenant_id_from_request), check DEFAULT_ROLE.read().get(tenant_id) and if
missing attempt to load the default role from persisted metadata (call the
existing metadata lookup/loader used elsewhere), backfill DEFAULT_ROLE (using
DEFAULT_ROLE.write()) with the loaded value for that tenant, and then return the
found role (or Null if none). Ensure you reference and update DEFAULT_ROLE and
use the same metadata API used by other code paths so the in-memory map is
hydrated for future calls.

In `@src/prism/home/mod.rs`:
- Around line 369-395: The filter in get_dashboard_titles incorrectly applies
dashboard.author.eq(&user_id) only to the dashboard_id branch due to missing
parentheses; update the condition in the closure returned by
DASHBOARDS.list_dashboards(...) so the author check applies to both title and id
matches — e.g., make the condition
(dashboard.title.to_lowercase().contains(query_value) ||
dashboard_id.to_lowercase().contains(query_value)) &&
dashboard.author.eq(&user_id), or alternatively check
dashboard.author.eq(&user_id) first and return None early for non-authors before
evaluating title/ID matches; ensure you reference the existing symbols
get_dashboard_titles, DASHBOARDS.list_dashboards, dashboard.title, dashboard_id,
dashboard.author and user_id when making the change.

In `@src/rbac/mod.rs`:
- Around line 253-344: The current logic in get_user_tenant_from_basic and
mutate_request_with_basic_user can return the first tenant that matches
username+password, making tenant selection ambiguous when identical credentials
exist across tenants; change get_user_tenant_from_basic to scan all tenants and
only return Some(tenant) if exactly one tenant matches (otherwise return None),
and update mutate_request_with_basic_user to rely on that uniqueness result (do
not set the TENANT_ID header if multiple matches or None); keep function
signatures but adjust the matching loops to count matches or collect tenant ids
to ensure deterministic/unique tenant selection.

In `@src/rbac/user.rs`:
- Around line 336-386: Remove the PII-bearing warn log and normalize tenant
comparison in UserGroup::validate: replace
tracing::warn!(group_user=?group_user) with a non-PII debug/info log (for
example log only group_user.userid() at debug level or remove the log entirely)
and change the tenant check from user.tenant.eq(tenant_id) to comparing the
resolved tenant strings (e.g. compare
user.tenant.as_deref().unwrap_or(DEFAULT_TENANT) == tenant) so default-tenant
users are correctly recognized; update references in the validate method
(symbols: UserGroup::validate, group_user, user.tenant, tenant_id,
DEFAULT_TENANT, tracing::warn!) accordingly.

In `@src/utils/mod.rs`:
- Around line 120-123: get_tenant_id_from_request currently calls
HeaderValue::to_str().unwrap(), which can panic on invalid/non-visible-ASCII
header bytes; change the chain to handle the Result from to_str() and return
None on error instead of panicking. Locate get_tenant_id_from_request (and
TENANT_ID usage) and replace the unwrap with a fallible mapping (e.g., use
and_then/ok or map_err -> ok) so the function returns None when
tenant_value.to_str() returns Err, preserving safe behavior for HttpRequest
header parsing.
- Around line 71-81: The code in mutate_request_with_tenant uses
basic.password().unwrap() which can panic when the BasicAuth password is
missing; update mutate_request_with_tenant to safely handle the Option by
reading let password = basic.password().map(|p| p.to_string()) (or let password
= basic.password().unwrap_or_default() if an empty-string default is acceptable)
and then call Users.mutate_request_with_basic_user with that safe value (or
adapt Users.mutate_request_with_basic_user to accept Option<String> and pass
password directly); ensure you update the call site in
mutate_request_with_tenant that currently references
Users.mutate_request_with_basic_user(basic.user_id(), basic.password().unwrap(),
req) to use the non-panicking password variable.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7e0e38 and 85dff72.

📒 Files selected for processing (20)
  • Dockerfile
  • Dockerfile.debug
  • Dockerfile.dev
  • Dockerfile.kafka
  • docker-compose-test-with-kafka.yaml
  • src/connectors/kafka/mod.rs
  • src/handlers/http/middleware.rs
  • src/handlers/http/modal/query/querier_rbac.rs
  • src/handlers/http/modal/query/querier_role.rs
  • src/handlers/http/modal/query_server.rs
  • src/handlers/http/role.rs
  • src/handlers/mod.rs
  • src/migration/mod.rs
  • src/parseable/mod.rs
  • src/prism/home/mod.rs
  • src/rbac/map.rs
  • src/rbac/mod.rs
  • src/rbac/role.rs
  • src/rbac/user.rs
  • src/utils/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/handlers/http/modal/query_server.rs
  • src/connectors/kafka/mod.rs

@parmesant parmesant force-pushed the multi-tenancy-review branch from 85dff72 to 3c29c98 Compare February 23, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant