Skip to content

ENG-2635: Reconcile set_schema with namespace_meta for Postgres connectors#7510

Draft
JadeCara wants to merge 19 commits intomainfrom
ENG-2635-set-schema-reconciliation
Draft

ENG-2635: Reconcile set_schema with namespace_meta for Postgres connectors#7510
JadeCara wants to merge 19 commits intomainfrom
ENG-2635-set-schema-reconciliation

Conversation

@JadeCara
Copy link
Contributor

@JadeCara JadeCara commented Feb 26, 2026

Ticket ENG-2635

Description Of Changes

Reconcile set_schema() with namespace_meta for Postgres connectors (PR 3 of 3 for ENG-2635).

When namespace_meta is present, table names are already schema-qualified in the generated SQL (e.g. "billing"."customer"). In that case, set_schema() — which sets PostgreSQL's search_path — should be skipped to avoid conflicts. Previously both mechanisms could run simultaneously, which could cause unexpected behavior if db_schema was also configured.

This PR also consolidates the duplicated get_qualified_table_name() override from three Postgres connectors into the base SQLConnector, since all three had identical logic. The base implementation uses getattr to check for schema and database_name on the parsed namespace_meta, falling back to the plain collection name when no namespace is configured.

Stacks on top of:

Code Changes

  • src/fides/api/service/connectors/sql_connector.py - Added _current_namespace_meta instance variable in __init__; lifted get_qualified_table_name() from subclasses into the base with namespace-aware logic
  • src/fides/api/service/connectors/postgres_connector.py - set_schema() skips when _current_namespace_meta is set; query_config() stores namespace_meta on the instance; removed duplicate get_qualified_table_name()
  • src/fides/api/service/connectors/google_cloud_postgres_connector.py - Same set_schema reconciliation pattern; removed duplicate get_qualified_table_name()
  • src/fides/api/service/connectors/rds_postgres_connector.py - Removed duplicate get_qualified_table_name() (inherits from base)
  • tests/ops/service/connectors/test_postgres_connector.py - Added TestPostgreSQLConnectorSetSchema with 5 unit tests
  • tests/ops/service/connectors/test_google_cloud_postgres_connector.py - Added TestGoogleCloudSQLPostgresConnectorSetSchema with 5 unit tests

Steps to Confirm

These scenarios verify the three-way logic in set_schema() and the call flow through query_config() → set_schema() during DSR execution.

Scenario 1: namespace_meta present → set_schema skipped, SQL uses qualified table names

  1. Create or update a Postgres dataset with namespace metadata:
    {
      "fides_key": "my_postgres_dataset",
      "fides_meta": {
        "namespace": {
          "schema": "billing"
        }
      },
      "collections": [...]
    }
  2. Run an access DSR against this dataset.
  3. Expected: The generated SQL uses schema-qualified table names like SELECT ... FROM "billing"."customer" WHERE .... The SET search_path statement is NOT executed (check logs — no Setting PostgreSQL search_path before retrieving data message). This is because query_config() stores the namespace_meta on the connector instance, and set_schema() sees it and returns early.

Scenario 2: No namespace_meta, db_schema in connection secrets → set_schema runs

  1. Use a Postgres dataset without fides_meta.namespace.
  2. Configure the Postgres connection with db_schema in secrets:
    {
      "host": "...",
      "dbname": "...",
      "db_schema": "billing",
      ...
    }
  3. Run an access DSR against this dataset.
  4. Expected: The SQL uses unqualified table names like SELECT ... FROM "customer" WHERE .... The SET search_path to 'billing' statement IS executed before the query (check logs for Setting PostgreSQL search_path before retrieving data). This is the legacy path — _current_namespace_meta is None, so set_schema() proceeds to check db_schema.

Scenario 3: Neither namespace_meta nor db_schema → no-op (backward compatible)

  1. Use a Postgres dataset without fides_meta.namespace.
  2. Use a Postgres connection without db_schema in secrets.
  3. Run an access DSR.
  4. Expected: SQL uses unqualified table names like SELECT ... FROM "customer" WHERE .... No SET search_path is executed. Queries run against the default public schema. This confirms backward compatibility — existing Postgres connections without any schema configuration continue to work unchanged.

Code path reference (for reviewers reading the code):

  • sql_connector.py:retrieve_data() calls self.query_config(node) (line 184), which stores _current_namespace_meta on the connector instance, then calls self.set_schema(connection) (line 197), which checks the stored state.
  • The same flow applies in mask_data() (lines 229, 243) and execute_standalone_retrieval_query() (lines 160, 171).

Manual verification completed against fidesplus-slim container with branch code loaded:

# Scenario Connector Result
1 namespace_meta present → set_schema skipped PostgreSQL ✅ PASS
2 No namespace_meta, db_schema set → set_schema runs PostgreSQL ✅ PASS
3 No namespace_meta, no db_schema → no-op PostgreSQL ✅ PASS
4 namespace_meta present → set_schema skipped GCS Postgres ✅ PASS
5 get_qualified_table_name no namespace → plain name PostgreSQL ✅ PASS
6 get_qualified_table_name schema only → schema.table PostgreSQL ✅ PASS
7 get_qualified_table_name schema+db → db.schema.table PostgreSQL ✅ PASS
8 Non-postgres connector unaffected by changes MariaDB ✅ PASS

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

🤖 Generated with Claude Code

Jade Wibbels and others added 18 commits February 26, 2026 08:50
Add PostgresNamespaceMeta schema and wire namespace support through
PostgresQueryConfig, PostgreSQLConnector, and RDSPostgresConnector.
Remove stubbed retrieve_data/mask_data from RDSPostgresConnector so
it inherits SQLConnector's implementations, enabling DSR execution.
Update RDS Postgres admin UI tags to include "DSR Automation".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Guard against None secrets in NamespaceMetaValidationStep. Update
test_validate_unsupported_connection_type to use mariadb (postgres is
now a supported namespace type). Add Postgres-specific validation tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Postgres defaults to the public schema when neither namespace_meta nor
db_schema is configured. Return empty fallback fields so validation
doesn't reject existing postgres connections that lack both.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove extra parentheses around bind parameters in expected query
strings to match current SQLAlchemy output format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dry-run query test passes db=None to TaskResources, which flows
into query_config() -> get_namespace_meta(). Return None early when
no db session is available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a dataset's namespace_meta has no overlapping fields with the
connection type's namespace schema (e.g. BigQuery namespace_meta on a
Postgres connection), skip validation rather than raising an error.
This happens when datasets of mixed types are bulk-linked to a single
connection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add GoogleCloudSQLPostgresNamespaceMeta with database_name (optional)
and schema (required), following the same pattern as Snowflake/BigQuery.

Update GoogleCloudSQLPostgresQueryConfig with generate_table_name()
for schema-qualified SQL, and GoogleCloudSQLPostgresConnector to fetch
namespace_meta from DB and pass it to the query config.

Also includes shared fixes from PR #7500:
- Guard against None db session in get_namespace_meta
- Guard against None secrets in namespace validation
- Skip namespace validation for mismatched connection types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ort' into ENG-2635-set-schema-reconciliation
When namespace_meta is present, set_schema() is now skipped because
table names are already schema-qualified in the generated SQL. The
namespace_meta state is stored on the connector instance via
query_config() and checked in set_schema(). Also deduplicates
get_qualified_table_name into the base SQLConnector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Contributor

vercel bot commented Feb 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Feb 26, 2026 9:31pm
fides-privacy-center Ignored Ignored Feb 26, 2026 9:31pm

Request Review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JadeCara JadeCara requested a review from galvana February 26, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant