Skip to content

fix(ingest/vertexai): pass credentials to project discovery client#18135

Open
eylulyr wants to merge 1 commit into
datahub-project:masterfrom
eylulyr:fix/vertexai-project-discovery-credentials
Open

fix(ingest/vertexai): pass credentials to project discovery client#18135
eylulyr wants to merge 1 commit into
datahub-project:masterfrom
eylulyr:fix/vertexai-project-discovery-credentials

Conversation

@eylulyr

@eylulyr eylulyr commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

VertexAI ingestion using project_id_pattern or project_labels fails at startup with DefaultCredentialsError, even when a service account credential block is configured in the recipe. Project discovery builds its Cloud Resource Manager client without credentials, so it falls back to Application Default Credentials, which are not present on the executor.

This PR threads the source's configured credentials into the discovery client,
matching the Dataplex source.

Root cause

VertexAISource._resolve_target_projects() called resolve_gcp_projects(filter_cfg, self.report) without a projects_client, so the helper constructed a bare ProjectsClient() that uses ADC. The configured credentials (already used for aiplatform.init() and MetadataServiceClient) were never applied to project discovery. The failure happens in init, before any extraction.

Fix

Pass a credentialed ProjectsClient to resolve_gcp_projects, built only when discovery is actually needed (project_ids is empty).

resolved_projects = resolve_gcp_projects(
    filter_cfg,
    self.report,
    projects_client=(
        ProjectsClient(credentials=self._credentials)
        if not self.config.project_ids
        else None
    ),
)

This is exactly what the Dataplex source already does in its _project_ids
resolution, and consistent with BigQuery passing explicit credentials to its GCP
clients. Only the VertexAI caller was missing it.

Testing

Two unit tests in tests/unit/vertexai/test_vertexai_source.py:

  • Discovery path: asserts ProjectsClient is built with the source credentials and passed to resolve_gcp_projects.
  • Explicit project_ids: asserts no client is built and None is forwarded.

Full VertexAI source suite passes (20 tests).

Checklist

  • The PR conforms to DataHub's Contributing Guideline (Commit Message Format)
  • Tests for the changes have been added/updated
  • Docs related to the changes have been added/updated (no doc change needed —
    required IAM permissions are unchanged)

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Linear: ING-2950

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@github-actions github-actions Bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jul 2, 2026
VertexAI project resolution built a credential-less ProjectsClient, so
project_id_pattern / project_labels discovery fell back to Application
Default Credentials. On executors without ambient ADC (e.g. the managed
remote executor) this raised DefaultCredentialsError and aborted
ingestion before any extraction, even when a service-account credential
block was configured in the recipe.

Thread self._credentials into ProjectsClient in _resolve_target_projects,
matching the Dataplex source. resolve_gcp_projects already accepts an
optional projects_client; VertexAI simply wasn't supplying one.

Add a regression test asserting the discovery client is built with the
source's credentials and handed to resolve_gcp_projects.

Co-authored-by: Claude <noreply@anthropic.com>
@eylulyr eylulyr force-pushed the fix/vertexai-project-discovery-credentials branch from 15d97a8 to 9c8b9b7 Compare July 2, 2026 12:29
@maggiehays maggiehays added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants