Providers/SemanticStore: hosted OpenAI vector stores behind a unified SemanticStore protocol#29
Merged
Merged
Conversation
…ttributes The vector-store CRUD, file, and batch methods were internal by accident (only createVectorStoreFile carried public) — reachable solely via @testable. All of them are now public, with the supporting models made usable: ExpirationPolicy gets a public init, FileCounts public counts, VectorStoreFilesBatch goes public, LastError becomes Sendable. The stray progress print in waitUntilVectorStoreIsReady is gone and the 'Vectore' filename typo is fixed. Two functional additions bring the client up to the current API: - FileStatus gains cancelled/failed — decoding a failed vector-store file previously threw instead of reporting the failure. - searchVectorStore(id:query:maxNumResults:filters:rewriteQuery:) — the 2025 search endpoint the 2024-era client predated — with its wire types (VectorStoreAttribute, VectorStoreFilter, the result page with tolerant search_query decoding) and attributes on file attach, the hooks the SemanticStore unification builds on. The filter operator case names (eq/ne/gt/lt/or) mirror the documented wire names and are excluded from the identifier_name lint rule like the other protocol names. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…protocol A store-agnostic core — indexText / indexFile / sync / search / count — as the new SemanticStore protocol, with MemoryMatch, IndexOutcome, and SyncSummary moved out of the SQLiteVectorStore trait gate (Sendable, public inits) so they exist in every build. Citations render source:path when a match has no line span (0/0 = whole artifact). SQLiteVectorStore conforms as-is; keyword/hybrid/fused/expanded search and reranking remain its extras. The FNV-1a hash and relativePath helpers move to shared StoreSupport (fingerprints stay byte-identical; the existing suite proves it). New: OpenAIVectorStore, a drop-in SemanticStore over OpenAI's hosted vector stores. Identity mirrors the local store via path/source/hash file attributes — re-indexing replaces, unchanged content is hash-skipped without re-uploading, sync prunes — and search maps the hosted results onto MemoryMatch (sources filter via attribute filter, topN capped at the endpoint's 50). delete() and pruning also delete the underlying File uploads so account storage doesn't leak. rewritesQueries opts into server-side query rewriting per search; lastSearchQueries exposes what the rewriter executed (the local analogue remains expandedSearch + QueryExpander). count() reports completed documents — chunking is server-side. Not trait-gated: the hosted store needs no SQLite engine. LocalVectorStore and TextFragment go public — the guide always documented them as the zero-setup store, but they were internal. Because the target now hosts three stores, FTS5 keyword search, RRF, query expansion, and reranking — and its old name collided with both the Providers VectorStore wire DTO and OpenAI's product term — the product/target is renamed VectorStore -> SemanticStore, after the protocol that is now its center. The SQLiteVectorStore trait and the concrete store class names are unchanged. Migration: .product(name: "SemanticStore", …) + import SemanticStore. Tests: 9 offline (filter/page wire coverage, match mapping, citations, filenames, hash stability) plus a live hosted round-trip — index, search, rewrite, incremental skip, sync prune, delete — gated on OPENAI_API_KEY. Docs/SemanticStore.md covers the three stores, the protocol, hosted parity notes, and query rewriting on both engines. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a9faf5867
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…d one Review follow-ups (PR #29): - Replacement order: upload, attach, and fully process the new document BEFORE removing the old one, so a transient failure leaves the prior content searchable instead of losing the identity. A failed or half-attached replacement is detached and deleted before the error propagates. - Unchanged fast path: the inventory now carries each remote file's status, and a hash match only counts as 'unchanged' when it is a single, completed document (isCurrent). Previously a failed upload kept its hash attribute, so retrying the same content skipped re-indexing while the document stayed unsearchable. Stale duplicates from an interrupted replacement also fail the gate and are swept up by the next upsert. Offline test covers the gate; the live round-trip exercises the new replacement ordering. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Lets the OpenAI hosted vector stores be used exactly like the local SQLite one — one protocol, two backends:
Fixes
createVectorStoreFilelackedpublic— usable only via@testable. All CRUD/file/batch methods are now public;ExpirationPolicy/FileCounts/VectorStoreFilesBatchbecame constructible/readable; stray debugprintremoved;Vectorefilename typo fixed.FileStatuscouldn't decode failures —cancelled/failedwere missing, so retrieving a failed file threw a decoding error instead of reporting it.Features
POST /v1/vector_stores/{id}/search) withVectorStoreAttribute/VectorStoreFilterwire types, tolerantsearch_querydecoding, andattributeson file attach.SemanticStoreprotocol — the store-agnostic core;MemoryMatch& co. un-gated andSendable; span-less (0/0) citations rendersource:path.OpenAIVectorStore— hostedSemanticStorewith local-parity identity viapath/source/hashattributes (replace on re-index, hash-skip, sync-prune), storage-leak-free deletes, and server-side query rewriting (rewritesQueries, observable vialastSearchQueries). No trait needed.LocalVectorStoremade public (the guide always advertised it).Breaking: product rename
VectorStore→SemanticStoreThe target now hosts three stores plus FTS5/RRF/expansion/reranking, and the old name collided with both the
Providers.VectorStorewire DTO and OpenAI's product term. Migration:.product(name: "SemanticStore", …)+import SemanticStore. TheSQLiteVectorStoretrait and concrete class names are unchanged. Known trade-off: module and protocol share a name (XCTest-style), so module-qualifying other symbols needs scoped imports.Verification
--traits SQLiteVectorStore),swiftlint --strictclean.OPENAI_API_KEY.rewrite_queryvalidated against the live API (a bogus-param control confirms strict body validation).swift package edit(7/7), both before and after the rename.Follow-up
indexMedia, multimodal embeddings) per the openclaw mechanism — the0/0whole-artifact citation convention is already in place for it.🤖 Generated with Claude Code