Skip to content

Commit cc00c14

Browse files
committed
Merge branch 'python-128bit-keys': 128-bit key (uuid) support for Python
2 parents 9cd517d + a32b708 commit cc00c14

8 files changed

Lines changed: 796 additions & 368 deletions

File tree

README.md

Lines changed: 49 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,14 @@ Linux • macOS • Windows • iOS • Android • WebAssembly •
4949

5050
> **ISCC Foundation Fork** -- This is a maintained fork of [USearch](https://github.com/unum-cloud/usearch)
5151
> by the [ISCC Foundation](https://iscc.io), published on PyPI as
52-
> [`usearch-iscc`](https://pypi.org/project/usearch-iscc/). It includes bug fixes and patches not yet
53-
> available upstream. The Python import name remains `usearch` for compatibility. Install with:
54-
> `pip install usearch-iscc`
52+
> [`usearch-iscc`](https://pypi.org/project/usearch-iscc/). The Python import name remains `usearch`
53+
> for compatibility. Install with: `pip install usearch-iscc`
54+
>
55+
> **Fork divergence from upstream:**
56+
> - 128-bit key support (Python): `Index(ndim=..., key_kind="uuid")` for packed 16-byte keys
57+
> - Bug fix: `Index.vectors` returns `np.ndarray` instead of broken list/tuple
58+
> - Bug fix: `self_recall()` wraps `index.get()` result with `np.vstack()` before search
59+
> - Build: published as `usearch-iscc` on PyPI with independent release cycle
5560
5661
---
5762

@@ -154,6 +159,47 @@ index = Index(
154159
)
155160
```
156161

162+
## 128-bit Keys (UUID Mode)
163+
164+
By default, USearch uses 64-bit unsigned integer keys. This fork adds support for 128-bit keys via `key_kind="uuid"`, allowing you to pack structured identifiers (e.g. content hashes, chunk pointers) directly into the key.
165+
166+
```py
167+
import numpy as np
168+
from usearch.index import Index
169+
170+
# Create an index with 128-bit keys
171+
index = Index(ndim=128, metric='cos', key_kind='uuid')
172+
173+
# Keys are 16-byte values: single keys as bytes, batches as numpy V16 arrays
174+
batch_size = 1000
175+
keys = np.empty(batch_size, dtype='V16')
176+
vectors = np.random.randn(batch_size, 128).astype(np.float32)
177+
178+
for i in range(batch_size):
179+
body = i.to_bytes(8, 'big') # 8 bytes: content identity
180+
offset = (i * 16).to_bytes(4, 'big') # 4 bytes: chunk offset
181+
size = (1024 + i).to_bytes(4, 'big') # 4 bytes: chunk size
182+
keys[i] = body + offset + size # 16 bytes total
183+
184+
index.add(keys, vectors)
185+
matches = index.search(vectors[0], count=5)
186+
187+
for match in matches:
188+
print(match.key, match.distance) # match.key is bytes(16)
189+
190+
# Single-key operations use bytes(16)
191+
single_key = keys[0].tobytes()
192+
index.contains(single_key) # bool
193+
index.get(single_key) # np.ndarray or None
194+
index.remove(single_key)
195+
196+
# Save/load preserves key kind; mismatched load raises ValueError
197+
index.save('index.usearch')
198+
restored = Index.restore('index.usearch') # auto-detects uuid mode
199+
```
200+
201+
> **Note:** Auto-generated keys are not supported in uuid mode — you must always pass explicit keys to `add()`.
202+
157203
## Serialization & Serving `Index` from Disk
158204

159205
USearch supports multiple forms of serialization:

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.23.3
1+
2.23.4

WHEELS.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ uv pip install usearch-iscc
2323
```toml
2424
[project]
2525
dependencies = [
26-
"usearch-iscc>=2.23.3",
26+
"usearch-iscc>=2.23.4",
2727
]
2828
```
2929

@@ -62,8 +62,8 @@ The build and deployment process is fully automated:
6262

6363
2. **Create and push a version tag:**
6464
```bash
65-
git tag v2.23.3
66-
git push origin v2.23.3
65+
git tag v2.23.4
66+
git push origin v2.23.4
6767
```
6868

6969
3. **GitHub Actions automatically:**
@@ -108,15 +108,15 @@ python -c "import usearch; print(usearch.__version__)"
108108
If needed, you can retag to trigger a rebuild:
109109

110110
```bash
111-
git tag -d v2.23.3
112-
git push origin :refs/tags/v2.23.3
113-
git tag v2.23.3
114-
git push origin v2.23.3
111+
git tag -d v2.23.4
112+
git push origin :refs/tags/v2.23.4
113+
git tag v2.23.4
114+
git push origin v2.23.4
115115
```
116116

117117
## Version Numbering
118118

119-
This fork follows the upstream version scheme (e.g., `2.23.3`). Tags use the format `v2.23.3`.
119+
This fork follows the upstream version scheme (e.g., `2.23.4`). Tags use the format `v2.23.4`.
120120

121121
## Differences from Upstream
122122

include/usearch/index_plugins.hpp

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,26 @@ struct uuid_t {
8888
std::uint8_t octets[16];
8989
};
9090

91+
inline bool operator==(uuid_t const& a, uuid_t const& b) noexcept {
92+
return std::memcmp(a.octets, b.octets, sizeof(a.octets)) == 0;
93+
}
94+
inline bool operator!=(uuid_t const& a, uuid_t const& b) noexcept { return !(a == b); }
95+
inline bool operator<(uuid_t const& a, uuid_t const& b) noexcept {
96+
return std::memcmp(a.octets, b.octets, sizeof(a.octets)) < 0;
97+
}
98+
99+
template <> struct hash_gt<uuid_t> {
100+
std::size_t operator()(uuid_t const& element) const noexcept {
101+
// 64-bit FNV-1a hash over all 16 octets.
102+
std::uint64_t hash = 14695981039346656037ull;
103+
for (std::size_t i = 0; i != sizeof(element.octets); ++i) {
104+
hash ^= element.octets[i];
105+
hash *= 1099511628211ull;
106+
}
107+
return static_cast<std::size_t>(hash);
108+
}
109+
};
110+
91111
class f16_bits_t;
92112
class bf16_bits_t;
93113

0 commit comments

Comments
 (0)