Skip to content

Conversation

@SungJin1212
Copy link
Member

This PR introduces local cache warmup logic on HA tracker startup that fetches all keys from the KV store and warms the local cache.

Previously, whenever the Distributor started, it suffered from initial cold cache misses. Since the local map was empty, the HATracker treated every incoming request as a new entry. This caused unnecessary CAS operations to the KV store even for existing valid keys.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@SungJin1212 SungJin1212 force-pushed the sync-ha-tracker-on-start branch 3 times, most recently from 7b6fda1 to 3567f6f Compare January 14, 2026 10:16
@SungJin1212 SungJin1212 force-pushed the sync-ha-tracker-on-start branch from 3567f6f to 7786687 Compare January 15, 2026 01:51
@friedrichg
Copy link
Member

Do you see distributors in clusters with large number of pairs using more memory after this change? Do we need a flag for this?

Signed-off-by: SungJin1212 <[email protected]>
@SungJin1212
Copy link
Member Author

@friedrichg
I added a benchmark for syncKVStoreToLocalMap under various key size.
I ran a benchmark and the results indicate that the overhead is negligible even at a very large scale (10000 keys, about 18MB).

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/ha
cpu: Apple M4 Max
BenchmarkHATracker_syncKVStoreToLocalMap
BenchmarkHATracker_syncKVStoreToLocalMap/keys=100
BenchmarkHATracker_syncKVStoreToLocalMap/keys=100-14         	   10740	    113366 ns/op	  203197 B/op	    2532 allocs/op
BenchmarkHATracker_syncKVStoreToLocalMap/keys=1000
BenchmarkHATracker_syncKVStoreToLocalMap/keys=1000-14        	    1174	    973137 ns/op	 1834239 B/op	   24289 allocs/op
BenchmarkHATracker_syncKVStoreToLocalMap/keys=10000
BenchmarkHATracker_syncKVStoreToLocalMap/keys=10000-14       	     130	   9065987 ns/op	18199241 B/op	  240955 allocs/op
PASS

Copy link
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SungJin1212 Thanks, this is great work.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 15, 2026
successCount++
}

c.electedLock.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is needed. as this is called before running the loop and starting the service.

c.replicaGroups[user][cluster] = struct{}{}
}
}
c.electedLock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ha-tracker lgtm This PR has been approved by a maintainer size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants