-
Notifications
You must be signed in to change notification settings - Fork 41
feat(ts): Defer geometry column decoding until geometry is requested #757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for taking the time to open this PR! As a side note, I'm not sure I understand why this needs special handling instead of being part of the decoding by lazy loading the geometry only when needed without introducing a new "special" column. I might be reading this wrong, so a diagram of the proposed changes might be helpful a well. Thanks!! |
|
Thanks for the quick review @HarelM! Here are the answers to your questions: 1. MapLibre GL JS IntegrationGood news: no changes are needed in MapLibre GL JS. 2. Why
|
|
Thanks! I would be happy to get a class diagram with the relevant changes and better understand why another class is needed instead of using the current one. Also I believe Mplibre-gl-js uses getFeatures which means it parses the geometry and won't benefit from this PR, but I might be wrong, so please double check if this will requires changes in maplibre gl js to benefit from this improvement. Generally speaking, this is a good idea to parse the geometry on demand, it is how MVT also works. |
|
Great work on implementing lazy geometry decoding! Lazy decoding is crucial for staying competitive with MVT, since MVT's row-oriented layout makes it more straightforward to implement, and the MVT JS library leverage this advantage extensively. General thoughts Approachs
For production use, I would now recommend the second approach which aligns with your PR's direction. Architecture considerations
Therefore, I recommend avoiding "column" terminology in the in-memory representation. To extend your solution, we could consider implementing a LazyVector abstraction inspired by Velox (VLDB paper, implementation). In this approach, FeatureTables would initially contain only LazyVectors, which materialize into concrete vectors (e.g., GeometryVector, GPUVector) on first access. Alternatively, we could handle lazy materialization internally within the existing vector classes, without introducing an explicit LazyVector type. Path forward What’s your thinking on the architectural direction here? |
|
Thanks @HarelM for the insightful feedback! I fully align with the architectural vision you described regarding Context & MotivationTo give you a bit of background: I am currently working on migrating a heavy production workload from MVT to MLT. My specific use case hits the "trifecta" of MVT bottlenecks:
I really appreciate the incredible work done on the MLT specifications to date. I am convinced MLT is the superior format for this, but the current TS decoder implementation forces us to pay the CPU/GC cost for data we often filter out. I would like to help unlock MLT's potential for these scenarios. Roadmap: The Step-by-Step PlanI opted to split the work into atomic, reviewable PRs to facilitate the review process. Here is the path I am following:
I think merging this PR as a foundational step would be a safe incremental approach. It introduces the concept of deferred decoding without breaking the existing object model, paving the way for the more advanced "Virtual Layer" in the next PR. |
|
Thanks for the detailed proposal! Here's some background on MLT's architecture and where we're headed: Background MLT was designed from the ground up to address the performance limitations you mentioned by using an explicit in-memory format (as demonstrated, for example, in our ACM SIGSPATIAL paper and the MapLibre GL JS POC integration). Compared to MVT’s (feature based) in-memory representation, this (column-oriented) design enables:
However, bringing the research-grade POC to production took longer than expected, so we adopted a pragmatic intermediate approach: translating MLT's in-memory format back to MVT's in-memory representation (e.g. via Ongoing Work @Salkin975 and @Turtelll are currently working on bringing the research-grade POC into a production-grade implementation that operates directly on MLT's native vector format for filtering, line/polygon tessellation, and other operations. This introduces a column-oriented layout in MapLibre GL JS through additional buckets, overcoming the record-oriented limitations of the current MVT-based approach. To get an first impression of the filtering implementation, see: https://github.com/mactrem/mlt-evaluation/tree/main/ts/src/vector. An early POC integration attempt is visible here (closed in favor of the unoptimized variant): maplibre/maplibre-gl-js#6567 Path Forward Given this roadmap, I'd recommend not investing significant time optimizing the current That said, if you've already completed the work for Steps 1 and 2, we'd be happy to merge it as an incremental improvement. However, for future efforts, I think it would make more sense to join forces on the long-term columnar implementation rather than further developing the intermediate abstraction layer. |
|
Here's my current working branch: https://github.com/Salkin975/maplibre-tile-spec/tree/feature-reenable-columnar-buckets |
|
Thanks @mactrem and @Salkin975 for the transparent roadmap and the links! It clarifies a lot. Understanding the StrategyI now understand that the current My Proposal: The "Pragmatic Bridge"While the native columnar implementation is clearly the superior long-term solution, migrating the entire ecosystem (and my production app) to it might take some time to be fully stable and released. Since I have already completed the work for Step 2 (Virtual Layer) and it creates a more efficient "bridge" for anyone using MLT in the current MapLibre architecture:
Does this sound like a reasonable plan to you? |
|
Can you please add a class diagram so I'll have an easier time reviewing this? |
|
@DoFabien Sounds good to me, i’m aligned with this direction. Thanks for your work on this! |
Description
This PR introduces a performance optimization that defers the decoding of the geometry column (vertex, index, and topology buffers) until the geometry is explicitly requested.
In many use cases (e.g., filtering features based on properties, or processing data where only properties are relevant), the geometry data is not needed. Previously,
decodeTilewould always decode the geometry column, incurring unnecessary CPU cost.With this change, the geometry decoding is skipped initially using a lightweight
skipStreamPayloadmechanism and wrapped in aDeferredGeometryColumn. The actual decoding happens only whenfeatureTable.geometryVector,featureTable.getFeatures(), or the feature iterator is accessed.Changes
DeferredGeometryColumnwhich holds the necessary context (tile, offset, metadata) to perform decoding later. It caches the result after the first decoding.decodeGeometryColumnlogic inmltDecoder.tsto instantiateDeferredGeometryColumnand skip the stream usingskipGeometryColumn.skipGeometryColumningeometryDecoder.tsandskipStreamPayloadinintegerStreamDecoder.tsto efficiently advance the buffer offset without reading values.FeatureTableto handleDeferredGeometryColumnand resolve it lazily viaresolveGeometryVector.test/expected/tag0x01/no-id/no-id.mlt) to keep thenumFeaturesderivation test deterministic.Impact
FeatureTableremains identical.numFeaturesis correctly derived even when geometry is skipped (specifically handling the case where no ID column exists).Performance Benchmark
Ran
npm run benchon two tag0x01 OMT tiles:test/expected/tag0x01/omt/14_8298_10748.mlttest/expected/tag0x01/omt/11_1063_1367.mltResults:
Checklist