Skip to content

Conversation

@DoFabien
Copy link

Description

This PR introduces a performance optimization that defers the decoding of the geometry column (vertex, index, and topology buffers) until the geometry is explicitly requested.

In many use cases (e.g., filtering features based on properties, or processing data where only properties are relevant), the geometry data is not needed. Previously, decodeTile would always decode the geometry column, incurring unnecessary CPU cost.

With this change, the geometry decoding is skipped initially using a lightweight skipStreamPayload mechanism and wrapped in a DeferredGeometryColumn. The actual decoding happens only when featureTable.geometryVector, featureTable.getFeatures(), or the feature iterator is accessed.

Changes

  • New Abstraction: Added DeferredGeometryColumn which holds the necessary context (tile, offset, metadata) to perform decoding later. It caches the result after the first decoding.
  • Decoder Update: Modified decodeGeometryColumn logic in mltDecoder.ts to instantiate DeferredGeometryColumn and skip the stream using skipGeometryColumn.
  • Skip Logic: Implemented skipGeometryColumn in geometryDecoder.ts and skipStreamPayload in integerStreamDecoder.ts to efficiently advance the buffer offset without reading values.
  • FeatureTable: Updated FeatureTable to handle DeferredGeometryColumn and resolve it lazily via resolveGeometryVector.
  • Tests: Added a minimal no-ID fixture (test/expected/tag0x01/no-id/no-id.mlt) to keep the numFeatures derivation test deterministic.

Impact

  • Performance: Significant reduction in processing time for tiles when geometry is not accessed.
  • API: No breaking changes. The public API of FeatureTable remains identical.
  • Correctness: Verified that numFeatures is correctly derived even when geometry is skipped (specifically handling the case where no ID column exists).

Performance Benchmark

Ran npm run bench on two tag0x01 OMT tiles:

  • test/expected/tag0x01/omt/14_8298_10748.mlt
  • test/expected/tag0x01/omt/11_1063_1367.mlt

Results:

Decode properties only (deferred geometry): 256.80 hz, mean 3.8941 ms, rme +/-2.09% (129 samples)
Decode full (geometry + properties): 17.3939 hz, mean 57.4915 ms, rme +/-3.20% (10 samples)
Summary: properties-only 14.76x faster

Checklist

  • Code passes all existing tests.
  • New tests added for deferred decoding behavior.
  • No public API changes.

@HarelM
Copy link
Collaborator

HarelM commented Dec 19, 2025

Thanks for taking the time to open this PR!
Is it possible to use it in maplibre-gl-js or are there modification needed there as well to benefit from these optimization?

As a side note, I'm not sure I understand why this needs special handling instead of being part of the decoding by lazy loading the geometry only when needed without introducing a new "special" column.

I might be reading this wrong, so a diagram of the proposed changes might be helpful a well.

Thanks!!

@DoFabien
Copy link
Author

Thanks for the quick review @HarelM!

Here are the answers to your questions:

1. MapLibre GL JS Integration

Good news: no changes are needed in MapLibre GL JS.
I tested this with maplibre-gl-js 5.15 and it works without any changes on that side.
The worker evaluates most filters based on feature properties before touching geometry. By updating the decoder dependency, geometry decoding is deferred and only happens if/when MapLibre actually requests geometry (e.g. for bucketing/rendering). If a tile/layer ends up not needing geometry (e.g. fully filtered out, or properties-only workflows), the geometry column is never decoded. Any code path that does access geometry behaves as before (decode-on-first-access, then cached).

2. Why DeferredGeometryColumn?

MLT decoding is stream-based. To move forward, we must consume bytes from the geometry stream.

  • Current: We consume bytes by decoding them into geometry objects, even if unused.
  • This PR: We consume bytes by skipping them (advance the offset).
  • Issue: Once skipped, the stream cannot be re-read unless we remember where the geometry lived.
  • Solution: DeferredGeometryColumn captures the offset/length and metadata so we can decode later, only if geometry is actually requested.

Note: DeferredGeometryColumn is an internal, in-memory abstraction. It does not change the MLT format or introduce any new column type. The FeatureTable public API stays the same.

3. Diagram

Current Behavior (Eager)

flowchart TB
    subgraph Decoder["MLT Decoder (decodeTile)"]
        A[Parse Tile] --> B[Decode ID + Properties]
        B --> C["Decode Geometry Column"]
        C --> D[Return FeatureTable]
    end

    subgraph Consumer["Consumer (e.g. maplibre-gl-js)"]
        D --> E{Need geometry?}
        E -->|Yes| F[Use geometry]
        E -->|No| G["Geometry wasted ❌"]
    end

    style C fill:#f99,stroke:#333,stroke-width:2px,color:black
    style G fill:#f99,stroke:#333,stroke-width:2px,color:black
Loading

New Behavior (Lazy - This PR)

flowchart TB
    subgraph Decoder["MLT Decoder (decodeTile)"]
        A[Parse Tile] --> B[Decode ID + Properties]
        B --> C["Skip Geometry Bytes<br/>+ Store offset in DeferredGeometryColumn"]
        C --> D[Return FeatureTable]
    end

    subgraph Consumer["Consumer (e.g. maplibre-gl-js)"]
        D --> E{"Access .geometryVector<br/>or iterate features?"}
        E -->|Yes| F["Decode Geometry<br/>(once, then cached)"]
        E -->|No| G["Geometry never decoded"]
    end

    style C fill:#bfb,stroke:#333,stroke-width:2px,color:black
    style G fill:#9f9,stroke:#333,stroke-width:2px,color:black
Loading

@HarelM
Copy link
Collaborator

HarelM commented Dec 20, 2025

Thanks! I would be happy to get a class diagram with the relevant changes and better understand why another class is needed instead of using the current one.

Also I believe Mplibre-gl-js uses getFeatures which means it parses the geometry and won't benefit from this PR, but I might be wrong, so please double check if this will requires changes in maplibre gl js to benefit from this improvement.

Generally speaking, this is a good idea to parse the geometry on demand, it is how MVT also works.

@mactrem
Copy link
Collaborator

mactrem commented Dec 21, 2025

Great work on implementing lazy geometry decoding!

Lazy decoding is crucial for staying competitive with MVT, since MVT's row-oriented layout makes it more straightforward to implement, and the MVT JS library leverage this advantage extensively.

General thoughts
Lazy decoding is actually often even more critical for attribute columns than geometry. For example, localized tiles (like Planetiler's default configuration) often contain 70+ name:* columns, but typically only 1-2 are needed for rendering. Therefore, we need a unified lazy decoding strategy that encompasses both geometry and attribute columns.

Approachs
While implementing the POC, I explored two main strategies:

  • Style-aware selective decoding: Evaluate the style upfront and decode only referenced FeatureTables and columns (via decodeTileLazy() in the POC). This allows decoding all relevant data in one pass but complicates style change handling.
  • Fully lazy decoding: Decode all columns lazily and materialize on-demand (similar to MVT JS library and your current geometry approach) via LazyVector.

For production use, I would now recommend the second approach which aligns with your PR's direction.

Architecture considerations
We should maintain clear conceptual separation between:

  • Storage format: columns and streams (counter/length-based, sparse-encoded)
  • In-memory format/representation: vectors and buffers (offset-based, null-padded for random access)

Therefore, I recommend avoiding "column" terminology in the in-memory representation.

To extend your solution, we could consider implementing a LazyVector abstraction inspired by Velox (VLDB paper, implementation). In this approach, FeatureTables would initially contain only LazyVectors, which materialize into concrete vectors (e.g., GeometryVector, GPUVector) on first access. Alternatively, we could handle lazy materialization internally within the existing vector classes, without introducing an explicit LazyVector type.

Path forward
I’m comfortable merging lazy geometry decoding in this iteration. However, I think we should document a clear plan for extending lazy decoding to attribute columns.

What’s your thinking on the architectural direction here?

@DoFabien
Copy link
Author

Thanks @HarelM for the insightful feedback! I fully align with the architectural vision you described regarding LazyVector and separating storage from memory representation.

Context & Motivation

To give you a bit of background: I am currently working on migrating a heavy production workload from MVT to MLT. My specific use case hits the "trifecta" of MVT bottlenecks:

  1. High feature density (10k+ features per tile).
  2. Rich attributes (dozens of columns, similar to your "70+ name:*" example).
  3. Heavy client-side filtering (users filter data dynamically).

I really appreciate the incredible work done on the MLT specifications to date. I am convinced MLT is the superior format for this, but the current TS decoder implementation forces us to pay the CPU/GC cost for data we often filter out. I would like to help unlock MLT's potential for these scenarios.

Roadmap: The Step-by-Step Plan

I opted to split the work into atomic, reviewable PRs to facilitate the review process. Here is the path I am following:

  1. Step 1: Column-Level Lazy (This PR)

    • Goal: Stop decoding the geometry stream if it's not needed at all (e.g., filtered-out layers).
  2. Step 2: JS Lazy Materialization / Virtual Layer (Work In Progress)

    • Goal: Solve the JS allocation bottleneck. Currently, FeatureTable.getFeatures() creates 10,000 JS objects even if we only need one.
    • Implementation: I am working on a "Virtual Layer" approach (aligned with your LazyVector suggestion). It exposes a MapLibre-compatible interface (feature(index)) that materializes the Geometry object on-demand (Random Access) rather than eagerly iterating the whole table.
    • Analogy: The VirtualLayer I'm building is essentially a LazyVector (à la Velox) for MLT features: it holds references to the encoded buffers and only pays the materialization cost (decoding to JS objects) when the consumer explicitly requests the data at a specific index.
    • Status: I am actively working on this (it is my current local branch) and it is almost ready. I will open this PR immediately after this one is merged.
  3. Step 3: Lazy Attributes

    • Goal: Apply the same "on-demand materialization" pattern to Attribute columns.
    • Implementation: This will leverage the infrastructure built in Step 2 to allow accessing feature.properties.name without decoding the other 69 columns.

I think merging this PR as a foundational step would be a safe incremental approach. It introduces the concept of deferred decoding without breaking the existing object model, paving the way for the more advanced "Virtual Layer" in the next PR.

@mactrem
Copy link
Collaborator

mactrem commented Dec 21, 2025

Thanks for the detailed proposal! Here's some background on MLT's architecture and where we're headed:

Background

MLT was designed from the ground up to address the performance limitations you mentioned by using an explicit in-memory format (as demonstrated, for example, in our ACM SIGSPATIAL paper and the MapLibre GL JS POC integration). Compared to MVT’s (feature based) in-memory representation, this (column-oriented) design enables:

  • Improved performance for memory-bound tasks such decoding and filtering
  • Support for next-generation map renderers that offload computation to compute shaders (e.g., line/polygon tessellation and font shaping, which we are currently working on)

However, bringing the research-grade POC to production took longer than expected, so we adopted a pragmatic intermediate approach: translating MLT's in-memory format back to MVT's in-memory representation (e.g. via FeatureTable.getFeatures()). This approach was originally intended only for unit testing against MVT, not production use, which introduces several suboptimal performance characteristics—including the ones you've observed. This trade-off allowed us to introduce MLT to the community while we continued work on the production-grade implementation.

Ongoing Work

@Salkin975 and @Turtelll are currently working on bringing the research-grade POC into a production-grade implementation that operates directly on MLT's native vector format for filtering, line/polygon tessellation, and other operations. This introduces a column-oriented layout in MapLibre GL JS through additional buckets, overcoming the record-oriented limitations of the current MVT-based approach.

To get an first impression of the filtering implementation, see: https://github.com/mactrem/mlt-evaluation/tree/main/ts/src/vector. An early POC integration attempt is visible here (closed in favor of the unoptimized variant): maplibre/maplibre-gl-js#6567
@Salkin975 could you provide the branch with your current implementation state?

Path Forward
The proposed virtual layer concept could serve as an intermediate step to improve the current implementation. However, the mid/long-term direction is to transition fully to MLT's native column-oriented in-memory format, as this is essential for unlocking zero-copy techniques and other advanced optimizations.

Given this roadmap, I'd recommend not investing significant time optimizing the current FeatureTable.getFeatures() path or the Feature-based in-memory representation, as this approach will be replaced once the production-grade POC is ready to merge.

That said, if you've already completed the work for Steps 1 and 2, we'd be happy to merge it as an incremental improvement. However, for future efforts, I think it would make more sense to join forces on the long-term columnar implementation rather than further developing the intermediate abstraction layer.

@Salkin975
Copy link
Contributor

Salkin975 commented Dec 21, 2025

Here's my current working branch: https://github.com/Salkin975/maplibre-tile-spec/tree/feature-reenable-columnar-buckets
I'm re-adding the columnar bucket compatibility that was removed from the first release to avoid dead code. Once I reach the POC stage, I'll create a merge request to restore this functionality to the source repository.
After that, I'll continue with implementing a refined/completed version of the POC columnar buckets in the maplibre-gl-js repository.
My test implementation for the columnar line bucket is available here:
https://github.com/Salkin975/maplibre-gl-js/tree/mlt-add-columnar-LineBucket

@DoFabien
Copy link
Author

Thanks @mactrem and @Salkin975 for the transparent roadmap and the links! It clarifies a lot.

Understanding the Strategy

I now understand that the current FeatureTable / getFeatures() path is essentially a compatibility shim for the existing MVT-based pipeline, and that the endgame is Native Columnar Buckets.

My Proposal: The "Pragmatic Bridge"

While the native columnar implementation is clearly the superior long-term solution, migrating the entire ecosystem (and my production app) to it might take some time to be fully stable and released.

Since I have already completed the work for Step 2 (Virtual Layer) and it creates a more efficient "bridge" for anyone using MLT in the current MapLibre architecture:

  1. I propose we merge this PR (Step 1) and the upcoming Virtual Layer (Step 2).
    • Reason: It makes MLT more usable in production today by fixing the critical JS allocation bottleneck, serving as a robust stopgap until the native implementation lands.
  2. I will pivot on Step 3 (Lazy Properties).
    • Based on your advice, I will not invest additional time implementing Lazy Attributes on this intermediate layer.
    • That said, it's worth noting that the hardest work has been done in Step 1 and Step 2. The DeferredColumn pattern and the LazyGeometryCoordinatesResolver infrastructure established here would make implementing Lazy Properties relatively straightforward, essentially applying the same technique to string columns. The groundwork is in place if anyone wants to pick it up later.
    • Instead, once Step 2 is merged, I will shift my focus to testing and contributing to @Salkin975's feature-reenable-columnar-buckets branch. The work initiated there looks great, and I would be very happy to participate in the birth of this version which is going to be a real game changer for the ecosystem!

Does this sound like a reasonable plan to you?

@HarelM
Copy link
Collaborator

HarelM commented Dec 22, 2025

Can you please add a class diagram so I'll have an easier time reviewing this?

@mactrem
Copy link
Collaborator

mactrem commented Dec 22, 2025

@DoFabien Sounds good to me, i’m aligned with this direction. Thanks for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants