Zarr v2 Interoperability Guide

ExZarr implements the Zarr v2 specification for compatibility with other Zarr implementations, particularly Python's zarr-python library. This guide explains how to work with Zarr arrays across different languages and platforms.

Overview
Python Integration
Data Type Compatibility
Compression Compatibility
Metadata Format
File Structure
Examples
Testing Interoperability
Troubleshooting

Overview

The Zarr v2 specification defines a standard format for storing chunked, compressed, N-dimensional arrays. ExZarr follows this specification to ensure arrays can be shared between:

Python (zarr-python, dask, xarray)
Julia (Zarr.jl)
JavaScript (zarr.js)
Java (N5-Zarr)
C++ (xtensor-zarr)
Rust (zarr-rs)

This allows scientific workflows to span multiple languages while maintaining a single data format.

Python Integration

Creating Arrays with ExZarr for Python

# Create an array in Elixir
{:ok, array} = ExZarr.create(
  shape: {1000, 1000},
  chunks: {100, 100},
  dtype: :float64,
  compressor: :zlib,
  storage: :filesystem,
  path: "/shared/data/experiment_1"
)

:ok = ExZarr.save(array, path: "/shared/data/experiment_1")

# Read in Python
import zarr
import numpy as np

z = zarr.open_array('/shared/data/experiment_1', mode='r')
print(z.shape)   # (1000, 1000)
print(z.dtype)   # float64
data = z[:]      # Read entire array

Creating Arrays with Python for ExZarr

# Create an array in Python
import zarr
import numpy as np

z = zarr.open_array(
    '/shared/data/results',
    mode='w',
    shape=(500, 500),
    chunks=(50, 50),
    dtype='int32',
    compressor=zarr.Zlib(level=5)
)

# Fill with data
z[:, :] = np.random.randint(0, 100, size=(500, 500))

# Read in Elixir
{:ok, array} = ExZarr.open(path: "/shared/data/results")
IO.inspect(array.shape)      # {500, 500}
IO.inspect(array.dtype)      # :int32
IO.inspect(array.compressor) # :zlib

Data Type Compatibility

ExZarr supports all standard Zarr v2 data types with full bidirectional compatibility:

ExZarr Type	Python numpy	Bytes	Description
`:int8`	`int8`	1	8-bit signed integer
`:int16`	`int16`	2	16-bit signed integer
`:int32`	`int32`	4	32-bit signed integer
`:int64`	`int64`	8	64-bit signed integer
`:uint8`	`uint8`	1	8-bit unsigned integer
`:uint16`	`uint16`	2	16-bit unsigned integer
`:uint32`	`uint32`	4	32-bit unsigned integer
`:uint64`	`uint64`	8	64-bit unsigned integer
`:float32`	`float32`	4	32-bit floating point
`:float64`	`float64`	8	64-bit floating point

Byte Order

Zarr uses byte order prefixes in metadata:

< - Little-endian (most common)
> - Big-endian
| - Native/not applicable (for single-byte types)

ExZarr automatically handles all byte order formats when reading arrays.

Compression Compatibility

Zlib (Recommended)

The :zlib compressor is fully compatible across all implementations:

# ExZarr
{:ok, array} = ExZarr.create(
  shape: {1000},
  chunks: {100},
  dtype: :int32,
  compressor: :zlib,  # Fully compatible
  storage: :filesystem,
  path: "/data/compressed"
)

# Python
import zarr

z = zarr.open_array(
    '/data/compressed',
    mode='w',
    shape=(1000,),
    chunks=(100,),
    dtype='int32',
    compressor=zarr.Zlib(level=5)  # Compatible
)

No Compression

Arrays with :none compressor are also fully compatible:

# ExZarr - no compression
compressor: :none

# Python - no compression
compressor=None

Other Compressors

:zstd and :lz4 currently fall back to :zlib in ExZarr
For maximum compatibility, use :zlib or :none
Future versions will support native zstd and lz4

Metadata Format

Zarr arrays store metadata in a .zarray JSON file. ExZarr follows this format exactly:

{
  "zarr_format": 2,
  "shape": [1000, 1000],
  "chunks": [100, 100],
  "dtype": "<f8",
  "compressor": {
    "id": "zlib",
    "level": 5
  },
  "fill_value": 0.0,
  "order": "C",
  "filters": null
}

Dtype Encoding

Data types are encoded as strings with byte order prefix and size:

Zarr String	ExZarr Type	Description
`<i1` or `	i1`	`:int8`
`<i4`	`:int32`	32-bit signed int (little-endian)
`<u2`	`:uint16`	16-bit unsigned int (little-endian)
`<f8`	`:float64`	64-bit float (little-endian)

Compressor Encoding

Compressors are encoded as objects with codec ID:

{
  "id": "zlib",
  "level": 5
}

For no compression: "compressor": null

File Structure

Zarr arrays on disk follow a standard directory structure:

my_array/
├── .zarray          # Metadata JSON file
├── 0.0              # Chunk at index (0, 0)
├── 0.1              # Chunk at index (0, 1)
├── 1.0              # Chunk at index (1, 0)
└── 1.1              # Chunk at index (1, 1)

Groups

Hierarchical groups use .zgroup files:

my_group/
├── .zgroup          # Group metadata
├── array1/
│   ├── .zarray
│   ├── 0.0
│   └── 0.1
└── subgroup/
    ├── .zgroup
    └── array2/
        ├── .zarray
        └── 0.0

Chunk Naming

Chunks are named using dot notation:

0 - 1D chunk at index 0
0.0 - 2D chunk at index (0, 0)
0.0.0 - 3D chunk at index (0, 0, 0)

This is consistent across all Zarr implementations.

Examples

Example 1: Scientific Data Pipeline

Process data in Python, analyze in Elixir:

# Python: Generate and save experimental data
import zarr
import numpy as np

# Create large dataset
z = zarr.open_array(
    '/data/experiment/raw',
    mode='w',
    shape=(10000, 10000),
    chunks=(1000, 1000),
    dtype='float32',
    compressor=zarr.Zlib(level=5)
)

# Simulate experimental data
z[:, :] = np.random.normal(loc=0, scale=1, size=(10000, 10000))

# Elixir: Load and analyze
{:ok, array} = ExZarr.open(path: "/data/experiment/raw")

# Process in Elixir
IO.puts "Dataset shape: #{inspect(array.shape)}"
IO.puts "Data type: #{array.dtype}"
IO.puts "Total elements: #{ExZarr.Array.size(array)}"
IO.puts "Memory per element: #{ExZarr.Array.itemsize(array)} bytes"

# Could process chunks in parallel using Flow or Broadway

Example 2: Multi-Language Workflow

# Step 1: Data collection (Python)
import zarr
import numpy as np

data = zarr.open_array(
    '/shared/pipeline/step1',
    mode='w',
    shape=(5000, 100),
    chunks=(500, 100),
    dtype='int32'
)
data[:, :] = collect_sensor_data()

# Step 2: Data validation (Elixir)
{:ok, input} = ExZarr.open(path: "/shared/pipeline/step1")

# Validate and create cleaned output
{:ok, output} = ExZarr.create(
  shape: input.shape,
  chunks: input.chunks,
  dtype: input.dtype,
  compressor: :zlib,
  storage: :filesystem,
  path: "/shared/pipeline/step2"
)

# Process and save validated data
:ok = ExZarr.save(output, path: "/shared/pipeline/step2")

# Step 3: Machine learning (Python)
import zarr

cleaned = zarr.open_array('/shared/pipeline/step2', mode='r')
# Train model on cleaned data

Example 3: Interactive Demo

Run the included demo script:

elixir examples/python_interop_demo.exs

This demonstrates:

Creating a 10×10 array with ExZarr
Reading it with zarr-python
Creating a 20×20 array with zarr-python
Reading it with ExZarr

Testing Interoperability

Running Integration Tests

ExZarr includes comprehensive integration tests:

# Setup Python environment (one time)
./test/support/setup_python_tests.sh

# Run integration tests
mix test test/ex_zarr_python_integration_test.exs

What the Tests Verify

The integration tests ensure:

Bidirectional Compatibility
- ExZarr → Python: Arrays created by ExZarr are readable by Python
- Python → ExZarr: Arrays created by Python are readable by ExZarr
Data Type Coverage
- All 10 data types tested in both directions
- 1D, 2D, and 3D arrays
- Various chunk sizes
Metadata Correctness
- Shape, chunks, dtype preserved
- Fill values maintained
- Compressor settings retained
Compression
- Zlib compression/decompression works across implementations
- No compression mode compatible

Manual Testing

Create test arrays to verify compatibility:

# Create test array with ExZarr
{:ok, array} = ExZarr.create(
  shape: {100, 100},
  chunks: {10, 10},
  dtype: :float64,
  compressor: :zlib,
  storage: :filesystem,
  path: "/tmp/test_array"
)
:ok = ExZarr.save(array, path: "/tmp/test_array")

# Verify with Python
import zarr

z = zarr.open_array('/tmp/test_array', mode='r')
print(f"Shape: {z.shape}")
print(f"Dtype: {z.dtype}")
print(f"Chunks: {z.chunks}")
assert z.shape == (100, 100)

Troubleshooting

Common Issues

1. "Cannot open array" Error

Symptom: Python or ExZarr cannot open an array created by the other

Solutions:

Verify .zarray file exists
Check file permissions
Ensure Zarr v2 format (not v3)
Validate JSON in .zarray is well-formed

2. Dtype Mismatch

Symptom: Data appears corrupted or has wrong type

Solutions:

Ensure consistent byte order (little-endian is default)
Check dtype string in .zarray matches expected format
Verify both implementations use Zarr v2 specification

3. Compression Errors

Symptom: "Decompression failed" or "Unsupported codec"

Solutions:

Use :zlib or :none for maximum compatibility
Ensure zarr-python version is 2.x, not 3.x
Check compressor configuration in .zarray

4. Chunk Files Missing

Symptom: "Chunk not found" errors

Solutions:

Verify chunk files use dot notation (e.g., 0.0)
Check directory permissions
Ensure chunks were actually written
Confirm path is correct

Debugging Tips

Inspect Metadata
```
cat /path/to/array/.zarray | jq
```
List Chunk Files
```
ls -la /path/to/array/
```

Verify with Python

import zarr
z = zarr.open_array('/path/to/array', mode='r')
print(z.info)

Check ExZarr Metadata

{:ok, array} = ExZarr.open(path: "/path/to/array")
IO.inspect(array.metadata, pretty: true)

Reporting Issues

If you encounter compatibility issues:

Verify you're using Zarr v2 (not v3)
Check Python zarr version: python3 -c "import zarr; print(zarr.__version__)"
Run integration tests: mix test test/ex_zarr_python_integration_test.exs
Provide:
- ExZarr version
- Python zarr version
- .zarray file contents
- Error messages
- Minimal reproduction steps

Best Practices

1. Use Standard Compression

For maximum compatibility, use zlib:

compressor: :zlib  # Best compatibility

2. Document Metadata

Add attributes to groups for documentation:

# Python
import zarr
root = zarr.open_group('/data/experiment', mode='w')
root.attrs['description'] = 'Temperature measurements'
root.attrs['created'] = '2026-01-22'

3. Consistent Chunking

Choose chunk sizes that work well for both reading and writing:

# Good: Balanced chunks
chunks: {100, 100}  # 10,000 elements per chunk

# Avoid: Too small (too many files)
chunks: {10, 10}    # Only 100 elements per chunk

# Avoid: Too large (memory intensive)
chunks: {10000, 10000}  # 100 million elements per chunk

4. Test Compatibility

Always test that arrays can be read by both implementations:

# Create with ExZarr
iex> {:ok, array} = ExZarr.create(...)
iex> :ok = ExZarr.save(array, path: "/test")

# Verify with Python
$ python3 -c "import zarr; z = zarr.open_array('/test', mode='r'); print(z.info)"

5. Use Zarr v2

ExZarr implements Zarr v2. Ensure Python uses v2 format:

# Ensure Zarr v2
import zarr
assert zarr.__version__.startswith('2.')

Resources

Summary

ExZarr provides full Zarr v2 compatibility, enabling seamless data exchange with Python and other Zarr implementations. The integration tests verify this compatibility across all data types, compression methods, and array dimensions. By following the Zarr specification and best practices, you can build multi-language scientific computing pipelines with confidence.

FilesExpand file tree

INTEROPERABILITY.md

Latest commit

History

INTEROPERABILITY.md

File metadata and controls

Zarr v2 Interoperability Guide

Table of Contents

Overview

Python Integration

Creating Arrays with ExZarr for Python

Creating Arrays with Python for ExZarr

Data Type Compatibility

Byte Order

Compression Compatibility

Zlib (Recommended)

No Compression

Other Compressors

Metadata Format

Dtype Encoding

Compressor Encoding

File Structure

Groups

Chunk Naming

Examples

Example 1: Scientific Data Pipeline

Example 2: Multi-Language Workflow

Example 3: Interactive Demo

Testing Interoperability

Running Integration Tests

What the Tests Verify

Manual Testing

Troubleshooting

Common Issues

1. "Cannot open array" Error

2. Dtype Mismatch

3. Compression Errors

4. Chunk Files Missing

Debugging Tips

Reporting Issues

Best Practices

1. Use Standard Compression

2. Document Metadata

3. Consistent Chunking

4. Test Compatibility

5. Use Zarr v2

Resources

Summary