ExZarr implements the Zarr v2 specification for compatibility with other Zarr implementations, particularly Python's zarr-python library. This guide explains how to work with Zarr arrays across different languages and platforms.
- Overview
- Python Integration
- Data Type Compatibility
- Compression Compatibility
- Metadata Format
- File Structure
- Examples
- Testing Interoperability
- Troubleshooting
The Zarr v2 specification defines a standard format for storing chunked, compressed, N-dimensional arrays. ExZarr follows this specification to ensure arrays can be shared between:
- Python (zarr-python, dask, xarray)
- Julia (Zarr.jl)
- JavaScript (zarr.js)
- Java (N5-Zarr)
- C++ (xtensor-zarr)
- Rust (zarr-rs)
This allows scientific workflows to span multiple languages while maintaining a single data format.
# Create an array in Elixir
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
compressor: :zlib,
storage: :filesystem,
path: "/shared/data/experiment_1"
)
:ok = ExZarr.save(array, path: "/shared/data/experiment_1")# Read in Python
import zarr
import numpy as np
z = zarr.open_array('/shared/data/experiment_1', mode='r')
print(z.shape) # (1000, 1000)
print(z.dtype) # float64
data = z[:] # Read entire array# Create an array in Python
import zarr
import numpy as np
z = zarr.open_array(
'/shared/data/results',
mode='w',
shape=(500, 500),
chunks=(50, 50),
dtype='int32',
compressor=zarr.Zlib(level=5)
)
# Fill with data
z[:, :] = np.random.randint(0, 100, size=(500, 500))# Read in Elixir
{:ok, array} = ExZarr.open(path: "/shared/data/results")
IO.inspect(array.shape) # {500, 500}
IO.inspect(array.dtype) # :int32
IO.inspect(array.compressor) # :zlibExZarr supports all standard Zarr v2 data types with full bidirectional compatibility:
| ExZarr Type | Python numpy | Bytes | Description |
|---|---|---|---|
:int8 |
int8 |
1 | 8-bit signed integer |
:int16 |
int16 |
2 | 16-bit signed integer |
:int32 |
int32 |
4 | 32-bit signed integer |
:int64 |
int64 |
8 | 64-bit signed integer |
:uint8 |
uint8 |
1 | 8-bit unsigned integer |
:uint16 |
uint16 |
2 | 16-bit unsigned integer |
:uint32 |
uint32 |
4 | 32-bit unsigned integer |
:uint64 |
uint64 |
8 | 64-bit unsigned integer |
:float32 |
float32 |
4 | 32-bit floating point |
:float64 |
float64 |
8 | 64-bit floating point |
Zarr uses byte order prefixes in metadata:
<- Little-endian (most common)>- Big-endian|- Native/not applicable (for single-byte types)
ExZarr automatically handles all byte order formats when reading arrays.
The :zlib compressor is fully compatible across all implementations:
# ExZarr
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int32,
compressor: :zlib, # Fully compatible
storage: :filesystem,
path: "/data/compressed"
)# Python
import zarr
z = zarr.open_array(
'/data/compressed',
mode='w',
shape=(1000,),
chunks=(100,),
dtype='int32',
compressor=zarr.Zlib(level=5) # Compatible
)Arrays with :none compressor are also fully compatible:
# ExZarr - no compression
compressor: :none# Python - no compression
compressor=None:zstdand:lz4currently fall back to:zlibin ExZarr- For maximum compatibility, use
:zlibor:none - Future versions will support native zstd and lz4
Zarr arrays store metadata in a .zarray JSON file. ExZarr follows this format exactly:
{
"zarr_format": 2,
"shape": [1000, 1000],
"chunks": [100, 100],
"dtype": "<f8",
"compressor": {
"id": "zlib",
"level": 5
},
"fill_value": 0.0,
"order": "C",
"filters": null
}Data types are encoded as strings with byte order prefix and size:
| Zarr String | ExZarr Type | Description |
|---|---|---|
<i1 or ` |
i1` | :int8 |
<i4 |
:int32 |
32-bit signed int (little-endian) |
<u2 |
:uint16 |
16-bit unsigned int (little-endian) |
<f8 |
:float64 |
64-bit float (little-endian) |
Compressors are encoded as objects with codec ID:
{
"id": "zlib",
"level": 5
}For no compression: "compressor": null
Zarr arrays on disk follow a standard directory structure:
my_array/
├── .zarray # Metadata JSON file
├── 0.0 # Chunk at index (0, 0)
├── 0.1 # Chunk at index (0, 1)
├── 1.0 # Chunk at index (1, 0)
└── 1.1 # Chunk at index (1, 1)
Hierarchical groups use .zgroup files:
my_group/
├── .zgroup # Group metadata
├── array1/
│ ├── .zarray
│ ├── 0.0
│ └── 0.1
└── subgroup/
├── .zgroup
└── array2/
├── .zarray
└── 0.0
Chunks are named using dot notation:
0- 1D chunk at index 00.0- 2D chunk at index (0, 0)0.0.0- 3D chunk at index (0, 0, 0)
This is consistent across all Zarr implementations.
Process data in Python, analyze in Elixir:
# Python: Generate and save experimental data
import zarr
import numpy as np
# Create large dataset
z = zarr.open_array(
'/data/experiment/raw',
mode='w',
shape=(10000, 10000),
chunks=(1000, 1000),
dtype='float32',
compressor=zarr.Zlib(level=5)
)
# Simulate experimental data
z[:, :] = np.random.normal(loc=0, scale=1, size=(10000, 10000))# Elixir: Load and analyze
{:ok, array} = ExZarr.open(path: "/data/experiment/raw")
# Process in Elixir
IO.puts "Dataset shape: #{inspect(array.shape)}"
IO.puts "Data type: #{array.dtype}"
IO.puts "Total elements: #{ExZarr.Array.size(array)}"
IO.puts "Memory per element: #{ExZarr.Array.itemsize(array)} bytes"
# Could process chunks in parallel using Flow or Broadway# Step 1: Data collection (Python)
import zarr
import numpy as np
data = zarr.open_array(
'/shared/pipeline/step1',
mode='w',
shape=(5000, 100),
chunks=(500, 100),
dtype='int32'
)
data[:, :] = collect_sensor_data()# Step 2: Data validation (Elixir)
{:ok, input} = ExZarr.open(path: "/shared/pipeline/step1")
# Validate and create cleaned output
{:ok, output} = ExZarr.create(
shape: input.shape,
chunks: input.chunks,
dtype: input.dtype,
compressor: :zlib,
storage: :filesystem,
path: "/shared/pipeline/step2"
)
# Process and save validated data
:ok = ExZarr.save(output, path: "/shared/pipeline/step2")# Step 3: Machine learning (Python)
import zarr
cleaned = zarr.open_array('/shared/pipeline/step2', mode='r')
# Train model on cleaned dataRun the included demo script:
elixir examples/python_interop_demo.exsThis demonstrates:
- Creating a 10×10 array with ExZarr
- Reading it with zarr-python
- Creating a 20×20 array with zarr-python
- Reading it with ExZarr
ExZarr includes comprehensive integration tests:
# Setup Python environment (one time)
./test/support/setup_python_tests.sh
# Run integration tests
mix test test/ex_zarr_python_integration_test.exsThe integration tests ensure:
-
Bidirectional Compatibility
- ExZarr → Python: Arrays created by ExZarr are readable by Python
- Python → ExZarr: Arrays created by Python are readable by ExZarr
-
Data Type Coverage
- All 10 data types tested in both directions
- 1D, 2D, and 3D arrays
- Various chunk sizes
-
Metadata Correctness
- Shape, chunks, dtype preserved
- Fill values maintained
- Compressor settings retained
-
Compression
- Zlib compression/decompression works across implementations
- No compression mode compatible
Create test arrays to verify compatibility:
# Create test array with ExZarr
{:ok, array} = ExZarr.create(
shape: {100, 100},
chunks: {10, 10},
dtype: :float64,
compressor: :zlib,
storage: :filesystem,
path: "/tmp/test_array"
)
:ok = ExZarr.save(array, path: "/tmp/test_array")# Verify with Python
import zarr
z = zarr.open_array('/tmp/test_array', mode='r')
print(f"Shape: {z.shape}")
print(f"Dtype: {z.dtype}")
print(f"Chunks: {z.chunks}")
assert z.shape == (100, 100)Symptom: Python or ExZarr cannot open an array created by the other
Solutions:
- Verify
.zarrayfile exists - Check file permissions
- Ensure Zarr v2 format (not v3)
- Validate JSON in
.zarrayis well-formed
Symptom: Data appears corrupted or has wrong type
Solutions:
- Ensure consistent byte order (little-endian is default)
- Check dtype string in
.zarraymatches expected format - Verify both implementations use Zarr v2 specification
Symptom: "Decompression failed" or "Unsupported codec"
Solutions:
- Use
:zlibor:nonefor maximum compatibility - Ensure zarr-python version is 2.x, not 3.x
- Check compressor configuration in
.zarray
Symptom: "Chunk not found" errors
Solutions:
- Verify chunk files use dot notation (e.g.,
0.0) - Check directory permissions
- Ensure chunks were actually written
- Confirm path is correct
-
Inspect Metadata
cat /path/to/array/.zarray | jq -
List Chunk Files
ls -la /path/to/array/
-
Verify with Python
import zarr z = zarr.open_array('/path/to/array', mode='r') print(z.info)
-
Check ExZarr Metadata
{:ok, array} = ExZarr.open(path: "/path/to/array") IO.inspect(array.metadata, pretty: true)
If you encounter compatibility issues:
- Verify you're using Zarr v2 (not v3)
- Check Python zarr version:
python3 -c "import zarr; print(zarr.__version__)" - Run integration tests:
mix test test/ex_zarr_python_integration_test.exs - Provide:
- ExZarr version
- Python zarr version
.zarrayfile contents- Error messages
- Minimal reproduction steps
For maximum compatibility, use zlib:
compressor: :zlib # Best compatibilityAdd attributes to groups for documentation:
# Python
import zarr
root = zarr.open_group('/data/experiment', mode='w')
root.attrs['description'] = 'Temperature measurements'
root.attrs['created'] = '2026-01-22'Choose chunk sizes that work well for both reading and writing:
# Good: Balanced chunks
chunks: {100, 100} # 10,000 elements per chunk
# Avoid: Too small (too many files)
chunks: {10, 10} # Only 100 elements per chunk
# Avoid: Too large (memory intensive)
chunks: {10000, 10000} # 100 million elements per chunkAlways test that arrays can be read by both implementations:
# Create with ExZarr
iex> {:ok, array} = ExZarr.create(...)
iex> :ok = ExZarr.save(array, path: "/test")
# Verify with Python
$ python3 -c "import zarr; z = zarr.open_array('/test', mode='r'); print(z.info)"ExZarr implements Zarr v2. Ensure Python uses v2 format:
# Ensure Zarr v2
import zarr
assert zarr.__version__.startswith('2.')- Zarr Specification v2
- zarr-python Documentation
- ExZarr Documentation
- Integration Tests
- Python Helper Script
ExZarr provides full Zarr v2 compatibility, enabling seamless data exchange with Python and other Zarr implementations. The integration tests verify this compatibility across all data types, compression methods, and array dimensions. By following the Zarr specification and best practices, you can build multi-language scientific computing pipelines with confidence.