Important
pyopenxlsx uses OpenXLSX-NX (v1.0.0+), a specialized C++ fork that includes critical performance optimizations and functional enhancements (such as agile encryption, streaming I/O, vector shapes, threaded comments, and custom properties) not currently available in the upstream repository.
pyopenxlsx is a high-performance Python binding for the OpenXLSX-NX C++ library. It aims to provide significantly faster read/write speeds compared to pure Python libraries like openpyxl, while maintaining a Pythonic API design.
- High Performance: Powered by the modern C++17 OpenXLSX-NX library.
- Pythonic API: Intuitive interface with properties, iterators, and context managers.
- Streaming I/O: Bypass the DOM entirely with
XLStreamWriterandXLStreamReaderfor memory-efficient bulk data processing. - Security: Full support for ECMA-376 Standard and Agile Encryption (read/write password-protected files) and granular worksheet protection.
- Async Support:
async/awaitsupport for key I/O operations. - Rich Styling: Comprehensive support for fonts, fills, borders, alignments, and number formats.
- Extended Metadata: Support for both standard and custom document properties.
- Advanced Content: Support for images, vector shapes, hyperlinks (external/internal), and modern threaded comments.
- Memory Safety: Combines C++ efficiency with Python's automatic memory management.
| Component | Technology |
|---|---|
| C++ Core | OpenXLSX-NX |
| Bindings | nanobind |
| Build System | scikit-build-core & CMake |
While openpyxl is a great pure-Python library, pyopenxlsx is designed to solve critical performance bottlenecks and add modern enterprise features by leveraging a C++ engine.
| Feature / Capability | pyopenxlsx (OpenXLSX-NX) |
openpyxl |
Notes |
|---|---|---|---|
| Underlying Engine | C++17 (nanobind wrapped) |
Pure Python | pyopenxlsx is heavily optimized for low-level memory management. |
| Execution Speed | Extremely Fast (Up to 160x) | Slower | Pure Python loop overhead makes parsing large files sluggish. |
| Memory Footprint | Minimal (C++ Memory Mapping) | High | Parsing large files in openpyxl often leads to OOM errors. |
| Asyncio Support | ✅ Native (await load_workbook_async) |
❌ No | pyopenxlsx offloads heavy I/O to a threadpool, perfect for Web APIs (FastAPI/Django). |
| Agile Encryption (Passwords) | ✅ Native Read & Write | ❌ No | openpyxl cannot read/write password-protected .xlsx files without 3rd-party decryption tools. |
| Threaded Comments | ✅ Full Support (Conversations/Replies) | ❌ No / Can be lost | pyopenxlsx supports modern Excel conversational comments and resolution states. |
| Vector Shapes | ✅ Native Support (20+ Shapes) | ❌ No | Draw complex vector shapes (Arrows, Flowcharts, etc.) directly. |
| Formula Evaluation | ✅ Built-in C++ Engine | ❌ No | pyopenxlsx can statically evaluate simple formulas without Excel installed. |
| Streaming I/O | ✅ Direct to disk with Styles | pyopenxlsx can stream styled data directly to the archive, bypassing the DOM. |
|
| Granular Sheet Protection | ✅ Deep Control (20+ specific flags) | ✅ Yes | pyopenxlsx exposes extensive ECMA-376 locking options. |
| Styles Architecture | ✅ Declarative (Index-based) | pyopenxlsx reuses style indices, saving massive amounts of memory on huge datasets. |
|
| Charts | ✅ Highly Advanced | openpyxl currently has more mature support for extremely complex/3D charts. |
|
| Environment | Pre-compiled Wheels required | Any Python env | pyopenxlsx provides wheels for major OS/Architectures via CI. |
# Using pip
pip install pyopenxlsx
# Using uv
uv pip install pyopenxlsx# Using uv
uv pip install .
# Or using pip
pip install .uv pip install -e .from pyopenxlsx import Workbook
# Create a new workbook
with Workbook() as wb:
ws = wb.active
ws.title = "MySheet"
# Write data
ws["A1"].value = "Hello"
ws["B1"].value = 42
ws.cell(row=2, column=1).value = 3.14
# Save
wb.save("example.xlsx")from pyopenxlsx import Workbook
with Workbook() as wb:
# Set custom document properties
wb.custom_properties["Author"] = "Curry Tang"
wb.custom_properties["Project"] = "PyOpenXLSX"
wb.save("props.xlsx")from pyopenxlsx import Workbook
with Workbook() as wb:
ws = wb.active
ws["A1"].value = "Google"
# External link
ws.add_hyperlink("A1", "https://www.google.com", tooltip="Search")
# Internal link to another sheet
ws2 = wb.create_sheet("Data")
ws["A2"].value = "See Data"
ws.add_internal_hyperlink("A2", "Data!A1")
wb.save("links.xlsx")from pyopenxlsx import load_workbook
wb = load_workbook("example.xlsx")
ws = wb["MySheet"]
print(ws["A1"].value) # Output: Hello
wb.close()pyopenxlsx provides async/await support for all I/O-intensive operations, ensuring your event loop remains responsive.
import asyncio
from pyopenxlsx import Workbook, load_workbook_async, Font
async def main():
# 1. Async context manager for automatic cleanup
async with Workbook() as wb:
ws = wb.active
ws["A1"].value = "Async Data"
# 2. Async stylesheet creation
style_idx = await wb.add_style_async(font=Font(bold=True))
ws["A1"].style_index = style_idx
# 3. Async worksheet operations
new_ws = await wb.create_sheet_async("AsyncSheet")
await new_ws.append_async(["Dynamic", "Row", 123])
# 4. Async range operations
await new_ws.range("A1:C1").clear_async()
# 5. Async save
await wb.save_async("async_example.xlsx")
# 6. Async load
async with await load_workbook_async("async_example.xlsx") as wb:
ws = wb.active
print(ws["A1"].value)
# 7. Async protection
await ws.protect_async(password="secret")
await ws.unprotect_async()
asyncio.run(main())from pyopenxlsx import Workbook, Font, Fill, Border, Side, Alignment
wb = Workbook()
ws = wb.active
# Define styles using hex colors (ARGB) or names
# Hex colors can be 6-digit (RRGGBB) or 8-digit (AARRGGBB)
font = Font(name="Arial", size=14, bold=True, color="FF0000") # Red
fill = Fill(pattern_type="solid", color="FFFF00") # Yellow
border = Border(
left=Side(style="thin", color="000000"),
right=Side(style="thin"),
top=Side(style="thick"),
bottom=Side(style="thin")
)
alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)
# Apply style
style_idx = wb.add_style(font=font, fill=fill, border=border, alignment=alignment)
ws["A1"].value = "Styled Cell"
ws["A1"].style_index = style_idx
wb.save("styles.xlsx")pyopenxlsx provides a robust, memory-safe Fluent Builder API for generating Data Pivot Tables directly from source data.
from pyopenxlsx import Workbook
from pyopenxlsx._openxlsx import XLPivotTableOptions, XLPivotSubtotal
with Workbook() as wb:
# 1. Write source data to a sheet
ws_data = wb.active
ws_data.name = "SalesData"
ws_data.write_row(1, ["Region", "Product", "Sales"])
ws_data.write_rows(2, [["North", "Apples", 100], ["South", "Bananas", 300]])
# 2. Create a separate sheet for the Pivot Table
ws_pivot = wb.create_sheet("PivotReport")
# 3. Configure options using the Fluent Builder API
options = XLPivotTableOptions("SalesPivot", "SalesData!A1:C3", "B3")
(options
.add_row_field("Region")
.add_column_field("Product")
.add_data_field("Sales", "Total Sales", XLPivotSubtotal.Sum)
.set_pivot_table_style("PivotStyleMedium14")
)
# 4. Add the pivot table
ws_pivot._sheet.add_pivot_table(options)
wb.save("pivot_demo.xlsx")For advanced configuration and Slicers, see the Pivot Tables API.
from pyopenxlsx import Workbook
wb = Workbook()
ws = wb.active
# 1. Insert image at A1, automatically maintaining aspect ratio
# Requires Pillow: pip install pillow
ws.add_image("logo.png", anchor="A1", width=200)
# 2. Or specify exact dimensions
ws.add_image("banner.jpg", anchor="B5", width=400, height=100)
# 3. Add Native Vector Shapes
ws.add_shape(
row=2, col=5, shape_type="Arrow",
name="MyArrow", text="Point!",
fill_color="FF0000", line_width=2.5,
rotation=90
)
wb.save("media.xlsx")from pyopenxlsx import Workbook
wb = Workbook()
ws = wb.active
# 1. Simple or multiline legacy comments
ws["A1"].comment = "Short comment"
# 2. Modern Threaded Comments (Conversations)
author_id = wb._doc.persons().add_person("Curry Tang")
threads = ws._sheet.threaded_comments()
root_comment = threads.add_comment("B2", author_id, "Please review this cell.")
threads.add_reply(root_comment.id(), author_id, "Fixed!")
wb.save("comments.xlsx")Highlight specific data using visual rules like color scales and data bars.
from pyopenxlsx import Workbook
from pyopenxlsx._openxlsx import XLColorScaleRule, XLDataBarRule, XLColor
wb = Workbook()
ws = wb.active
ws.write_rows(1, [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# 1. Color Scale Rule (Red to Green)
scale_rule = XLColorScaleRule(XLColor(255, 0, 0), XLColor(0, 255, 0))
ws.add_conditional_formatting("A1:C1", scale_rule)
# 2. Data Bar Rule (Blue bars)
bar_rule = XLDataBarRule(XLColor(0, 0, 255), show_value=True)
ws.add_conditional_formatting("A2:C2", bar_rule)
wb.save("conditional_formatting.xlsx")For writing massive datasets without consuming memory for Python objects, use the direct stream writer.
from pyopenxlsx import Workbook
with Workbook() as wb:
ws = wb.active
# Open a direct XML stream writer
writer = ws.stream_writer()
writer.append_row(["ID", "Timestamp", "Value"])
for i in range(1_000_000):
# Writes directly to disk/archive; highly memory efficient
writer.append_row([i, "2023-01-01", 99.9])
writer.close()
wb.save("massive_data.xlsx")The full API documentation has been split into individual modules for easier reading. Please refer to the docs/ directory:
- Workbook API
- Worksheet API
- Cell & Range API
- Styles API
- Data Validation API
- Tables (ListObjects) API
- Pivot Tables API
- Rich Text API
- Async Operations API
- Conditional Formatting API
- Streams I/O API
- Charts API
- Page Setup & Printing API
- Images & Shapes API
- Formula Engine API
- Comments & Threaded Comments API
- Encryption & Protection API
- Pandas Integration API
pyopenxlsx is built for speed. By leveraging the C++ OpenXLSX-NX engine and providing optimized bulk operations, it significantly outperforms pure-Python alternatives.
Note: The following benchmarks were recorded on an Apple Silicon (arm64) M-series processor, comparing
pyopenxlsxv1.3.1 againstopenpyxl.
| Scenario | pyopenxlsx | openpyxl | Speedup |
|---|---|---|---|
| Load File (20,000 cells) | ~2.5ms | ~169.0ms | 67x |
| Single Read (1 cell in large doc) | ~4.4ms | ~181.7ms | 41x |
| Bulk Read / Iterate (20,000 cells) | ~10.0ms | ~136.3ms* | 13.6x |
| Write Small (1,000 cells) | ~3.5ms | ~8.0ms | 2.2x |
| Write Large (50,000 cells) | ~95.1ms | ~316.9ms | 3.3x |
| Bulk Write Large (50,000 cells, numpy/range) | ~17.4ms | N/A | 18.2x |
| Extreme Write (1,000,000 cells) | ~567ms | ~6,172ms | 10.8x |
| Bulk Write Extreme (1,000,000 cells, numpy) | ~330ms | N/A | 18.7x |
* openpyxl bulk read timed using values_only=True.
| Library | Execution Time | Memory Delta | CPU Load |
|---|---|---|---|
| pyopenxlsx (bulk write) | ~0.33s | ~200 MB | ~99% |
| openpyxl | ~6.17s | ~600 MB* | ~99% |
Note
*Memory delta for openpyxl can be misleading due to Python's garbage collection timing during the benchmark. However, pyopenxlsx consistently shows lower memory pressure for bulk operations as data is handled primarily in C++.
- C++ Foundation: Core operations happen in highly optimized C++. Recent updates eliminated
shared_ptrheap allocations and deep copies for zero-allocation performance during high-throughput tasks. - Reduced Object Overhead:
pyopenxlsxminimizes the creation of many PythonCellobjects during bulk operations. - Efficient Memory Mapping: Leverages the memory-efficient design of OpenXLSX-NX.
- Asynchronous I/O: Key operations are available as non-blocking coroutines to maximize throughput in concurrent applications.
# Run all tests
uv run pytest
# With coverage
uv run pytest --cov=src/pyopenxlsx --cov-report=term-missingBSD 3-Clause License. The underlying OpenXLSX-NX library is licensed under the MIT License, and nanobind under a BSD-style license.