Skip to content

Commit 2d5d0f6

Browse files
committed
improve intro
1 parent 7fce3e3 commit 2d5d0f6

File tree

1 file changed

+24
-2
lines changed

1 file changed

+24
-2
lines changed

docs/index.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,30 @@
66

77
## Introduction
88

9-
PLAID (Physics-Learning AI Datamodel) is a flexible and extensible framework for representing and sharing datasets of physics simulations. PLAID defines a unified standard for describing simulation data and is accompanied by a library for creating, reading, and manipulating complex datasets across a wide range of physical use cases.
10-
The data model and library have initially been developped at SafranTech, the research center of [Safran group](https://www.safran-group.com/).
9+
### PLAID (Physics-Learning AI Datamodel): The Missing Layer for Scientific ML
10+
11+
Keep your simulation data intact, query it intuitively, and transform it seamlessly for deep learning.
12+
13+
PLAID is an open framework that makes it easy to represent and share datasets from complex physics simulations. It introduces a common standard for describing simulation data and comes with a library to create, explore, and manipulate complex datasets of physics similations. PLAID was first developed at SafranTech, the research and innovation center of [Safran Group](https://www.safran-group.com/).
14+
15+
16+
### Why Another Data Model?
17+
18+
In machine learning, datasets are often treated as flat tables, sequences, or images. Standard frameworks — Hugging Face, PyTorch, TensorFlow — assume your data is already regular, homogeneous, and columnar. But in scientific and industrial applications, this assumption rarely holds:
19+
20+
- Simulations produce hierarchical, multi-zone data.
21+
- Fields have heterogeneous shapes, types, and metadata.
22+
- Implicit conventions vary from one simulation to another.
23+
24+
Traditional ML datasets are not designed to handle this complexity efficiently. Flattening, padding, or converting these structures into a standard tabular format can be error-prone, memory-intensive, and slow, and it often destroys critical information about the underlying physical structure.
25+
26+
PLAID fills this gap by sitting upstream in the ML pipeline, bridging raw scientific data and ML-ready formats:
27+
28+
1. Capture the full structure: PLAID preserves hierarchical, multi-field, multi-zone data, including metadata, defaults, and units.
29+
2. Simplify access: Intuitive APIs let you query fields, arrays, and derived quantities without flattening or rewriting your trees.
30+
3. Prepare for ML: When needed, PLAID can produce ML-ready datasets (PyTorch, PyG, or Hugging Face style), while keeping memory and computation efficient.
31+
32+
In short: PLAID is not “just another dataset format.” It is a scientific data management layer, designed for the complex, heterogeneous, high-dimensional world of physics-based simulations, where preparing your data for ML is as important as the model itself.
1133

1234
## Open source
1335

0 commit comments

Comments
 (0)