|
6 | 6 |
|
7 | 7 | ## Introduction |
8 | 8 |
|
9 | | -PLAID (Physics-Learning AI Datamodel) is a flexible and extensible framework for representing and sharing datasets of physics simulations. PLAID defines a unified standard for describing simulation data and is accompanied by a library for creating, reading, and manipulating complex datasets across a wide range of physical use cases. |
10 | | -The data model and library have initially been developped at SafranTech, the research center of [Safran group](https://www.safran-group.com/). |
| 9 | +### PLAID (Physics-Learning AI Datamodel): The Missing Layer for Scientific ML |
| 10 | + |
| 11 | +Keep your simulation data intact, query it intuitively, and transform it seamlessly for deep learning. |
| 12 | + |
| 13 | +PLAID is an open framework that makes it easy to represent and share datasets from complex physics simulations. It introduces a common standard for describing simulation data and comes with a library to create, explore, and manipulate complex datasets of physics similations. PLAID was first developed at SafranTech, the research and innovation center of [Safran Group](https://www.safran-group.com/). |
| 14 | + |
| 15 | + |
| 16 | +### Why Another Data Model? |
| 17 | + |
| 18 | +In machine learning, datasets are often treated as flat tables, sequences, or images. Standard frameworks — Hugging Face, PyTorch, TensorFlow — assume your data is already regular, homogeneous, and columnar. But in scientific and industrial applications, this assumption rarely holds: |
| 19 | + |
| 20 | +- Simulations produce hierarchical, multi-zone data. |
| 21 | +- Fields have heterogeneous shapes, types, and metadata. |
| 22 | +- Implicit conventions vary from one simulation to another. |
| 23 | + |
| 24 | +Traditional ML datasets are not designed to handle this complexity efficiently. Flattening, padding, or converting these structures into a standard tabular format can be error-prone, memory-intensive, and slow, and it often destroys critical information about the underlying physical structure. |
| 25 | + |
| 26 | +PLAID fills this gap by sitting upstream in the ML pipeline, bridging raw scientific data and ML-ready formats: |
| 27 | + |
| 28 | +1. Capture the full structure: PLAID preserves hierarchical, multi-field, multi-zone data, including metadata, defaults, and units. |
| 29 | +2. Simplify access: Intuitive APIs let you query fields, arrays, and derived quantities without flattening or rewriting your trees. |
| 30 | +3. Prepare for ML: When needed, PLAID can produce ML-ready datasets (PyTorch, PyG, or Hugging Face style), while keeping memory and computation efficient. |
| 31 | + |
| 32 | +In short: PLAID is not “just another dataset format.” It is a scientific data management layer, designed for the complex, heterogeneous, high-dimensional world of physics-based simulations, where preparing your data for ML is as important as the model itself. |
11 | 33 |
|
12 | 34 | ## Open source |
13 | 35 |
|
|
0 commit comments