Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 22 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ This directory holds the public documentation for the InQL project.
Use the docs tree like this:

- **Language reference:** current package/API contracts under [language/reference/][language-reference]
- **Language how-to guides:** task-oriented workflows under [language/how-to/][language-how-to]
- **Language explanation:** conceptual guidance and usage framing under [language/explanation/][language-explanation]
- **Architecture:** repository and system boundaries in [architecture.md][architecture]
- **RFCs:** design records and normative proposals in [rfcs/][rfcs]
Expand All @@ -19,9 +20,17 @@ Use the docs tree like this:
1. [Language overview][language-overview]
2. [Dataset carriers (Explanation)][dataset-explanation]
3. [Execution context (Explanation)][execution-explanation]
4. [Dataset carriers (Reference)][dataset-reference]
5. [Execution context (Reference)][execution-reference]
6. [Local inspection (Reference)][inspection-reference]
4. [Build deferred dataset transformations (How-to)][dataset-transformations-how-to]
5. [Normalize semi-structured fields (How-to)][normalize-semistructured-fields-how-to]
6. [Work with nested row values (How-to)][nested-row-values-how-to]
7. [Expand rows with generators (How-to)][generator-rows-how-to]
8. [Add window columns (How-to)][window-columns-how-to]
9. [Capture execution observations and adapter coverage (How-to)][execution-observations-how-to]
10. [Dataset carriers (Reference)][dataset-reference]
11. [Dataset methods (Reference)][dataset-methods-reference]
12. [Execution context (Reference)][execution-reference]
13. [Inspect a plan and lineage graph (How-to)][inspect-plan-lineage-how-to]
14. [Local inspection (Reference)][inspection-reference]

### Understand the system design

Expand All @@ -34,21 +43,30 @@ Use the docs tree like this:
1. [RFC index][rfcs-index]
2. [How to write an RFC][writing-rfcs]

> Note: When a standalone docs site is added, `docs/` remains the content root. The structure here should already follow the same content model used in Incan: reference, explanation, architecture/contributing, RFCs, and release notes.
> Note: When a standalone docs site is added, `docs/` remains the content root. The structure here should already follow the same content model used in Incan: reference, how-to guides, explanation, architecture/contributing, RFCs, and release notes.

<!-- References -->
[language-reference]: language/reference/
[language-how-to]: language/how-to/
[language-explanation]: language/explanation/
[architecture]: architecture.md
[rfcs]: rfcs/README.md
[whitepapers]: whitepapers/README.md
[release-notes]: release_notes/
[contributing]: contributing/
[language-overview]: language/README.md
[window-columns-how-to]: language/how-to/window_columns.md
[dataset-explanation]: language/explanation/dataset_carriers.md
[execution-explanation]: language/explanation/execution_context.md
[dataset-reference]: language/reference/dataset_carriers.md
[dataset-methods-reference]: language/reference/dataset_methods.md
[dataset-transformations-how-to]: language/how-to/dataset_transformations.md
[generator-rows-how-to]: language/how-to/generator_rows.md
[nested-row-values-how-to]: language/how-to/nested_row_values.md
[normalize-semistructured-fields-how-to]: language/how-to/normalize_semistructured_fields.md
[execution-reference]: language/reference/execution_context.md
[execution-observations-how-to]: language/how-to/execution_observations.md
[inspect-plan-lineage-how-to]: language/how-to/inspect_plan_lineage.md
[inspection-reference]: language/reference/inspection.md
[rfcs-index]: rfcs/README.md
[writing-rfcs]: contributing/writing_rfcs.md
27 changes: 27 additions & 0 deletions docs/language/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,35 @@
This section documents the current InQL package surface.

- Use [reference/][reference] for API shape, signatures, and current behavior contracts.
- Use [how-to/][how-to] for concrete task workflows.
- Use [explanation/][explanation] for mental models, usage framing, and tradeoffs.

## Current entry points

### Core carriers

- [Build deferred dataset transformations (How-to)][dataset-transformations-how-to]
- [Expand rows with generators (How-to)][generator-rows-how-to]
- [Normalize semi-structured fields (How-to)][normalize-semistructured-fields-how-to]
- [Work with nested row values (How-to)][nested-row-values-how-to]
- [Dataset carriers (Reference)][dataset-reference]
- [Dataset carriers (Explanation)][dataset-explanation]
- [Dataset methods (Reference)][dataset-methods-reference]
- [Query blocks (Reference)][query-blocks-reference]

### Execution and materialization

- [Capture execution observations and adapter coverage (How-to)][execution-observations-how-to]
- [Execution context (Reference)][execution-reference]
- [Execution context (Explanation)][execution-explanation]

### Analytical functions

- [Add window columns (How-to)][window-columns-how-to]
- [Estimate approximate metrics (How-to)][approximate-metrics-how-to]
- [Build typed HyperLogLog sketches (How-to)][typed-hll-sketches-how-to]
- [Inspect typed variant payloads (How-to)][variant-payloads-how-to]

### Substrait boundary

- [Substrait read-root and binding contract][substrait-read-root]
Expand All @@ -27,17 +41,30 @@ This section documents the current InQL package surface.

### Local evidence

- [Inspect a plan and lineage graph (How-to)][inspect-plan-lineage-how-to]
- [Local inspection][inspection-reference]

<!-- References -->
[reference]: reference/
[how-to]: how-to/
[explanation]: explanation/
[approximate-metrics-how-to]: how-to/approximate_metrics.md
[dataset-reference]: reference/dataset_carriers.md
[dataset-explanation]: explanation/dataset_carriers.md
[dataset-methods-reference]: reference/dataset_methods.md
[dataset-transformations-how-to]: how-to/dataset_transformations.md
[generator-rows-how-to]: how-to/generator_rows.md
[nested-row-values-how-to]: how-to/nested_row_values.md
[normalize-semistructured-fields-how-to]: how-to/normalize_semistructured_fields.md
[query-blocks-reference]: reference/query_blocks.md
[typed-hll-sketches-how-to]: how-to/typed_hll_sketches.md
[variant-payloads-how-to]: how-to/variant_payloads.md
[window-columns-how-to]: how-to/window_columns.md
[inspection-reference]: reference/inspection.md
[execution-reference]: reference/execution_context.md
[execution-explanation]: explanation/execution_context.md
[execution-observations-how-to]: how-to/execution_observations.md
[inspect-plan-lineage-how-to]: how-to/inspect_plan_lineage.md
[substrait-read-root]: reference/substrait/read_root_binding_contract.md
[substrait-conformance]: reference/substrait/conformance.md
[substrait-operator-catalog]: reference/substrait/operator_catalog.md
Expand Down
22 changes: 22 additions & 0 deletions docs/language/explanation/execution_context.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,26 @@ The ergonomic split is:

This keeps materialization convenient while leaving sink ownership explicit at the session boundary.

## Runtime evidence is separate from plan evidence

Plan inspection explains the relational work InQL has authored. Execution observations explain a concrete runtime attempt to run that work through a Session and backend adapter.

That split matters because the same plan can be attempted more than once, with different backends, bindings, diagnostics, timings, or trace IDs. The plan target remains the semantic anchor. The execution attempt target records what happened in one runtime lifecycle event.

Observed Session methods keep this separation explicit:

- `execute_observed(...)` records an execution checkpoint without local materialization.
- `collect_observed(...)` records a materialization attempt and can include row count evidence.
- `write_observed(...)` records a sink-write attempt.

The compact `execute(...)`, `collect(...)`, and `write(...)` methods still return `Result[...]` values for application code that does not need an evidence record.

## Adapter coverage is explicit evidence

Adapter coverage answers a different question from execution success. Execution success says the selected backend accepted and ran a plan attempt. Coverage says whether the selected adapter is known to provide a named capability or guarantee.

The current coverage API is deliberately explicit: callers pass `AdapterRequirement` records to `session.check_coverage(...)`. InQL does not yet infer all requirements from arbitrary plan shapes. Unknown coverage is therefore not a soft success; it means InQL does not have evidence that the adapter enforces that capability.

## Typical flow

```incan
Expand Down Expand Up @@ -112,3 +132,5 @@ The materialized carrier exposes structured collection metadata:
- preview text

For exact API shape, see [Execution context (Reference)](../reference/execution_context.md).

For a task-oriented workflow, see [Capture execution observations and adapter coverage](../how-to/execution_observations.md).
26 changes: 26 additions & 0 deletions docs/language/how-to/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# InQL language how-to guides

How-to guides show concrete task workflows for the current InQL package surface. They complement the reference docs, which define API shape and behavior contracts.

- [Add window columns][window-columns]
- [Build typed HyperLogLog sketches][typed-hll-sketches]
- [Capture execution observations and adapter coverage][execution-observations]
- [Build deferred dataset transformations][dataset-transformations]
- [Estimate approximate metrics][approximate-metrics]
- [Expand rows with generators][generator-rows]
- [Inspect a plan and lineage graph][inspect-plan-lineage]
- [Normalize semi-structured fields][normalize-semistructured-fields]
- [Inspect typed variant payloads][variant-payloads]
- [Work with nested row values][nested-row-values]

<!-- References -->
[approximate-metrics]: approximate_metrics.md
[dataset-transformations]: dataset_transformations.md
[execution-observations]: execution_observations.md
[generator-rows]: generator_rows.md
[inspect-plan-lineage]: inspect_plan_lineage.md
[nested-row-values]: nested_row_values.md
[normalize-semistructured-fields]: normalize_semistructured_fields.md
[typed-hll-sketches]: typed_hll_sketches.md
[variant-payloads]: variant_payloads.md
[window-columns]: window_columns.md
24 changes: 24 additions & 0 deletions docs/language/how-to/approximate_metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Estimate approximate metrics

This how-to shows how to opt in to approximate aggregate helpers when exact results are not required.

Use approximate helpers explicitly. InQL does not silently replace exact aggregates with approximate implementations because a backend can do so.

## Estimate distinct counts and percentiles

Group the relation normally, then use approximate aggregate measures inside `agg(...)`.

```incan
from pub::inql.functions import approx_count_distinct, approx_percentile, col

summary = (
events
.group_by([col("campaign_id")])
.agg([
approx_count_distinct(col("user_id")),
approx_percentile(col("latency_ms"), 0.95),
])
)
```

`approx_percentile(...)` accepts a percentile from `0.0` through `1.0` and an optional positive accuracy value. For exact helper contracts, see [Approximate functions](../reference/functions/approximate.md).
58 changes: 58 additions & 0 deletions docs/language/how-to/dataset_transformations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Build deferred dataset transformations

This how-to shows how to combine common carrier methods while keeping work deferred until a Session executes it.

## Add computed columns

Use `with_column(...)` to append a new computed column or replace an existing column by name.

```incan
from pub::inql import LazyFrame
from pub::inql.functions import add, col, mul
from models import Order

def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return (
orders
.with_column("amount_x2", mul(col("amount"), 2))
.with_column("amount_plus_one", add(col("amount"), 1))
)
```

## Filter, group, and aggregate

Use scalar helpers for row predicates and aggregate helpers for grouped measures.

```incan
from pub::inql import LazyFrame
from pub::inql.functions import avg, col, count, eq, sum
from models import Order

def paid_spend_by_customer(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return (
orders
.filter(eq(col("status"), "paid"))
.group_by([col("customer_id")])
.agg([
sum(col("amount")),
avg(col("amount")),
count(),
])
)
```

## Sort and limit

Use ordering helpers inside `order_by(...)`, then cap rows with `limit(...)`.

```incan
from pub::inql.functions import col, desc

top_orders = (
orders
.order_by([desc(col("amount"))])
.limit(10)
)
```

These transforms stay deferred for `LazyFrame[T]`. Use a `Session` to execute, collect, or write the result. For exact method signatures and schema behavior, see [Dataset methods (Reference)](../reference/dataset_methods.md).
102 changes: 102 additions & 0 deletions docs/language/how-to/execution_observations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Capture execution observations and adapter coverage

This how-to shows how to collect runtime evidence for a Session operation and how to ask the selected adapter whether it covers explicit requirements.

Use the observed Session methods when you need an auditable execution attempt record. Use `check_coverage(...)` when a tool, policy, or review step already knows which adapter capability needs to be checked.

## Collect with an observation

Use `collect_observed(...)` when you need materialized data and execution evidence from the same attempt.

```incan
from pub::inql import ExecutionObservationStatus, LazyFrame, Session
from models import Order

session = Session.default()
orders: LazyFrame[Order] = session.read_csv("orders", "orders.csv")?

observed = session.collect_observed(orders)

match observed.data:
Some(df) =>
println(df.preview_text())
println(f"rows={df.row_count()}")
None =>
println(observed.observation.diagnostics[0].message)

assert observed.observation.status == ExecutionObservationStatus.Success
```

The observed result always includes `observation`. On success, `data` contains the materialized `DataFrame[T]`. On failure, `data` is `None` and `error` contains the `SessionError`.

## Validate execution without materializing

Use `execute_observed(...)` when you want the same execution checkpoint as `execute(...)` but still need an observation record.

```incan
observed = session.execute_observed(orders)

match observed.error:
Some(err) => println(err.error_message())
None => println(observed.observation.observation_id)
```

`execute_observed(...)` returns the deferred `LazyFrame[T]` on success. It does not invent a row count because it does not materialize local rows.

## Write with an observation

Use `write_observed(...)` when the write itself is the operation you want to audit.

```incan
from pub::inql import csv_sink

write_attempt = session.write_observed(orders, csv_sink("target/orders.csv"))

match write_attempt.error:
Some(err) => println(err.error_message())
None => println(write_attempt.observation.observation_id)
```

The write result has no `data` field. The output artifact is the sink side effect; the returned value carries the observation and optional error.

## Check explicit adapter requirements

`check_coverage(...)` does not infer requirements from a plan yet. Build the requirements that matter to the policy or workflow, then ask the selected adapter for coverage records.

```incan
from pub::inql import (
AdapterCoverageState,
AdapterRequirement,
AdapterRequirementCapability,
AdapterRequirementGuarantee,
)

observed = session.collect_observed(orders)
requirement = AdapterRequirement(
requirement_id="orders-row-filter",
target=observed.observation.plan_target,
capability=AdapterRequirementCapability.RowFilter,
guarantee=AdapterRequirementGuarantee.Required,
reason="filtered order review requires adapter-side row filtering",
evidence_refs=[],
)

coverage = session.check_coverage([requirement])

match coverage[0].state:
AdapterCoverageState.Covered => println("covered")
AdapterCoverageState.PartiallyCovered => println(coverage[0].diagnostics[0].message)
AdapterCoverageState.Uncovered => println(coverage[0].diagnostics[0].message)
AdapterCoverageState.Unknown => println(coverage[0].diagnostics[0].message)
```

Treat `Unknown` as non-enforcing. It means InQL has not classified that adapter capability; it does not mean the adapter has proven support.

## Choose the right observed method

- Use `execute_observed(...)` for a validation/checkpoint boundary without local materialization.
- Use `collect_observed(...)` when a local `DataFrame[T]` and row count are part of the evidence you need.
- Use `write_observed(...)` when the sink write is the operation being audited.
- Use `check_coverage(...)` for explicit adapter requirements; do not use it as a plan-requirement discovery API.

For the complete field and enum reference, see [Execution context (Reference)](../reference/execution_context.md).
Loading
Loading