Skip to content

feat: Add GDS Design Spec#64

Open
shirly121 wants to merge 6 commits intoalibaba:mainfrom
shirly121:gds_spec
Open

feat: Add GDS Design Spec#64
shirly121 wants to merge 6 commits intoalibaba:mainfrom
shirly121:gds_spec

Conversation

@shirly121
Copy link
Copy Markdown
Collaborator

@shirly121 shirly121 commented Mar 16, 2026

Committed-by: Xiaoli Zhou from Dev container

What do these changes do?

as titled.

Related issue number

Fixes

Greptile Summary

This PR adds a new Graph Data Science (GDS) design specification (specs/004-gds/spec.md) that covers the full stack from product requirements (8 core graph algorithms), user-facing Cypher API (INSTALL/LOAD EXTENSION, project_graph, CALL algo), C++ implementation structures (ProjectedSubgraph, GDSGraph, GDSAlgo physical plan), developer extension API, and a prioritised roadmap.

Key issues found during review:

  • The "AI/GraphRAG 刚需算法" section header claims 3 algorithms but only 2 (Leiden and Label Propagation) are listed — the count needs to be corrected to (2 个) or a missing third algorithm should be added.
  • A first-person draft design note ("这里我需要对算法的表示提出修改。我认为...") was accidentally left inline in §1.4 and should be removed before publishing.
  • The platform support table in §2.2 is structurally invalid: 2 header columns but 3 separator columns, which will misrender in most Markdown viewers.
  • §1.4.3 states that shortest_path with weight_property: null is "equivalent to BFS", yet §1.4.5 defines bfs as a separate procedure with an additional max_depth parameter — the spec should clarify whether these are aliases or truly distinct procedures.

Confidence Score: 3/5

  • Documentation-only PR; safe to merge after fixing the content count error, removing the draft note, and correcting the malformed table.
  • No code changes are introduced — this is a spec document only, so there is no runtime risk. However, the spec contains a factual error (algorithm count mismatch), an editorial artefact (inline draft note), and a malformed Markdown table that would misrender for readers. These issues reduce the quality and trustworthiness of the spec as published documentation.
  • specs/004-gds/spec.md — algorithm count inconsistency, draft note, and malformed table all need to be fixed before this spec is used as a reference by implementers.

Important Files Changed

Filename Overview
specs/004-gds/spec.md New GDS design specification (956 lines) covering 8 graph algorithms, Project Subgraph syntax, Extension lifecycle, C++ implementation details, and a developer API. Contains a content count mismatch (AI/GraphRAG section claims 3 algorithms but lists 2), an unfinished draft design note left inline, a malformed Markdown table in the platform support section, and ambiguous semantics between bfs and shortest_path with no weight.

Sequence Diagram

sequenceDiagram
    participant User
    participant NeuG as NeuG (Cypher Engine)
    participant ExtReg as Extension Registry
    participant OSS as OSS / Local FS
    participant SubgraphCtx as Session Subgraph Context
    participant GDSAlgo as GDS Algorithm

    User->>NeuG: INSTALL EXTENSION 'gds'
    NeuG->>OSS: Download libjson.neug_extension for platform
    OSS-->>NeuG: .so file
    NeuG->>ExtReg: Register extension metadata

    User->>NeuG: LOAD EXTENSION 'gds'
    NeuG->>ExtReg: dlopen() → call Init()
    ExtReg-->>NeuG: Functions registered in catalog

    User->>NeuG: CALL project_graph('g', {Person:'true'}, {KNOWS:'true'})
    NeuG->>SubgraphCtx: Store ProjectedSubgraph (labels + predicates, no data copy)
    SubgraphCtx-->>NeuG: OK

    User->>NeuG: CALL k_core('g', {min_k:3}) YIELD node, core_number
    NeuG->>SubgraphCtx: Lookup ProjectedSubgraph by name 'g'
    SubgraphCtx-->>NeuG: VertexEntries + EdgeEntries
    NeuG->>NeuG: Compile to GDSAlgo physical plan (bind label_ids + Expression predicates)
    NeuG->>GDSAlgo: Execute with GDSGraph (scan full graph, apply predicates at runtime)
    GDSAlgo-->>NeuG: (node, core_number) tuples
    NeuG-->>User: Result set
Loading

Last reviewed commit: be1533e

Greptile also left 5 inline comments on this PR.

Committed-by: Xiaoli Zhou from Dev container
Committed-by: Xiaoli Zhou from Dev container
Comment on lines +36 to +40

### 1.3 图语义说明
NeuG 底层存储为**有向图**(CSR for outgoing, CSC for incoming)。算法层根据算法需求封装不同的语义:

| 图语义 | 实现方式 | 适用算法 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI/GraphRAG algorithm count mismatch

The section header reads "AI/GraphRAG 刚需算法(3 个)" but the table that follows only contains 2 algorithms: Leiden and Label Propagation. This contradicts both the section header and the overall claim of "8 个核心算法" in §1.2 (6 classic + 2 AI = 8, so the AI section count should be 2, not 3).

Either a third algorithm is missing from the table, or the count in the header should be corrected to (2 个).

Suggested change
### 1.3 图语义说明
NeuG 底层存储为**有向图**(CSR for outgoing, CSC for incoming)。算法层根据算法需求封装不同的语义:
| 图语义 | 实现方式 | 适用算法 |
#### AI/GraphRAG 刚需算法(2 个)
| 算法 | 图语义 | 描述 | 输出 | 并行化 |
| --- | --- | --- | --- | --- |
| **Leiden** | 无向 | 高质量社区发现(优于 Louvain) | `(node, community_id)` | 支持 |
| **Label Propagation** | 无向 | 基于标签传播的快速社区发现 | `(node, label)` | 支持 |


```cypher
CALL project_graph(
<GRAPH_NAME>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draft design note left in published spec

This line reads as an unfinished internal design note written in first-person ("这里我需要对算法的表示提出修改。我认为还是修改为..."), which is the author reasoning through a design decision mid-document. It is not appropriate for a published specification and should be removed before merging.

Suggested change
<GRAPH_NAME>,

Comment on lines +380 to +385

> **注意**:不支持 Windows 平台。
>

### 2.3 Extension 生命周期
```plain
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malformed platform support table

The table header declares only 2 columns (平台 | 架构) but the separator row has 3 column delimiters (|------|------|--------|), making this an invalid Markdown table. Most renderers will either misrender it or collapse the columns incorrectly.

Suggested change
> **注意**:不支持 Windows 平台。
>
### 2.3 Extension 生命周期
```plain
| 平台 | 架构 |
|------|------|
| Linux | x86_64 |
| Linux | aarch64 (ARM64) |
| macOS | arm64 (Apple Silicon) |
| macOS | x86_64 |

Comment on lines +178 to +188
| 列名 | 类型 | 描述 |
| --- | --- | --- |
| `node` | Any | 目标顶点标识符 |
| `distance` | Float | 从源到该节点的最短距离 |
| `path` | List | 最短路径经过的节点列表(可选) |


**Cypher 示例**:

```cypher
CALL project_graph('station_graph', {'Station': 'true'}, {'CONNECTED': 'true'});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BFS and weightless Shortest Path are conflated

In §1.4.3 the spec says weight_property: null makes shortest_path "等价于 BFS" (equivalent to BFS). However, §1.4.5 defines bfs as a fully independent procedure with its own max_depth parameter and different semantics (hop count vs. distance). This creates ambiguity:

  • Are bfs and shortest_path with weight_property: null truly interchangeable?
  • If so, is bfs just a convenience alias, or does it add capability (max_depth) not present in shortest_path?

The spec should explicitly clarify the relationship — for example, whether shortest_path(..., {weight_property: null}) also supports max_depth, or whether the two procedures remain distinct despite the note.

| 性能优化 | 算法级别优化 |


--- No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at end of file

The file is missing a trailing newline, as indicated by \ No newline at end of file in the diff. This is a POSIX requirement for text files and can cause issues with certain tools.

Suggested change
---
| 性能优化 | 算法级别优化 |

Committed-by: Xiaoli Zhou from Dev container
### 1.2 V1 算法列表
第一版支持 **8 个核心算法**,分为两类(BFS、LCC 等别名在详细说明中列出):

#### 经典图算法(6 个)
Copy link
Copy Markdown
Collaborator

@longbinlai longbinlai Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把 ldbc graphalytics 算法具体的对应也说明一下。

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

然后这个spec里面额外需要加入实现完成之后 需要和 竞品包括 kuzu,ladybug db,neo4j 等在 ldbc graphalytics 某个数据集上进行benchmark 比较,需要比竞品更优

| **PageRank** | 有向 | 计算节点的重要性分数 | `(node, rank)` | 支持 |
| **Shortest Path (Dijkstra)** | 有向 | 单源最短路径 | `(node, distance, path)` | 不支持 |
| **Connected Components** | 无向 | 弱连通分量检测(别名 WCC) | `(node, component_id)` | 支持 |
| **Breadth-First Search (BFS)** | 有向 | 从源点出发的广度优先遍历,按层扩展 | `(node, distance)` | 不支持 |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么 shortest path 和bfs的并行化是不支持?我们其实不需要假定这个shortest path 一定是 dijkstra 算法?理论上是选择性能优的算法。

Committed-by: Xiaoli Zhou from Dev container
@longbinlai longbinlai requested a review from luoxiaojian March 18, 2026 03:41

```cypher
-- 先投影子图,再执行算法
CALL project_graph('my_graph', {'Person': 'n.name <> "Ira"'}, {'KNOWS': 'r.id < 3'});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

边的 label,是不是需要是三元组?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

边的 label,是不是需要是三元组?

确实,已经修改为三元组定义:

image

Committed-by: Xiaoli Zhou from Dev container
Committed-by: Xiaoli Zhou from Dev container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants