Skip to content

feat: Support COPY FROM without prior DDL#134

Open
shirly121 wants to merge 14 commits intoalibaba:mainfrom
shirly121:load_no_schema
Open

feat: Support COPY FROM without prior DDL#134
shirly121 wants to merge 14 commits intoalibaba:mainfrom
shirly121:load_no_schema

Conversation

@shirly121
Copy link
Copy Markdown
Collaborator

What do these changes do?

Related issue number

Fixes #120

Resolve vertex and edge NameOrId against the live schema in
BatchInsertVertexOpr and BatchInsertEdgeOpr Eval instead of fixing
IDs at plan build time, so COPY can run when labels are resolved
later.

Extend bind_copy_from, gopt conversion, and plan_copy plumbing to
carry physical EdgeType / NameOrId through the pipeline. Add Python
tests for import/export paths that exercise the behavior.

Committed-by: Xiaoli Zhou from Dev container
Committed-by: Xiaoli Zhou from Dev container
…ring

- import_data.md: document auto_detect, node/edge inference rules,
  import order, limitations; anchor from intro; extend troubleshooting.
- spec.md: link Module 5 to import_data.md; fix P2 module/FM numbering
  (M6–M13, FR-601–1305) and dependency blurbs to match Priority Overview.

Committed-by: Xiaoli Zhou from Dev container
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Support COPY FROM without prior DDL via deferred label resolution

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Defer vertex/edge label resolution to execution time for schema-less COPY
• Support COPY FROM without prior DDL via auto-detect and schema inference
• Add DDLVertexInfo/DDLEdgeInfo structures for inferred table metadata
• Extend physical plan with NameOrId for deferred label resolution
• Add comprehensive Python tests for no-schema import/export workflows
Diagram
flowchart LR
  A["COPY Statement<br/>No Schema"] -->|Binder| B["DDLVertexInfo/<br/>DDLEdgeInfo"]
  B -->|Planner| C["LogicalCreateTable<br/>+ LogicalCopyFrom"]
  C -->|GOPT| D["CreateVertexSchema/<br/>CreateEdgeSchema<br/>+ BatchInsert*"]
  D -->|Execution| E["Resolve NameOrId<br/>at Runtime<br/>+ Insert Data"]
Loading

Grey Divider

File Changes

1. src/compiler/binder/bind/copy/bind_copy_from.cpp ✨ Enhancement +265/-6

Implement DDL info structures and no-schema COPY binding

src/compiler/binder/bind/copy/bind_copy_from.cpp


2. src/compiler/gopt/g_query_converter.cpp ✨ Enhancement +98/-13

Convert inferred DDL and deferred edge/vertex type resolution

src/compiler/gopt/g_query_converter.cpp


3. src/execution/execute/ops/batch/batch_insert_edge.cc ✨ Enhancement +93/-83

Defer edge label resolution to execution via NameOrId

src/execution/execute/ops/batch/batch_insert_edge.cc


View more (11)
4. src/execution/execute/ops/batch/batch_insert_vertex.cc ✨ Enhancement +39/-28

Defer vertex label resolution to execution via NameOrId

src/execution/execute/ops/batch/batch_insert_vertex.cc


5. src/compiler/planner/plan/plan_copy.cpp ✨ Enhancement +33/-2

Insert DDL operators before COPY in logical plan

src/compiler/planner/plan/plan_copy.cpp


6. src/compiler/planner/gopt_planner.cc Miscellaneous +4/-1

Add logging for logical and physical plan debugging

src/compiler/planner/gopt_planner.cc


7. src/compiler/binder/bind/bind_file_scan.cpp 🐞 Bug fix +1/-1

Allow empty column names for schema-less scan binding

src/compiler/binder/bind/bind_file_scan.cpp


8. src/compiler/planner/plan/append_table_function_call.cpp ✨ Enhancement +4/-0

Connect table function to existing plan operators

src/compiler/planner/plan/append_table_function_call.cpp


9. include/neug/compiler/binder/copy/bound_copy_from.h ✨ Enhancement +72/-7

Add DDLTableInfo, DDLVertexInfo, DDLEdgeInfo header definitions

include/neug/compiler/binder/copy/bound_copy_from.h


10. include/neug/compiler/binder/binder.h ✨ Enhancement +8/-0

Add bindCopyNodeFromNoSchema and bindCopyRelFromNoSchema declarations

include/neug/compiler/binder/binder.h


11. include/neug/compiler/gopt/g_query_converter.h ✨ Enhancement +2/-2

Update convertBatchInsert* signatures to accept BoundCopyFromInfo

include/neug/compiler/gopt/g_query_converter.h


12. specs/001-ap-temp-graph/spec.md 📝 Documentation +287/-61

Document Module 5 COPY no-schema, renumber M5-M12 to M6-M13

specs/001-ap-temp-graph/spec.md


13. doc/source/data_io/import_data.md 📝 Documentation +87/-8

Add user guide for COPY FROM without predefined schema

doc/source/data_io/import_data.md


14. tools/python_bind/tests/test_db_import_export.py 🧪 Tests +490/-0

Add 18 comprehensive tests for no-schema COPY workflows

tools/python_bind/tests/test_db_import_export.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Mar 26, 2026

Code Review by Qodo

🐞 Bugs (0)   📘 Rule violations (0)   📎 Requirement gaps (1)   🎨 UX Issues (0)
📎\ ≡ Correctness (1)

Grey Divider


Action required

1. BoundCopyFromInfo lacks ddl_required 📎
Description
BoundCopyFromInfo does not include the required ddl_required flag, so downstream
planning/execution cannot explicitly determine when schema-creating DDL must be prepended for
inferred COPY. This violates the binder contract required for the no-schema COPY flow.
Code

include/neug/compiler/binder/copy/bound_copy_from.h[R95-97]

struct NEUG_API BoundCopyFromInfo {
// Table entry to copy into.
catalog::TableCatalogEntry* tableEntry;
Evidence
PR Compliance ID 4 requires BoundCopyFromInfo to include a ddl_required flag. The updated
BoundCopyFromInfo adds ddlTableInfo but still has no ddl_required field, and the planner keys
off ddlTableInfo presence instead of an explicit flag.

Binder: BoundCopyFromInfo includes ddl_required and populates catalog entry from inference when needed
include/neug/compiler/binder/copy/bound_copy_from.h[95-131]
src/compiler/planner/plan/plan_copy.cpp[71-124]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`BoundCopyFromInfo` is required to expose an explicit `ddl_required` flag for the no-schema COPY flow, but the current implementation only adds `ddlTableInfo` and uses its presence as an implicit signal.
## Issue Context
Compliance requires `ddl_required` to be present and set appropriately so planner/execution can reliably decide when to prepend Create*Schema before BatchInsert*.
## Fix Focus Areas
- include/neug/compiler/binder/copy/bound_copy_from.h[95-156]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[168-186]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[293-313]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[382-477]
- src/compiler/planner/plan/plan_copy.cpp[71-155]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Vector init won't compile🐞
Description
Binder::bindCopyRelFrom and Binder::bindCopyRelFromNoSchema use brace-init to construct
evaluateTypes, which attempts to convert columns.size() into common::ColumnEvaluateType (an
enum class) and fails compilation.
Code

src/compiler/binder/bind/copy/bind_copy_from.cpp[R354-356]

+  expression_vector columns = boundSource->getColumns();
std::vector<ColumnEvaluateType> evaluateTypes{columns.size(),
                                      ColumnEvaluateType::REFERENCE};
Evidence
The code uses `std::vector<ColumnEvaluateType> evaluateTypes{columns.size(),
ColumnEvaluateType::REFERENCE};`, which is interpreted as an initializer-list of elements, not the
(count, value) fill constructor. Because ColumnEvaluateType is an enum class, columns.size()
cannot implicitly convert to it, causing a compile error. The same pattern appears in the no-schema
rel binder path as well.

src/compiler/binder/bind/copy/bind_copy_from.cpp[344-362]
src/compiler/binder/bind/copy/bind_copy_from.cpp[449-457]
include/neug/compiler/common/enums/column_evaluate_type.h[27-35]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`evaluateTypes` is constructed with brace-initialization, which tries to build a 2-element initializer-list (and fails because `columns.size()` cannot convert to `enum class ColumnEvaluateType`). This is a hard compile error.
### Issue Context
This occurs in both the existing-schema and no-schema relationship COPY binders.
### Fix Focus Areas
- src/compiler/binder/bind/copy/bind_copy_from.cpp[354-356]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[455-456]
### Proposed fix
Replace brace-init with the fill constructor:
- `std::vector<ColumnEvaluateType> evaluateTypes(columns.size(), ColumnEvaluateType::REFERENCE);`
Apply to both occurrences.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Unconditional INFO debug logs🐞
Description
Several new LOG(INFO) statements print full logical/physical plans and other debug strings
unconditionally, increasing log volume and runtime overhead in normal use.
Code

src/compiler/planner/gopt_planner.cc[R47-48]

+    LOG(INFO) << "logical plan: " << statement->logicalPlan->toString();
+
Evidence
GOptPlanner::compilePlan now logs the full logical plan and protobuf DebugString physical plan at
INFO level on every compilation; these strings can be very large. Additional INFO logs were also
added in the binder no-schema path and COPY conversion path, suggesting leftover debug
instrumentation rather than intentional user-facing logging.

src/compiler/planner/gopt_planner.cc[44-64]
src/compiler/binder/bind/copy/bind_copy_from.cpp[399-405]
src/compiler/gopt/g_query_converter.cpp[1344-1353]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New logs emit verbose plan/debug output at INFO level unconditionally, which will spam logs and add overhead.
### Issue Context
These look like development-time traces (plan strings / DebugString) and are not guarded.
### Fix Focus Areas
- src/compiler/planner/gopt_planner.cc[44-64]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[399-405]
- src/compiler/gopt/g_query_converter.cpp[1344-1353]
### Proposed fix
- Convert `LOG(INFO)` to `VLOG(n)` (or guard behind a config/flag) for logical/physical plan dumps.
- Remove the binder/query-converter INFO logs, or convert them to `VLOG` with an appropriate verbosity.
- If any of these logs are meant to remain at INFO, shorten them (e.g., log only operator counts / high-level operator names rather than full `toString()` / `DebugString()`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@shirly121 shirly121 requested a review from longbinlai March 26, 2026 07:17
Comment on lines 95 to 97
struct NEUG_API BoundCopyFromInfo {
// Table entry to copy into.
catalog::TableCatalogEntry* tableEntry;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. boundcopyfrominfo lacks ddl_required 📎 Requirement gap ✓ Correctness

BoundCopyFromInfo does not include the required ddl_required flag, so downstream
planning/execution cannot explicitly determine when schema-creating DDL must be prepended for
inferred COPY. This violates the binder contract required for the no-schema COPY flow.
Agent Prompt
## Issue description
`BoundCopyFromInfo` is required to expose an explicit `ddl_required` flag for the no-schema COPY flow, but the current implementation only adds `ddlTableInfo` and uses its presence as an implicit signal.

## Issue Context
Compliance requires `ddl_required` to be present and set appropriately so planner/execution can reliably decide when to prepend Create*Schema before BatchInsert*.

## Fix Focus Areas
- include/neug/compiler/binder/copy/bound_copy_from.h[95-156]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[168-186]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[293-313]
- src/compiler/binder/bind/copy/bind_copy_from.cpp[382-477]
- src/compiler/planner/plan/plan_copy.cpp[71-155]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

liulx20
liulx20 previously approved these changes Mar 26, 2026
…ogging

- Change brace-init {} to parentheses () for std::vector<ColumnEvaluateType>
  in bindCopyRelFrom and bindCopyRelFromNoSchema to avoid compile error
  (brace-init selects initializer_list ctor, size_t cannot convert to enum class).
- Remove leftover LOG(INFO) debug statements in gopt_planner.cc and
  g_query_converter.cpp; restore VLOG(10) for physical plan output.
- Remove commented-out code in DDLVertexInfo constructor.
COPY LegacyUser FROM "users.csv" (header=true, auto_detect=false); -- require table to exist
```

Parser option names are matched case-insensitively; `AUTO_DETECT` is accepted as well.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个应该统一说一下就好。不需要在这里单独提吧

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改好了,63ebe28


```cypher
COPY LegacyUser FROM "users.csv" (header=true, auto_detect=true);
COPY LegacyUser FROM "users.csv" (header=true, auto_detect=false); -- require table to exist
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥你这个例子要写成 LegacyUser ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改好了,63ebe28

示例:

```cypher
COPY person FROM 'person.csv' (auto_detect=true);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在 这个 sniff 只支持csv吗?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csv/json/parquet都支持,我再加一些测试和文档

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加好了,2ba626f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] COPY FROM: auto-detect schema without prior DDL (spec 001 Module 13)

4 participants