Skip to content

feat: add interactive node configuration for dedicated DataMate nodes#498

Merged
Dallas98 merged 7 commits into
mainfrom
develop/node-pool
Jun 3, 2026
Merged

feat: add interactive node configuration for dedicated DataMate nodes#498
Dallas98 merged 7 commits into
mainfrom
develop/node-pool

Conversation

@MoeexT
Copy link
Copy Markdown
Contributor

@MoeexT MoeexT commented Jun 3, 2026

  • Add node-setup.sh script for interactive node selection with keyboard navigation
  • Add node-cleanup.sh script to remove labels/taints during uninstall
  • Add global.nodeSelector and global.tolerations to values.yaml
  • Add nodeSelector/tolerations placeholders to all deployments:
    • Helm charts: backend, backend-python, database, frontend, gateway, runtime
    • Ray cluster: head and worker nodes
    • NPU/GPU worker groups
    • Raw K8s YAMLs: data-juicer, mineru-310, mineru-910
  • Add Makefile targets: node-setup, node-cleanup
  • Integrate node setup into datamate-k8s-install workflow

Features:

  • Interactive keyboard navigation (↑/↓ or j/k)
  • Automatic label application: node-role.kubernetes.io/datamate=true
  • Optional taint application: node-role.kubernetes.io/datamate=true:NoSchedule
  • Automatic Helm argument generation
  • Safe defaults for development (skip option)

Fixed terminal handling issue when script runs from Makefile by:

  • Detecting non-terminal environments
  • Using temp file for Helm args instead of stdout capture
  • Adding fallback read mode for Makefile context

Installation

image image image

Uninstallation

image

Label and Taint

image

close: #499

MoeexT added 7 commits June 2, 2026 19:43
- Add node-setup.sh script for interactive node selection with keyboard navigation
- Add node-cleanup.sh script to remove labels/taints during uninstall
- Add global.nodeSelector and global.tolerations to values.yaml
- Add nodeSelector/tolerations placeholders to all deployments:
  * Helm charts: backend, backend-python, database, frontend, gateway, runtime
  * Ray cluster: head and worker nodes
  * NPU/GPU worker groups
  * Raw K8s YAMLs: data-juicer, mineru-310, mineru-910
- Add Makefile targets: node-setup, node-cleanup
- Integrate node setup into datamate-k8s-install workflow

Features:
- Interactive keyboard navigation (↑/↓ or j/k)
- Automatic label application: node-role.kubernetes.io/datamate=true
- Optional taint application: node-role.kubernetes.io/datamate=true:NoSchedule
- Automatic Helm argument generation
- Safe defaults for development (skip option)

Fixed terminal handling issue when script runs from Makefile by:
- Detecting non-terminal environments
- Using temp file for Helm args instead of stdout capture
- Adding fallback read mode for Makefile context
The script was missing --namespace argument handling, causing
'Unknown option: --namespace' error during uninstallation.
In stty raw mode, Enter key produces \r (carriage return, \x0d)
instead of \n (newline, \x0a). Added conversion to make Enter
key detection work in interactive node selection.

The issue was that pressing Enter did nothing because the case
pattern only matched \x0a but raw mode sends \x0d.
Helm was interpreting 'true' as boolean instead of string, causing
Kubernetes validation errors:
- expected string, got &value.unstructured{Value:true}

Fixed by adding quotes around all values in --set arguments:
- nodeSelector values: "true"
- tolerations values: "true", "Equal", "NoSchedule"

This ensures Helm passes strings to Kubernetes, not boolean types.
Helm --set interprets 'true' as boolean, causing Kubernetes validation
errors for nodeSelector and tolerations value fields.

Changes:
- Changed all --set to --set-string (forces string type)
- Added dot escaping for nodeSelector keys (node-role\.kubernetes\.io)
- Kept toleration key values unescaped (only in value, not in path)

Tested with:
  helm template test deployment/helm/datamate/     --set-string backend-python.nodeSelector.node-role\.kubernetes\.io/datamate=true     --set-string backend-python.tolerations[0].value=true

Output shows correct string values:
  nodeSelector:
    node-role.kubernetes.io/datamate: "true"  # String, not boolean
  tolerations:
    value: "true"  # String, not boolean

This resolves the error:
  'expected string, got &value.valueUnstructured{Value:true}'
@Dallas98 Dallas98 merged commit eca3d0a into main Jun 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

支持DataMate Pod部署在指定节点

2 participants