Azure virtual machine extension readiness#107
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves CTF deployment reliability—especially for Azure release-mode readiness—by moving Azure setup into a VM Custom Script Extension and strengthening the reboot + post-reboot test coordination used by the CTF testing harness.
Changes:
- Switched Azure release setup from SSH-based readiness polling to an
azurerm_virtual_machine_extension-driven setup flow (Terraform now waits on extension completion). - Refactored reboot test coordination so state survives VM reboots and added an explicit
--post-rebootverification phase. - Updated docs and troubleshooting guidance, including Azure’s minimum Terraform version and relevant VM-side logs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
CONTRIBUTING.md |
Documents provider differences in setup readiness and contributor vs release testing flow. |
azure/README.md |
Updates Azure troubleshooting guidance and Terraform minimum version. |
azure/main.tf |
Implements Azure release setup via Custom Script Extension and adds setup-script robustness (cloud-init wait, apt retries). |
.github/skills/ctf-testing/test_ctf_challenges.sh |
Persists reboot-test state across reboots and adds --post-reboot mode. |
.github/skills/ctf-testing/deploy_and_test.sh |
Improves reboot/test orchestration and refactors test-script copy logic for reuse. |
| mkdir -p "${TEST_STATE_DIR}" | ||
| sort -u /var/ctf/completed_challenges 2>/dev/null | wc -l > "$PROGRESS_SNAPSHOT" | ||
| touch "$REBOOT_MARKER" | ||
| echo "Reboot marker created. Re-run after reboot to verify services." |
| # TEST EXECUTION | ||
| # ============================================================================= | ||
|
|
||
| # Copy test script to VM and execute it |
|
|
||
| Setup readiness differs by provider: | ||
|
|
||
| - Azure release mode uses VM Custom Script Extension, so Terraform waits for extension success or failure. |
| echo " Restarting Azure VM..." | ||
| echo " Restarting Azure VM..." >&2 | ||
| az vm restart --resource-group ctf-resources --name ctf-vm | ||
| # az vm restart waits by default, but add explicit wait for running state |
…s and reboot verification
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
.github/skills/ctf-testing/deploy_and_test.sh:441
- Same stdout contract issue as Azure:
gcloud compute instances resetmay write to stdout, which would pollute_reboot_vm's stdout return value and breaknew_ip=$(_reboot_vm ...). Redirect stdout so only the IP is emitted on stdout.
gcp)
echo " Restarting GCP VM..." >&2
local zone
zone=$(cd "${REPO_ROOT}/${provider}" \
&& terraform output -raw zone 2>/dev/null \
|| echo "us-central1-a")
gcloud compute instances reset ctf-instance --zone="${zone}" --quiet
# Wait for VM to be running
| _copy_test_script() { | ||
| local provider="$1" | ||
| local ip="$2" | ||
|
|
||
| _log INFO "Copying test script to VM..." | ||
| # shellcheck disable=SC2086 | ||
| _sshpass_cmd scp ${SSH_OPTS} "${TEST_SCRIPT}" "${SSH_USER}@${ip}:/tmp/test_ctf_challenges.sh" | ||
| } |
| azure) | ||
| echo " Restarting Azure VM..." | ||
| echo " Restarting Azure VM..." >&2 | ||
| az vm restart --resource-group ctf-resources --name ctf-vm | ||
| # az vm restart waits by default, but add explicit wait for running state | ||
| az vm wait \ | ||
| --resource-group ctf-resources \ | ||
| --name ctf-vm \ | ||
| --created \ | ||
| --timeout 120 2>/dev/null || true | ||
| ;; |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
.github/skills/ctf-testing/deploy_and_test.sh:441
_reboot_vmis documented to return the VM IP via stdout, but the GCP reboot path runsgcloud compute instances reset ...without redirecting its output. Ifgcloudprints status text to stdout, it will corrupt the capturednew_ipvalue (new_ip=$(_reboot_vm ...)) and break the post-reboot flow.
echo " Restarting GCP VM..." >&2
local zone
zone=$(cd "${REPO_ROOT}/${provider}" \
&& terraform output -raw zone 2>/dev/null \
|| echo "us-central1-a")
gcloud compute instances reset ctf-instance --zone="${zone}" --quiet
# Wait for VM to be running
| # Parse arguments | ||
| WITH_REBOOT=false | ||
| POST_REBOOT=false | ||
| for arg in "$@"; do | ||
| case $arg in | ||
| --with-reboot) | ||
| WITH_REBOOT=true | ||
| shift | ||
| ;; | ||
| --post-reboot) | ||
| POST_REBOOT=true | ||
| shift | ||
| ;; | ||
| esac | ||
| done |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
.github/skills/ctf-testing/deploy_and_test.sh:440
_reboot_vmis intended to return only the VM IP via stdout (logs go to stderr), but the GCP reboot path callsgcloud compute instances resetwithout redirecting its stdout. Ifgcloudprints anything, it will corrupt the captured IP (new_ip=$(...)). Redirect stdout to stderr (or /dev/null) to enforce the function contract consistently across providers.
zone=$(cd "${REPO_ROOT}/${provider}" \
&& terraform output -raw zone 2>/dev/null \
|| echo "us-central1-a")
gcloud compute instances reset ctf-instance --zone="${zone}" --quiet
| WITH_REBOOT=true | ||
| shift | ||
| ;; | ||
| --post-reboot) | ||
| POST_REBOOT=true | ||
| ;; |
…clarity and prevent conflicting flags
| azure_release_extension_script = <<-EOF | ||
| #!/bin/sh | ||
| exec /bin/bash <<'LINUX_CTFS_SETUP' | ||
| ${local.release_setup_script} | ||
| LINUX_CTFS_SETUP | ||
| EOF |
| echo " Restarting Azure VM..." >&2 | ||
| az vm restart --resource-group ctf-resources --name ctf-vm >&2 | ||
| ;; |
…ript formatting in main.tf
| ;; | ||
| gcp) | ||
| echo " Restarting GCP VM..." | ||
| echo " Restarting GCP VM..." >&2 |
…ilure in _reboot_vm function
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This pull request makes several improvements to the CTF deployment and testing scripts, with a focus on more robust VM reboot and test coordination, improved Azure release setup, and better logging and error handling. The changes also clarify documentation and update the minimum Terraform version for Azure.
Key changes include:
Test Script and Reboot Coordination Improvements
${HOME}/.linux-ctfs-test) to ensure they survive VM reboots, and added a--post-rebootflag for explicit post-reboot verification. The test script now checks for the reboot marker and cleans up state after verification. [1] [2] [3] [4] [5]_copy_test_scriptfunction, and ensured it is called before both initial and post-reboot test runs. [1] [2] [3]Azure Release Mode and Setup Reliability
apt-get updateon failure, and clarify error messages to reference the correct logs. [1] [2]Documentation and Version Updates