Skip to content

Azure virtual machine extension readiness#107

Open
rishabkumar7 wants to merge 10 commits into
mainfrom
azure-virtual-machine-extension-readiness
Open

Azure virtual machine extension readiness#107
rishabkumar7 wants to merge 10 commits into
mainfrom
azure-virtual-machine-extension-readiness

Conversation

@rishabkumar7
Copy link
Copy Markdown
Collaborator

This pull request makes several improvements to the CTF deployment and testing scripts, with a focus on more robust VM reboot and test coordination, improved Azure release setup, and better logging and error handling. The changes also clarify documentation and update the minimum Terraform version for Azure.

Key changes include:

Test Script and Reboot Coordination Improvements

  • Refactored test script state files to use a persistent directory (${HOME}/.linux-ctfs-test) to ensure they survive VM reboots, and added a --post-reboot flag for explicit post-reboot verification. The test script now checks for the reboot marker and cleans up state after verification. [1] [2] [3] [4] [5]
  • Extracted the logic for copying the test script to the VM into a new _copy_test_script function, and ensured it is called before both initial and post-reboot test runs. [1] [2] [3]
  • Improved error handling and logging during VM reboot and post-reboot testing, including explicit checks for SSH availability and reboot success. [1] [2] [3] [4]

Azure Release Mode and Setup Reliability

  • Migrated Azure release setup to use the VM Custom Script Extension rather than SSH-based provisioning, improving reliability and surfacing errors more clearly in Azure logs. The readiness wait is now handled by Terraform waiting for the extension to complete. [1] [2] [3]
  • Improved the Azure setup script to wait for cloud-init, retry apt-get update on failure, and clarify error messages to reference the correct logs. [1] [2]

Documentation and Version Updates

  • Updated the minimum required Terraform version for Azure to v1.14.0 and clarified troubleshooting steps, including where to find logs if the setup fails.
  • Updated contributor documentation to explain the differences in setup readiness and test coordination between Azure, AWS, GCP, and contributor (local) mode.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves CTF deployment reliability—especially for Azure release-mode readiness—by moving Azure setup into a VM Custom Script Extension and strengthening the reboot + post-reboot test coordination used by the CTF testing harness.

Changes:

  • Switched Azure release setup from SSH-based readiness polling to an azurerm_virtual_machine_extension-driven setup flow (Terraform now waits on extension completion).
  • Refactored reboot test coordination so state survives VM reboots and added an explicit --post-reboot verification phase.
  • Updated docs and troubleshooting guidance, including Azure’s minimum Terraform version and relevant VM-side logs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
CONTRIBUTING.md Documents provider differences in setup readiness and contributor vs release testing flow.
azure/README.md Updates Azure troubleshooting guidance and Terraform minimum version.
azure/main.tf Implements Azure release setup via Custom Script Extension and adds setup-script robustness (cloud-init wait, apt retries).
.github/skills/ctf-testing/test_ctf_challenges.sh Persists reboot-test state across reboots and adds --post-reboot mode.
.github/skills/ctf-testing/deploy_and_test.sh Improves reboot/test orchestration and refactors test-script copy logic for reuse.

mkdir -p "${TEST_STATE_DIR}"
sort -u /var/ctf/completed_challenges 2>/dev/null | wc -l > "$PROGRESS_SNAPSHOT"
touch "$REBOOT_MARKER"
echo "Reboot marker created. Re-run after reboot to verify services."
# TEST EXECUTION
# =============================================================================

# Copy test script to VM and execute it
Comment thread CONTRIBUTING.md Outdated

Setup readiness differs by provider:

- Azure release mode uses VM Custom Script Extension, so Terraform waits for extension success or failure.
echo " Restarting Azure VM..."
echo " Restarting Azure VM..." >&2
az vm restart --resource-group ctf-resources --name ctf-vm
# az vm restart waits by default, but add explicit wait for running state
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

.github/skills/ctf-testing/deploy_and_test.sh:441

  • Same stdout contract issue as Azure: gcloud compute instances reset may write to stdout, which would pollute _reboot_vm's stdout return value and break new_ip=$(_reboot_vm ...). Redirect stdout so only the IP is emitted on stdout.
        gcp)
            echo "  Restarting GCP VM..." >&2
            local zone
            zone=$(cd "${REPO_ROOT}/${provider}" \
                && terraform output -raw zone 2>/dev/null \
                || echo "us-central1-a")
            gcloud compute instances reset ctf-instance --zone="${zone}" --quiet
            # Wait for VM to be running

Comment on lines +469 to +476
_copy_test_script() {
local provider="$1"
local ip="$2"

_log INFO "Copying test script to VM..."
# shellcheck disable=SC2086
_sshpass_cmd scp ${SSH_OPTS} "${TEST_SCRIPT}" "${SSH_USER}@${ip}:/tmp/test_ctf_challenges.sh"
}
Comment on lines 430 to 433
azure)
echo " Restarting Azure VM..."
echo " Restarting Azure VM..." >&2
az vm restart --resource-group ctf-resources --name ctf-vm
# az vm restart waits by default, but add explicit wait for running state
az vm wait \
--resource-group ctf-resources \
--name ctf-vm \
--created \
--timeout 120 2>/dev/null || true
;;
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

.github/skills/ctf-testing/deploy_and_test.sh:441

  • _reboot_vm is documented to return the VM IP via stdout, but the GCP reboot path runs gcloud compute instances reset ... without redirecting its output. If gcloud prints status text to stdout, it will corrupt the captured new_ip value (new_ip=$(_reboot_vm ...)) and break the post-reboot flow.
            echo "  Restarting GCP VM..." >&2
            local zone
            zone=$(cd "${REPO_ROOT}/${provider}" \
                && terraform output -raw zone 2>/dev/null \
                || echo "us-central1-a")
            gcloud compute instances reset ctf-instance --zone="${zone}" --quiet
            # Wait for VM to be running

Comment on lines 61 to 75
# Parse arguments
WITH_REBOOT=false
POST_REBOOT=false
for arg in "$@"; do
case $arg in
--with-reboot)
WITH_REBOOT=true
shift
;;
--post-reboot)
POST_REBOOT=true
shift
;;
esac
done
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

.github/skills/ctf-testing/deploy_and_test.sh:440

  • _reboot_vm is intended to return only the VM IP via stdout (logs go to stderr), but the GCP reboot path calls gcloud compute instances reset without redirecting its stdout. If gcloud prints anything, it will corrupt the captured IP (new_ip=$(...)). Redirect stdout to stderr (or /dev/null) to enforce the function contract consistently across providers.
            zone=$(cd "${REPO_ROOT}/${provider}" \
                && terraform output -raw zone 2>/dev/null \
                || echo "us-central1-a")
            gcloud compute instances reset ctf-instance --zone="${zone}" --quiet

Comment on lines 67 to +71
WITH_REBOOT=true
shift
;;
--post-reboot)
POST_REBOOT=true
;;
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread azure/main.tf
Comment on lines +134 to 139
azure_release_extension_script = <<-EOF
#!/bin/sh
exec /bin/bash <<'LINUX_CTFS_SETUP'
${local.release_setup_script}
LINUX_CTFS_SETUP
EOF
Comment on lines +431 to 433
echo " Restarting Azure VM..." >&2
az vm restart --resource-group ctf-resources --name ctf-vm >&2
;;
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

;;
gcp)
echo " Restarting GCP VM..."
echo " Restarting GCP VM..." >&2
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread .github/skills/ctf-testing/test_ctf_challenges.sh Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants