Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart Kubernetes from Azure Portal, then SpinApp couldn't run anymore #24

Open
thangchung opened this issue Sep 15, 2024 · 5 comments

Comments

@thangchung
Copy link

I followed the guidance in the README file. It worked very well.

However, one issue that has happened to me is that if I stop the AKS cluster and restart it again, SpinApp (deployment) will be in spending status forever. See below

image

The logs:
104s Normal Scheduled pod/simple-spinapp-84c9b4885b-bf682 Successfully assigned default/simple-spinapp-84c9b4885b-bf682 to aks-nodepool1-18815957-vmss000001
12s Warning FailedCreatePodSandBox pod/simple-spinapp-84c9b4885b-bf682 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured

I tried to delete it by using:

kubectl delete -f https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/simple.yaml

And

kubectl apply -f https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/simple.yaml

It was still not working.

The only way to make it work again is to use helm delete spinkube, and re-install it again on the AKS cluster.

@Mossaka
Copy link
Member

Mossaka commented Sep 16, 2024

interesting, were you able to ssh into the cluster node and check if the spin shim binary still exists in PATH or the contaienrd's config.toml still have the CRI config for the spin shim?

@vdice
Copy link
Collaborator

vdice commented Sep 16, 2024

I'm seeing the same behavior. Indeed, when the (new?) node(s) come back up after the AKS stop/restart, they are missing the spin shim CRI config -- thus the SpinApp pods are stuck in ContainerCreating with failed to get sandbox runtime: no runtime for "spin" is configured.

The current quick fix is to re-annotate node(s), eg via kubectl annotate node --all kwasm.sh/kwasm-node=true. (Should not need to delete spinkube and re-install.) But the best resolution would be for AKS to preserve the containerd configuration through the stop/restart cycle.

@Mossaka
Copy link
Member

Mossaka commented Sep 16, 2024

I will reach out to the AKS team to find out the configuration issue

@thangchung
Copy link
Author

thangchung commented Sep 19, 2024

I will reach out to the AKS team to find out the configuration issue

Thanks, @Mossaka @vdice for acting on it. I'm waiting for #25.

@ThorstenHans
Copy link

This issue also affects Kubernetes clusters outside of Azure that have capabilities like horizontal cluster auto-scaling or scheduled node upgrades.

As an intermediate solution, I created a small DaemonSet that starts a Job to annotate the current Kubernetes node.

Although the solution isn't ideal, It guarantees that new nodes will be annotated with kwasm.sh/kwasm-node=true.

@Mossaka I'm happy to polish my workaround and publish it on GitHub so that others will have a solution for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants