What to do if your Kubernetes pod will not start

Created by Steve Place, Modified on Wed, Feb 12 at 5:24 PM by Steve Place

Throughout this article, we will assume the namespace of your Stardog cluster is stardog-ns, and the name of your helm release is dev-sd. Replace this with your actual namespace and helm release.

If you run kubectl get pods -n stardog-ns, you'll see an output in this format:

> kubectl get pods -n stardog-ns
NAME                                                READY   STATUS             RESTARTS   AGE
dev-sd-stardog-0                                    0/1     CrashLoopBackOff   5          10m

Let's break down this particular output:

Our Stardog pod is named dev-sd-stardog-0. This is the name of our release (dev-sd), the component of the helm chart (stardog), and the index of the pod (where pods are 0-indexed).
- If you're running a cluster, you'll see a pod for each cluster node, e.g., dev-sd-stardog-0, dev-sd-stardog-1, and dev-sd-stardog-2.
The pod is not ready. There is only one pod under the dev-sd-stardog-0 name here, but there could be multiple. If some but not all of them were ready, you could see something like 1/3 here.
The status is CrashLoopBackOff. Its meaning is described below.
The pod has restarted 5 times due to errors.
The pod has existed for 10 minutes.

Common pod statuses, what they mean, and next steps

CrashLoopBackOff

What it means: The pod is repeatedly crashing and restarting due to an issue with the Stardog process or its configuration.

Next steps:

Check logs: kubectl logs dev-sd-stardog-0 -n stardog-ns
Check events: kubectl describe pod dev-sd-stardog-0 -n stardog-ns
- The entire output of kubectl describe pod is quite large, and you usually only need to worry about the events. The events are at the bottom of the output. An example of what these look like is provided in the next major section of this article.

ImagePullBackOff / ErrImagePull

What it means: Kubernetes is unable to pull the Stardog container image.

Next steps:

Check if the image exists: kubectl describe pod dev-sd-stardog-0 -n stardog-ns
Ensure you have access to the container registry you are trying to pull from.
Verify the image name and tag in values.yaml

Pending

What it means: The pod is waiting for resources, such as a node with sufficient CPU/memory or a required volume.

Next Steps:

If you haven't been waiting long, wait longer; it may just take more time for the pod to start.
Check node status: kubectl get nodes
Ensure PVCs (Persistent Volume Claims) are correctly bound. (More on this in the section "Ensuring PVCs are correctly bound".)

ContainerCreating

What it means: The container is still being created, possibly due to a slow storage mount, networking issue, or resource constraint.

Next Steps:

If you haven't been waiting long, wait longer; it may just take more time for the container to create.
Check storage volumes: kubectl get pvc -n stardog-ns
Look for node-level issues: kubectl describe node <node-name>

OOMKilled

What it means: The pod exceeded its memory limit and was killed by Kubernetes.

Next steps:

Check memory limits in kubectl describe pod dev-sd-stardog-0 -n stardog-ns
Increase the memory request/limit in values.yaml

Running

What it means: The pod has been successfully scheduled to a node, and at least one container inside it is running or is in the process of starting.

Next steps:

If the pod is ready, you can begin using Stardog.
If the pod is not ready, wait until it either becomes ready or changes to one of the above statuses.
- If your pod enters a status that is not listed above, file a Support Ticket, and we'll add it to this article after we solve your issue.

What the Events section of kubectl describe pod looks like

Example 1: A successfully running pod

Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Normal   Scheduled               3m     default-scheduler Successfully assigned stardog-ns/dev-sd-stardog-0 to node-1
  Normal   Pulling                 3m     kubelet            Pulling image "stardog/stardog:latest"
  Normal   Pulled                  2m58s  kubelet            Successfully pulled image "stardog/stardog:latest"
  Normal   Created                 2m57s  kubelet            Created container stardog
  Normal   Started                 2m57s  kubelet            Started container stardog

Key Takeaways:

Scheduled: The pod was assigned to a node.
Pulling/Pulled: The container image was successfully pulled.
Created/Started: The Stardog container started successfully.

Example 2: Pod Stuck in ContainerCreating Due to PVC Issues

If a Persistent Volume Claim (PVC) is not binding correctly, the event log might look like this:

Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Warning  FailedAttachVolume       3m     attachdetach-controller  AttachVolume.Attach failed for volume "pvc-1234abcd" : volume not found
  Normal   Scheduled                3m     default-scheduler        Successfully assigned stardog-ns/dev-sd-stardog-0 to node-2
  Warning  FailedMount              2m     kubelet                  Unable to mount volumes for pod "dev-sd-stardog-0": timeout waiting for volume
  Warning  FailedMount              1m     kubelet                  MountVolume.SetUp failed for volume "stardog-data": no persistent volumes available

Key Takeaways:

FailedAttachVolume: The volume was not found.
FailedMount: The pod cannot mount the requested storage.
Fix: Check if the PVC is bound (kubectl get pvc -n stardog-ns) and if a matching PV exists. More on this in the below section "Ensuring PVCs are correctly bound".

Example 3: CrashLoopBackOff Due to a Startup Failure

If the pod keeps crashing and restarting, you might see:

Events:
  Type     Reason           Age   From               Message
  ----     ------           ----  ----               -------
  Normal   Scheduled        2m    default-scheduler Successfully assigned stardog-ns/dev-sd-stardog-0 to node-3
  Normal   Pulled           2m    kubelet            Successfully pulled image "stardog/stardog:latest"
  Normal   Created          2m    kubelet            Created container stardog
  Normal   Started          2m    kubelet            Started container stardog
  Warning  BackOff          1m    kubelet            Back-off restarting failed container
  Warning  Failed           30s   kubelet            Error: command failed with exit code 1

Key Takeaways:

BackOff: Kubernetes is delaying restarts because the container keeps failing.
Failed: The container exited with a non-zero exit code.
Fix: Check the logs (kubectl logs dev-sd-stardog-0 -n stardog-ns) to diagnose the failure.

Example 4: ImagePullBackOff Due to Missing or Unauthorized Image

If Kubernetes cannot pull the Stardog image, the events might look like this:

Events:
  Type     Reason             Age    From               Message
  ----     ------             ----   ----               -------
  Normal   Scheduled          5m     default-scheduler Successfully assigned stardog-ns/dev-sd-stardog-0 to node-4
  Warning  Failed             4m     kubelet            Failed to pull image "stardog/stardog:latest": image not found
  Warning  Failed             4m     kubelet            Error: ErrImagePull
  Normal   BackOff            3m     kubelet            Back-off pulling image "stardog/stardog:latest"
  Warning  Failed             2m     kubelet            Error: ImagePullBackOff

Key Takeaways:

ErrImagePull / ImagePullBackOff: The image could not be found or Kubernetes lacked permission to pull it.
Fix: Verify the image section of values.yaml. If you're using an image pull secret (recommended over hardcoding your credentials into values.yaml), see the next section on troubleshooting errors with them them.

Troubleshooting Image Pull Secrets

If your Stardog deployment uses a private container registry, Kubernetes must authenticate using an image pull secret. If this secret is missing, misconfigured, or not referenced correctly, you might see an ImagePullBackOff error. See example 4 in the previous section for an example of this. Below are the steps to troubleshoot this.

Step 1: Verify the Image Pull Secret Exists

First, check if the image pull secret is present in the correct namespace:

kubectl get secrets -n stardog-ns

Look for a secret that should be used for image pulling, such as:

NAME                   TYPE                                  DATA   AGE
my-registry-secret     kubernetes.io/dockerconfigjson       1      3h

If the secret is missing, it must be created.

Step 2: Recreate the Image Pull Secret (If Needed)

If the secret does not exist or is incorrect, create it using the following command:

kubectl create secret docker-registry my-registry-secret \
  --docker-server=my-registry.example.com \
  --docker-username=<your-username> \
  --docker-password=<your-password> \
  --namespace=stardog-ns

Run the command from step 1 to verify the secret was created.

Step 3: Ensure the Pod References the Image Pull Secret

Your values.yaml should specify the image pull secret under imagePullSecrets. Example:

image:
  repository: my-registry.example.com/stardog/stardog
  tag: latest
  pullPolicy: IfNotPresent

imagePullSecrets:
  - name: my-registry-secret

After you update values.yaml, you'll need to redeploy your helm release with:

helm upgrade --install dev-sd /path/to/stardog/chart/ --values /path/to/values.yaml -n stardog-ns

Step 4: Verify the Secret is Being Used by the Pod

Check if the pod is referencing the pull secret:

kubectl get pod dev-sd-stardog-0 -n stardog-ns -o jsonpath='{.spec.imagePullSecrets}'

Expected output:

[{"name":"my-registry-secret"}]

Step 5: Restart the Pod to Retry Image Pull

Once the pull secret is confirmed to exist and be correctly referenced, restart the pod:

kubectl delete pod dev-sd-stardog-0 -n stardog-ns

Kubernetes will automatically recreate the pod and attempt to pull the image again.

Ensuring PVCs are correctly bound

To check if Persistent Volume Claims (PVCs) are correctly bound in your Kubernetes cluster, follow these steps:

Step 1: List PVCs in the Namespace

Run the following command to check the status of all PVCs in the stardog-ns namespace:

kubectl get pvc -n stardog-ns

Example output:

NAME                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
stardog-data-0       Bound     pvc-1234abcd-5678-efgh-ijkl-90mnopqrst   10Gi       RWO            standard       3h

Possible statuses:

Bound → PVC is correctly attached to a Persistent Volume (PV).
Pending → PVC is waiting for a PV to be provisioned.
Lost → The bound PV is missing or deleted.

Step 2: Describe the PVC for More Details

If a PVC is not bound, inspect it with:

kubectl describe pvc stardog-data-0 -n stardog-ns

Look for:

Events: Errors such as "Failed to provision volume" or "No persistent volumes available" indicate storage issues.
StorageClass: Ensure the requested storage class exists (kubectl get storageclass).
Requested vs. Bound Size: Ensure the requested storage amount matches an available PV.

Step 3: Check if the PVC is Mounted in the Pod

To confirm that the PVC is correctly used by the Stardog pod, check its volume mounts:

kubectl describe pod dev-sd-stardog-0 -n stardog-ns

Look for the Volumes section:

Volumes:
  stardog-storage:
    Type: PersistentVolumeClaim (a reference to a PVC)
    ClaimName: stardog-data-0
    ReadOnly: false

If the claim is missing, the pod may fail to start due to a missing storage mount.

Step 4: Verify the Bound Persistent Volume (PV)

If the PVC is in a Pending state, check available PVs:

kubectl get pv

Example output:

NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                      STORAGECLASS   AGE
pvc-1234abcd       10Gi       RWO            Retain           Bound    stardog-ns/stardog-data-0 standard       3h

If no PVs are available, ensure your cluster storage provider (AWS EBS, GCE PD, NFS, etc.) is configured correctly.
If a PV exists but is not bound, check its Reclaim Policy (kubectl describe pv <pv-name>) and ensure it matches the PVC request.

Step 5. Manually Delete and Recreate the PVC (If Necessary)

If the PVC is stuck in Pending or Lost status, you might need to delete and recreate it:

kubectl delete pvc stardog-data-0 -n stardog-ns

Then apply a new PVC manifest.