How to Rerun a Completed Kubernetes Job: Best Practices for Job Restarts


10 views

When working with Kubernetes Jobs, it's important to understand their fundamental behavior. Jobs are designed to create one or more Pods and ensure that a specified number of them successfully terminate. Once completed, Jobs maintain their status for historical reference but won't automatically restart.

In your case, examining the job status shows:

$ kubectl describe job dbload
Name:           dbload
Namespace:      default
Selector:       controller-uid=5b9a5b5a-6c5d-4e7f-a1b2-c3d4e5f6a7b8
Labels:         controller-uid=5b9a5b5a-6c5d-4e7f-a1b2-c3d4e5f6a7b8
                job-name=dbload
Annotations:    kubernetes.io/change-cause=kubectl create --filename=dbload-deployment.yml --record=true
Parallelism:    1
Completions:    1
Start Time:     Mon, 01 Jan 2023 10:00:00 +0000
Completed At:   Mon, 01 Jan 2023 10:30:00 +0000
Duration:       30m
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  ...
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  1h    job-controller  Created pod: dbload-0mk0d

There are several approaches to rerun a Kubernetes Job, each with different implications:

1. Delete and Recreate the Job

The most straightforward method is to delete the existing job and recreate it:

kubectl delete job dbload
kubectl create -f dbload-deployment.yml --record

2. Use Job TTL Controller (Kubernetes 1.12+)

For clusters running Kubernetes 1.12 or later, you can use the TTL-after-finished controller:

apiVersion: batch/v1
kind: Job
metadata:
  name: dbload
spec:
  ttlSecondsAfterFinished: 60  # Job will be deleted 60 seconds after completion
  template:
    # rest of your job spec

3. Create a CronJob Instead

If you need regular execution, consider using a CronJob:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: dbload
spec:
  schedule: "0 * * * *"  # Run hourly
  jobTemplate:
    spec:
      template:
        # Your existing job spec here

For more complex scenarios, you might consider these approaches:

Using kubectl replace

You can modify and replace the existing job:

kubectl get job dbload -o yaml > job.yaml
# Modify job.yaml to reset status
kubectl replace --force -f job.yaml

Programmatic Job Restart

Here's a Python example using the Kubernetes Python client:

from kubernetes import client, config

config.load_kube_config()
batch_v1 = client.BatchV1Api()

# Delete existing job
batch_v1.delete_namespaced_job(
    name="dbload",
    namespace="default",
    body=client.V1DeleteOptions()
)

# Create new job
with open("dbload-deployment.yml") as f:
    job_manifest = yaml.safe_load(f)
batch_v1.create_namespaced_job(namespace="default", body=job_manifest)
  • Always include resource limits in your Job specifications
  • Consider adding proper labels for easier management
  • For production environments, implement proper logging and monitoring
  • Use ConfigMaps or Secrets for configuration rather than hardcoding in the Job spec

Kubernetes Jobs are designed to run to completion and then persist in the cluster with a Completed status. This is different from Deployments which are meant for long-running applications. When you check your job status:

$ kubectl get job dbload
NAME      DESIRED   SUCCESSFUL   AGE
dbload    1         1            1h

The job remains in the cluster as a record of completed work, which is why you can't simply recreate it with the same name.

Here are three practical methods to rerun your job:

1. Delete and Recreate

The most straightforward approach:

kubectl delete job dbload
kubectl create -f dbload-deployment.yml --record

2. Use Job TTL Controller (Kubernetes 1.12+)

Add TTL to automatically clean up completed jobs:

apiVersion: batch/v1
kind: Job
metadata:
  name: dbload
spec:
  ttlSecondsAfterFinished: 60  # Delete 60 seconds after completion
  template:
    # ... rest of your spec

3. Create Jobs Programmatically

For frequent reruns, consider generating unique job names:

apiVersion: batch/v1
kind: Job
metadata:
  name: dbload-$(date +%s)  # Unique timestamp suffix
spec:
  # ... rest of your spec

If you need to run the job periodically, convert it to a CronJob:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: dbload
spec:
  schedule: "0 * * * *"  # Run hourly
  jobTemplate:
    spec:
      template:
        # ... your existing pod template
  • Completed jobs consume etcd storage space
  • Job history is valuable for auditing and debugging
  • Consider adding labels for better job management

For special cases, you can reset job status (not recommended for production):

kubectl patch job dbload --type=json -p='[{"op": "remove", "path": "/status"}]'

Remember that Kubernetes intentionally makes jobs immutable after completion to maintain accurate execution history.