How to Rerun a Completed Kubernetes Job: Best Practices for Job Restarts


2 views

When working with Kubernetes Jobs, it's important to understand their fundamental behavior. Jobs are designed to create one or more Pods and ensure that a specified number of them successfully terminate. Once completed, Jobs maintain their status for historical reference but won't automatically restart.

In your case, examining the job status shows:

$ kubectl describe job dbload
Name:           dbload
Namespace:      default
Selector:       controller-uid=5b9a5b5a-6c5d-4e7f-a1b2-c3d4e5f6a7b8
Labels:         controller-uid=5b9a5b5a-6c5d-4e7f-a1b2-c3d4e5f6a7b8
                job-name=dbload
Annotations:    kubernetes.io/change-cause=kubectl create --filename=dbload-deployment.yml --record=true
Parallelism:    1
Completions:    1
Start Time:     Mon, 01 Jan 2023 10:00:00 +0000
Completed At:   Mon, 01 Jan 2023 10:30:00 +0000
Duration:       30m
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  ...
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  1h    job-controller  Created pod: dbload-0mk0d

There are several approaches to rerun a Kubernetes Job, each with different implications:

1. Delete and Recreate the Job

The most straightforward method is to delete the existing job and recreate it:

kubectl delete job dbload
kubectl create -f dbload-deployment.yml --record

2. Use Job TTL Controller (Kubernetes 1.12+)

For clusters running Kubernetes 1.12 or later, you can use the TTL-after-finished controller:

apiVersion: batch/v1
kind: Job
metadata:
  name: dbload
spec:
  ttlSecondsAfterFinished: 60  # Job will be deleted 60 seconds after completion
  template:
    # rest of your job spec

3. Create a CronJob Instead

If you need regular execution, consider using a CronJob:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: dbload
spec:
  schedule: "0 * * * *"  # Run hourly
  jobTemplate:
    spec:
      template:
        # Your existing job spec here

For more complex scenarios, you might consider these approaches:

Using kubectl replace

You can modify and replace the existing job:

kubectl get job dbload -o yaml > job.yaml
# Modify job.yaml to reset status
kubectl replace --force -f job.yaml

Programmatic Job Restart

Here's a Python example using the Kubernetes Python client:

from kubernetes import client, config

config.load_kube_config()
batch_v1 = client.BatchV1Api()

# Delete existing job
batch_v1.delete_namespaced_job(
    name="dbload",
    namespace="default",
    body=client.V1DeleteOptions()
)

# Create new job
with open("dbload-deployment.yml") as f:
    job_manifest = yaml.safe_load(f)
batch_v1.create_namespaced_job(namespace="default", body=job_manifest)
  • Always include resource limits in your Job specifications
  • Consider adding proper labels for easier management
  • For production environments, implement proper logging and monitoring
  • Use ConfigMaps or Secrets for configuration rather than hardcoding in the Job spec

Kubernetes Jobs are designed to run to completion and then persist in the cluster with a Completed status. This is different from Deployments which are meant for long-running applications. When you check your job status:

$ kubectl get job dbload
NAME      DESIRED   SUCCESSFUL   AGE
dbload    1         1            1h

The job remains in the cluster as a record of completed work, which is why you can't simply recreate it with the same name.

Here are three practical methods to rerun your job:

1. Delete and Recreate

The most straightforward approach:

kubectl delete job dbload
kubectl create -f dbload-deployment.yml --record

2. Use Job TTL Controller (Kubernetes 1.12+)

Add TTL to automatically clean up completed jobs:

apiVersion: batch/v1
kind: Job
metadata:
  name: dbload
spec:
  ttlSecondsAfterFinished: 60  # Delete 60 seconds after completion
  template:
    # ... rest of your spec

3. Create Jobs Programmatically

For frequent reruns, consider generating unique job names:

apiVersion: batch/v1
kind: Job
metadata:
  name: dbload-$(date +%s)  # Unique timestamp suffix
spec:
  # ... rest of your spec

If you need to run the job periodically, convert it to a CronJob:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: dbload
spec:
  schedule: "0 * * * *"  # Run hourly
  jobTemplate:
    spec:
      template:
        # ... your existing pod template
  • Completed jobs consume etcd storage space
  • Job history is valuable for auditing and debugging
  • Consider adding labels for better job management

For special cases, you can reset job status (not recommended for production):

kubectl patch job dbload --type=json -p='[{"op": "remove", "path": "/status"}]'

Remember that Kubernetes intentionally makes jobs immutable after completion to maintain accurate execution history.