Archiveteam Warrior


Archiveteam Warrior ArchiveTeam Warrior is a virtual archiving appliance. It will take some of your extra CPU and bandwidth and collect data from various sites (projects of the ArchiveTeam) and send it to the ArchiveTeam. This gets aggregated and added to the Internet Archive to help preserve our digital heritage.

Product: ArchiveTeam Warrior
Install Type: Manifest
Container Image: Docker

Installation Details

I have not as of yet created a Helm chart for this, so I have configured manifest files to install ArchiveTeam Warrior on Kubernetes. These manifests were adapted from these Docker instructions. You should probably read through that page first to understand what is being adapted here.

The following assume you have an existing namespace named utility, an NGINX ingress named nginx-int, and Cert Manager configured to use an ACME provider. Because this control panel has no need to be public this uses my internal CA with Step CA. Please adjust for your particular needs.

00-utility-namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: utility
  labels:
    name: utility

This is optional, but I create the namespace in my builds in case I have to recover from scratch. If it already exists, it will not have any real negative effects

01-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: archiveteam
  namespace: utility
  labels:
    app.kubernetes.io/name: archiveteam
spec:
  selector:
    matchLabels:
      app: archiveteam
  template:
    metadata:
      labels:
        app: archiveteam
        app.kubernetes.io/name: archiveteam
    spec:
      containers:
      - name: archiveteam
        image: atdr.meo.ws/archiveteam/warrior-dockerfile:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8001
        envFrom:
        - configMapRef:
            name: archiveteam

02-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: archiveteam
  namespace: utility
data:
  DOWNLOADER: <<Your_ID>>
  SELECTED_PROJECT: auto
  SHARED_RSYNC_THREADS: '40'
  WARRIOR_ID: << Your_ID >>
  CONCURRENT_ITEMS: '6'

You should use your own unique ID here. Also please adjust the SHARED_RSYNC_THREADS and CONCURRENT_ITEMS to fir your needs, these values represent the current maximum values

03-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: archiveteam
  namespace: utility
spec:
  selector:
    app: archiveteam
  ports:
  - port: 8001
    targetPort: 8001
  type: ClusterIP

04-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: archiveteam
  namespace: utility
  labels:
    name: archiveteam
  annotations:
    cert-manager.io/cluster-issuer: your-issuer
spec:
  ingressClassName: your-ingress
  rules:
  - host: your.host.name
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: archiveteam
            port: 
              number: 8001
  tls:
    - hosts:
      - your.host.name
      secretName: archiveteam-int-tls

As mentioned above, the ingress configuration assumes working ngress and cert-manager configurations

Now, we can deploy this all together with:

kubectl apply -f 00-utility-namespace.yaml \
              -f 01-deploy.yaml \
              -f 02-configmap.yaml \
              -f 03-service.yaml \
              -f 04-ingress.yaml