Version: 0.0.1

Developer Guide

This guide is for developers who deploy workloads and consume resources within team projects.

Responsibilities

As a developer, you are responsible for:

✅ Deploying applications within your project
✅ Monitoring resource usage
✅ Staying within quota limits
✅ Optimizing resource consumption

Getting Started

1. Get Kubeconfig

Request kubeconfig from your team leader or administrator.

2. Set Context

# Set context to your project namespace
kubectl config set-context --current --namespace=your-project

# Verify
kubectl config view --minify | grep namespace

3. Check Quota

See your available resources:

kubectl describe quota

Deploying Workloads

Basic Pod Deployment

apiVersion: v1
kind: Pod
metadata:
  name: gpu-training-job
  namespace: your-project
spec:
  containers:
  - name: trainer
    image: your-ml-image:latest
    resources:
      requests:
        cpu: "4"
        memory: "16Gi"
        nvidia.com/gpu: "1"
      limits:
        cpu: "4"
        memory: "16Gi"
        nvidia.com/gpu: "1"

Using Deployments

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-inference
  namespace: your-project
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-inference
  template:
    metadata:
      labels:
        app: ml-inference
    spec:
      containers:
      - name: inference
        image: your-inference-image:latest
        resources:
          requests:
            cpu: "2"
            memory: "8Gi"
            nvidia.com/gpu: "1"

Monitoring Usage

Check Pod Resource Usage

# View resource consumption
kubectl top pods

# Detailed pod information
kubectl describe pod <pod-name>

View Logs

# Stream logs
kubectl logs -f <pod-name>

# Previous logs (if pod restarted)
kubectl logs --previous <pod-name>

Best Practices

Resource Requests and Limits

Always specify both requests and limits:

resources:
  requests:
    cpu: "2"
    memory: "8Gi"
  limits:
    cpu: "4"
    memory: "16Gi"

GPU Usage

Request GPUs only when needed
Use GPU for compute-intensive tasks
Monitor GPU utilization

Clean Up

Delete resources when no longer needed:

# Delete pod
kubectl delete pod <pod-name>

# Delete deployment
kubectl delete deployment <deployment-name>

# Clean up completed jobs
kubectl delete job --field-selector status.successful=1

Cost Optimization

Right-size your resource requests
Use horizontal pod autoscaling
Clean up idle resources
Share GPUs when possible (if supported)

Troubleshooting

Pod Pending (Insufficient Quota)

If your pod is stuck in Pending state:

kubectl describe pod <pod-name>

Look for quota-related errors and reduce resource requests or ask your team leader for more quota.

Out of Memory (OOM)

If pods are killed due to OOM:

Check memory usage patterns
Increase memory limits
Optimize application memory usage

GPU Not Available

Verify GPU requests:

kubectl get nodes -o custom-columns=NAME:.metadata.name,GPU:.status.allocatable."nvidia\.com/gpu"

Next Steps

Team Leader Guide - Understand team management
Architecture - Learn about the platform

Responsibilities​

Getting Started​

1. Get Kubeconfig​

2. Set Context​

3. Check Quota​

Deploying Workloads​

Basic Pod Deployment​

Using Deployments​

Monitoring Usage​

Check Pod Resource Usage​

View Logs​

Best Practices​

Resource Requests and Limits​

GPU Usage​

Clean Up​

Cost Optimization​

Troubleshooting​

Pod Pending (Insufficient Quota)​

Out of Memory (OOM)​

GPU Not Available​

Next Steps​