Skip to content

Data Schema

ProcessedJob Model

The collected data follows this schema:

Field Type Description
job_id str SLURM job ID
user str Username
job_name str Job name (max 50 chars)
partition str SLURM partition
state str Final job state
submit_time datetime.datetime None
start_time datetime.datetime None
end_time datetime.datetime None
node_list str Nodes where job ran
elapsed_seconds int Runtime in seconds
alloc_cpus int CPUs allocated
req_mem_mb float Memory requested (MB)
max_rss_mb float Peak memory used (MB)
total_cpu_seconds float Actual CPU time used
alloc_gpus int GPUs allocated
cpu_efficiency float CPU efficiency % (0-100)
memory_efficiency float Memory efficiency % (0-100)
cpu_hours_wasted float Wasted CPU hours
memory_gb_hours_wasted float Wasted memory GB-hours
cpu_hours_reserved float Total CPU hours reserved
memory_gb_hours_reserved float Total memory GB-hours reserved
gpu_hours_reserved float Total GPU hours reserved
is_complete bool Whether job has reached final state

Post-Processing with Polars

You can use Polars to analyze the collected data. Here's an example:

from datetime import datetime, timedelta
from pathlib import Path

import polars as pl

# Load processed data for last 7 days
dfs = []
for i in range(7):
    date = (datetime.now() - timedelta(days=i)).strftime("%Y-%m-%d")
    file = Path(f"data/processed/{date}.parquet")
    if file.exists():
        dfs.append(pl.read_parquet(file))

if dfs:
    df = pl.concat(dfs)

    # Find users with worst CPU efficiency
    worst_users = df.filter(pl.col("state") == "COMPLETED").group_by("user").agg(pl.col("cpu_efficiency").mean()).sort("cpu_efficiency").head(5)

    print("## Users with Worst CPU Efficiency")
    print(worst_users)

    # Find most wasted resources by partition
    waste_by_partition = df.group_by("partition").agg(pl.col("cpu_hours_wasted").sum()).sort("cpu_hours_wasted", descending=True)

    print("\n## CPU Hours Wasted by Partition")
    print(waste_by_partition)
else:
    print("No data files found. Run `./slurm_usage.py collect` first.")