Data Schema
ProcessedJob Model
The collected data follows this schema:
| Field | Type | Description |
|---|---|---|
| job_id | str | SLURM job ID |
| user | str | Username |
| job_name | str | Job name (max 50 chars) |
| partition | str | SLURM partition |
| state | str | Final job state |
| submit_time | datetime.datetime | None |
| start_time | datetime.datetime | None |
| end_time | datetime.datetime | None |
| node_list | str | Nodes where job ran |
| elapsed_seconds | int | Runtime in seconds |
| alloc_cpus | int | CPUs allocated |
| req_mem_mb | float | Memory requested (MB) |
| max_rss_mb | float | Peak memory used (MB) |
| total_cpu_seconds | float | Actual CPU time used |
| alloc_gpus | int | GPUs allocated |
| cpu_efficiency | float | CPU efficiency % (0-100) |
| memory_efficiency | float | Memory efficiency % (0-100) |
| cpu_hours_wasted | float | Wasted CPU hours |
| memory_gb_hours_wasted | float | Wasted memory GB-hours |
| cpu_hours_reserved | float | Total CPU hours reserved |
| memory_gb_hours_reserved | float | Total memory GB-hours reserved |
| gpu_hours_reserved | float | Total GPU hours reserved |
| is_complete | bool | Whether job has reached final state |
Post-Processing with Polars
You can use Polars to analyze the collected data. Here's an example:
from datetime import datetime, timedelta
from pathlib import Path
import polars as pl
# Load processed data for last 7 days
dfs = []
for i in range(7):
date = (datetime.now() - timedelta(days=i)).strftime("%Y-%m-%d")
file = Path(f"data/processed/{date}.parquet")
if file.exists():
dfs.append(pl.read_parquet(file))
if dfs:
df = pl.concat(dfs)
# Find users with worst CPU efficiency
worst_users = df.filter(pl.col("state") == "COMPLETED").group_by("user").agg(pl.col("cpu_efficiency").mean()).sort("cpu_efficiency").head(5)
print("## Users with Worst CPU Efficiency")
print(worst_users)
# Find most wasted resources by partition
waste_by_partition = df.group_by("partition").agg(pl.col("cpu_hours_wasted").sum()).sort("cpu_hours_wasted", descending=True)
print("\n## CPU Hours Wasted by Partition")
print(waste_by_partition)
else:
print("No data files found. Run `./slurm_usage.py collect` first.")