Skip to content

Comparing Models

Basic Comparison

from aime_loc import LOC

loc = LOC()
comp = loc.compare("meta-llama/Llama-4-Scout", "deepseek-ai/DeepSeek-R1")

print(comp.summary())
# Llama-4-Scout wins by +2.1pp overall (8 improved, 5 degraded)

Understanding the Comparison

ModelComparison contains:

  • model_a, model_b — Full CognitiveProfile for each model
  • per_function_delta — List of FunctionDelta objects (13 entries)
  • overall_delta — TC difference (model_b - model_a)
  • winner — Model with higher TC
  • improved_functions — Functions where model_b > model_a
  • degraded_functions — Functions where model_b < model_a

Visualizations

# Per-function delta bar chart (green=improved, red=degraded)
comp.delta_chart()

# Overlaid radar charts
comp.side_by_side_radar()

Export Report

# Markdown report
comp.save_report("comparison.md")

# JSON export
comp.save_report("comparison.json", fmt="json")

# Dictionary
data = comp.to_dict()

Per-Function Analysis

for delta in comp.per_function_delta:
    direction = "↑" if delta.improved else "↓"
    print(f"  {delta.function.value}: {delta.tc_a:.2f}% → {delta.tc_b:.2f}% "
          f"({direction} {delta.delta:+.2f}pp)")