Cross-Substrate Paper Example¶
A complete workflow for writing a paper comparing human EEG and AI model cognitive profiles using the same LOC framework.
Research Question¶
Do human brains and large language models exhibit similar patterns of cognitive coherence when assessed using the same 13-function framework?
Full Analysis Script¶
"""cross_substrate_analysis.py — Human vs AI cognitive coherence."""
from pathlib import Path
import pandas as pd
from aime_loc import LOC
from aime_loc.eeg import EEG
from aime_loc.eeg.viz import cognitive_radar
# ── Setup ──────────────────────────────────────────────
loc = LOC()
eeg = EEG(loc)
# ── Part 1: Score Human EEG ───────────────────────────
print("=== Human EEG Scoring ===")
session = eeg.session()
for f in sorted(Path("eeg_data/").glob("sub-*/eeg/rest.set")):
subject = f.parent.parent.name
rec = eeg.load(f)
rec.preprocess()
epochs = rec.extract_epochs(duration=2.0)
session.add(epochs, subject=subject, task="rest")
human_results = eeg.score_session(session)
human_results.export_csv("results/human_profiles.csv")
print(f"Scored {human_results.n_profiles} human recordings")
for p in human_results.profiles:
print(f" {p.subject_id}: TC={p.tc_score:.2f}%")
# ── Part 2: Score AI Models ───────────────────────────
print("\n=== AI Model Scoring ===")
models = [
"meta-llama/Llama-4-Scout",
"meta-llama/Llama-3.3-70B-Instruct",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"Qwen/Qwen3.5-35B-A3B",
"google/gemma-3-12b-it",
]
llm_results = loc.benchmark(models, questions="78q")
for p in llm_results.profiles:
print(f" {p.model_id}: TC={p.tc_score:.2f}%")
# ── Part 3: Cross-Substrate Comparison ────────────────
print("\n=== Cross-Substrate Analysis ===")
# Mean TC
human_tcs = [p.tc_score for p in human_results.profiles]
llm_tcs = [p.tc_score for p in llm_results.profiles]
human_mean = sum(human_tcs) / len(human_tcs)
llm_mean = sum(llm_tcs) / len(llm_tcs)
print(f"Human mean TC: {human_mean:.2f}% (n={len(human_tcs)})")
print(f"LLM mean TC: {llm_mean:.2f}% (n={len(llm_tcs)})")
print(f"Difference: {human_mean - llm_mean:+.2f}pp")
# Per-function comparison
from aime_loc.models.common import FUNCTION_ORDER
human_by_func = {f: 0.0 for f in FUNCTION_ORDER}
llm_by_func = {f: 0.0 for f in FUNCTION_ORDER}
for p in human_results.profiles:
for f, s in p.tc_by_function().items():
human_by_func[f] += s / len(human_results.profiles)
for p in llm_results.profiles:
for f, s in p.tc_by_function().items():
llm_by_func[f] += s / len(llm_results.profiles)
print(f"\n{'Function':<16} {'Human':>8} {'LLM':>8} {'Delta':>8}")
print("-" * 44)
for func in FUNCTION_ORDER:
h = human_by_func[func]
l = llm_by_func[func]
d = h - l
print(f"{func:<16} {h:>7.2f}% {l:>7.2f}% {d:>+7.2f}%")
# ── Part 4: Figures ───────────────────────────────────
fig_dir = Path("figures")
fig_dir.mkdir(exist_ok=True)
# Figure 1: Best human vs best LLM
best_human = max(human_results.profiles, key=lambda p: p.tc_score)
best_llm = max(llm_results.profiles, key=lambda p: p.tc_score)
cognitive_radar(
[best_human, best_llm],
title=f"Human ({best_human.subject_id}) vs AI ({best_llm.model_id.split('/')[-1]})",
show=False,
save=str(fig_dir / "fig1_best_vs_best.pdf"),
journal="nature",
dpi=600,
)
# Figure 2: All human profiles overlaid
cognitive_radar(
human_results.profiles,
title="Human EEG Cognitive Profiles (Resting State)",
show=False,
save=str(fig_dir / "fig2_all_humans.pdf"),
journal="nature",
dpi=600,
)
# ── Part 5: Export ────────────────────────────────────
# JSON for supplementary materials
for p in human_results.profiles:
p.to_json(f"results/{p.subject_id}_rest.json")
for p in llm_results.profiles:
name = p.model_id.replace("/", "_")
p.to_json(f"results/{name}.json")
# LaTeX tables
with open("results/table1_human.tex", "w") as f:
f.write(best_human.to_latex())
with open("results/table2_llm.tex", "w") as f:
f.write(best_llm.to_latex())
print("\nAnalysis complete!")
print(f"Figures: {fig_dir}/")
print("Results: results/")
Paper Outline¶
Abstract¶
We present the first direct comparison of cognitive coherence between human brains and large language models using the AIME LOC framework. Using the same 13-function cognitive model applied to both EEG frequency power and transformer layer activations, we found that [findings].
Introduction¶
- Cognitive coherence as a measurable property of information processing systems
- The LOC framework: 13 cognitive functions and True Coherence metric
- Cross-substrate application: same framework applied to fundamentally different substrates
Methods¶
Human Participants¶
N subjects performed resting-state EEG recordings. Data was collected using [device] at [sfreq] Hz with [n_channels] channels.
EEG Processing¶
EEG data was preprocessed using AIME LOC v0.2.0 (bandpass 0.5–45 Hz, 50 Hz notch filter, average reference). Power Spectral Density was computed using Welch's method (2-second epochs, 256-sample windows, 50% overlap). PSD arrays were submitted to the AIME API for server-side True Coherence scoring.
AI Models¶
Five large language models were evaluated using the 78-question LOC evaluation set via the AIME API. Models ranged from 12B to 70B parameters.
Cross-Substrate Comparison¶
Human and AI cognitive profiles were compared on the same 13-function framework. Per-function TC scores were extracted and compared using independent-samples t-tests with Bonferroni correction.
Results¶
- Table 1: Human TC scores by subject
- Table 2: AI model TC scores
- Table 3: Per-function cross-substrate comparison
- Figure 1: Human vs AI radar overlay
- Figure 2: All human profiles
- Figure 3: Per-function delta chart
Discussion¶
Key themes to address:
- Overall TC differences — Human brains vs LLMs
- Function-specific patterns — Which functions show convergence/divergence
- Diagnostic analysis — Which aspect of coherence differs most between substrates
- Implications — What cross-substrate similarity/difference means for consciousness research