WorkshopA Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents
Raghu Arghal, Fade Chen, Niall Dalton, Evgenii Kortukov, Calum McNamara, Angelos Nalmpantis, Moksh Nirvaan, Gabriele Sarti, Mario Giulianelli
ICLR Workshop World Models, 2026We propose a framework for evaluating goal-directedness in LLM agents, integrating behavioural evaluation with interpretability analyses of internal representations.
