Andrew
M. Bean
Toggle navigation
about
publications
repositories
cv
media
teaching
Measuring what Matters: Construct Validity in Large Language Model Benchmarks