Andrew
M. Bean
Toggle navigation
about
publications
repositories
cv
press
teaching
Measuring what Matters: Construct Validity in Large Language Model Benchmarks