AI-101

XpertBench: New Benchmark Reveals 'Expert Gap' in LLMs Across Professional Domains

Source: arXiv cs.AIPublished: (1mo ago)Added to AI-101:

AI-generated

TLDR

Researchers have released XpertBench, a comprehensive benchmark with 1,346 tasks across 80 categories spanning finance, healthcare, legal services, education, and research.

Results reveal a significant 'expert-gap' in current AI systems, with even leading models achieving only around 66% peak success rates and mean scores hovering near 55%.

Key Takeaways

  • A new benchmark with 1,346 expert-curated tasks shows leading LLMs achieve only 55-66% success rates on professional-level work in finance, healthcare, and legal services
Read original →