XpertBench: New Benchmark Reveals 'Expert Gap' in LLMs Across Professional Domains
Source: arXiv cs.AIPublished: (1mo ago)Added to AI-101:
AI-generated
TLDR
Researchers have released XpertBench, a comprehensive benchmark with 1,346 tasks across 80 categories spanning finance, healthcare, legal services, education, and research.
Results reveal a significant 'expert-gap' in current AI systems, with even leading models achieving only around 66% peak success rates and mean scores hovering near 55%.
Key Takeaways
- A new benchmark with 1,346 expert-curated tasks shows leading LLMs achieve only 55-66% success rates on professional-level work in finance, healthcare, and legal services