AINeutralarXiv – CS AI · 3h ago6/10
🧠
PetroBench: A Benchmark for Large Language Models in Petroleum Engineering
Researchers have developed PetroBench, a comprehensive benchmark for evaluating large language models in petroleum engineering, testing eight mainstream LLMs across 1,200 domain-specific questions. The evaluation reveals significant performance gaps, with leading models achieving 72-74% accuracy overall but struggling particularly with factual discrimination in objective questions, suggesting LLMs need substantial improvement before widespread deployment in critical petroleum industry applications.
🧠 Claude🧠 Gemini