OpenAI researchers introduce MLE-bench: a new benchmark for measuring the performance of AI agents in machine learning engineering
Machine learning (ML) models have shown promising results in various coding tasks, but there remains a gap in effectively benchmarking ...