ToolHop: a new dataset designed to evaluate LLM in multi-hop tool usage scenarios

Multi-hop queries have always caused LLM agents difficulties with their solutions as they require multiple steps of reasoning and information ...

This Amazon Machine Learning Research Presents a New Open Source High-Fidelity Dataset for Automotive Aerodynamics

by Technical Terrence Team

12/26/2024

0

One of the most critical challenges in computational fluid dynamics (CFD) and machine learning (ML) is that high-resolution 3D data ...

FineWeb-C: A community-created dataset to improve language models in ALL languages

by Technical Terrence Team

12/25/2024

0

FineWeb2 Significantly advances multilingual pre-training datasets, covering over 1000 languages with high-quality data. The dataset uses approximately 8 terabytes of ...

$Hugging Face Launches FineMath – Latest Open Math Pre-Training Dataset with Over 50 Billion Tokens$

Hugging Face Launches FineMath – Latest Open Math Pre-Training Dataset with Over 50 Billion Tokens

by Technical Terrence Team

12/20/2024

0

For educational research, access to high-quality educational resources is essential for students and educators. Mathematics, often perceived as one of ...

An introduction to preparing your own dataset for LLM training

by Technical Terrence Team

12/19/2024

0

RAW HTML pdfplumber, pypdf, and pdfminer to help with the extraction of text and tabular data from the PDF. The ...

CloudFerro and ESA Φ-lab release first global embeddings dataset for Earth observations

by Technical Terrence Team

12/14/2024

0

CloudFerro and the European Space Agency's (ESA) Φ-lab have presented the first global embeddings dataset for Earth observations, a significant ...

SmolTalk Released: The Dataset Recipe Behind SmolLM2's Best-in-Class Performance

by Technical Terrence Team

11/21/2024

0

Recent advances in natural language processing (NLP) have introduced new models and training data sets aimed at addressing the increasing ...

Angler: Helping Machine Translation Professionals Prioritize Model Improvements

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

by Technical Terrence Team

11/21/2024

0

Large Language Models (LLM) are commonly trained on data sets consisting of sequences of fixed-length tokens. These data sets are ...

This Machine Learning Paper Transforms the Efficiency of Embedded AI: New Scaling Laws to Optimize Model and Dataset Ratios in World Modeling and Behavior Cloning Tasks

by Technical Terrence Team

11/14/2024

0

Embodied artificial intelligence (ai) involves the creation of agents that operate within physical or simulated environments, autonomously executing tasks based ...

CHESTNUT: A QoS Dataset for Mobile Edge Environments

by Technical Terrence Team

11/01/2024

0

Quality of Service (QoS) is a very important metric used to evaluate the performance of network services in mobile edge ...

Tag: Dataset