Model evaluations versus task evaluations | by Aparna Dhinakaran | March 2024

Image created by the author using Dall-E 3

Understanding the difference for LLM applications

For a moment, imagine an airplane. What comes to mind? Let's now imagine a Boeing 737 and a Osprey V-22. Both are aircraft designed to transport cargo and people, but they have different purposes: one more general (commercial flights and cargo), the other very specific (infiltration, exfiltration and resupply missions for special operations forces). They look very different because they are designed for different activities.

With the rise of LLMs, we have seen our first truly general-purpose ML models. Its generality helps us in many ways:

The same engineering team can now perform sentiment analysis and structured data mining.
Professionals from many fields can share knowledge, making it possible for the entire industry to benefit from each other's experience.
There are a wide range of industries and jobs where the same experience is useful.

But as we see with airplanes, generality requires a very different evaluation than excelling at a particular task, and at the end of the day, business value often comes from solving particular problems.

This is a good analogy for the difference between model and task evaluations. Model evaluations focus on overall evaluation, but task evaluations focus on evaluating the performance of a particular task.

Model evaluations versus task evaluations | by Aparna Dhinakaran | March 2024

Technical Terrence Team

I'd buy cheap FTSE 100 shares before the UK's main index hits 8,000!

Leave a Reply Cancel reply

Recommended.

You have spoken! Meet the Disrupt 2024 Breakout Session Audience Choice Winners

How a Typo Made Lyft Stock Soar

Kreation Verse Reveals Anime-Style Mobile Fighter TAT Rumble

Multi AI Agent Systems 101. Automating Routine Tasks in Data Source… | by Mariya Mansurova | Jun, 2024

Researchers at Imperial College London propose FitMe: an AI model that turns arbitrary facial images into relightable facial avatars, which can be used directly in common rendering engines and games

Categories

Important Links

Model evaluations versus task evaluations | by Aparna Dhinakaran | March 2024

Understanding the difference for LLM applications

What is the difference?

How do they work?

Establishing a reference point

Preparation of the evaluation template

Metrics and iteration

Application of LLM assessments

Evaluation throughout the system life cycle

Example: is the model hallucinating?

Related

Technical Terrence Team

I'd buy cheap FTSE 100 shares before the UK's main index hits 8,000!

Leave a Reply Cancel reply

Recommended.

You have spoken! Meet the Disrupt 2024 Breakout Session Audience Choice Winners

How a Typo Made Lyft Stock Soar

Kreation Verse Reveals Anime-Style Mobile Fighter TAT Rumble

Multi AI Agent Systems 101. Automating Routine Tasks in Data Source… | by Mariya Mansurova | Jun, 2024

Researchers at Imperial College London propose FitMe: an AI model that turns arbitrary facial images into relightable facial avatars, which can be used directly in common rendering engines and games

Categories

Important Links

Get daily news updates to your inbox!