Euler AI Benchmark

Euler AI Benchmark supports a variety of AI models across different domains, predominantly large language models, allowing users to compare AI performance in various tasks and contexts.

There are 20 LLMs in total, with around billion parameters (1.288 trillion) parameters. Some models like DeepSeek-R1, DeepSeek-V3, and Phi-4 are missing parameter details, so the actual total is likely higher.

DeepSeek Models

DeepSeek-R1
DeepSeek-R1-Distill-Llama-70B
DeepSeek-V3
DeepSeek-R1-Distill-Qwen-32B

Meta Llama Models

Llama-3.3-70B-Instruct-Turbo
Llama-3.3-70B-Instruct
Meta-Llama-3.1-70B-Instruct
Meta-Llama-3.1-8B-Instruct
Meta-Llama-3.1-405B-Instruct
Meta-Llama-3.1-8B-Instruct-Turbo
Meta-Llama-3.1-70B-Instruct-Turbo
Llama-3.2-90B-Vision-Instruct
Llama-3.2-11B-Vision-Instruct

Mistral Models

Mistral-Small-24B-Instruct-2501

Microsoft Models

Phi-4
WizardLM-2-8x22B

Qwen Models

QwQ-32B-Preview
Qwen2.5-Coder-32B-Instruct
Qwen2.5-72B-Instruct

NVIDIA Model

Llama-3.1-Nemotron-70B-Instruct

Simply follow the steps below 👇

1, Go to app.euler.ai

2, Click Connect Wallet and follow the steps to log in

3, You will see a very concise interface, allowing you to input and test whatever you want to the large languages models we have integrated.

4, At the bottom of the conversation box, you will see four feedback buttons, signaling your evaluations towards each model.

5, Once a vote/human evaluation is made, you will see the results of the winning model.

6, Click the New Round to continue, or go to the leader board to see which LLM performs the best.

Each time you contribute to the Euler AI Benchmark, it will become more refined by your human interaction, which will be documented and rewarded by $EULER ecosystem.

PreviousModel Context Protocol NextEuler MCP Overview

Last updated 6 months ago