Euler AI Benchmark
Last updated
Last updated
Euler AI Benchmark supports a variety of AI models across different domains, predominantly large language models, allowing users to compare AI performance in various tasks and contexts.
There are 20 LLMs in total, with around billion parameters (1.288 trillion) parameters. Some models like DeepSeek-R1, DeepSeek-V3, and Phi-4 are missing parameter details, so the actual total is likely higher.
DeepSeek Models
DeepSeek-R1
DeepSeek-R1-Distill-Llama-70B
DeepSeek-V3
DeepSeek-R1-Distill-Qwen-32B
Meta Llama Models
Llama-3.3-70B-Instruct-Turbo
Llama-3.3-70B-Instruct
Meta-Llama-3.1-70B-Instruct
Meta-Llama-3.1-8B-Instruct
Meta-Llama-3.1-405B-Instruct
Meta-Llama-3.1-8B-Instruct-Turbo
Meta-Llama-3.1-70B-Instruct-Turbo
Llama-3.2-90B-Vision-Instruct
Llama-3.2-11B-Vision-Instruct
Mistral Models
Mistral-Small-24B-Instruct-2501
Microsoft Models
Phi-4
WizardLM-2-8x22B
Qwen Models
QwQ-32B-Preview
Qwen2.5-Coder-32B-Instruct
Qwen2.5-72B-Instruct
NVIDIA Model
Llama-3.1-Nemotron-70B-Instruct
Simply follow the steps below 👇
2, Click Connect Wallet and follow the steps to log in
3, You will see a very concise interface, allowing you to input and test whatever you want to the large languages models we have integrated.
4, At the bottom of the conversation box, you will see four feedback buttons, signaling your evaluations towards each model.
5, Once a vote/human evaluation is made, you will see the results of the winning model.
6, Click the New Round to continue, or go to the leader board to see which LLM performs the best.
1, Go to
Each time you contribute to the Euler AI Benchmark, it will become more refined by your human interaction, which will be documented and rewarded by ecosystem.