Mike Notes
Gary Marcus mentioned this project in his newsletter today.
Resources
- https://www.wolfram.com/llm-benchmarking-project/
- https://www.wolfram.com/language/elementary-introduction/3rd-ed/
- https://datarepository.wolframcloud.com/resources/LLMBenchmarks-Data/
Wolfram LLM Benchmarking Project
Using Wolfram Language to benchmark the performance of major LLMs.
As major users and analyzers of large language model (LLM) technology, we've been continually tracking the performance of LLMs. This project involves releasing our ongoing results, initially for a specific well-characterized code generation task.
The task consists of going from English-language specifications to Wolfram Language code. The test cases are exercises from Stephen Wolfram's An Elementary Introduction to the Wolfram Language. These exercises have been done online by millions of humans, and we've developed effective tools for determining functional correctness of code, which we're now applying to LLMs.
No comments:
Post a Comment