DIY LLM Evaluation, a Case Study of Rhyming in ABBA Schema
Xebia
MAY 8, 2024
DIY LLM Evaluation, a Case Study of Rhyming in ABBA Schema It’s becoming common knowledge: You should not choose your LLMs based on static benchmarks. Especially when combined with the auto-regressive architecture of most LLMs. So far I had been eye-balling this… but now it was time for a more structured approach.
Let's personalize your content