Evaluation

In classical MLOps, a model's performance is evaluated on a hold-out validation set [5] with an evaluation metric. Because the performance of LLMs is more difficult to evaluate, currently organizations seem to be using A/B testing.

Last updated