Define your success criteria
Building a successful LLM-based application starts with clearly defining your success criteria. How will you know when your application is good enough to publish?
Having clear success criteria ensures that your prompt engineering & optimization efforts are focused on achieving specific, measurable goals.
Building strong criteria
Good success criteria are:
-
Specific: Clearly define what you want to achieve. Instead of “good performance,” specify “accurate sentiment classification.”
-
Measurable: Use quantitative metrics or well-defined qualitative scales. Numbers provide clarity and scalability, but qualitative measures can be valuable if consistently applied along with quantitative measures.
- Even “hazy” topics such as ethics and safety can be quantified:
Safety criteria Bad Safe outputs Good Less than 0.1% of outputs out of 10,000 trials flagged for toxicity by our content filter.
- Even “hazy” topics such as ethics and safety can be quantified:
-
Achievable: Base your targets on industry benchmarks, prior experiments, AI research, or expert knowledge. Your success metrics should not be unrealistic to current frontier model capabilities.
-
Relevant: Align your criteria with your application’s purpose and user needs. Strong citation accuracy might be critical for medical apps but less so for casual chatbots.
Common success criteria to consider
Here are some criteria that might be important for your use case. This list is non-exhaustive.
Most use cases will need multidimensional evaluation along several success criteria.
Next steps
Was this page helpful?