How Hostinger evaluates AI applications with Braintrust

Albert Zhang

27 February 2024

Hostinger is a leading provider of web hosting solutions, serving over a million creators from 150+ countries. At Braintrust, we've had the opportunity to work closely with Hostinger as they work on exciting AI applications, like an AI Customer Support chatbot that now handles over 40% of support chat conversations.

In the video below, Liucija (Senior Data Scientist on the AI team @ Hostinger) explains how she approaches AI evaluations.

Hostinger's Approach to Evaluations (0:10)

As Liucija explains in the video, the Hostinger team has 3 main goals when they run LLM evaluations:

Evaluate how offline changes impact application behavior during development, including running evaluations as part of the CI/CD process
Continually assess the performance of live AI applications
Identify and prioritize emerging issues to further improve the application

What Evaluations does Hostinger Run? (1:47)

The Hostinger team runs a variety of evaluations. Some examples include:

Component checks (URLs, correct doc)
Automated comparison vs. expected answers (factuality, truthfulness)
Automated evaluations without expected answers (sentiment analysis, safety)
Internal human review (thumbs up/down, comments)
Live user feedback (thumbs up/down, comments)

How Braintrust Supports Hostinger (2:40)

After a quick setup process (Liucija got started by herself!), the Hostinger team now leverages Braintrust to run evaluations, track progress using a visual dashboard, drill down into specific improvements / regressions, manage / version datasets, and log production data. Braintrust saves the Hostinger team hours of manual evaluation a day, allowing Hostinger to work on 3x more AI features with high quality assurance.

If you are building AI applications and want to see how Braintrust can help, check out our docs or get in touch!