Using Python functions to extract text from images
From digitizing and archiving images of your handwritten notes, to automating invoice processing, there are a multitude of reasons you’d want to extract text from an image. You could use an LLM for image processing, but doing so can sometimes be inaccurate, expensive, and slow. Optical character recognition, or OCR, is a great pre-processing step that allows you to convert raw image data into text that can then be processed or summarized by an LLM.
Maybe you find the perfect recipe on the internet, but it’s surrounded by ads and people’s life stories, or you want to digitize an old recipe written by your grandmother.
You also want the recipes to be in a specific format so you can quickly shop for groceries. You build an app to accomplish this, but you're getting mixed results. In Braintrust, you can create the OCR text extraction tool and experiment with different LLM prompts side-by-side in the playground. This way, your formatting will be just right, and you can deploy the perfect version. Let’s walk through this workflow step by step.
Getting started
To get started, you'll need a few accounts:
and python
and pip
installed locally. If you'd like to follow along in code,
the tool-ocr
project contains a working example with all the code snippets we'll use.
Clone the repo
To start, clone the repo and install the dependencies:
Next, create a .env
file with your Braintrust API key:
Finally, make sure to set your OPENAI_API_KEY
environment variable in the AI providers section
of your account.
Creating an OCR tool
Optical character recognition, or OCR, is any type of technology that converts images of typed, handwritten or printed text into machine-encoded text. There are many well known libraries for OCR — in this cookbook, we’ll use OCR.Space, a free API you can use for testing without creating an account.
For this cookbook, we're using the free version of OCR.Space that limits the number of requests. You may exceed rate limits and need to upgrade your account to experiment further with this application.
In Braintrust, you can create tools and then run them in the UI, API, and, of course, via prompts. This will make it easier to iterate on your prompt without having to worry about the OCR logic.
The OCR tool is defined in ocr.py
:
In just a few lines of code, it takes an image URL, parses and extracts the text, and returns the text contained in the image.
To push the tool to Braintrust along with all its dependencies, run:
Try out the tool
To try out the tool, visit the toolOCR project in Braintrust, and navigate to the Tools section of your Library. Here, you can test different images and see what kinds of outputs you're getting from the tool.
This is helpful information for deciding if you'd like to do any additional post processing to the text output. For example, you may notice that your output contains /n
to indicate new lines in the parsed text. You could include additional processing in your tool to do this. If you change your code, just run braintrust push ocr.py --requirements requirements.txt
again to sync the tool with Braintrust.
Try out the prompt
When we pushed the tool to Braintrust, we also included an initial definition of the prompt:
Just like the tool, you can run it in the UI and even try it out on some examples:
If you visit the Logs tab, you can check out detailed logs for each call:
We recommend using code-based prompts to initialize projects, but we'll show how convenient it is to tweak your prompts in the UI in a moment.
Create a playground
To try out the prompt together with some data, we'll create a playground. Scroll to the bottom of your prompt modal and select Create playground with prompt.
In the tool-ocr
project, we set up a script for you that will upload a sample dataset of recipe images. To upload the dataset to Braintrust, run:
Then, navigate to Dataset in your playground and select the Recipes dataset.
Your playground is now set up with a prompt, model choice, dataset, and the tool we created. Hit Run to run the prompt and tool on the images in the dataset.
Iterating on the prompt
Now that we have an interactive environment to test out our prompt and tool call, we can tweak the prompt and model until we get the desired results.
Hit the copy icon to duplicate your prompt and start tweaking. You can also tweak the original prompt and save your changes there if you'd like. For example, you can try instructing the model to always list the quantity of each ingredient you need to purchase.
Once you're satisfied with the prompt, hit Update to save the changes. Each time you save the prompt, you create a new version. To learn more about how to use a prompt in your code, check out the prompts guide.
Next steps
Now that you've written tool and prompt Python functions in Braintrust, you can:
- Deploy the prompt in your app
- Conduct more detailed evaluations
- Learn about logging LLM calls to create a data flywheel