How To Use Hugging Face Agents For NLP Tasks

Hugging Face, the open-source AI community for Machine Learning practitioners, recently integrated the concept of tools and agents into its popular Transformers library. If you have already used Hugging Face for Natural Language Processing (NLP), computer vision and audio/speech processing tasks, you may be wondering what value tools and agents add to the ecosystem. Agents add an arguably major level of convenience for the users – let me explain.
Let's say I wanted to use a model from Hugging Face Hub to create translations from English to French. In such a case, I would need to do some research to find a good model, then figure out how to actually use that model and finally code it up and generate the translation. But what if you already had a Hugging Face expert at your disposal that already knew all of that? In other words, you would just tell the expert that you want to translate a sentence from English to French and then the expert would take care of finding a good model, coding up the problem and returning the results – and the expert is much faster than what you and I could do. That is exactly what agents do! We describe what we want to the agent in plain English and then the agent looks into the tools available in its toolbox and executes it! This is very similar to asking ChatGPT to translate a sentence and then ChatGPT takes care of the rest. But now, instead of being limited to the handful of models that ChatGPT uses (i.e. Open AI models, such as GPT 3.5 or GPT-4), agents have access to many models available on Hugging Face.
Now that we understand what agents and tools do, let's look at how this can be implemented.
Transformers Agent – Implementation
For this section, I have mainly relied on the Hugging Face documentations about agents and have implemented them using my own examples.
Step 1 – Requirements
Let's start with importing a few libraries that we will be using for this exercise. Note that I have included the versions of these libraries in the results, in case you want to create an identical environment.
import transformers, huggingface_hub, diffusers, torch
from platform import python_version
print(f'python: {python_version()}')
print(f'transformers: {transformers.__version__}')
print(f'huggingface_hub: {huggingface_hub.__version__}')
print(f'diffusers: {diffusers.__version__}')
print(f'torch: {torch.__version__}')
Results:

If your environment does not have one or some of these libraries, then you could get an error from the above code block. If so, you can go ahead and run the cell below to install the libraries. I have made the assumption that if you are reading this post, your environment has Python so it is not included below.
Pro Tip: If you need to run one line of command, such as installing transformers in a Jupyter notebook or similar, you could use
!pip install transformers==4.29.0
. Instead of adding!
to the beginning of each command line in the code block below, I have used the%%sh
, which is a magic command indicating all the contents of this cell are to be run as commands.
%%sh
pip install transformers==4.29.0
pip install huggingface_hub==0.14.1
pip install diffusers==0.16.1
pip install --upgrade torch torchvision
pip install openai==0.27.6
Step 2 – Hugging Face Login
Now that our environment is ready, we need to login to Hugging Face to have access to their inference API. This step requires a free Hugging Face token. If you do not have one, you can follow the instructions in this link (this took me less than 5 minutes) to create one for yourself.
Let's log in.
import huggingface_hub
from huggingface_hub import login
my_hf_token = 'ADD_YOUR_TOKEN_HERE'
login(my_hf_token)
Results:

Results indicate that login was successful.
In the next step, we will instantiate the agent. Note that Hugging Face supports various agents (which is essentially a large language model or LLM). Some are behind a pay wall, such as Open AI's and some are open-source, such as BigCode and OpenAssistant. For this post, I have selected one of the free and open-source options from BigCode called Starcoder, since this will be more convenient for those getting started to experiment with such models. If you are interested in using other agents, Hugging Face has an easy-to-read tutorial linked here.
Let's continue to the next step and instantiate our agent.
Step 3 – Instantiate Agent
Code block below instantiates the Starcoder agent from BigCode.
import transformers
from transformers import HfAgent
# Starcoder
agent_starcoder = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
Now that our agent is ready, let's see what it can do!
Step 4 – Run Tasks
We are going to ask the model to generate a picture as the very first task. What we do is we communicate what we want to the agent and the agent will make it happen. Let's look at an example.
task = "Draw me a picture of the ocean"
picture_ocean = agent_starcoder.run(task)
picture_ocean
Results:


That is quite interesting! Looking at the results, we can see the agent explains some of the steps it takes. For example, the agent uses the image_generator
tool to generate the picture that we asked for.
We mentioned that our agent is an LLM and we know that outputs are randomly generated and are expected to change as we run the model again. Let's see what we will get if we run the same task another time.
picture_ocean = agent_starcoder.run(task)
picture_ocean
Results:


As expected, the pictures are different. But what if after seeing the picture, we want to make a change to that picture? For example, it would be nice to see a ship in this picture. Let's ask our agent to add a ship to the same picture.
picture_ocean_updated = agent.run("Transform the image in `picture_ocean` to add a ship to it.", picture=picture_ocean)
picture_ocean_updated
Results:


As we see, the agent used a different tool this time called image_transform
, since it no longer needs to generate an entire image and rather it transforms the provided image by adding a ship to it. We can see the small ship in the top left side quadrant of the picture. Not bad at all!
But at this point you may be asking what is the agent actually doing? We will answer that question next.
Step 5 – What Do Agents Actually Do?
As we learned earlier, agents are LLMs that are performing tasks given our provided prompts. In other words, agents receive our prompts and then based on what is being asked, the agents collect the tools they believe will be helpful and then run such a code. Hugging Face provides a way of looking at the agents' code by adding return_code=True
to our run command. In other words, we can ask the agent to just return the code block and then we can modify and/or run the code ourselves.
Let's re-run our command as follows and look at the results:
task = "Draw me a picture of the ocean"
agent_starcoder.run(task, return_code=True)
Results:

I have cleaned up and re-written what the agent returned as follows:
from transformers import load_tool
image_generator = load_tool("huggingface-tools/text-to-image")
image = image_generator(prompt="ocean")
Let's run this code and look at the results.
from transformers import load_tool
image_generator = load_tool("huggingface-tools/text-to-image")
image = image_generator(prompt="ocean")
image
Results:


That worked as expected. In other words, we no longer need to rely on the agent to create that code for us and we can directly use the huggingface-tools/text-to-image
tool to generate pictures.
Now that we understand how agents pick their tools and work, let's look at some additional tools.
Step 6 – Additional Tools
In this section, we will cover a few examples of other tools that agents can use, as follows:
- Image captioning
- Question answering
- Translation
- Mixed Requests
6.1. Image Captioning
This is a fun exercise. First, we will use the text-to-image
tool to generate a picture of a toy car. Then we will save that picture and ask the agent to caption it.
Let's start with creating a picture of a toy car.
from transformers import load_tool
image_generator = load_tool("huggingface-tools/text-to-image")
image_generator(prompt="toy car")
Results:


Then, I saved it to the local drive and then will read it using the Python Imaging Library or PIL. After that, we will prompt the agent to caption that image.
Below is the code block:
from PIL import Image
task = 'Caption the following image'
image = Image.open('/content/picture.png')
agent_starcoder.run(task, image=image)
Results:

Look at the very last sentence in the bottom – That is a good caption! The agent is describing the car, which seems pretty accurate. In case you are curious about what is going on under the hood, let's have the agent return the code to see what tools are being used and how, as follows:
agent_starcoder.run(task, image=image, return_code=True)
Results:

Let's clean up the code block as follows:
from transformers import load_tool
image_captioner = load_tool("image-captioning")
caption = image_captioner(image)
What the agent did was to load the image-captioning
tool and then caption the image that we provided. Sounds straight forward! Let's move on to the next example.
6.2. Question Answering
Question answering is self-explanatory but let's make it more interesting. Instead of providing a paragraph to the agent and asking questions about the provided information, let's provide an image and ask the agent about the content of the image.
I wrote a few lines in a word document and then saved it as a *.jpg
image in my local. Let's first use PIL to see the image as follows:
from PIL import Image
image = Image.open('/content/jack.jpg')
image.show()
Results:

As you can see, the image has a few sentences about my imaginary friend in Seattle, named Jack. Next, let's ask a question from our agent and see how it responds. I would like to ask the agent about Jack's favorite color, as follows:
task = "in the following 'document', what is Jack's favorite color?"
agent_starcoder.run(task, document=image)
Results:

Once again, let's look at the very last sentence in the bottom – That is pretty good! We can see that text extraction is not perfect, for example, it extracted Tesla to ‘tesia' but still, the agent returned the relevant portion of the image, which answered our question.
Let's see what tools exactly the agent used:
task = "in the following 'document', what is Jack's favorite color?"
agent_starcoder.run(task, document=image, return_code=True)
Results:

Let's clean up the code as follows:
from transformers import load_tool
document_qa = load_tool("document-question-answering")
answer = document_qa(document, question="What is Jack's favorite color?")
print(f"The answer is {answer}.")
We can see that the agent used the document-question-answering
tool and then asked the question in the form of a question answering task.
Next, let's see if the agent can perform translation.
6.3. Translation
This one is pretty straight forward. Let's ask the agent to translate a sentence and then see what tools it uses.
text = 'this is a sentence in English'
task = "translate the following 'document' to French"
agent_starcoder.run(task, document=text)
Results:

Excellent! We see the result, which is the translation of the provided sentence in French. Let's look what agent used:
agent_starcoder.run(task, document=text, return_code=True)
Results:

Let me clean up the code first:
from transformers import load_tool
translator = load_tool("translation")
translated_document = translator(document, src_lang="English", tgt_lang="French")
That looks pretty straight forward. The agent uses the translation
tool and the agent recognized the source (src_lang
) and target language (tgt_lang
) correctly, based on what we asked it to do!
At this point, I was wondering whether the agent can handle more complicated tasks. We will look at that next.
6.4. Mixed Requests
What if we combine a question answering and translation? Let's ask the agent what Jack's favorite color is while asking that the answer must be returned in French.
task = "in the following 'document', what is Jack's favorite color? After you answer the question, translate the answer to French"
agent_starcoder.run(task, document=image)
Results:

Look at the last sentence – that is great! We see that the agent first returned the English answer and then translates the response to French, as we asked it to!
Conclusion
Agents and Tools are a powerful combination. I see agents benefiting both technical users (i.e. machine learning and AI practitioners) and also the non-technical users. For technical users, agents speed up the process – agents can help by selecting among many tools and return the code for the technical user to modify to their needs. On the other hand, non-technical users who are not familiar with machine learning, can simply ask what they want in plain English and the agent will take care of the rest.
I hope you enjoyed this brief tutorial on agents and tools! If you are interested in learning more about implementing Natural Language Processing (NLP) tasks in Hugging Face, check out the post below.
Thanks for Reading!
If you found this post helpful, please follow me on Medium and subscribe to receive my latest posts!
(All images, unless otherwise noted, are by the author.)