How To Use Hugging Face Agents For NLP Tasks

Author:Murphy | View: 21365 | Time: 2025-03-23 18:35:07

Hugging Face, the open-source AI community for Machine Learning practitioners, recently integrated the concept of tools and agents into its popular Transformers library. If you have already used Hugging Face for Natural Language Processing (NLP), computer vision and audio/speech processing tasks, you may be wondering what value tools and agents add to the ecosystem. Agents add an arguably major level of convenience for the users – let me explain.

Let's say I wanted to use a model from Hugging Face Hub to create translations from English to French. In such a case, I would need to do some research to find a good model, then figure out how to actually use that model and finally code it up and generate the translation. But what if you already had a Hugging Face expert at your disposal that already knew all of that? In other words, you would just tell the expert that you want to translate a sentence from English to French and then the expert would take care of finding a good model, coding up the problem and returning the results – and the expert is much faster than what you and I could do. That is exactly what agents do! We describe what we want to the agent in plain English and then the agent looks into the tools available in its toolbox and executes it! This is very similar to asking ChatGPT to translate a sentence and then ChatGPT takes care of the rest. But now, instead of being limited to the handful of models that ChatGPT uses (i.e. Open AI models, such as GPT 3.5 or GPT-4), agents have access to many models available on Hugging Face.

Now that we understand what agents and tools do, let's look at how this can be implemented.

Join Medium with my referral link

Transformers Agent – Implementation

For this section, I have mainly relied on the Hugging Face documentations about agents and have implemented them using my own examples.

Step 1 – Requirements

Let's start with importing a few libraries that we will be using for this exercise. Note that I have included the versions of these libraries in the results, in case you want to create an identical environment.

import transformers, huggingface_hub, diffusers, torch
from platform import python_version
print(f'python: {python_version()}')
print(f'transformers: {transformers.__version__}')
print(f'huggingface_hub: {huggingface_hub.__version__}')
print(f'diffusers: {diffusers.__version__}')
print(f'torch: {torch.__version__}')

Results:

If your environment does not have one or some of these libraries, then you could get an error from the above code block. If so, you can go ahead and run the cell below to install the libraries. I have made the assumption that if you are reading this post, your environment has Python so it is not included below.

Pro Tip: If you need to run one line of command, such as installing transformers in a Jupyter notebook or similar, you could use !pip install transformers==4.29.0. Instead of adding ! to the beginning of each command line in the code block below, I have used the %%sh, which is a magic command indicating all the contents of this cell are to be run as commands.

%%sh

pip install transformers==4.29.0

pip install huggingface_hub==0.14.1

pip install diffusers==0.16.1

pip install --upgrade torch torchvision

pip install openai==0.27.6

Step 2 – Hugging Face Login

Now that our environment is ready, we need to login to Hugging Face to have access to their inference API. This step requires a free Hugging Face token. If you do not have one, you can follow the instructions in this link (this took me less than 5 minutes) to create one for yourself.

Let's log in.

import huggingface_hub
from huggingface_hub import login

my_hf_token = 'ADD_YOUR_TOKEN_HERE'

login(my_hf_token)

Results:

Results indicate that login was successful.

In the next step, we will instantiate the agent. Note that Hugging Face supports various agents (which is essentially a large language model or LLM). Some are behind a pay wall, such as Open AI's and some are open-source, such as BigCode and OpenAssistant. For this post, I have selected one of the free and open-source options from BigCode called Starcoder, since this will be more convenient for those getting started to experiment with such models. If you are interested in using other agents, Hugging Face has an easy-to-read tutorial linked here.

Let's continue to the next step and instantiate our agent.

Step 3 – Instantiate Agent

Code block below instantiates the Starcoder agent from BigCode.

import transformers
from transformers import HfAgent

# Starcoder
agent_starcoder = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")

Now that our agent is ready, let's see what it can do!

Step 4 – Run Tasks

We are going to ask the model to generate a picture as the very first task. What we do is we communicate what we want to the agent and the agent will make it happen. Let's look at an example.

task = "Draw me a picture of the ocean"

picture_ocean = agent_starcoder.run(task)

picture_ocean

Results:

Picture of the ocean, generated by Hugging Face Agent

That is quite interesting! Looking at the results, we can see the agent explains some of the steps it takes. For example, the agent uses the image_generator tool to generate the picture that we asked for.

We mentioned that our agent is an LLM and we know that outputs are randomly generated and are expected to change as we run the model again. Let's see what we will get if we run the same task another time.

picture_ocean = agent_starcoder.run(task)

picture_ocean

Results:

As expected, the pictures are different. But what if after seeing the picture, we want to make a change to that picture? For example, it would be nice to see a ship in this picture. Let's ask our agent to add a ship to the same picture.

picture_ocean_updated = agent.run("Transform the image in `picture_ocean` to add a ship to it.", picture=picture_ocean)

picture_ocean_updated

Results:

Picture of the ocean and a ship, generated by Hugging Face Agent

As we see, the agent used a different tool this time called image_transform, since it no longer needs to generate an entire image and rather it transforms the provided image by adding a ship to it. We can see the small ship in the top left side quadrant of the picture. Not bad at all!

But at this point you may be asking what is the agent actually doing? We will answer that question next.

Step 5 – What Do Agents Actually Do?

As we learned earlier, agents are LLMs that are performing tasks given our provided prompts. In other words, agents receive our prompts and then based on what is being asked, the agents collect the tools they believe will be helpful and then run such a code. Hugging Face provides a way of looking at the agents' code by adding return_code=True to our run command. In other words, we can ask the agent to just return the code block and then we can modify and/or run the code ourselves.

Let's re-run our command as follows and look at the results:

task = "Draw me a picture of the ocean"

agent_starcoder.run(task, return_code=True)

Results:

I have cleaned up and re-written what the agent returned as follows:

from transformers import load_tool
image_generator = load_tool("huggingface-tools/text-to-image")
image = image_generator(prompt="ocean")

Let's run this code and look at the results.


from transformers import load_tool

image_generator = load_tool("huggingface-tools/text-to-image")

image = image_generator(prompt="ocean")

image

Results:

That worked as expected. In other words, we no longer need to rely on the agent to create that code for us and we can directly use the huggingface-tools/text-to-image tool to generate pictures.

Now that we understand how agents pick their tools and work, let's look at some additional tools.

Step 6 – Additional Tools

In this section, we will cover a few examples of other tools that agents can use, as follows:

Image captioning
Question answering
Translation
Mixed Requests

6.1. Image Captioning

This is a fun exercise. First, we will use the text-to-image tool to generate a picture of a toy car. Then we will save that picture and ask the agent to caption it.

Let's start with creating a picture of a toy car.

from transformers import load_tool

image_generator = load_tool("huggingface-tools/text-to-image")

image_generator(prompt="toy car")

Results:

Picture of a toy car, generated by Hugging Face Agent

Then, I saved it to the local drive and then will read it using the Python Imaging Library or PIL. After that, we will prompt the agent to caption that image.

Below is the code block:

from PIL import Image

task = 'Caption the following image'

image = Image.open('/content/picture.png')

agent_starcoder.run(task, image=image)

Results:

Look at the very last sentence in the bottom – That is a good caption! The agent is describing the car, which seems pretty accurate. In case you are curious about what is going on under the hood, let's have the agent return the code to see what tools are being used and how, as follows:

agent_starcoder.run(task, image=image, return_code=True)

Results:

Let's clean up the code block as follows:

from transformers import load_tool
image_captioner = load_tool("image-captioning")
caption = image_captioner(image)

What the agent did was to load the image-captioning tool and then caption the image that we provided. Sounds straight forward! Let's move on to the next example.

6.2. Question Answering

Question answering is self-explanatory but let's make it more interesting. Instead of providing a paragraph to the agent and asking questions about the provided information, let's provide an image and ask the agent about the content of the image.

I wrote a few lines in a word document and then saved it as a *.jpg image in my local. Let's first use PIL to see the image as follows:

from PIL import Image
image = Image.open('/content/jack.jpg')
image.show()

Results:

As you can see, the image has a few sentences about my imaginary friend in Seattle, named Jack. Next, let's ask a question from our agent and see how it responds. I would like to ask the agent about Jack's favorite color, as follows:

task = "in the following 'document', what is Jack's favorite color?"

agent_starcoder.run(task, document=image)

Results:

Once again, let's look at the very last sentence in the bottom – That is pretty good! We can see that text extraction is not perfect, for example, it extracted Tesla to ‘tesia' but still, the agent returned the relevant portion of the image, which answered our question.

Let's see what tools exactly the agent used:

task = "in the following 'document', what is Jack's favorite color?"

agent_starcoder.run(task, document=image, return_code=True)

Results:

Let's clean up the code as follows:

from transformers import load_tool

document_qa = load_tool("document-question-answering")

answer = document_qa(document, question="What is Jack's favorite color?")
print(f"The answer is {answer}.")

We can see that the agent used the document-question-answering tool and then asked the question in the form of a question answering task.

Next, let's see if the agent can perform translation.

6.3. Translation

This one is pretty straight forward. Let's ask the agent to translate a sentence and then see what tools it uses.

text = 'this is a sentence in English'

task = "translate the following 'document' to French"

agent_starcoder.run(task, document=text)

Results:

Excellent! We see the result, which is the translation of the provided sentence in French. Let's look what agent used:

agent_starcoder.run(task, document=text, return_code=True)

Results:

Let me clean up the code first:

from transformers import load_tool
translator = load_tool("translation")
translated_document = translator(document, src_lang="English", tgt_lang="French")

That looks pretty straight forward. The agent uses the translation tool and the agent recognized the source (src_lang) and target language (tgt_lang) correctly, based on what we asked it to do!

At this point, I was wondering whether the agent can handle more complicated tasks. We will look at that next.

6.4. Mixed Requests

What if we combine a question answering and translation? Let's ask the agent what Jack's favorite color is while asking that the answer must be returned in French.

task = "in the following 'document', what is Jack's favorite color? After you answer the question, translate the answer to French"

agent_starcoder.run(task, document=image)

Results:

Look at the last sentence – that is great! We see that the agent first returned the English answer and then translates the response to French, as we asked it to!

Conclusion

Agents and Tools are a powerful combination. I see agents benefiting both technical users (i.e. machine learning and AI practitioners) and also the non-technical users. For technical users, agents speed up the process – agents can help by selecting among many tools and return the code for the technical user to modify to their needs. On the other hand, non-technical users who are not familiar with machine learning, can simply ask what they want in plain English and the agent will take care of the rest.

I hope you enjoyed this brief tutorial on agents and tools! If you are interested in learning more about implementing Natural Language Processing (NLP) tasks in Hugging Face, check out the post below.