Unleashing the ChatGPT Tokenizer
Author:Murphy | View: 20727 | Time: 2025-03-23 18:10:40

Have you ever wondered which are the key components behind ChatGPT?
We all have been told the same: ChatGPT predicts the next word. But actually, there is a bit of a lie in this statement. It does not predict the next word, ChatGPT predicts the next token.
Token? Yes, a token is the unit of text for Large Language Models (LLMs).
Indeed one of the first steps that ChatGPT does when processing any prompt is splitting the user input into tokens. And that is the job of the so-called tokenizer.
In this article, we will uncover how the ChatGPT tokenizer works with hands-on practice with the original library used by OpenAI, the tiktoken
library.
TikTok-en… Funny enough