Tokens

08 of december of 2023

Disclaimer: This post has been translated to English using a machine translation model. Please, let me know if you find any mistakes.

Now that LLMs are on the rise, we keep hearing about the number of tokens each model supports, but what are tokens? They are the smallest units of representation of words.

To explain what tokens are, let's first look at a practical example using the OpenAI tokenizer, called tiktoken.

So, first we install the package:

pip install tiktoken

Once installed, we create a tokenizer using the cl100k_base model, which in the example notebook How to count tokens with tiktoken explains is used by the models gpt-4, gpt-3.5-turbo and text-embedding-ada-002

	
		import tiktoken
 
encoder = tiktoken.get_encoding("cl100k_base")
	
	
		
	
	Copied

Now we create an example word to tokenize it

	
		example_word = "breakdown"
	
	
		
	
	Copied

And we tokenize it

	
		tokens = encoder.encode(example_word)
tokens
	
	
		
	
	Copied

	
		[9137, 2996]

The word has been split into 2 tokens, the 9137 and the 2996. Let's see which words they correspond to.

	
		word1 = encoder.decode([tokens[0]])
word2 = encoder.decode([tokens[1]])
word1, word2
	
	
		
	
	Copied

	
		('break', 'down')

The OpenAI tokenizer has split the word breakdown into the words break and down. That is, it has divided the word into 2 simpler ones.

This is important, as when it is said that an LLM supports x tokens, it does not mean that it supports x words, but rather that it supports x minimal units of word representation.

If you have a text and want to see the number of tokens it has for the OpenAI tokenizer, you can check it on the Tokenizer page, which displays each token in a different color.

We have seen the tokenizer of OpenAI, but each LLM may use a different one.

As we have said, the tokens are the minimal units of representation of words, so let's see how many distinct tokens tiktoken has.

	
		n_vocab = encoder.n_vocab
print(f"Vocab size: {n_vocab}")
	
	
		
	
	Copied

	
		Vocab size: 100277

Let's see how it tokenizes another type of words

	
		def encode_decode(word):
    tokens = encoder.encode(word)
    decode_tokens = []
    for token in tokens:
        decode_tokens.append(encoder.decode([token]))
    return tokens, decode_tokens
	
	
		
	
	Copied

	
		word = "dog"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "tomorrow..."
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "artificial intelligence"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "Python"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "12/25/2023"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "😊"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
	
	
		
	
	Copied

	
		Word: dog ==&gt; tokens: [18964], decode_tokens: ['dog']
Word: tomorrow... ==&gt; tokens: [38501, 7924, 1131], decode_tokens: ['tom', 'orrow', '...']
Word: artificial intelligence ==&gt; tokens: [472, 16895, 11478], decode_tokens: ['art', 'ificial', ' intelligence']
Word: Python ==&gt; tokens: [31380], decode_tokens: ['Python']
Word: 12/25/2023 ==&gt; tokens: [717, 14, 914, 14, 2366, 18], decode_tokens: ['12', '/', '25', '/', '202', '3']
Word: 😊 ==&gt; tokens: [76460, 232], decode_tokens: ['�', '�']

Finally, we will see it with words in another language

	
		word = "perro"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "perra"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "mañana..."
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "inteligencia artificial"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "Python"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "12/25/2023"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
 
word = "😊"
tokens, decode_tokens = encode_decode(word)
print(f"Word: {word} ==&gt; tokens: {tokens}, decode_tokens: {decode_tokens}")
	
	
		
	
	Copied

	
		Word: perro ==&gt; tokens: [716, 299], decode_tokens: ['per', 'ro']
Word: perra ==&gt; tokens: [79, 14210], decode_tokens: ['p', 'erra']
Word: mañana... ==&gt; tokens: [1764, 88184, 1131], decode_tokens: ['ma', 'ñana', '...']
Word: inteligencia artificial ==&gt; tokens: [396, 39567, 8968, 21075], decode_tokens: ['int', 'elig', 'encia', ' artificial']
Word: Python ==&gt; tokens: [31380], decode_tokens: ['Python']
Word: 12/25/2023 ==&gt; tokens: [717, 14, 914, 14, 2366, 18], decode_tokens: ['12', '/', '25', '/', '202', '3']
Word: 😊 ==&gt; tokens: [76460, 232], decode_tokens: ['�', '�']

We can see that for similar words, Spanish generates more tokens than English, so for the same text, with a similar number of words, the number of tokens will be greater in Spanish than in English.

Continue reading

MCP Elicitation: Implementing Elicitation in Servers with FastMCP and Python

Learn how to implement elicitation in MCP (Model Context Protocol) servers with FastMCP. Complete step-by-step tutorial to create an intelligent travel booking agent that requests user information interactively. Includes server and client code, virtual environment setup with uv, and practical elicitation examples for real-time user data collection.

MCP Durability: Server and Client with Persistence for Long-Running Tasks

Learn to build durable MCP server and client for long-running tasks with persistence. Complete Model Context Protocol tutorial featuring SQLite data persistence, background task management, and real-time monitoring. Implement data migration, batch processing, and ML model training that survive server restarts. Python code examples using FastMCP, resources, tools, and durability patterns for enterprise applications.

Resumable MCP: How to Build Servers and Clients with Automatic Checkpoints

Learn to build resumable MCP servers and clients with automatic checkpoint capabilities. Complete tutorial on implementing task interruption handling, state persistence, and recovery for long-running processes in Model Control Protocol. Includes practical code with FastMCP, persistent session management, and real-world examples for processes that can be interrupted and resumed from where they left off.

Last posts -->

Have you seen these projects?

Horeca chatbot

Naviground

Subtify

View all projects -->

Do you want to apply AI in your project? Contact me!

Do you want to watch any talk?

Tomorrow's Agents: Deciphering the Mysteries of Planning, UX and Memory

AI agents, powered by LLMs, promise to transform applications. But are they simple executors today or future intelligent collaborators? To reach their true potential, we must overcome critical barriers. This talk delves into the three puzzles that will define the next generation of agents: 1. Advanced Planning (The Brain): Today's agents often stumble on complex tasks. We'll explore how, beyond basic function calls, cognitive architectures enable robust plans, anticipation of problems, and deep reasoning. How do we make them think several steps ahead? 2. Revolutionary UX (The Soul): Interacting with an agent cannot be a source of frustration. We'll discuss how to transcend traditional chat toward human-on-the-loop interfaces—collaborative, generative, and accessible UX. How to Design Engaging Experiences? 3. Persistent Memory (The Legacy): An agent that forgets what it's learned is doomed to inefficiency. We'll look at techniques for empowering agents with meaningful memory that goes beyond their history, enabling them to learn and making each interaction smarter. With practical examples, we'll not only understand the magnitude of these challenges, but we'll also take away concrete ideas and a clear vision to help build the agents of tomorrow: smarter, more intuitive, and truly capable. Will you join us on the journey to unravel the next chapter of AI agents?

Create your own Apple intelligence

Learn to create an IA system to execute efficiently on a device

Last talks -->

Do you want to improve with these tips?

o1 prompt engineering

Create better prompts for o1 following an example

Memory profiler

See the memory usage of a script

DataLoader with pin_memory and num_workers

Increase DataLoader performance with pin_memory and num_workers

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.