Wan2.1-T2V-14B: Video Generation with HF

06 of march of 2025

Disclaimer: This post has been translated to English using a machine translation model. Please, let me know if you find any mistakes.

It is clear that the largest hub of AI models is Hugging Face. And now they are offering the possibility to perform inference on some of their models using serverless GPU providers.

One of those models is Wan-AI/Wan2.1-T2V-14B, which as of writing this post is the best open-source video generation model, as can be seen in the Artificial Analysis Video Generation Arena Leaderboard

If we look at its modelcard we can see on the right a button that says Replicate.

Inference providers

If we go to the Inference providers settings page, we will see something like this

Where we can press the button with a key to enter the API KEY of the provider we want to use, or leave selected the path with two dots. If we choose the first option, it will be the provider who charges us for the inference, while in the second option, it will be Hugging Face who charges us for the inference. So do what suits you best.

Inference with Replicate

In my case, I obtained an API KEY from Replicate and added it to a file called .env, which is where I will store the API KEYS and which you should not upload to GitHub, GitLab, or your project repository.

The .env must have this format

HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS="hf_aL...AY"
REPLICATE_API_KEY="r8_Sh...UD"

Where HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS is a token you need to obtain from Hugging Face and REPLICATE_API_KEY is the API KEY of Replicate which you can obtain from Replicate.

Reading the API Keys

The first thing we need to do is read the API KEYS from the .env file

	
		
			< >
			Input
		
		
			Python
			
		
	
	
		import os
import dotenv
dotenv.load_dotenv()
 
REPLICATE_API_KEY = os.getenv("REPLICATE_API_KEY")
HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS = os.getenv("HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS")
	
	Copied

Logging in the Hugging Face hub

To be able to use the Wan-AI/Wan2.1-T2V-14B model, as it is on the Hugging Face hub, we need to log in.

	
		
			< >
			Input
		
		
			Python
			
		
	
	
		from huggingface_hub import login
login(HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS)
	
	Copied

Inference Client

Now we create an inference client, we have to specify the provider, the API KEY and in this case, additionally, we are going to set a timeout of 1000 seconds, because by default it is 60 seconds and the model takes quite a while to generate the video.

	
		
			< >
			Input
		
		
			Python
			
		
	
	
		from huggingface_hub import InferenceClient
 
client = InferenceClient(
	provider="replicate",
	api_key=REPLICATE_API_KEY,
	timeout=1000
)
	
	Copied

Video Generation

We already have everything to generate our video. We use the text_to_video method of the client, pass it the prompt, and tell it which model from the hub we want to use; if not, it will use the default one.

	
		
			< >
			Input
		
		
			Python
			
		
	
	
		video = client.text_to_video(
	"Funky dancer, dancing in a rehearsal room. She wears long hair that moves to the rhythm of her dance.",
	model="Wan-AI/Wan2.1-T2V-14B",
)
	
	Copied

Saving the video

Finally, we save the video, which is of type bytes, to a file on our disk.

	
		
			< >
			Input
		
		
			Python
			
		
	
	
		output_path = "output_video.mp4"
with open(output_path, "wb") as f:
    f.write(video)
print(f"Video saved to: {output_path}")
	
	Copied

>_ Output

			
				Video saved to: output_video.mp4

Generated video

This is the video generated by the model

Continue reading

Deep Research with LangGraph: Build an AI Research Assistant from Scratch

Learn how neural networks work from scratch with a practical linear regression example. This beginner-friendly tutorial explains artificial neurons, parameter initialization, loss functions, and mean squared error (MSE) with step-by-step code examples in Python.

MCP Elicitation: Implementing Elicitation in Servers with FastMCP and Python

Learn how to implement elicitation in MCP (Model Context Protocol) servers with FastMCP. Complete step-by-step tutorial ...

MCP Durability: Server and Client with Persistence for Long-Running Tasks

Learn to build durable MCP server and client for long-running tasks with persistence. Complete Model Context Protocol tu...

Last posts -->

Have you seen these projects?

Gymnasia

Horeca chatbot

Naviground

View all projects -->

>_ Available for projects

Do you have an AI project?

Let's talk.

maximofn@gmail.com

Machine Learning and AI specialist. I develop solutions with generative AI, intelligent agents and custom models.

Write me LinkedIn

Do you want to watch any talk?

Tomorrow's Agents: Deciphering the Mysteries of Planning, UX and Memory

AI agents, powered by LLMs, promise to transform applications. But are they simple executors today or future intelligent collaborators? To reach their...

Create your own Apple intelligence

Learn to create an IA system to execute efficiently on a device

Last talks -->

Do you want to improve with these tips?

Best practices building agents with Claude Code

Technical talk: skills, subagents, slash commands and MCPs in Claude Code

o1 prompt engineering

Create better prompts for o1 following an example

Memory profiler

See the memory usage of a script

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.