Accelerate: Saving, Mixed Precision and Inference (2/2)

Accelerate: Saving, Mixed Precision and Inference (2/2)

**Hugging Face Accelerate Series**

  1. Installation, configuration, and training2. 👉 Saving, mixed precision, and inference (you are here)

Disclaimer: This post has been translated to English using a machine translation model. Please, let me know if you find any mistakes.

This is the **second part** of the series on Hugging Face Accelerate. In the first part we saw the installation, configuration, and how to adapt a training loop to run on multiple GPUs/TPUs, as well as execution control between processes. Here we continue with saving and loading models, mixed precision training (FP16/BF16/FP8), and inference.

Save and load the state dictlink image 15

When we train, sometimes we save the state so we can continue at another time

To save the state, we will have to use the save_state() and load_state() methods

	
< > Input
Python
%%writefile accelerate_scripts/10_accelerate_save_and_load_checkpoints.py
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import evaluate
import tqdm
# Importamos e inicializamos Accelerator
from accelerate import Accelerator
accelerator = Accelerator()
dataset = load_dataset("tweet_eval", "emoji")
num_classes = len(dataset["train"].info.features["label"].names)
max_len = 130
checkpoints = "cardiffnlp/twitter-roberta-base-irony"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)
def tokenize_function(dataset):
return tokenizer(dataset["text"], max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
tokenized_dataset = {
"train": dataset["train"].map(tokenize_function, batched=True, remove_columns=["text"]),
"validation": dataset["validation"].map(tokenize_function, batched=True, remove_columns=["text"]),
"test": dataset["test"].map(tokenize_function, batched=True, remove_columns=["text"]),
}
tokenized_dataset["train"].set_format(type="torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_dataset["validation"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
BS = 128
dataloader = {
"train": DataLoader(tokenized_dataset["train"], batch_size=BS, shuffle=True),
"validation": DataLoader(tokenized_dataset["validation"], batch_size=BS, shuffle=True),
"test": DataLoader(tokenized_dataset["test"], batch_size=BS, shuffle=True),
}
model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
model.classifier.out_proj = torch.nn.Linear(in_features=model.classifier.out_proj.in_features, out_features=num_classes, bias=True)
loss_function = torch.nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=5e-4)
metric = evaluate.load("accuracy")
@accelerator.on_local_main_process
def print_something(something):
print(something)
EPOCHS = 1
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = accelerator.device
# model.to(device)
model, optimizer, dataloader["train"], dataloader["validation"] = accelerator.prepare(model, optimizer, dataloader["train"], dataloader["validation"])
for i in range(EPOCHS):
model.train()
progress_bar_train = tqdm.tqdm(dataloader["train"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_train:
optimizer.zero_grad()
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
loss = loss_function(outputs['logits'], labels)
# loss.backward()
accelerator.backward(loss)
optimizer.step()
model.eval()
progress_bar_validation = tqdm.tqdm(dataloader["validation"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_validation:
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs['logits'], axis=-1)
# Recopilamos las predicciones de todos los dispositivos
predictions = accelerator.gather_for_metrics(predictions)
labels = accelerator.gather_for_metrics(labels)
accuracy = metric.add_batch(predictions=predictions, references=labels)
accuracy = metric.compute()
# Guardamos los pesos
accelerator.save_state("accelerate_scripts/checkpoints")
print_something(f"Accuracy = {accuracy['accuracy']}")
# Cargamos los pesos
accelerator.load_state("accelerate_scripts/checkpoints")
Copied
>_ Output
			
Overwriting accelerate_scripts/09_accelerate_save_and_load_checkpoints.py

We execute it

	
< > Input
Python
!accelerate launch accelerate_scripts/10_accelerate_save_and_load_checkpoints.py
Copied
>_ Output
			
100%|█████████████████████████████████████████| 176/176 [01:58&lt;00:00, 1.48it/s]
100%|███████████████████████████████████████████| 20/20 [00:05&lt;00:00, 3.40it/s]
Accuracy = 0.2142

Save the modellink image 16

When the prepare method was used, the model was wrapped so it could be saved on the necessary devices. Therefore, when saving it, we have to use the save_model method, which first unwraps it and then saves it. Also, if we use the safe_serialization=True parameter, the model will be saved as a safe tensor

	
< > Input
Python
%%writefile accelerate_scripts/11_accelerate_save_model.py
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import evaluate
import tqdm
# Importamos e inicializamos Accelerator
from accelerate import Accelerator
accelerator = Accelerator()
dataset = load_dataset("tweet_eval", "emoji")
num_classes = len(dataset["train"].info.features["label"].names)
max_len = 130
checkpoints = "cardiffnlp/twitter-roberta-base-irony"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)
def tokenize_function(dataset):
return tokenizer(dataset["text"], max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
tokenized_dataset = {
"train": dataset["train"].map(tokenize_function, batched=True, remove_columns=["text"]),
"validation": dataset["validation"].map(tokenize_function, batched=True, remove_columns=["text"]),
"test": dataset["test"].map(tokenize_function, batched=True, remove_columns=["text"]),
}
tokenized_dataset["train"].set_format(type="torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_dataset["validation"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
BS = 128
dataloader = {
"train": DataLoader(tokenized_dataset["train"], batch_size=BS, shuffle=True),
"validation": DataLoader(tokenized_dataset["validation"], batch_size=BS, shuffle=True),
"test": DataLoader(tokenized_dataset["test"], batch_size=BS, shuffle=True),
}
model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
model.classifier.out_proj = torch.nn.Linear(in_features=model.classifier.out_proj.in_features, out_features=num_classes, bias=True)
loss_function = torch.nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=5e-4)
metric = evaluate.load("accuracy")
@accelerator.on_local_main_process
def print_something(something):
print(something)
EPOCHS = 1
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = accelerator.device
# model.to(device)
model, optimizer, dataloader["train"], dataloader["validation"] = accelerator.prepare(model, optimizer, dataloader["train"], dataloader["validation"])
for i in range(EPOCHS):
model.train()
progress_bar_train = tqdm.tqdm(dataloader["train"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_train:
optimizer.zero_grad()
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
loss = loss_function(outputs['logits'], labels)
# loss.backward()
accelerator.backward(loss)
optimizer.step()
model.eval()
progress_bar_validation = tqdm.tqdm(dataloader["validation"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_validation:
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs['logits'], axis=-1)
# Recopilamos las predicciones de todos los dispositivos
predictions = accelerator.gather_for_metrics(predictions)
labels = accelerator.gather_for_metrics(labels)
accuracy = metric.add_batch(predictions=predictions, references=labels)
accuracy = metric.compute()
# Guardamos el modelo
accelerator.wait_for_everyone()
accelerator.save_model(model, "accelerate_scripts/model", safe_serialization=True)
print_something(f"Accuracy = {accuracy['accuracy']}")
Copied
>_ Output
			
Writing accelerate_scripts/11_accelerate_save_model.py

We run it

	
< > Input
Python
!accelerate launch accelerate_scripts/11_accelerate_save_model.py
Copied
>_ Output
			
100%|█████████████████████████████████████████| 176/176 [01:58&lt;00:00, 1.48it/s]
100%|███████████████████████████████████████████| 20/20 [00:05&lt;00:00, 3.35it/s]
Accuracy = 0.214

Save the pretrained modellink image 17

In models that use the transformers library, we must save the model with the save_pretrained method in order to load it with the from_pretrained method. Before saving it, it must be unwrapped with the unwrap_model method.

	
< > Input
Python
%%writefile accelerate_scripts/12_accelerate_save_pretrained.py
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import evaluate
import tqdm
# Importamos e inicializamos Accelerator
from accelerate import Accelerator
accelerator = Accelerator()
dataset = load_dataset("tweet_eval", "emoji")
num_classes = len(dataset["train"].info.features["label"].names)
max_len = 130
checkpoints = "cardiffnlp/twitter-roberta-base-irony"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)
def tokenize_function(dataset):
return tokenizer(dataset["text"], max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
tokenized_dataset = {
"train": dataset["train"].map(tokenize_function, batched=True, remove_columns=["text"]),
"validation": dataset["validation"].map(tokenize_function, batched=True, remove_columns=["text"]),
"test": dataset["test"].map(tokenize_function, batched=True, remove_columns=["text"]),
}
tokenized_dataset["train"].set_format(type="torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_dataset["validation"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
BS = 128
dataloader = {
"train": DataLoader(tokenized_dataset["train"], batch_size=BS, shuffle=True),
"validation": DataLoader(tokenized_dataset["validation"], batch_size=BS, shuffle=True),
"test": DataLoader(tokenized_dataset["test"], batch_size=BS, shuffle=True),
}
model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
model.classifier.out_proj = torch.nn.Linear(in_features=model.classifier.out_proj.in_features, out_features=num_classes, bias=True)
loss_function = torch.nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=5e-4)
metric = evaluate.load("accuracy")
@accelerator.on_local_main_process
def print_something(something):
print(something)
EPOCHS = 1
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = accelerator.device
# model.to(device)
model, optimizer, dataloader["train"], dataloader["validation"] = accelerator.prepare(model, optimizer, dataloader["train"], dataloader["validation"])
for i in range(EPOCHS):
model.train()
progress_bar_train = tqdm.tqdm(dataloader["train"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_train:
optimizer.zero_grad()
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
loss = loss_function(outputs['logits'], labels)
# loss.backward()
accelerator.backward(loss)
optimizer.step()
model.eval()
progress_bar_validation = tqdm.tqdm(dataloader["validation"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_validation:
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs['logits'], axis=-1)
# Recopilamos las predicciones de todos los dispositivos
predictions = accelerator.gather_for_metrics(predictions)
labels = accelerator.gather_for_metrics(labels)
accuracy = metric.add_batch(predictions=predictions, references=labels)
accuracy = metric.compute()
# Guardamos el modelo pretrained
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
"accelerate_scripts/model_pretrained",
is_main_process=accelerator.is_main_process,
save_function=accelerator.save,
)
print_something(f"Accuracy = {accuracy['accuracy']}")
Copied
>_ Output
			
Writing accelerate_scripts/11_accelerate_save_pretrained.py

Let's run it

	
< > Input
Python
!accelerate launch accelerate_scripts/12_accelerate_save_pretrained.py
Copied
>_ Output
			
Map: 100%|██████████████████████| 45000/45000 [00:02&lt;00:00, 15152.47 examples/s]
Map: 100%|██████████████████████| 45000/45000 [00:02&lt;00:00, 15119.13 examples/s]
Map: 100%|████████████████████████| 5000/5000 [00:00&lt;00:00, 12724.70 examples/s]
Map: 100%|████████████████████████| 5000/5000 [00:00&lt;00:00, 12397.49 examples/s]
Map: 100%|██████████████████████| 50000/50000 [00:03&lt;00:00, 15247.21 examples/s]
Map: 100%|██████████████████████| 50000/50000 [00:03&lt;00:00, 15138.03 examples/s]
100%|█████████████████████████████████████████| 176/176 [01:59&lt;00:00, 1.48it/s]
100%|███████████████████████████████████████████| 20/20 [00:05&lt;00:00, 3.37it/s]
Accuracy = 0.21

Now we could load it.

	
< > Input
Python
from transformers import AutoModel
checkpoints = "accelerate_scripts/model_pretrained"
tokenizer = AutoModel.from_pretrained(checkpoints)
Copied
>_ Output
			
Some weights of RobertaModel were not initialized from the model checkpoint at accelerate_scripts/model_pretrained and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Notebook traininglink image 18

So far we have seen how to run scripts, but if you want to run the code in a notebook, we can write the same code as before, but encapsulated in a function

First, we import the libraries

	
< > Input
Python
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import evaluate
import tqdm
import time
# from accelerate import Accelerator
Copied

Now we create the function

	
< > Input
Python
def train_code(batch_size: int = 64):
from accelerate import Accelerator
accelerator = Accelerator()
dataset = load_dataset("tweet_eval", "emoji")
num_classes = len(dataset["train"].info.features["label"].names)
max_len = 130
checkpoints = "cardiffnlp/twitter-roberta-base-irony"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)
def tokenize_function(dataset):
return tokenizer(dataset["text"], max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
tokenized_dataset = {
"train": dataset["train"].map(tokenize_function, batched=True, remove_columns=["text"]),
"validation": dataset["validation"].map(tokenize_function, batched=True, remove_columns=["text"]),
"test": dataset["test"].map(tokenize_function, batched=True, remove_columns=["text"]),
}
tokenized_dataset["train"].set_format(type="torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_dataset["validation"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
BS = batch_size
dataloader = {
"train": DataLoader(tokenized_dataset["train"], batch_size=BS, shuffle=True),
"validation": DataLoader(tokenized_dataset["validation"], batch_size=BS, shuffle=True),
"test": DataLoader(tokenized_dataset["test"], batch_size=BS, shuffle=True),
}
model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
model.classifier.out_proj = torch.nn.Linear(in_features=model.classifier.out_proj.in_features, out_features=num_classes, bias=True)
loss_function = torch.nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=5e-4)
metric = evaluate.load("accuracy")
EPOCHS = 1
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = accelerator.device
# model.to(device)
model, optimizer, dataloader["train"], dataloader["validation"] = accelerator.prepare(model, optimizer, dataloader["train"], dataloader["validation"])
for i in range(EPOCHS):
model.train()
progress_bar_train = tqdm.tqdm(dataloader["train"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_train:
optimizer.zero_grad()
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
loss = loss_function(outputs['logits'], labels)
# loss.backward()
accelerator.backward(loss)
optimizer.step()
model.eval()
progress_bar_validation = tqdm.tqdm(dataloader["validation"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_validation:
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs['logits'], axis=-1)
# Recopilamos las predicciones de todos los dispositivos
predictions = accelerator.gather_for_metrics(predictions)
labels = accelerator.gather_for_metrics(labels)
accuracy = metric.add_batch(predictions=predictions, references=labels)
accuracy = metric.compute()
accelerator.print(f"Accuracy = {accuracy['accuracy']}")
Copied

To be able to run the training in the notebook, we use the notebook_launcher function, to which we pass the function we want to execute, the arguments of that function, and the number of GPUs on which we are going to train with the num_processes variable

	
< > Input
Python
from accelerate import notebook_launcher
args = (128,)
notebook_launcher(train_code, args, num_processes=2)
Copied
>_ Output
			
Launching training on 2 GPUs.
>_ Output
			
100%|██████████| 176/176 [02:01&lt;00:00, 1.45it/s]
100%|██████████| 20/20 [00:06&lt;00:00, 3.31it/s]
>_ Output
			
Accuracy = 0.2112

FP16 Traininglink image 19

When we initially configured accelerate, it asked us Do you wish to use FP16 or BF16 (mixed precision)? and we said no, so now we’re going to tell it yes, that we want FP16

So far we have trained in FP32, which means that each model weight is a 32-bit floating-point number, and now we are going to use a 16-bit floating-point number, meaning the model will take up less space. As a result, two things will happen: we will be able to use a larger batch size, and it will also be faster

First, we run accelerate config again and tell it that we want FP16

	
< > Input
Python
!accelerate config
Copied
>_ Output
			
--------------------------------------------------------------------------------
In which compute environment are you running?
This machine
--------------------------------------------------------------------------------
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: no
Do you wish to optimize your script with torch dynamo?[yes/NO]:no
Do you want to use DeepSpeed? [yes/NO]: no
Do you want to use FullyShardedDataParallel? [yes/NO]: no
Do you want to use Megatron-LM ? [yes/NO]: no
How many GPU(s) should be used for distributed training? [1]:2
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:0,1
--------------------------------------------------------------------------------
Do you wish to use FP16 or BF16 (mixed precision)?
fp16
accelerate configuration saved at ~/.cache/huggingface/accelerate/default_config.yaml

Now we create a script for training, with the same batch size as before, to see if it takes less time to train

	
< > Input
Python
%%writefile accelerate_scripts/13_accelerate_base_code_fp16_bs128.py
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import evaluate
import tqdm
# Importamos e inicializamos Accelerator
from accelerate import Accelerator
accelerator = Accelerator()
dataset = load_dataset("tweet_eval", "emoji")
num_classes = len(dataset["train"].info.features["label"].names)
max_len = 130
checkpoints = "cardiffnlp/twitter-roberta-base-irony"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)
def tokenize_function(dataset):
return tokenizer(dataset["text"], max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
tokenized_dataset = {
"train": dataset["train"].map(tokenize_function, batched=True, remove_columns=["text"]),
"validation": dataset["validation"].map(tokenize_function, batched=True, remove_columns=["text"]),
"test": dataset["test"].map(tokenize_function, batched=True, remove_columns=["text"]),
}
tokenized_dataset["train"].set_format(type="torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_dataset["validation"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
BS = 128
dataloader = {
"train": DataLoader(tokenized_dataset["train"], batch_size=BS, shuffle=True),
"validation": DataLoader(tokenized_dataset["validation"], batch_size=BS, shuffle=True),
"test": DataLoader(tokenized_dataset["test"], batch_size=BS, shuffle=True),
}
model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
model.classifier.out_proj = torch.nn.Linear(in_features=model.classifier.out_proj.in_features, out_features=num_classes, bias=True)
loss_function = torch.nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=5e-4)
metric = evaluate.load("accuracy")
EPOCHS = 1
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = accelerator.device
# model.to(device)
model, optimizer, dataloader["train"], dataloader["validation"] = accelerator.prepare(model, optimizer, dataloader["train"], dataloader["validation"])
for i in range(EPOCHS):
model.train()
progress_bar_train = tqdm.tqdm(dataloader["train"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_train:
optimizer.zero_grad()
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
loss = loss_function(outputs['logits'], labels)
# loss.backward()
accelerator.backward(loss)
optimizer.step()
model.eval()
progress_bar_validation = tqdm.tqdm(dataloader["validation"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_validation:
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs['logits'], axis=-1)
# Recopilamos las predicciones de todos los dispositivos
predictions = accelerator.gather_for_metrics(predictions)
labels = accelerator.gather_for_metrics(labels)
accuracy = metric.add_batch(predictions=predictions, references=labels)
accuracy = metric.compute()
accelerator.print(f"Accuracy = {accuracy['accuracy']}")
Copied
>_ Output
			
Overwriting accelerate_scripts/12_accelerate_base_code_fp16_bs128.py

Let's run it and see how long it takes.

	
< > Input
Python
%%time
!accelerate launch accelerate_scripts/13_accelerate_base_code_fp16_bs128.py
Copied
>_ Output
			
Map: 100%|████████████████████████| 5000/5000 [00:00&lt;00:00, 14983.76 examples/s]
Map: 100%|██████████████████████| 50000/50000 [00:03&lt;00:00, 14315.47 examples/s]
100%|█████████████████████████████████████████| 176/176 [01:01&lt;00:00, 2.88it/s]
100%|███████████████████████████████████████████| 20/20 [00:02&lt;00:00, 6.84it/s]
Accuracy = 0.2094
CPU times: user 812 ms, sys: 163 ms, total: 976 ms
Wall time: 1min 27s

When we ran this training in FP32, it took about 2 and a half minutes, and now about 1 and a half minutes. Let’s see if now, instead of training with a batch size of 128, we do it with one of 256

	
< > Input
Python
%%writefile accelerate_scripts/14_accelerate_base_code_fp16_bs256.py
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import evaluate
import tqdm
# Importamos e inicializamos Accelerator
from accelerate import Accelerator
accelerator = Accelerator()
dataset = load_dataset("tweet_eval", "emoji")
num_classes = len(dataset["train"].info.features["label"].names)
max_len = 130
checkpoints = "cardiffnlp/twitter-roberta-base-irony"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)
def tokenize_function(dataset):
return tokenizer(dataset["text"], max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
tokenized_dataset = {
"train": dataset["train"].map(tokenize_function, batched=True, remove_columns=["text"]),
"validation": dataset["validation"].map(tokenize_function, batched=True, remove_columns=["text"]),
"test": dataset["test"].map(tokenize_function, batched=True, remove_columns=["text"]),
}
tokenized_dataset["train"].set_format(type="torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_dataset["validation"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
BS = 256
dataloader = {
"train": DataLoader(tokenized_dataset["train"], batch_size=BS, shuffle=True),
"validation": DataLoader(tokenized_dataset["validation"], batch_size=BS, shuffle=True),
"test": DataLoader(tokenized_dataset["test"], batch_size=BS, shuffle=True),
}
model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
model.classifier.out_proj = torch.nn.Linear(in_features=model.classifier.out_proj.in_features, out_features=num_classes, bias=True)
loss_function = torch.nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=5e-4)
metric = evaluate.load("accuracy")
EPOCHS = 1
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = accelerator.device
# model.to(device)
model, optimizer, dataloader["train"], dataloader["validation"] = accelerator.prepare(model, optimizer, dataloader["train"], dataloader["validation"])
for i in range(EPOCHS):
model.train()
progress_bar_train = tqdm.tqdm(dataloader["train"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_train:
optimizer.zero_grad()
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
loss = loss_function(outputs['logits'], labels)
# loss.backward()
accelerator.backward(loss)
optimizer.step()
model.eval()
progress_bar_validation = tqdm.tqdm(dataloader["validation"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_validation:
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs['logits'], axis=-1)
# Recopilamos las predicciones de todos los dispositivos
predictions = accelerator.gather_for_metrics(predictions)
labels = accelerator.gather_for_metrics(labels)
accuracy = metric.add_batch(predictions=predictions, references=labels)
accuracy = metric.compute()
accelerator.print(f"Accuracy = {accuracy['accuracy']}")
Copied
>_ Output
			
Overwriting accelerate_scripts/13_accelerate_base_code_fp16_bs256.py

We ran it

	
< > Input
Python
%%time
!accelerate launch accelerate_scripts/14_accelerate_base_code_fp16_bs256.py
Copied
>_ Output
			
Map: 100%|████████████████████████| 5000/5000 [00:00&lt;00:00, 15390.30 examples/s]
Map: 100%|████████████████████████| 5000/5000 [00:00&lt;00:00, 14990.92 examples/s]
100%|███████████████████████████████████████████| 88/88 [00:54&lt;00:00, 1.62it/s]
100%|███████████████████████████████████████████| 10/10 [00:02&lt;00:00, 3.45it/s]
Accuracy = 0.2236
CPU times: user 670 ms, sys: 91.6 ms, total: 761 ms
Wall time: 1min 12s

It has only gone down by about 15 seconds

BF16 traininglink image 20

Before we trained in FP16 and now we are going to do it in BF16, what is the difference?

FP32_FP16_BF16

As we can see in the image, while FP16 compared to FP32 has fewer bits in the mantissa and the exponent, which makes its range much smaller, BF16 compared to FP32 has the same number of exponent bits but fewer in the mantissa, which makes BF16 have the same range of numbers as FP32, but it is less precise

This is beneficial because in FP16 some calculations could produce very large numbers, which in FP16 format could not be represented. In addition, there are certain hardware devices that are optimized for this format

As before, we run accelerate config and indicate that we want BF16

	
< > Input
Python
!accelerate config
Copied
>_ Output
			
--------------------------------------------------------------------------------
In which compute environment are you running?
This machine
--------------------------------------------------------------------------------
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: no
Do you wish to optimize your script with torch dynamo?[yes/NO]:no
Do you want to use DeepSpeed? [yes/NO]: no
Do you want to use FullyShardedDataParallel? [yes/NO]: no
Do you want to use Megatron-LM ? [yes/NO]: no
How many GPU(s) should be used for distributed training? [1]:2
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:0,1
--------------------------------------------------------------------------------
Do you wish to use FP16 or BF16 (mixed precision)?
bf16
accelerate configuration saved at ~/.cache/huggingface/accelerate/default_config.yaml

Now we run the last script we had created, that is, with a batch size of 256

	
< > Input
Python
%%time
!accelerate launch accelerate_scripts/14_accelerate_base_code_fp16_bs256.py
Copied
>_ Output
			
Map: 100%|██████████████████████| 50000/50000 [00:03&lt;00:00, 14814.95 examples/s]
Map: 100%|██████████████████████| 50000/50000 [00:03&lt;00:00, 14506.83 examples/s]
100%|███████████████████████████████████████████| 88/88 [00:51&lt;00:00, 1.70it/s]
100%|███████████████████████████████████████████| 10/10 [00:03&lt;00:00, 3.21it/s]
Accuracy = 0.2112
CPU times: user 688 ms, sys: 144 ms, total: 832 ms
Wall time: 1min 17s

It took a time similar to what it took before, which is normal, since we have trained a model with 16-bit weights, just like before

FP8 Traininglink image 21

Now we are going to train in FP8 format, which, as its name suggests, is a floating-point format where each weight has 8 bits, so we run accelerate config to tell it that we want FP8

	
< > Input
Python
!accelerate config
Copied
>_ Output
			
--------------------------------------------------------------------------------
In which compute environment are you running?
This machine
--------------------------------------------------------------------------------
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: no
Do you wish to optimize your script with torch dynamo?[yes/NO]:no
Do you want to use DeepSpeed? [yes/NO]: no
Do you want to use FullyShardedDataParallel? [yes/NO]: no
Do you want to use Megatron-LM ? [yes/NO]: no
How many GPU(s) should be used for distributed training? [1]:2
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:0,1
--------------------------------------------------------------------------------
Do you wish to use FP16 or BF16 (mixed precision)?
fp8
accelerate configuration saved at ~/.cache/huggingface/accelerate/default_config.yaml

Now we run the last script, the one with a batch size of 256

	
< > Input
Python
%%time
!accelerate launch accelerate_scripts/14_accelerate_base_code_fp16_bs256.py
Copied
>_ Output
			
Traceback (most recent call last):
File "/home/wallabot/Documentos/web/portafolio/posts/accelerate_scripts/13_accelerate_base_code_fp16_bs256.py", line 12, in &lt;module&gt;
accelerator = Accelerator()
^^^^^^^^^^^^^
File "/home/wallabot/miniconda3/envs/nlp/lib/python3.11/site-packages/accelerate/accelerator.py", line 371, in __init__
self.state = AcceleratorState(
^^^^^^^^^^^^^^^^^
File "/home/wallabot/miniconda3/envs/nlp/lib/python3.11/site-packages/accelerate/state.py", line 790, in __init__
raise ValueError(
ValueError: Using `fp8` precision requires `transformer_engine` or `MS-AMP` to be installed.
Traceback (most recent call last):
File "/home/wallabot/Documentos/web/portafolio/posts/accelerate_scripts/13_accelerate_base_code_fp16_bs256.py", line 12, in &lt;module&gt;
accelerator = Accelerator()
^^^^^^^^^^^^^
File "/home/wallabot/miniconda3/envs/nlp/lib/python3.11/site-packages/accelerate/accelerator.py", line 371, in __init__
self.state = AcceleratorState(
^^^^^^^^^^^^^^^^^
File "/home/wallabot/miniconda3/envs/nlp/lib/python3.11/site-packages/accelerate/state.py", line 790, in __init__
raise ValueError(
ValueError: Using `fp8` precision requires `transformer_engine` or `MS-AMP` to be installed.
...
[0]:
time : 2024-05-13_21:40:56
host : wallabot
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 501480)
error_file: &lt;N/A&gt;
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
CPU times: user 65.1 ms, sys: 14.5 ms, total: 79.6 ms
Wall time: 7.24 s

Since the weights are now 8-bit and take up half the memory, we’re going to increase the batch size to 512

	
< > Input
Python
%%writefile accelerate_scripts/15_accelerate_base_code_fp8_bs512.py
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import evaluate
import tqdm
# Importamos e inicializamos Accelerator
from accelerate import Accelerator
accelerator = Accelerator()
dataset = load_dataset("tweet_eval", "emoji")
num_classes = len(dataset["train"].info.features["label"].names)
max_len = 130
checkpoints = "cardiffnlp/twitter-roberta-base-irony"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)
def tokenize_function(dataset):
return tokenizer(dataset["text"], max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
tokenized_dataset = {
"train": dataset["train"].map(tokenize_function, batched=True, remove_columns=["text"]),
"validation": dataset["validation"].map(tokenize_function, batched=True, remove_columns=["text"]),
"test": dataset["test"].map(tokenize_function, batched=True, remove_columns=["text"]),
}
tokenized_dataset["train"].set_format(type="torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_dataset["validation"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format(type="torch", columns=['label', 'input_ids', 'attention_mask'])
BS = 512
dataloader = {
"train": DataLoader(tokenized_dataset["train"], batch_size=BS, shuffle=True),
"validation": DataLoader(tokenized_dataset["validation"], batch_size=BS, shuffle=True),
"test": DataLoader(tokenized_dataset["test"], batch_size=BS, shuffle=True),
}
model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
model.classifier.out_proj = torch.nn.Linear(in_features=model.classifier.out_proj.in_features, out_features=num_classes, bias=True)
loss_function = torch.nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=5e-4)
metric = evaluate.load("accuracy")
EPOCHS = 1
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = accelerator.device
# model.to(device)
model, optimizer, dataloader["train"], dataloader["validation"] = accelerator.prepare(model, optimizer, dataloader["train"], dataloader["validation"])
for i in range(EPOCHS):
model.train()
progress_bar_train = tqdm.tqdm(dataloader["train"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_train:
optimizer.zero_grad()
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
loss = loss_function(outputs['logits'], labels)
# loss.backward()
accelerator.backward(loss)
optimizer.step()
model.eval()
progress_bar_validation = tqdm.tqdm(dataloader["validation"], disable=not accelerator.is_local_main_process)
for batch in progress_bar_validation:
input_ids = batch["input_ids"]#.to(device)
attention_mask = batch["attention_mask"]#.to(device)
labels = batch["label"]#.to(device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs['logits'], axis=-1)
# Recopilamos las predicciones de todos los dispositivos
predictions = accelerator.gather_for_metrics(predictions)
labels = accelerator.gather_for_metrics(labels)
accuracy = metric.add_batch(predictions=predictions, references=labels)
accuracy = metric.compute()
accelerator.print(f"Accuracy = {accuracy['accuracy']}")
Copied
>_ Output
			
Writing accelerate_scripts/15_accelerate_base_code_fp8_bs512.py

We run it

	
< > Input
Python
%%time
!accelerate launch accelerate_scripts/15_accelerate_base_code_fp8_bs512.py
Copied

Model inferencelink image 22

Using the Hugging Face ecosystemlink image 23

Let's see how to perform inference with large models using the transformers library from Hugging Face.

Inference with pipelinelink image 24

If we use the Hugging Face ecosystem, it is very simple, since everything happens under the hood without us having to do much. In the case of using pipeline, which is the easiest way to do inference with the transformers library, we simply have to tell it which model we want to use and, very importantly, pass device_map="auto". This will make accelerate distribute the model across the different GPUs, CPU RAM, or hard disk if necessary under the hood.

There are more possible values for device_map, which we will see later, but for now stick with "auto".

We are going to use the Llama3 8B model, which, as its name suggests, is a model with around 8 billion parameters. Since each parameter, by default, is in FP32 format, which corresponds to 4 bytes (32 bits), that means that if we multiply 8 billion parameters by 4 bytes, we get that it would need a GPU with about 32 GB of VRAM.

In my case, I have 2 GPUs with 24 GB of VRAM each, so it wouldn’t fit on a single GPU. But thanks to setting device_map="auto", accelerate will distribute the model across the two GPUs and I’ll be able to run inference

	
< > Input
Python
%%writefile accelerate_scripts/16_inference_with_pipeline.py
from transformers import pipeline
checkpoints = "meta-llama/Meta-Llama-3-8B-Instruct"
generator = pipeline(model=checkpoints, device_map="auto")
prompt = "Conoces accelerate de hugging face?"
output = generator(prompt)
print(output)
Copied
>_ Output
			
Overwriting accelerate_scripts/09_inference_with_pipeline.py

Now we run it; only since it uses accelerate underneath as a pipeline, we don’t need to execute it with accelerate launch script.pypython script.py is enough

	
< > Input
Python
!python accelerate_scripts/16_inference_with_pipeline.py
Copied
>_ Output
			
Loading checkpoint shards: 100%|██████████████████| 4/4 [00:09&lt;00:00, 2.27s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
[{'generated_text': 'Conoces accelerate de hugging face? ¿Qué es el modelo de lenguaje de transformers y cómo se utiliza en el marco de hugging face? ¿Cómo puedo utilizar modelos de lenguaje de transformers en mi aplicación? ¿Qué son los tokenizers y cómo se utilizan en el marco de hugging face? ¿Cómo puedo crear un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los datasets y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar datasets para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar finetuning para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los checkpoints y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar checkpoints para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los evaluadores y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar evaluadores para evaluar el rendimiento de un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los pre-trainados y cómo se utilizan en el marco de hugging face? ¿Cómo puedo utilizar pre-trainados para entrenar un modelo de lenguaje personalizado utilizando transformers y hugging face? ¿Qué son los finetuning'}]

As you can see, it has not responded; instead, it has kept asking questions. This is because Llama3 is a language model that predicts the next token, so with the prompt I gave it, it considered that the best next tokens were ones corresponding to more questions. Which makes sense, because there are times when people have doubts about a topic and ask many questions, so in order to get it to answer the question, we have to condition it a bit.

	
< > Input
Python
%%writefile accelerate_scripts/17_inference_with_pipeline_condition.py
from transformers import pipeline
checkpoints = "meta-llama/Meta-Llama-3-8B-Instruct"
generator = pipeline(model=checkpoints, device_map="auto")
prompt = "Conoces accelerate de hugging face?"
messages = [
{
"role": "system",
"content": "Eres un chatbot amigable que siempre intenta solucionar las dudas",
},
{"role": "user", "content": f"{prompt}"},
]
output = generator(messages)
print(output[0]['generated_text'][-1])
Copied
>_ Output
			
Overwriting accelerate_scripts/10_inference_with_pipeline_condition.py

As you can see, a message has been generated with roles, conditioning the model and with the prompt

	
< > Input
Python
!python accelerate_scripts/17_inference_with_pipeline_condition.py
Copied
>_ Output
			
Loading checkpoint shards: 100%|██████████████████| 4/4 [00:09&lt;00:00, 2.41s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
{'role': 'assistant', 'content': '¡Hola! Sí, conozco Accelerate de Hugging Face. Accelerate es una biblioteca de Python desarrollada por Hugging Face que se enfoca en simplificar y acelerar el entrenamiento y la evaluación de modelos de lenguaje en diferentes dispositivos y entornos. Con Accelerate, puedes entrenar modelos de lenguaje en diferentes plataformas y dispositivos, como GPUs, TPUs, CPUs y servidores, sin necesidad de cambiar el código de tu modelo. Esto te permite aprovechar al máximo la potencia de cálculo de tus dispositivos y reducir el tiempo de entrenamiento. Accelerate también ofrece varias características adicionales, como: * Soporte para diferentes frameworks de machine learning, como TensorFlow, PyTorch y JAX. * Integración con diferentes sistemas de almacenamiento y procesamiento de datos, como Amazon S3 y Google Cloud Storage. * Soporte para diferentes protocolos de comunicación, como HTTP y gRPC. * Herramientas para monitorear y depurar tus modelos en tiempo real. En resumen, Accelerate es una herramienta muy útil para desarrolladores de modelos de lenguaje que buscan simplificar y acelerar el proceso de entrenamiento y evaluación de sus modelos. ¿Tienes alguna pregunta específica sobre Accelerate o necesitas ayuda para implementarlo en tu proyecto?'}

Now the response does answer our prompt

Inference with AutoClasslink image 25

Lastly, let’s see how to do inference using only AutoClass.

	
< > Input
Python
%%writefile accelerate_scripts/18_inference_with_autoclass.py
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
checkpoints = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(checkpoints, device_map="auto")
model = AutoModelForCausalLM.from_pretrained(checkpoints, device_map="auto")
streamer = TextStreamer(tokenizer)
prompt = "Conoces accelerate de hugging face?"
tokens_input = tokenizer([prompt], return_tensors="pt").to(model.device)
_ = model.generate(**tokens_input, streamer=streamer, max_new_tokens=500, do_sample=True, top_k=50, top_p=0.95, temperature=0.7)
Copied
>_ Output
			
Overwriting accelerate_scripts/11_inference_with_autoclass.py

As can be seen, the streamer object has been created, which is then passed to the model's generate method. This is useful so that each word is printed as it is generated and there is no need to wait for the entire output to be generated before printing it.

	
< > Input
Python
!python accelerate_scripts/18_inference_with_autoclass.py
Copied
>_ Output
			
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|██████████████████| 4/4 [00:09&lt;00:00, 2.28s/it]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
&lt;|begin_of_text|&gt;Conoces accelerate de hugging face? Si es así, puedes utilizar la biblioteca `transformers` de Hugging Face para crear un modelo de lenguaje que pueda predecir la siguiente palabra en una secuencia de texto.
Aquí te muestro un ejemplo de cómo hacerlo:
```
import pandas as pd
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Cargar el modelo y el tokenizador
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Cargar el conjunto de datos
train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
# Preprocesar los datos
train_texts = train_df["text"]
train_labels = train_df["label"]
test_texts = test_df["text"]
...
input_ids = batch[0].to(device)
attention_mask = batch[1].to(device)
labels = batch[2].to(device)
optimizer.zero_grad()
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f"Epoch {epoch+1}, Loss: {total

Using PyTorchlink image 26

Normally, the way to make inferences with PyTorch is to create a model with randomly initialized weights and then load a state dict with the pretrained model weights, so to obtain that state dict we’re going to do a little trick first and download it.

	
< > Input
Python
import torch
import torchvision.models as models
model = models.resnet152(weights=models.ResNet152_Weights.IMAGENET1K_V1)
torch.save(model.state_dict(), 'accelerate_scripts/resnet152_pretrained.pth')
Copied
>_ Output
			
Downloading: "https://download.pytorch.org/models/resnet152-394f9c45.pth" to /home/maximo.fernandez/.cache/torch/hub/checkpoints/resnet152-394f9c45.pth
100%|██████████| 230M/230M [02:48&lt;00:00, 1.43MB/s]

Now that we have the state dict, we are going to do inference as is normally done in PyTorch

	
< > Input
Python
import torch
import torchvision.models as models
device = "cuda" if torch.cuda.is_available() else "cpu" # Set device
resnet152 = models.resnet152().to(device) # Create model with random weights and move to device
state_dict = torch.load('accelerate_scripts/resnet152_pretrained.pth', map_location=device) # Load pretrained weights into device memory
resnet152.load_state_dict(state_dict) # Load this weights into the model
input = torch.rand(1, 3, 224, 224).to(device) # Random image with batch size 1
output = resnet152(input)
output.shape
Copied
>_ Output
			
torch.Size([1, 1000])

Let's explain what has happened

  • When we have done resnet152 = models.resnet152().to(device), a ResNet152 with random weights has been loaded into GPU memory
  • When we have done state_dict = torch.load('accelerate_scripts/resnet152_pretrained.pth', map_location=device) a dictionary with the trained weights has been loaded into GPU memory
  • When we have done resnet152.load_state_dict(state_dict), those pretrained weights have been assigned to the model

That is, the model has been loaded twice into the GPU memory

You may be wondering why we first made

model = models.resnet152(weights=models.ResNet152_Weights.IMAGENET1K_V1)
torch.save(model.state_dict(), 'accelerate_scripts/resnet152_pretrained.pth')

To do later

resnet152 = models.resnet152().to(device)
state_dict = torch.load('accelerate_scripts/resnet152_pretrained.pth', map_location=device)
resnet152.load_state_dict(state_dict)

And why don’t we use directly

model = models.resnet152(weights=models.ResNet152_Weights.IMAGENET1K_V1)

And we stop saving the state dict to load it later. Well, because PyTorch, underneath, does the same thing we've done. So, in order to see the whole process, we've done in several lines what PyTorch does in one

This way of working has worked well so far, as long as models were manageable in size for consumer GPUs. But since the arrival of LLMs, this approach no longer makes sense

For example, a 6B-parameter model would occupy 24 GB in memory, and since it is loaded twice with this way of working, a 48 GB GPU would be required.

So to fix this, the way to load a pre-trained PyTorch model is:

  • Create an empty model with init_empty_weights that will not use RAM memory* Then load the weights with load_checkpoint_and_dispatch, which will load a checkpoint into the empty model and distribute the weights for each layer across all available devices (GPU, CPU, RAM, and hard disk), thanks to setting device_map="auto"
	
< > Input
Python
import torch
import torchvision.models as models
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
with init_empty_weights():
resnet152 = models.resnet152()
resnet152 = load_checkpoint_and_dispatch(resnet152, checkpoint='accelerate_scripts/resnet152_pretrained.pth', device_map="auto")
device = "cuda" if torch.cuda.is_available() else "cpu"
input = torch.rand(1, 3, 224, 224).to(device) # Random image with batch size 1
output = resnet152(input)
output.shape
Copied
>_ Output
			
torch.Size([1, 1000])

How accelerate works under the hoodlink image 27

In this video, you can visually see how accelerate works under the hood

Initialization of an empty modellink image 28

Accelerate creates the skeleton of an empty model using init_empty_weights so that it uses as little memory as possible

For example, let's see how much RAM I currently have available on my computer

	
< > Input
Python
import psutil
def get_ram_info():
ram = dict(psutil.virtual_memory()._asdict())
print(f"Total RAM: {(ram['total']/1024/1024/1024):.2f} GB, Available RAM: {(ram['available']/1024/1024/1024):.2f} GB, Used RAM: {(ram['used']/1024/1024/1024):.2f} GB")
get_ram_info()
Copied
>_ Output
			
Total RAM: 31.24 GB, Available RAM: 22.62 GB, Used RAM: 7.82 GB

I have about 22 GB of RAM available

Now let’s try to create a 5000x1000x1000 parameter model, that is, a 5B-parameter model; if each parameter is in FP32, it requires 20 GB of RAM

	
< > Input
Python
import torch
from torch import nn
model = nn.Sequential(*[nn.Linear(5000, 1000) for _ in range(1000)])
Copied

If we take another look at the RAM

	
< > Input
Python
get_ram_info()
Copied
>_ Output
			
Total RAM: 31.24 GB, Available RAM: 3.77 GB, Used RAM: 26.70 GB

As we can see, we now only have 3 GB of RAM available

Now we are going to delete the model to free up RAM

	
< > Input
Python
del model
get_ram_info()
Copied
>_ Output
			
Total RAM: 31.24 GB, Available RAM: 22.44 GB, Used RAM: 8.03 GB

We have around 22 GB of RAM available again

Let's now use init_empty_weights from accelerate and then check the RAM.

	
< > Input
Python
from accelerate import init_empty_weights
with init_empty_weights():
model = nn.Sequential(*[nn.Linear(5000, 1000) for _ in range(1000)])
get_ram_info()
Copied
>_ Output
			
Total RAM: 31.24 GB, Available RAM: 22.32 GB, Used RAM: 8.16 GB

Before, we had exactly 22.44 GB free, and after creating the model with init_empty_weights we have 22.32 GB. The RAM savings are huge. Almost no RAM was used to create the model.

This is based on the meta device introduced in PyTorch 1.9, so it is important that to use accelerate we have a later version of PyTorch.

Loading the weightslink image 29

Once we have initialized the model, we need to load the weights, which we do using load_checkpoint_and_dispatch, which, as its name suggests, loads the weights and sends them to the device or devices that are necessary.

Continue reading

Last posts -->

Have you seen these projects?

Gymnasia

Gymnasia Gymnasia
React Native
Expo
TypeScript
FastAPI
Next.js
OpenAI
Anthropic

Mobile personal training app with AI assistant, exercise library, workout tracking, diet and body measurements

Horeca chatbot

Horeca chatbot Horeca chatbot
Python
LangChain
PostgreSQL
PGVector
React
Kubernetes
Docker
GitHub Actions

Chatbot conversational for cooks of hotels and restaurants. A cook, kitchen manager or room service of a hotel or restaurant can talk to the chatbot to get information about recipes and menus. But it also implements agents, with which it can edit or create new recipes or menus

View all projects -->
>_ Available for projects

Do you have an AI project?

Let's talk.

maximofn@gmail.com

Machine Learning and AI specialist. I develop solutions with generative AI, intelligent agents and custom models.

Do you want to watch any talk?

Last talks -->

Do you want to improve with these tips?

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.

Flow edit

Flow edit Flow edit

FLUX.1-RealismLora

FLUX.1-RealismLora FLUX.1-RealismLora
View all containers -->
>_ Available for projects

Do you have an AI project?

Let's talk.

maximofn@gmail.com

Machine Learning and AI specialist. I develop solutions with generative AI, intelligent agents and custom models.

Do you want to train your model with these datasets?

short-jokes-dataset

HuggingFace

Dataset with jokes in English

Use: Fine-tuning text generation models for humor

231K rows 2 columns 45 MB
View on HuggingFace →

opus100

HuggingFace

Dataset with translations from English to Spanish

Use: Training English-Spanish translation models

1M rows 2 columns 210 MB
View on HuggingFace →

netflix_titles

HuggingFace

Dataset with Netflix movies and series

Use: Netflix catalog analysis and recommendation systems

8.8K rows 12 columns 3.5 MB
View on HuggingFace →
View more datasets -->