Blip2

Blip2 Blip2

Disclaimer: This post has been translated to English using a machine translation model. Please, let me know if you find any mistakes.

Introductionlink image 7

Blip2 is an artificial intelligence capable of taking an image or video as input and having a conversation, answering questions, or providing context about what the input shows with great accuracy 🤯

GitHub

Paper

Installationlink image 8

To install this tool, it's best to create a new Anaconda environment.

	
!$ conda create -n blip2 python=3.9
Copy

Now we dive into the environment

	
!$ conda activate blip2
Copy

We install all the necessary modules

	
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
Copy
	
!$ conda install -c anaconda pillow
Copy
	
!$ conda install -y -c anaconda requests
Copy
	
!$ conda install -y -c anaconda jupyter
Copy

Finally we install Blip2

	
!$ pip install salesforce-lavis
Copy

Usagelink image 9

We load the necessary libraries

	
import torch
from PIL import Image
import requests
from lavis.models import load_model_and_preprocess
Copy

We load an example image

	
img_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg/800px-12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
display(raw_image.resize((500, 500)))
Copy
	
<PIL.Image.Image image mode=RGB size=500x500>

We set the GPU if there is one

	
device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
device
Copy
	
device(type='cuda')

We assign a model. In my case, with a computer that has 32 GB of RAM and a 3060 GPU with 12 GB of VRAM, I can't use all of them, so I've added a comment ok next to the models I was able to use, and the error I received for those I couldn't. If you have a computer with the same amount of RAM and VRAM, you'll know which ones you can use; if not, you'll need to test them.

	
# name = "blip2_opt"; model_type = "pretrain_opt2.7b" # ok
# name = "blip2_opt"; model_type = "caption_coco_opt2.7b" # FAIL VRAM
# name = "blip2_opt"; model_type = "pretrain_opt6.7b" # FAIL RAM
# name = "blip2_opt"; model_type = "caption_coco_opt6.7b" # FAIL RAM
# name = "blip2"; model_type = "pretrain" # FAIL type error
# name = "blip2"; model_type = "coco" # ok
name = "blip2_t5"; model_type = "pretrain_flant5xl" # ok
# name = "blip2_t5"; model_type = "caption_coco_flant5xl" # FAIL VRAM
# name = "blip2_t5"; model_type = "pretrain_flant5xxl" # FAIL
model, vis_processors, _ = load_model_and_preprocess(
name=name, model_type=model_type, is_eval=True, device=device
)
vis_processors.keys()
Copy
	
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
	
dict_keys(['train', 'eval'])

We prepare the image to feed it into the model

	
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
Copy

We analyze the image without asking anythinglink image 10

	
model.generate({"image": image})
Copy
	
['a black and white snake']

We analyze the image by askinglink image 11

	
prompt = None
Copy
	
def prepare_prompt(prompt, question):
if prompt is None:
prompt = question + " Answer:"
else:
prompt = prompt + " " + question + " Answer:"
return prompt
Copy
	
def get_answer(prompt, question, model):
prompt = prepare_prompt(prompt, question)
answer = model.generate(
{
"image": image,
"prompt": prompt
}
)
answer = answer[0]
prompt = prompt + " " + answer + "."
return prompt, answer
Copy
	
question = "What's in the picture?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: What's in the picture?
Answer: a snake
	
question = "What kind of snake?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: What kind of snake?
Answer: cobra
	
question = "Is it poisonous?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: Is it poisonous?
Answer: yes
	
question = "If it bites me, can I die?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: If it bites me, can I die?
Answer: yes

Continue reading

MCP: Complete Guide to Create servers and clients MCP (Model Context Protocol) with FastMCP

MCP: Complete Guide to Create servers and clients MCP (Model Context Protocol) with FastMCP

Learn what is the Model Context Protocol (MCP), the open-source standard developed by Anthropic that revolutionizes how AI models interact with external tools. In this practical and detailed guide, I take you step by step in creating an MCP server and client from scratch using the fastmcp library. You will build an "intelligent" AI agent with Claude Sonnet, capable of interacting with the GitHub API to query issues and repository information. We will cover from basic concepts to advanced features like filtering tools by tags, server composition, static resources and dynamic templates (resource templates), prompt generation, and secure authentication. Discover how MCP can standardize and simplify the integration of tools in your AI applications, analogously to how USB unified peripherals!

Last posts -->

Have you seen these projects?

Horeca chatbot

Horeca chatbot Horeca chatbot
Python
LangChain
PostgreSQL
PGVector
React
Kubernetes
Docker
GitHub Actions

Chatbot conversational for cooks of hotels and restaurants. A cook, kitchen manager or room service of a hotel or restaurant can talk to the chatbot to get information about recipes and menus. But it also implements agents, with which it can edit or create new recipes or menus

Subtify

Subtify Subtify
Python
Whisper
Spaces

Subtitle generator for videos in the language you want. Also, it puts a different color subtitle to each person

View all projects -->

Do you want to apply AI in your project? Contact me!

Do you want to improve with these tips?

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.

Flow edit

Flow edit Flow edit

FLUX.1-RealismLora

FLUX.1-RealismLora FLUX.1-RealismLora
View all containers -->

Do you want to apply AI in your project? Contact me!

Do you want to train your model with these datasets?

short-jokes-dataset

Dataset with jokes in English

opus100

Dataset with translations from English to Spanish

netflix_titles

Dataset with Netflix movies and series

View more datasets -->