Agents patterns
Are your agents falling short? Elevate your AI projects with advanced patterns: ReAct, planning, multi-agents, and more. Practical guide with code!
-->
Naviground is a navigation system implementable in manned and unmanned terrestrial vehicles. It allows navigation in structured and unstructured environments. I participated in the development of the perception system, especially in the detection of the environment using cameras.
Although the navigation system had LIDAR and RADAR sensors, for several reasons it was desired to have a perception system formed only by cameras.
To perform the detection of the environment, we used three types of neural networks:
Semantic segmentation networks
They classify what class each pixel of the image belongs to, obtaining a segmentation mask.
Object classification networks
They can detect objects in the image using a YOLO.
Depth estimation
A neural network can estimate the depth of each pixel of the image, so we can obtain the distance to each object.
Our problem was that as it was a vehicle for structured and unstructured environments, the pre-trained networks did not suit us, so we had to make trainings of the segmentation and object classification networks.
We had hours of videos recorded during tests in environments like this, so we created a dataset
We created an algorithm that, using an unsupervised classifier, created several clusters of images, where the images of each cluster were similar to each other. In this way, we stayed with a few images of each cluster, so we had a dataset with heterogeneous images.
Labeling objects for YOLO, although it is tedious, it is a relatively fast and easy process
However, labeling images for semantic segmentation, where each pixel has to be labeled, is a slow and tedious process. As none of the labeling tools for segmentation convinced us, we built our own labeling tool. It was so good that it was reused in other projects and even talked about commercializing it.
One of the problems we had is that all the training images were day, with sun, without rain, etc. So to make the networks more robust we needed more images. But that means that someone has to go out at night, wait for it to rain to have images with rain, wait for it to snow, which is more complicated, etc.
At that time there were many good image generation networks, so we could generate images with new environmental conditions, but the problem was that they had to be labeled, and for segmentation it required a lot of time.
So I built a pipeline that, using generative AI, modified the environmental conditions of the images that we already had labeled, having images in different environmental conditions, but without having to lose time labeling them.
As this had to work in a vehicle, it could not use a powerful computer. So a embedded device, a Jetson Orin, was used. So it was important to optimize the neural networks to make the inference as fast as possible.
I optimized them with TensorRT, making them run up to 40% faster in some cases.
Are your agents falling short? Elevate your AI projects with advanced patterns: ReAct, planning, multi-agents, and more. Practical guide with code!
🚀 Revolutionize your AI agents! 🧠LangGraph is not just another library, it's the orchestration framework that gives you total control to build complex agents, with long-term memory and even human intervention! Say goodbye to basic chatbots, it's time to create true intelligence. Dive into this post and discover it!
Learn how to create virtual environments with uv, a package manager and environment for Python written in Rust, which makes it very fast. If you have had problems with the waiting times using conda, or want a faster and easier alternative to venv, enter and see how to use uv.
Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.
Dataset with jokes in English
Dataset with translations from English to Spanish
Dataset with Netflix movies and series