Language + Pixels to Action Space
GitHubLLaMA VisionLangGraphRedisStreamlitRoboticsNatural Language Processing
TLDR
System that converts natural language commands and camera input into robot actions. Uses LLM for planning and vision for perception.
Detailed
Tech Stack:
LLaMA 3.2 Vision, LangGraph, Redis, Streamlit, Custom API
Goal:
Translate language commands and visual input into robot actions.
What I did:
- •Used LLM to parse instructions into multi step plans
- •Integrated LLaMA 3.2 Vision for real time visual understanding (objects, spatial relationships, obstacles)
- •Built custom API to convert plans to movement commands for differential drive robot
- •Used LangGraph and Redis for short term and long term memory
- •Created Streamlit interface for user interaction
What was achieved:
Robot navigates and interacts based on language prompts and visual perception. System retrieves past states and incorporates context into plans.