Semantic Memory for Embodied Agent
GitHubROS2-based embodied AI agent with semantic mapping and long-term memory capabilities.
This project centers on an embodied AI agent that builds and maintains a semantic map of its surroundings, complete with a "memory" capable of retaining contextual details over time. The system is architected around ROS2, with LangGraph providing the backbone for a stateful knowledge graph where each node represents elements of the environment—such as landmarks, obstacles, or points of interest—and each edge encodes their relationships or adjacency. A vector store complements this graph by storing text and embedding data, allowing the system to retrieve prior observations or instructions based on embedding similarity.
On the perception side, GPT-4 Vision ingests scene images from the agent's camera feed, generating high-level semantic information—like object identification or spatial relationships—which is then fed into the knowledge graph. Text-based interaction and advanced reasoning come from LLaMA 70B, which is integrated into a ROS2 node to parse user commands, consult the knowledge graph, and produce navigation goals.
These goals feed into the NAV2 stack, which controls a differential-drive robot in Gazebo. The agent is thus able to make informed decisions, referencing both past experiences (stored in the vector store) and current visual context (through GPT-4 Vision) to navigate toward a goal or interact with the environment. This cohesive framework provides the robot with a functional, long-term memory, enabling more nuanced task execution and streamlined user communication.