Vision Language Action Deployment

GitHub
OpenVLAMujocoRobosuiteComputer VisionRoboticsFoundation Models

TLDR

Integrated OpenVLA model with Mujoco and Robosuite. Robot interprets text/images and outputs actions directly in action space.

Detailed

Tech Stack:

OpenVLA, Mujoco, Robosuite, Custom Position Control API

Goal:

Deploy vision language action model for robot manipulation tasks.

What I did:

  • Integrated OpenVLA with Mujoco and Robosuite simulators
  • Built custom position control API to decode OpenVLA outputs
  • Adjusted end effector coordinates, grip strength, and orientation
  • Handled tasks like picking and placing objects

What was achieved:

System outputs actions directly in action space without low level control API. Handles various manipulation scenarios using language and vision.