Product Insight

Not a ROS2 replacement. An intelligent upper-computer layer above ROS2.

VLAClaw keeps stable motion inside robot-side ROS2 nodes while adding an OpenClaw layer for observation, planning, skill validation, and execution monitoring.

Traditional upper computer

Manual button control
Single command at a time
Little task memory
No visual reasoning
Direct parameter tweaking
Hard to extend as an agent

VLAClaw intelligent upper computer

Language + vision + robot state
Multi-step skill planning
Sensor feedback loop
Skill-based execution
rosbridge integration
Reusable robot capability library
Platform Validation

Interface-first validation for reliable robot-agent deployment.

VLAClaw organizes the engineering path around interfaces, skill contracts, safety checks, and repeatable ROS2 robot workflows so each capability can be tested and extended systematically.

Interface validation

Goal: Prove that OpenClaw can connect to ROS2 robots through rosbridge.

Evidence: WebSocket endpoint, JSON pub/sub examples, sensor topics, and command templates.

Next: Validate against robot-side rosbridge_server and platform-specific topic names.

Skill abstraction

Goal: Turn scattered actions into an AI-callable capability layer.

Evidence: skills.yaml schema, action-group mapping, parameter limits, and execution feedback.

Next: Register 5-8 core skills first: stop, stand, walk, turn, sit_wave, status, camera, IMU.

Safety boundary

Goal: Keep model output at the skill layer rather than raw motor control.

Evidence: Skill Server checks speed, duration, robot posture, IMU stability, and emergency stop.

Next: Test refusal behavior and recovery flow on repeated unsafe commands.

Demo workflow readiness

Goal: Package voice, vision, action-group, and developer-integration flows into repeatable robot demos.

Evidence: Voice greeting, visual interaction, safe action group, and developer integration workflows.

Next: Add robot logs, field footage, and deployment notes as each workflow is verified.

Solution

VLAClaw turns the upper computer into an embodied agent layer.

OpenClaw observes sensor topics, reasons with VLM/LLM models, selects validated skills, sends commands through rosbridge, and replans from execution feedback.

Observe
Understand
Plan
Select Skill
Execute
Monitor
Replan

Perception In

Subscribe to camera, IMU, radar, odometry, and robot status topics as agent observations.

/usb_cam/image_raw/imu_raw/odom/scan/puppy_control/status

Intelligence Core

Use OpenClaw, VLM, LLM, memory, and safety rules to convert user intent into a skill plan.

ASR / wake wordVLM scene understandingLLM task planningSkill selectionContext memory

Skill Execution Out

Call bounded robot skills that are validated before ROS2 lower controllers execute motion.

walk_forward()turn_left()stop()sit_wave()return_home()
Integration Workflow

A practical path from an existing ROS2 robot to an OpenClaw-controlled demo.

This is the implementation story customers and developers need: VLAClaw works with existing robot controllers and maps robot capabilities into observations and validated skills.

1

Audit robot interfaces

List ROS2 topics, services, action groups, sensor streams, and safety commands already available on the robot.

Topic map + control checklist
2

Enable rosbridge

Run rosbridge_websocket on the robot and expose a stable WebSocket endpoint for upper-computer access.

ws://robot_ip:9090 connection
3

Map observations

Normalize camera, IMU, radar, odometry, and status topics into OpenClaw-readable observation channels.

Observation adapter
4

Register skills

Convert motion commands, action groups, services, and interaction behaviors into skills with parameters and limits.

skills.yaml + registry
5

Run bounded demos

Start with single-command and short workflow demos before moving to multi-step planning.

Voice greeting / safe patrol demo
6

Measure and harden

Log latency, success rate, refusal behavior, recovery path, and feedback quality before deployment.

Mission logs + safety report
Compatibility

Designed for ROS2 robots first, then extended across embodiments.

The platform is strongest where the robot already exposes ROS2 topics or can bridge its controller into a small set of commands, sensors, and status messages.

Robot bodies

Quadruped robot dog

Primary MVP

Locomotion, action groups, camera, IMU, greeting demos.

Robotic arm / gripper

Planned extension

Skill schema supports grasp, release, pose, and vision-guided actions.

Interactive display robot

Ready to model

Expression display and speech skills can be integrated as non-motion skills.

Control interfaces

ROS2 topic publish / subscribe

Core path

Sensor observation and motion command surface.

ROS2 service call

Supported pattern

Useful for buzzer, mode switching, action triggering, and status queries.

Action group files

Skill source

.d6a or platform-specific motion files become validated semantic skills.

Compute placement

Raspberry Pi 5 gateway

Edge target

Handles connection, lightweight preprocessing, skill dispatch, and local services.

Laptop / workstation host

Development target

Preferred for coding, debugging, OpenClaw iteration, and visual inspection.

Cloud model fallback

Optional

Used for heavier VLM/LLM reasoning, long-context planning, and report generation.

System Architecture

OpenClaw x rosbridge x ROS2, separated for safety and deployability.

The architecture separates intelligence, communication, and real-time execution so developers can build robot agents without installing ROS2 on every host.

AI selects validated skills. Real-time motion control stays inside robot-side ROS2 nodes.

Human Interaction

Voice command, text instruction, web dashboard, and developer API.

Voice / ASR
Text tasks
Web console
Developer API

OpenClaw / VLAClaw Upper Computer

Agent runtime for multimodal understanding, planning, memory, and safety validation.

VLM / LLM planner
Skill registry
Safety validator
Execution monitor

ROS2 Bridge Layer

rosbridge_websocket exposes ROS2 pub/sub and services over WebSocket JSON.

ws://robot_ip:9090
Topic publish / subscribe
Service call
Cross-platform host

Robot Lower Computer

Robot-side ROS2 nodes handle sensors, action groups, motors, servos, displays, and arms.

puppy_control
ros_robot_controller
usb_cam
IMU / odom / motors

Observation In

Camera topic -> rosbridge -> OpenClaw perception
IMU / odom / radar -> rosbridge -> state estimation

Command Out

OpenClaw skill decision -> Skill Server
Validated skill -> rosbridge -> ROS2 control -> robot execution
Safety Philosophy

Skill-based control, not unsafe motor-level generation.

VLAClaw treats action groups, ROS2 commands, services, and navigation behaviors as semantic robot skills. The model chooses skills and parameters, not raw actuator values.

Direct AI-to-Motor Control

Outputs joint angles / PWM / torque
Hard to verify on real hardware
Unclear failure recovery
High risk for quadruped balance
Difficult to reuse action groups

VLAClaw Skill-Based Control

Outputs validated skill calls
Skill Server checks parameters
ROS2 controller executes stable motion
Sensor feedback supports recovery
Action groups become reusable capabilities
Product Modules

A product matrix for robot-side embodied intelligence.

VLAClaw is a platform stack rather than a single remote-control page. Each module has a clear role in the upper-computer agent layer.

OpenClaw Runtime

Agent loop, tool calling, task planning, context memory, and cloud fallback.

VLM / LLMMemoryPlanner

ROSBridge Adapter

WebSocket client for ROS2 topic pub/sub, service calls, and JSON serialization.

rosbridge9090JSON

Perception Hub

Camera, IMU, radar, odometry, and status streams normalized as observations.

CameraIMUOdom

Skill Server

Skill registry, parameter validation, execution dispatch, and feedback tracking.

RegistrySafetyStatus

ActionGroup Manager

Converts authored action groups such as sit_wave.d6a into callable robot skills.

.d6aActionsReuse

Safety Guard

Speed limits, posture checks, emergency stop, and repeated-command filtering.

LimitsE-stopRecovery
Roadmap

Reliability-first roadmap from voice MVP to multi-robot orchestration.

The roadmap is intentionally staged: build safe single-robot skills first, then add perception, recovery, navigation, and multi-embodiment orchestration.

1
Phase 1Current

Voice Command MVP

5-8 core skillsvoice commandrosbridge connectionmanual safety stop
2
Phase 2Next

Skill Registry and Developer API

skills.yamlSkill ServerPython / JS SDKdeveloper docs
3
Phase 3Planned

Multimodal Perception

camera topicVLM image understandingIMU monitoringperson / obstacle detection
4
Phase 4Planned

Workflow Automation

multi-step tasksfailure retrymission loggingstate-aware replanning
5
Phase 5Future

Multi-Robot and Multi-Embodiment

robot dog + arm + displayshared skill registrycloud-edge orchestration