Pluto's blog

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation

Evo-0：Vision-Language-Action Model with Implicit Spatial Understanding

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models

EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation

SPATIAL FORCING: IMPLICIT SPATIAL REPRESENTATION ALIGNMENT FOR VISION-LANGUAGE-ACTION MODEL

ActiveVLA: Injecting Active Perception into Vision-Language-Action Models for Precise 3D Robotic Manipulation

GLaD: Geometric Latent Distillation for Vision-Language-Action Models

From Spatial to Actions: Grounding VLA in Spatial Foundation Priors

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver

4D Visual Pre-training for Robot Learning

4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Real-Time Execution of Action Chunking Flow Policies

π∗ 0.6: a VLA That Learns From Experience