SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation
Evo-0:Vision-Language-Action Model with Implicit Spatial Understanding
GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
SPATIAL FORCING: IMPLICIT SPATIAL REPRESENTATION ALIGNMENT FOR VISION-LANGUAGE-ACTION MODEL
ActiveVLA: Injecting Active Perception into Vision-Language-Action Models for Precise 3D Robotic Manipulation
GLaD: Geometric Latent Distillation for Vision-Language-Action Models
From Spatial to Actions: Grounding VLA in Spatial Foundation Priors
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
4D Visual Pre-training for Robot Learning
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead
VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Real-Time Execution of Action Chunking Flow Policies
π∗ 0.6: a VLA That Learns From Experience