RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models

Yuxuan Chen1, Yixin Han1, Yize Huang1, Xiao Li1
1Shanghai Jiao Tong University

Abstract

Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and strong potential in complex robotic manipulation. However, their large parameter sizes and high inference latency hinder real-world deployment, especially on resource-constrained platforms. To address this, we conduct a systematic empirical study of model compression for VLAs. Building on these insights, we present RLRC, a three-stage compression and recovery pipeline consisting of structured pruning, performance recovery via SFT and RL, and subsequent quantization. The RL stage incorporates a critic warm-up strategy and BC loss regularization to stabilize training and preserve policy behavior. RLRC achieves up to an 8X memory reduction and 2.3X inference speedup while maintaining the original task success rate. Extensive experiments across multiple VLA backbones show that RLRC consistently outperforms existing compression baselines, highlighting its effectiveness for on-device deployment.

Empirical Study: Model Compression Applied to VLA

EXP
SparseVLA Prune
  • Quantization has minimal impact on performance, while significantly reducing memory requirements and slightly improving inference speed.
  • Unstructured pruning has a smaller impact on performance, whereas structured pruning offers greater acceleration benefits.
  • The combination of quantization and pruning can further substantially compress the model.
  • The speedup gains from quantization diminish as the sparsity ratio increases.

Method

We propose RLRC, a three-stage method: (1) we apply structured pruning to the VLA model, specifically targeting the LLM component, to remove redundant structures in a hardware-friendly manner; (2) we employ a performance recovery stage that combines SFT with RL to restore the model's effectiveness on downstream tasks; (3) we introduce optional quantization to further reduce the memory footprint, enabling efficient deployment on resource-constrained robotic platforms.

Method Overview

Experiments

Main Results

Main Results. Comparison of task success rates and efficiency metrics.

Main Results

The training curves of SFT and RLFT.

Ablation Studies

Post-SFT RLFT vs. Scratch RLFT.

Ablation Studies

Ablation studies on LIBERO.

RLRC Rollout

Rollout examples from real-world experiments. Video frames are sampled at equal time intervals, with OpenVLA shown on the top and OpenVLA-RLRC on the bottom.

Real-world Results

Real-world Results.

  • RLRC demonstrates competitive performance compared to other methods.
  • Training RLRC incurs extra cost, but its benefits exceed those of training-free acceleration methods and inherently small VLAs.
  • RLRC generalizes across diverse architectures, including autoregressive and diffusion-based VLAs, but has structural limitations.
  • Each RLRC component plays a complementary role: SFT provides strong initialization for effective RL optimization, while critic warm-up and BC loss regularization stabilize training and improve performance.
  • RLRC demonstrates strong performance and effective compression in real-world manipulation tasks.

Citation

@article{chen2025rlrc,
        title={RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models},
        author={Yuxuan Chen and Xiao Li},
        journal={arXiv preprint arXiv:2506.17639},
        year={2025}
    }