Block-wise Adaptive Caching for Accelerating Diffusion Policy

ICLR 2026

Abstract

Diffusion Policy has revolutionized robotic control with its ability to model complex multimodal distributions, yet its slow inference speed remains a critical bottleneck for real-time deployment.

We introduce Block-wise Adaptive Caching (BAC), a training-free acceleration framework designed specifically for Transformer-based Diffusion Policies. Unlike generic acceleration methods, BAC leverages the unique temporal redundancy of action features in robotic tasks. By adaptively caching and reusing features at the block level, BAC achieves lossless acceleration—enabling your robot to react faster and smoother for free!

  • 🚀 Real-world Speedup
    Boosts inference frequency from 8Hz → 45Hz for smooth control.
  • 🎯 Lossless Performance
    Maintains high success rates via fine-grained adaptive caching.
  • 🔌 Plug-and-Play Design
    Just plug in one line of code without requiring any re-training.

Methodology

BAC achieves a finer-grained cache schedule by first applying the Adaptive Caching Scheduler (ACS) to compute optimal update timesteps for each block and then employing the Bubbling Union Algorithm (BUA) to truncate inter-block error propagation.

Interactive Demo

Blocks Timesteps

ACS + BUA

Bubbling Union Algorithm

Legend

Updated Features
Cached Features
Error Surge
Error Correction

Experimental Setup

Front view

Front view

Side view

Side view

Task Description

We evaluate our method on the Pick-and-Release task in a real-world setting. In this task, the robot is required to grasp a soft bag whose diameter is approximately 80% of the gripper's jaw width. This scenario poses a challenge to the robot's real-time manipulation capabilities, particularly in maintaining a stable operating posture and precisely coordinating the timing of the gripper's opening and closing in high speed, as the bag is prone to toppling during execution.

Hardware Setup

  • Robot Arm: Franka Research 3
  • Camera: UGREEN CM717
  • GPU: NVIDIA GeForce RTX 4090D

Qualitative Comparisons

Comparisons of different acceleration methods on the Pick-and-Release task. BAC achieves high inference frequency with low end-to-end latency.

DDPM 100 steps

Failure
Frequency 7.8 Hz
Latency 180 s

DDPM 30 steps

Frequency 19.4 Hz
Latency 54 s

DDIM 50 steps

Frequency 15.6 Hz
Latency 25 s

Uniform S=20

Frequency 28.1 Hz
Latency 52 s
Ours

BAC S=7

Frequency 39.2 Hz
Latency 15 s
Ours (Fastest)

BAC S=5

Frequency 45.1 Hz
Latency 13 s