daca-grpo-denoising-credit-assignment

star 2

Denoising-Aware Credit Assignment for GRPO in Diffusion Language Models. Introduces Denoising Progress Scores and Stratified Masking Likelihood to improve GRPO-style training for diffusion LLMs, achieving gains up to 5.6pp on math reasoning, 7.4pp on code generation, and 36.3pp on constraint satisfaction.

hiyenwong

By hiyenwong schedule Updated 6/4/2026

play_arrow Run Skill in Manus View GitHub

name: daca-grpo-denoising-credit-assignment description: Denoising-Aware Credit Assignment for GRPO in Diffusion Language Models. Introduces Denoising Progress Scores and Stratified Masking Likelihood to improve GRPO-style training for diffusion LLMs, achieving gains up to 5.6pp on math reasoning, 7.4pp on code generation, and 36.3pp on constraint satisfaction.

DACA-GRPO: Denoising-Aware Credit Assignment for RL in Diffusion Language Models

Core Methodology

DACA-GRPO addresses two fundamental weaknesses in RL for diffusion language models:

Absence of temporal credit assignment across the denoising trajectory
Systematic bias of mean-field likelihood estimates used for policy optimization

Key Mechanisms

Denoising Progress Scores (DPS)

Extracts per-token importance weights from intermediate predictions at no additional forward cost
Tokens that contribute more to denoising progress receive higher importance weights
Computed from the change in prediction confidence across denoising steps

Stratified Masking Likelihood (SML)

Partitions token positions into strata so each token is predicted with most of the sequence as context
Reduces the mean-field bias inherent in independent token likelihood estimation
Each token sees a different masking pattern, providing better context coverage

Implementation Notes

Plug-and-play: DACA-GRPO is a lightweight enhancement for any GRPO-style trainer
No additional forward passes: DPS extraction uses existing intermediate predictions
Compatible with existing GRPO variants: Works on top of GRPO, GSPO, REINFORCE++, etc.
Stratified masking: Implement token partitioning to reduce mean-field bias

Performance Gains

Mathematical reasoning: up to +5.6pp
Code generation: up to +7.4pp
Constraint satisfaction: up to +36.3pp
JSON schema adherence: up to +5.9pp

Applications

RL training for diffusion language models
Improving GRPO-style policy optimization
Diffusion-based code generation and reasoning
Constrained generation with structural requirements

Activation Keywords

daca-grpo, denoising credit assignment, diffusion language model, GRPO enhancement, stratified masking, denoising progress scores, diffusion LLM RL

Install via CLI

npx skills add https://github.com/hiyenwong/ai_collection --skill daca-grpo-denoising-credit-assignment

Repository Details

star Stars 2

call_split Forks 0

navigation Branch main

article Path SKILL.md

More from Creator

hiyenwong

hiyenwong Explore all skills →