Tr2d2

TR2-D2 is out! Read our paper on off-policy RL for discrete diffusion fine-tuning with multi-objective rewards!