Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing
Researchers propose a novel deep learning architecture for text-based 3D human motion editing that uses cross-axis feature fusion and joint-wise motion prediction to better understand which body joints should be modified and when. The method achieves state-of-the-art results on the MotionFix dataset by combining two specialized transformers that process temporal and spatial dimensions independently before fusion.