Selected Publications
Published / accepted workExploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
Analyzes the implicit positional role of convolutional zero padding in high-resolution U-Net diffusion inference and introduces a training-free boundary-complement strategy for resolution extrapolation.
ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers
Studies high-resolution extrapolation in Diffusion Transformers and FLUX-like models, focusing on positional encoding, attention receptive fields, and frequency-aware detail preservation.
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
Proposes image-only domain adaptation for large-scale text-to-image diffusion models through guidance-decoupled prior preservation, reducing damage to the original controllability of the model.
Lifting by Image - Leveraging Image Cues for Accurate 3D Human Pose Estimation
Introduces pose-guided attention and adaptive feature selection to use image cues for 2D-to-3D human pose lifting while suppressing background overfitting.
OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Extracts object-level meshes from large 3D scenes by combining 2D Gaussian Splatting segmentation with personalized diffusion priors for occluded and invisible regions.
Controllable Generation with Text-to-Image Diffusion Models: A Survey
A systematic survey of controllable text-to-image diffusion, covering condition injection, structural control, editing control, personalization, and the evolution from prompt-level to spatial- and instance-level control.