Feng Zhou - Publications

AAAI 2026 Oral

Preview for Exploring Position Encoding in Diffusion U-Net

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

Feng Zhou*, Pu Cao*, Yiyang Ma, Lu Yang, Yonghao Dang, and Jianqin Yin

AAAI Conference on Artificial Intelligence, 2026. Oral presentation.

Analyzes the implicit positional role of convolutional zero padding in high-resolution U-Net diffusion inference and introduces a training-free boundary-complement strategy for resolution extrapolation.

Paper

CVPR 2026

ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers

Yiyang Ma*, Feng Zhou*, Pu Cao, Yonghao Dang, and Jianqin Yin

IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026.

Studies high-resolution extrapolation in Diffusion Transformers and FLUX-like models, focusing on positional encoding, attention receptive fields, and frequency-aware detail preservation.

Paper

CVPR 2025

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

Pu Cao*, Feng Zhou*, Lu Yang, Tianrui Huang, and Qing Song

IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025.

Proposes image-only domain adaptation for large-scale text-to-image diffusion models through guidance-decoupled prior preservation, reducing damage to the original controllability of the model.

Paper

AAAI 2024

Lifting by Image - Leveraging Image Cues for Accurate 3D Human Pose Estimation

Feng Zhou, Jianqin Yin, and Peiyang Li

AAAI Conference on Artificial Intelligence, 2024.

Introduces pose-guided attention and adaptive feature selection to use image cues for 2D-to-3D human pose lifting while suppressing background overfitting.

Paper

TCSVT 2025

OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation

Lizhi Wang*, Feng Zhou*, Bo Yu, Pu Cao, and Jianqin Yin

IEEE Transactions on Circuits and Systems for Video Technology, 2025.

Extracts object-level meshes from large 3D scenes by combining 2D Gaussian Splatting segmentation with personalized diffusion priors for occluded and invisible regions.

Paper

TPAMI 2025

Survey

Controllable Generation with Text-to-Image Diffusion Models: A Survey

Pu Cao, Feng Zhou, Qing Song, and Lu Yang

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025.

A systematic survey of controllable text-to-image diffusion, covering condition injection, structural control, editing control, personalization, and the evolution from prompt-level to spatial- and instance-level control.

Selected Publications

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

Lifting by Image - Leveraging Image Cues for Accurate 3D Human Pose Estimation

OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation

Controllable Generation with Text-to-Image Diffusion Models: A Survey