Feng Zhou

Full Name: Feng Zhou

Email: zhoufeng@bupt.edu.cn

Location: Beijing, China

Ph.D. student in Control Science and Engineering, Beijing University of Posts and Telecommunications, expected June 2027
- Advisor: Prof. Jianqin Yin, BUPT-COST Lab
- Research direction: controllable visual generation and visual understanding of the 3D world
B.Eng. in Internet of Things Engineering, Beijing University of Posts and Telecommunications, June 2022
- GPA: 3.7 / 4.0
- Recommended for direct Ph.D. admission

Feed-forward 3D reconstruction and 3D reconstruction foundation models
Sparse-view reconstruction, NeRF/3DGS scene representations, and 3D scene understanding
Diffusion models, controllable visual generation, and resolution extrapolation
AI-assisted research and engineering workflows for model development and experimentation

Nov 2025 - Present: Talent Program Intern, Horizon Robotics

Work on 3D reconstruction foundation models, with a focus on feed-forward multi-view geometry reasoning, cross-view communication, and high-resolution reconstruction.
Developed ATSR, a block-level top-k sparse attention design that reduces redundant cross-view communication and improves Pose AUC@30 from 0.879 to 0.891 with 8.45x acceleration under 300-view evaluation.
Developed GeoWeave, a selective cross-view communication mechanism for robust pose and point-cloud reconstruction under weak overlap and distracting views.
Explored Hybrid-VGGT, a high-resolution architecture with a low-resolution global backbone and a high-resolution HDE branch for depth and point-cloud quality.
Built mixed training data from 17 public datasets, unified evaluation scripts, business-data adaptation pipelines, and synthetic-data support for detailed object reconstruction.

3D Vision: Reconstruction and Scene Understanding

Sparse-view 3D reconstruction: study initialization quality as a dominant factor for sparse-view 3DGS; proposed low-frequency-aware SfM, 3DGS self-initialization, and point-cloud consistency filtering.
Object-level understanding in large 3D scenes: proposed Gaussian-segmentation guided object extraction and personalized diffusion priors for occluded or invisible regions.
Monocular 3D human pose estimation: designed pose-guided attention and adaptive feature selection to leverage image cues for 2D-to-3D pose lifting.

Controllable Visual Generation and Resolution Extrapolation

Analyzed implicit positional encoding in U-Net diffusion and proposed a training-free strategy for high-resolution image generation.
Studied resolution scalability in Diffusion Transformers and FLUX-like models through attention receptive fields and frequency-aware detail preservation.
Worked on image-only in-domain generation adaptation, controllable-generation evaluation, and a survey of text-to-image diffusion control.

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation. AAAI 2026 Oral.
Lifting by Image - Leveraging Image Cues for Accurate 3D Human Pose Estimation. AAAI 2024.
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation. CVPR 2025.
ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers. CVPR 2026.
OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation. TCSVT 2025.

Computer vision foundations: image formation, matching, optical flow, detection, segmentation, and pose estimation.
3D vision: camera models, epipolar geometry, triangulation, PnP/BA, SfM/MVS/SLAM, NeRF, 3DGS, and feed-forward 3D foundation models.
Generative modeling: DDPM/DDIM, SDE/ODE samplers, CFG, rectified flow / flow matching, latent diffusion, and Transformer-based diffusion / flow models.
General-purpose AI workflow: LLM/VLM applications, RAG/Agent patterns, tool use, and AI-assisted research and engineering workflows.