General Information
Full Name: Feng Zhou
Email: zhoufeng@bupt.edu.cn
Location: Beijing, China
Education
-
Ph.D. student in Control Science and Engineering, Beijing University of Posts and
Telecommunications, expected June 2027
- Advisor: Prof. Jianqin Yin, BUPT-COST Lab
- Research direction: controllable visual generation and visual understanding of the 3D world
-
B.Eng. in Internet of Things Engineering, Beijing University of Posts and
Telecommunications, June 2022
- GPA: 3.7 / 4.0
- Recommended for direct Ph.D. admission
Research Interests
- Feed-forward 3D reconstruction and 3D reconstruction foundation models
- Sparse-view reconstruction, NeRF/3DGS scene representations, and 3D scene understanding
- Diffusion models, controllable visual generation, and resolution extrapolation
- AI-assisted research and engineering workflows for model development and experimentation
Experience
Nov 2025 - Present: Talent Program Intern, Horizon Robotics
- Work on 3D reconstruction foundation models, with a focus on feed-forward multi-view geometry reasoning, cross-view communication, and high-resolution reconstruction.
- Developed ATSR, a block-level top-k sparse attention design that reduces redundant cross-view communication and improves Pose AUC@30 from 0.879 to 0.891 with 8.45x acceleration under 300-view evaluation.
- Developed GeoWeave, a selective cross-view communication mechanism for robust pose and point-cloud reconstruction under weak overlap and distracting views.
- Explored Hybrid-VGGT, a high-resolution architecture with a low-resolution global backbone and a high-resolution HDE branch for depth and point-cloud quality.
- Built mixed training data from 17 public datasets, unified evaluation scripts, business-data adaptation pipelines, and synthetic-data support for detailed object reconstruction.
Research Experience
3D Vision: Reconstruction and Scene Understanding
- Sparse-view 3D reconstruction: study initialization quality as a dominant factor for sparse-view 3DGS; proposed low-frequency-aware SfM, 3DGS self-initialization, and point-cloud consistency filtering.
- Object-level understanding in large 3D scenes: proposed Gaussian-segmentation guided object extraction and personalized diffusion priors for occluded or invisible regions.
- Monocular 3D human pose estimation: designed pose-guided attention and adaptive feature selection to leverage image cues for 2D-to-3D pose lifting.
Controllable Visual Generation and Resolution Extrapolation
- Analyzed implicit positional encoding in U-Net diffusion and proposed a training-free strategy for high-resolution image generation.
- Studied resolution scalability in Diffusion Transformers and FLUX-like models through attention receptive fields and frequency-aware detail preservation.
- Worked on image-only in-domain generation adaptation, controllable-generation evaluation, and a survey of text-to-image diffusion control.
Selected Publications
- Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation. AAAI 2026 Oral.
- Lifting by Image - Leveraging Image Cues for Accurate 3D Human Pose Estimation. AAAI 2024.
- Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation. CVPR 2025.
- ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers. CVPR 2026.
- OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation. TCSVT 2025.
Technical Scope
- Computer vision foundations: image formation, matching, optical flow, detection, segmentation, and pose estimation.
- 3D vision: camera models, epipolar geometry, triangulation, PnP/BA, SfM/MVS/SLAM, NeRF, 3DGS, and feed-forward 3D foundation models.
- Generative modeling: DDPM/DDIM, SDE/ODE samplers, CFG, rectified flow / flow matching, latent diffusion, and Transformer-based diffusion / flow models.
- General-purpose AI workflow: LLM/VLM applications, RAG/Agent patterns, tool use, and AI-assisted research and engineering workflows.