General Information

Full Name: Feng Zhou
Email: zhoufeng@bupt.edu.cn
Location: Beijing, China

Education

  • Ph.D. student in Control Science and Engineering, Beijing University of Posts and Telecommunications, expected June 2027
    • Advisor: Prof. Jianqin Yin, BUPT-COST Lab
    • Research direction: controllable visual generation and visual understanding of the 3D world
  • B.Eng. in Internet of Things Engineering, Beijing University of Posts and Telecommunications, June 2022
    • GPA: 3.7 / 4.0
    • Recommended for direct Ph.D. admission

Research Interests

  • Feed-forward 3D reconstruction and 3D reconstruction foundation models
  • Sparse-view reconstruction, NeRF/3DGS scene representations, and 3D scene understanding
  • Diffusion models, controllable visual generation, and resolution extrapolation
  • AI-assisted research and engineering workflows for model development and experimentation

Experience

Nov 2025 - Present: Talent Program Intern, Horizon Robotics

  • Work on 3D reconstruction foundation models, with a focus on feed-forward multi-view geometry reasoning, cross-view communication, and high-resolution reconstruction.
  • Developed ATSR, a block-level top-k sparse attention design that reduces redundant cross-view communication and improves Pose AUC@30 from 0.879 to 0.891 with 8.45x acceleration under 300-view evaluation.
  • Developed GeoWeave, a selective cross-view communication mechanism for robust pose and point-cloud reconstruction under weak overlap and distracting views.
  • Explored Hybrid-VGGT, a high-resolution architecture with a low-resolution global backbone and a high-resolution HDE branch for depth and point-cloud quality.
  • Built mixed training data from 17 public datasets, unified evaluation scripts, business-data adaptation pipelines, and synthetic-data support for detailed object reconstruction.

Research Experience

3D Vision: Reconstruction and Scene Understanding

  • Sparse-view 3D reconstruction: study initialization quality as a dominant factor for sparse-view 3DGS; proposed low-frequency-aware SfM, 3DGS self-initialization, and point-cloud consistency filtering.
  • Object-level understanding in large 3D scenes: proposed Gaussian-segmentation guided object extraction and personalized diffusion priors for occluded or invisible regions.
  • Monocular 3D human pose estimation: designed pose-guided attention and adaptive feature selection to leverage image cues for 2D-to-3D pose lifting.

Controllable Visual Generation and Resolution Extrapolation

  • Analyzed implicit positional encoding in U-Net diffusion and proposed a training-free strategy for high-resolution image generation.
  • Studied resolution scalability in Diffusion Transformers and FLUX-like models through attention receptive fields and frequency-aware detail preservation.
  • Worked on image-only in-domain generation adaptation, controllable-generation evaluation, and a survey of text-to-image diffusion control.

Selected Publications

  • Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation. AAAI 2026 Oral.
  • Lifting by Image - Leveraging Image Cues for Accurate 3D Human Pose Estimation. AAAI 2024.
  • Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation. CVPR 2025.
  • ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers. CVPR 2026.
  • OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation. TCSVT 2025.

Technical Scope

  • Computer vision foundations: image formation, matching, optical flow, detection, segmentation, and pose estimation.
  • 3D vision: camera models, epipolar geometry, triangulation, PnP/BA, SfM/MVS/SLAM, NeRF, 3DGS, and feed-forward 3D foundation models.
  • Generative modeling: DDPM/DDIM, SDE/ODE samplers, CFG, rectified flow / flow matching, latent diffusion, and Transformer-based diffusion / flow models.
  • General-purpose AI workflow: LLM/VLM applications, RAG/Agent patterns, tool use, and AI-assisted research and engineering workflows.