🪄DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting

ICCV 2025
1DSA, HKUST(GZ)   2CSE, HKUST
*Corresponding authors

DiGA3D is a versatile 3D inpainting framework guided by text prompts, supporting multiple inpainting tasks including object replacement, removal, and re-texturing, etc.


Abstract

Developing a unified pipeline that enables users to remove, re-texture, or replace objects in a versatile manner is crucial for text-guided 3D inpainting. However, there are still challenges in performing multiple 3D inpainting tasks within a unified framework: 1) Single reference inpainting methods lack robustness when dealing with views that are far from the reference view. 2) Appearance inconsistency arises when independently inpainting multi-view images with 2D diffusion priors; 3) Geometry inconsistency limits performance when there are significant geometric changes in the inpainting regions. To tackle these challenges, we introduce DiGA3D, a novel and versatile 3D inpainting pipeline that leverages diffusion models to propagate consistent appearance and geometry in a coarse-to-fine manner. First, DiGA3D develops a robust strategy for selecting multiple reference views to reduce errors during propagation. Next, DiGA3D designs an Attention Feature Propagation (AFP) mechanism that propagates attention features from the selected reference views to other views via diffusion models to maintain appearance consistency. Furthermore, DiGA3D introduces a Texture-Geometry Score Distillation Sampling (TG-SDS) loss to further improve the geometric consistency of inpainted 3D scenes. Extensive experiments on multiple 3D inpainting tasks demonstrate the effectiveness of our method.


Method



Our proposed framework. Before performing 3D inpainting, we first calculate the camera pose using COLMAP [1] and extract masks from mask prompts Tm. We then apply k-means clustering to group the views based on their camera centers and select the views closest to the cluster centers as the reference views. In the coarse stage, we employ DDIM Inversion [2] to generate deterministic latents, which are then used to produce coarsely consistent inpainting results with a 2D inpainter equipped with the AFP module. In the fine stage, we utilize ControlNet [3], leveraging texture and depth images as conditions, to further refine the 3D inpainting results by TG-SDS loss. In this scene, we designate Tp as "a cake" and Tn as "watering can" to replace the watering can with a cake.


Object Replacement

Our DiGA3D allows for replacing one object with another using text prompts. The first three videos are from the SPIn-NeRF dataset, trained at a resolution of 1008 × 567. The fourth video is from the statue dataset, trained at a resolution of 504 × 378.

Original Views "watering can" -> "bonsai" Original Views "box" -> "basketball"
Original Views "bag" -> "Van Gogh portrait" Original Views "statue" -> "a vase"

Object Re-Texturing

Our DiGA3D also enables object re-texturing (e.g., changing colors, materials, styles, etc.) using text prompts. We present examples from the SPIn-NeRF [4] and LLFF [5] datasets.

Original Views "watering can" -> "bronze watering can" Original Views "box" -> "brown wooden box"
Original Views "red flowers" -> "yellow flowers" Original Views "fortress" -> "origami fortress"

Object Removal

Our DiGA3D enables the removal of specific objects using text prompts.

References

    [1] Schonberger, Johannes L., and Jan-Michael Frahm. Structure-from-motion revisited. CVPR, 2016.
    [2] Song J, Meng C, Ermon S. Denoising diffusion implicit models. ICLR, 2021.
    [3] Zhang L, Rao A, Agrawala M. Adding conditional control to text-to-image diffusion models. ICCV, 2023.
    [4] Mirzaei A, Aumentado-Armstrong T, Derpanis K G, et al. SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields. CVPR, 2023.
    [5] Mildenhall B, Srinivasan P P, Ortiz-Cayon R, et al. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. TOG, 2019.
    [6] Wang Y, Wu Q, Zhang G, Xu D. Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal. ECCV, 2024.
  

BibTeX


  @inproceedings{pan2025diga3d,
    title={DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting},
    author={Pan, Jingyi and Xu, Dan and Luo, Qiong},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    year={2025}
    }