Generative models are increasingly used in 3D vision to synthesize novel shapes, yet it remains unclear whether their generation relies on memorizing training shapes. Understanding their memorization could help prevent training data leakage and improve the diversity of generated results. In this paper, we design an evaluation framework to quantify memorization in 3D generative models and study the influence of different data and modeling designs on memorization. We first apply our framework to quantify memorization in existing methods. Next, through controlled experiments with a latent vector-set (Vecset) diffusion model, we find that, on the data side, memorization depends on data modality, grows with data diversity and finer conditioning; on the modeling side, it peaks at moderate guidance and can be mitigated by longer Vecsets and simple rotation augmentation. Together, our framework and analysis provide an empirical understanding of memorization in 3D generative models and suggest simple yet effective strategies to reduce it without degrading generation quality.
Memorization in 3D generative models is still underexplored. We propose a simple evaluation framework to quantify memorization in generated 3D shapes.
We benchmark distance metrics for memorization detection using 133 generated shapes. A retrieval is counted as correct if the generated shape is visually near-identical to the retrieved training neighbor.
As shown below, Light Field Distance (LFD) achieves the highest accuracy among all seven metrics. Therefore, we adopt LFD as our primary retrieval metric.
| Metric | LFD | Uni3D | ULIP-2 | CD | DinoV2 | PointNet++ | SSCD |
|---|---|---|---|---|---|---|---|
| Acc. (%) | 78.4 | 74.8 | 66.2 | 46.8 | 20.1 | 18.7 | 7.2 |
Top-1 retrieval accuracy (%) of distance metrics on 133 generated shapes from four ShapeNet categories. LFD achieves the highest accuracy.
With LFD as our metric, we define a memorization score using the Mann-Whitney U test. This outputs a standardized z-score ZU based on the nearest-neighbor distances from generated samples (Q) and held-out test samples (Ptest) to the training set (T).
When ZU < 0, the generated set is, on average, closer to the training data than the test set is. Thus, we treat ZU < 0 as evidence of memorization, with the strength of memorization increasing as ZU decreases.
Our evaluation framework incorporates a quality indicator using Fréchet Distance (FD). We interpret ZU only among models with similar FD values, ensuring that we decouple memorization from generation quality.
| Method | LAS-Diffusion (uncond.) |
LAS-Diffusion (class) |
Wavelet Generation |
3DShape2VecSet | Michelangelo |
|---|---|---|---|---|---|
| ZU | -7.02 | -4.93 | -1.35 | 4.56 | 9.25 |
|
Gen &
Retrieved |
|
|
|
|
|
ZU scores on ShapeNet chairs. Lower scores indicate stronger memorization. The images show generated shapes (left) vs. their nearest training neighbors (right).
| method | LAS-Diffusion (class) |
3DShape2VecSet | Michelangelo | Trellis-small | Trellis-large | Trellis-xlarge |
|---|---|---|---|---|---|---|
| split | IM-NET | 3DILG | 3DILG | Trellis500K | Trellis500K | Trellis500K |
| ZU | 0.46 | 1.07 | -0.33 | -0.67 | -1.57 | -2.19 |
ZU scores evaluated on full datasets. Scores near zero indicate the model generalizes well rather than memorizing.
@article{pu2025memorization,
title={Memorization in 3D Shape Generation: An Empirical Study},
author={Pu, Shu and Zeng, Boya and Zhou, Kaichen and Wang, Mengyu and Liu, Zhuang},
journal={arXiv preprint arXiv:2512.23628},
year={2025}
}