Memorization in 3D Shape Generation: An Empirical Study

Abstract

Generative models are increasingly used in 3D vision to synthesize novel shapes, yet it remains unclear whether their generation relies on memorizing training shapes. Understanding their memorization could help prevent training data leakage and improve the diversity of generated results. In this paper, we design an evaluation framework to quantify memorization in 3D generative models and study the influence of different data and modeling designs on memorization. We first apply our framework to quantify memorization in existing methods. Next, through controlled experiments with a latent vector-set (Vecset) diffusion model, we find that, on the data side, memorization depends on data modality, grows with data diversity and finer conditioning; on the modeling side, it peaks at moderate guidance and can be mitigated by longer Vecsets and simple rotation augmentation. Together, our framework and analysis provide an empirical understanding of memorization in 3D generative models and suggest simple yet effective strategies to reduce it without degrading generation quality.

Methodology

Memorization in 3D generative models is still underexplored. We propose a simple evaluation framework to quantify memorization in generated 3D shapes.

Distance Metrics

We benchmark distance metrics for memorization detection using 133 generated shapes. A retrieval is counted as correct if the generated shape is visually near-identical to the retrieved training neighbor.

As shown below, Light Field Distance (LFD) achieves the highest accuracy among all seven metrics. Therefore, we adopt LFD as our primary retrieval metric.

Metric	LFD	Uni3D	ULIP-2	CD	DinoV2	PointNet++	SSCD
Acc. (%)	78.4	74.8	66.2	46.8	20.1	18.7	7.2

Top-1 retrieval accuracy (%) of distance metrics on 133 generated shapes from four ShapeNet categories. LFD achieves the highest accuracy.

Memorization Metric

With LFD as our metric, we define a memorization score using the Mann-Whitney U test. This outputs a standardized z-score Z_U based on the nearest-neighbor distances from generated samples (Q) and held-out test samples (P_test) to the training set (T).

When Z_U < 0, the generated set is, on average, closer to the training data than the test set is. Thus, we treat Z_U < 0 as evidence of memorization, with the strength of memorization increasing as Z_U decreases.

Evaluation Framework

Our evaluation framework incorporates a quality indicator using Fréchet Distance (FD). We interpret Z_U only among models with similar FD values, ensuring that we decouple memorization from generation quality.

Evaluating Existing Methods

Memorization on Single Category (ShapeNet Chairs)

Method	LAS-Diffusion (uncond.)	LAS-Diffusion (class)	Wavelet Generation	3DShape2VecSet	Michelangelo
Z_U	-7.02	-4.93	-1.35	4.56	9.25
Gen & Retrieved

Z_U scores on ShapeNet chairs. Lower scores indicate stronger memorization. The images show generated shapes (left) vs. their nearest training neighbors (right).

Memorization on Entire Training Sets

method	LAS-Diffusion (class)	3DShape2VecSet	Michelangelo	Trellis-small	Trellis-large	Trellis-xlarge
split	IM-NET	3DILG	3DILG	Trellis500K	Trellis500K	Trellis500K
Z_U	0.46	1.07	-0.33	-0.67	-1.57	-2.19

Z_U scores evaluated on full datasets. Scores near zero indicate the model generalizes well rather than memorizing.

**Qualitative retrieval on ShapeNet Chairs.** Earlier models (left) show strong memorization even at high percentiles (60^th), while modern models (right) generate novel geometries.

**Qualitative retrieval on Entire Datasets.** Across all large-scale models, retrieved training shapes become visually distinct from generated samples at moderate percentiles (20^th–60^th).

Controlled Experiments

BibTeX


@article{pu2025memorization,
  title={Memorization in 3D Shape Generation: An Empirical Study},
  author={Pu, Shu and Zeng, Boya and Zhou, Kaichen and Wang, Mengyu and Liu, Zhuang},
  journal={arXiv preprint arXiv:2512.23628},
  year={2025}
}