Recent advancements in diffusion models have established them as powerful generative tools in vision tasks, particularly for high-fidelity image synthesis. While the architectural design and training strategies have received considerable attention, the role of initial noise—the very starting point of the generation process—has remained relatively underexplored. However, emerging evidence suggests that the structure, type, and optimization of initial noise significantly influence the diversity, quality, and controllability of outputs.
This thesis aims to investigate the impact of noise initialization strategies on the performance and behavior of diffusion models, particularly in text-to-image generation and rare concept synthesis. Through a combination of empirical evaluation and theoretical analysis, the study will examine both structured noise (e.g., Perlin or multi-resolution noise) and learned noise initialization approaches.
Literature