Universität Bielefeld Play

[BA/MA]

Clothing Image Generation with Diffusion Models

Contact: Riza Velioglu

The Virtual Try-Off (VTOFF) task aims to generate realistic garment imagery from an input image of a clothed person. While significant progress has been made, challenges persist in preserving intricate garment details—such as textures, patterns, and fine structural features—to produce high-fidelity outputs. A critical component influencing the quality of these generated images is the image encoder used within the diffusion model pipeline.

This thesis investigates the impact of various image encoders, including CLIP, SigLIP2, DINOv2, OpenCLIP, and MambaVision, on the realism, structural consistency, and detail preservation in fashion image generation. Through systematic comparisons, the study aims to determine the optimal encoder configurations to enhance the performance of diffusion-based clothing generation systems.

In addition to encoder analysis, this research addresses three key areas:

By integrating these elements, the thesis aims to advance the field of clothing image generation, delivering solutions that are not only high-performing but also efficient, sustainable, and adaptable to real-world scenarios.

Literature