Hammer Lab, Bielefeld University

The Virtual Try-Off (VTOFF) task aims to generate realistic garment imagery from an input image of a clothed person. While significant progress has been made, challenges persist in preserving intricate garment details—such as textures, patterns, and fine structural features—to produce high-fidelity outputs. A critical component influencing the quality of these generated images is the image encoder used within the diffusion model pipeline.

This thesis investigates the impact of various image encoders, including CLIP, SigLIP2, DINOv2, OpenCLIP, and MambaVision, on the realism, structural consistency, and detail preservation in fashion image generation. Through systematic comparisons, the study aims to determine the optimal encoder configurations to enhance the performance of diffusion-based clothing generation systems.

In addition to encoder analysis, this research addresses three key areas:

Real-Time Optimization: To enable practical deployment in applications like virtual fitting rooms, the study will explore optimization techniques to reduce the computational complexity of diffusion models. Methods such as model distillation, latent diffusion, and the use of lightweight encoders will be evaluated to achieve real-time garment generation without compromising image quality.
Energy Efficiency: Sustainability is a growing concern in fashion technology. This thesis will investigate energy-efficient training and inference methods to minimize the environmental footprint of diffusion models, ensuring that high-quality garment generation aligns with eco-friendly practices.
Domain Adaptation: Models trained on synthetic garment datasets often struggle to generalize to other domains due to domain gaps and overfitting. This research will examine domain adaptation strategies to improve model performance.

By integrating these elements, the thesis aims to advance the field of clothing image generation, delivering solutions that are not only high-performing but also efficient, sustainable, and adaptable to real-world scenarios.

Literature

[1] Virtual Try-Off: https://rizavelioglu.github.io/tryoffdiff/
[2] CLIP: https://arxiv.org/abs/2103.00020
[3] SigLIP: https://arxiv.org/abs/2502.14786
[4] DINOv2: https://arxiv.org/abs/2304.07193
[5] OpenCLIP: https://ieeexplore.ieee.org/document/10205297
[6] MambaVision: https://arxiv.org/abs/2407.08083

[BA/MA]

Clothing Image Generation with Diffusion Models