CM3leon is designed for both text-to-image and image-to-text generation, showcasing capabilities in coherent image creation, editing, and vision-language tasks.
CM3leon achieves state-of-the-art performance in text-to-image generation, outperforming models like Google's Parti while using significantly less computational resources.
CM3leon can manage tasks such as text-to-image generation, text-guided image editing, image caption generation, visual question answering, and text-based editing.
CM3leon's innovatively adapted training recipe, involving retrieval-augmented pre-training, substantially reduces computational requirements, making it cost-effective.
While CM3leon reflects biases present in its training data, efforts in transparency and diverse data sourcing aim to mitigate bias and enhance fairness in model outputs.