Two tricks we used to enhance our Coloring Book AI Generator
We recently launched the Coloring Book Page AI Generator, a tool designed to generate unique coloring visuals from your creative prompts. This feature provides access to an unlimited catalogue of drawings, powered by AI and your own imagination. At MWM, we regularly update our models, each iteration an incremental step towards optimizing the creative experience.
Although the initial version was capable of generating various cool graphics, we identified two primary areas for improvement:
- The drawing lines frequently appeared too rough, leading to a less polished and smooth appearance.
- The generated drawings often included large black areas that are impractical for coloring.
In this article, we will explore two main approaches we employed to address these issues.
1. Gathering a High-Quality Dataset
The fundamental yet critical idea for improving our model is to revisit its training dataset.
We are fine-tuning Stable Diffusion for coloring book visuals, an operation that is quite sensitive to the overall quality of the image samples used during training.
In this second iteration, our primary focus was to gather a high-quality dataset of images. We searched for images with varied subject yet consistent in style. Emphasis was placed on coloring pages that provided large areas for coloring, avoiding overly complex designs.
Additionally, we observed that our users often had a clear idea of the subject they wanted to draw in their prompts, and were not inclined towards visuals that were overly abstract.
This observation led us to reduce the use of excessively complex drawing line schemes, thus limiting “Mandala effect” — a bias in the first version of the model that tended to create round and complex shapes even when they were not necessarily desired.
Finally, we also ensured that the images were of a sufficiently high resolution. This enabled the AI model to learn to produce clear, non-pixelated images.
By using a high-quality dataset, we were on the right path toward training an AI model capable of generating coloring book pages that are both aesthetically pleasing and practical to color on a tablet or mobile phone screen.
2. The Significance of the “Offset Noise Trick”
A few months ago, there has been a lot of excitement around a technique called “offset noise” for Stable Diffusion model training. This simple modification allows the model to generate images with significantly brighter or darker tones.
This method was discussed in an article by Crosslabs, based on one of their observations: without this offset modification, the images generated by Stable Diffusion tend to have an average pixel value around 0.5. On the pixel value scale, where 0 represents black and 1 represents white, this suggests that the model typically generates images that are well-balanced in terms of dark/light contrast.
This phenomenon is linked to the stable diffusion image noising process that occurs during training.
Without delving too deeply into the complexities, high-frequency components — with frequency referring to the rate of change in intensity (pixel value) along the X/Y dimensions — are destroyed significantly faster by the noising process. In comparison, the underlying low-frequency components of the image require many more noising steps to be fully destroyed.
The AI model strives to reverse the noising process, learning from the data distribution of our coloring book dataset.
However, since that the low frequency components are never completely eradicated, the training process naturally favors the preservation of these components in the images during the inference phase.
The offset noise trick involves introducing a stronger low-frequency variation to the noise during training by adding a small offset to all the pixel values. This is a shift (or offset) of the noise at the lowest frequency 0— the entire image channel.
By introducing more diversity in these low-frequency noise components during training, the network’s capacity to denoise dark or light images is no longer limited (values can potentially vary more freely towards 0 or 1).
This change in the lighter/darker contrast mirrors the contrast between the white areas to color and the black lines of the drawing. Ideally, we aim to develop a model that mostly generates white images, but with high capacity to vary the contrast to trace the black lines of the shapes to be colored.
While the original model training (without the offset trick) is visually appealing with our new dataset, the large black areas are impractical for coloring.
We discovered that training with this offset trick minimizes the occurrence of large black areas or undesirable background artifacts in the image composition. This method is a promising avenue for our goals with Color Pop
Stable diffusion image generation is a powerful tool that we’ve fine-tuned for the creation of coloring book pages. By prioritizing consistency with high-quality datasets and implementing new research ideas around diffusion models, we are steadily progressing towards better results. This enables our creative users to easily generate visually captivating images for coloring and sharing.
The overall improvement of the model is particularly well-received, as we are simultaneously introducing numerous community features in our Color Pop application. These features include the ability to share drawings with others, view the prompts that inspired an AI creation, and color pages created by other users.
Thank you for reading. Stay tuned for more exciting AI improvements in the near future!