Activity Log

Amazing! Google AI can generate customized 3D game mythical creatures with one click, and you can experience it online!


  How terrifying is an AI that can paint?

  Have you ever imagined what a magical combination of ants and pigs, crabs and whales, or any two of 100 creatures would look like?

  Now, AI can turn all these wild imaginations into reality!

  And all we need to do is click the mouse and doodle casually, like this:

  A rhinoceros' horn, an eagle's wings, a dinosaur's tail, combined to look like this:

  A proper professional creative work, it's simply too friendly for painting novices.

  More importantly, it may also inspire your creative inspiration, and this is one of the purposes of Google's research team in launching this tool.

  This AI painting tool is called Chimera Painter. It is a web tool that generates highly realistic "monsters" based on animal sketches.

  After completing the doodle, just click the "Transform" button, and it will automatically generate a 3D "monster".

  Interestingly, the Googel research team also used the monsters created by Chimera Painter to build a digital card game.

  The attack value of each card in the picture is determined by the monster on it, and the skills of these monsters are determined by the two species they are combined with.

  It is understood that the research and development inspiration of this AI tool comes from the "monsters" we usually see in games. Googel researchers believe that the creation of these monsters often requires game artists to have a high degree of artistic creativity and technical knowledge, while AI can act as a paintbrush to help them save time in artistic creation, such as one-click 3D rendering, and can even enhance their creativity.

  If there are 100 animals in the game, and each animal can be fused with each other, it will bring a lot of workload to any artist, but this is very easy for machine learning.

  So, how does it do it?

  Generative model based on GAN

  Chimera Painter is a machine learning (ML) model. In order to generate high-quality and arbitrarily combined monster images, the research team provided the model with thousands of biological images and labeled special parts such as claws, legs, legs, and eyes for model training.

  The training process of the model still uses Generative Adversarial Networks (GANs). We are already very familiar with GANs. It can generate high-definition and realistic new images based on two convolutional neural networks: a generator and a discriminator. Its working principle is that the generator is used to create new images, and the discriminator is used to determine whether these images come from the training dataset.

  However, here the researchers propose a variant called conditional GAN, where the generator uses separate inputs to guide the image generation process. Interestingly, this approach is completely different from the work of other GANs, because the latter usually focuses on photorealism, while the purpose of this tool is to fuse different species to generate a chimera.

  To train the GAN, the research team created a full-color image dataset containing outlines of single species, which were adapted from 3D biological models. The outline of this organism describes the shape and size of each organism and provides a segmentation map to identify the various parts of the body.

  After training, the model can generate the best multi-species chimera based on the outline provided by the artist and embed it into Chimera Painter.

  Creating a structured biological dataset

  When using GAN to generate new species, there is a problem that spatial coherence may be lost when drawing image details or low-contrast parts, including the distinction between eyes, fingers, or even overlapping body parts with similar textures.

  Therefore, it has certain requirements for the training dataset. Existing illustration libraries are not suitable for use as datasets for training ML models because they may have conflicting styles or lack diversity. Datasets for generating chimeras need to have uniqueness, such as dramatic perspectives, compositions, and lighting.

  To solve this problem, the researchers developed a user-led semi-automatic method, that is, creating an ML training dataset from 3D biological models. In this process, users will create and obtain a set of 3D biological models.

  Specifically, they will use Unreal Engine to produce two sets of textures and superimpose them on the 3D model—one set with full-color textures (left image), and the other showing each part of the body (such as head, ears, neck, etc.), called a segmentation map (right image).

  Among them, the body segmentation part in Figure 2 will be submitted to the model for training to ensure that the GAN understands the specific structure, shape, texture, and proportions of various biological body parts.

  Three-dimensional biological models are all placed in a 3D scene and are also used in Unreal Engine. A set of automated scripts will use this 3D scene and interpolate between different poses, viewpoints, and scaling levels of each 3D biological model to create full-color images and segmentation maps to form the training dataset for GAN.

  Using this method, the researchers generated more than 10,000 image + segmentation map pairs for each 3D biological model. Compared with manually creating this data, users can save about 20 minutes per image.

  Generating high-fidelity images

  The size of the GAN's hyperparameters affects the quality of the images output by the model. In order to verify which version of the model performs best, the research team collected and analyzed samples of the model generating different biological types, and extracted some significant features from them, such as depth of field, style of biological textures, and realism of faces and eyes.

  This information will not only be used to train new versions of the model, but will also be used to select the best images from each biological category (such as gazelle, lynx, gorilla, etc.) after the model generates thousands of biological images.

  Specifically, the research team optimized the GAN through perceptual loss (Perceptual Loss). This loss function component uses features extracted from a separate convolutional neural network (CNN) to calculate the difference between two images. This convolutional neural network has been previously trained on millions of photos in the ImageNet dataset.

  Features are extracted from different layers of the CNN, and weights are applied to each feature, which affects the contribution of the features to the final loss value. These weights are crucial for determining the appearance of the final generated image.

  Below are the results of GANs trained with different perceptual loss weights.

  The color variations in the image are primarily due to the dataset. This is because a single organism in the dataset often contains multiple textures (e.g., red and gray versions of a bat). However, ignoring color variations, many differences are directly related to changes in the perceptual loss value.

  Researchers found that specific values produce clearer facial features, making the generated organisms more realistic.

  Below are some organisms generated by GANs, trained with different perceptual loss weights, showcasing the model's ability to handle a small subset of outputs and poses.

  Online Experience

  In short, for artists or painting enthusiasts, Chimera Painter allows for the easy creation of numerous images by simply adjusting the local shape, type, or position of organisms, rather than drawing dozens of similar organisms from scratch. The model also allows the use of organism outlines created using external programs (such as Photoshop).