Discover more from So Essentially Substack
Jurassic World Remake through Image-to-Image Translation!
Is this a more accurate imagination of fossils now?
"We can make animal(dinosaur) visuals from skulls(fossils) better than ever!"
Researchers from the University of Rochester wanted to evaluate the task of translating an input image from a given source domain into a given target domain. Being able to perform image translations across large domain gaps has a wide variety of real-world applications in criminology, astrology, environmental conservation, and paleontology.
In this paper, they introduced a new task Skull2Animal for translating between skulls and living animals.
For background, the current image translation model work on translating between domains of images with small domain gaps, i.e., translating from photos to paintings or translating different types of animals (zebras to horses, cats to dogs). Because these translation tasks are fictitious, with no potential ground truth, models can randomly learn any potential mapping as long as the results are similar to the target domain!
The real value lies in accomplishing complex tasks involving Long-Image-to-Image (longI2I) transformations. This means using special models to turn pictures into pictures based on text instructions. These models provide reliable results and set clear boundaries for accurate translations. A model that can do longI2I tasks could be used by law enforcement or conservationists to show how cities, ecosystems, and habitats are impacted by climate change. It could also help illustrate the dangers of wildfires on towns and nature. Additionally, paleontologists could use it to imagine how dinosaurs and other ancient animals looked in real life.
Specifically, the researchers performed the diffusion process in the latent space with stable diffusion. This removes the need for a trained classifier for guidance and to perform the full forward process.
The dataset and code are available at https://tinyurl.com/skull2animal.
Here are some of the evaluation results from the paper
The researchers had some interesting takeaways:
They showed that traditional I2I methods using GANs are not adequate for large-domain translation.
By providing a classifier or text prompt, they can encode more information about the target domain into the diffusion process.
There is more scope in being able to encode enough information about the target domain but retaining the intrinsic source context, like head direction, background, and structural details
Prompting provides the best and most scalable information about the target
domain currently. Classifier-guided diffusion models require retraining for
specific use cases and lack stronger constraints on the target domain
because of the wide variety of images, they are trained on.
Thanks for reading So Essentially Substack! Subscribe for free to receive new posts and support my work.