Phased Consistency Model
So essentially,
Phase Consistency Models ensure AI model generated images maintain consistency for characters and objects and don’t have irregular features
Paper: Phased Consistency Model (49 Pages)
Researchers from multiple organizations such as The Chinese University of Hong Kong, Avolution AI, Hedra, Shanghai AI Laboratory, Sensetime Research and Stanford University are interested in ensuring AI generated images don’t have any weird features looking like uncanny valley.
Hmm..What’s the background?
The consistency model (CM) has brought significant advancements in accelerating diffusion models, specifically by reducing the number of iterative steps needed to generate samples. However, applying CM, and specifically latent consistency models (LCM), to high-resolution, text-conditioned image generation has not been successful. Images generated by AI still sometimes contain 7 fingers in one hand and holes in irregular places.
Ok, So what is proposed in the research paper?
Three main limitations of LCMs are identified:
Consistency: Due to the inherent properties of CMs and LCMs, they can only use a stochastic multi-step sampling algorithm. This results in varying degrees of noise for different inference-step settings and inconsistencies in the generated results, even with the same seeds, across different inference steps.
Controllability: While Stable Diffusion models can handle classifier-free guidance (CFG) across a wide range of steps, LCMs with their current design, are limited to 1-2 steps before encountering exposure problems. This limitation hinders hyper-parameter selection and reduces controllability.
Efficiency: LCMs produce inferior results in few-step settings, particularly below 4 steps, limiting their efficiency.
The paper introduces a novel adversarial loss to improve generation quality, particularly in low-step regimes. It involves a discriminator (designed using the pre-trained U-Net backbone) that operates in the latent space, aligning the distributions of generated and target data. Additionally, The paper analyzes the impact of Classifier-Free Guidance (CFG) on consistency distillation, revealing that CFG-augmented ODE solvers can limit the effectiveness of CFG during inference.
What’s next?
The authors acknowledge some potential avenues for further research:
Exploring Alternative Discriminator Architectures: The paper primarily focuses on using a latent-based discriminator based on the pre-trained U-Net. Investigating alternative discriminator architectures, particularly for high-resolution generation where pixel-based discriminators might be more effective, could be beneficial.
Optimizing Randomness in Sampling: While the paper introduces a parameter ('r') to control randomness during sampling, determining the optimal degree of randomness for different scenarios and exploring adaptive approaches to adjust randomness during generation could be valuable.
Evaluating Different ODE Solvers: The paper primarily uses DDIM as the ODE solver for both training and evaluation. Experimenting with different ODE solvers with varying orders of approximation accuracy and computational costs could lead to further improvements in generation quality or efficiency.
So essentially,
Phase Consistency Models ensure AI model generated images maintain consistency for characters and objects and don’t have irregular features