How to make Vision GPTs more human aligned?
So essentially,
Reliable evaluation metrics make GPTs human aligned ✅
Paper: GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation (23 Pages)
Researchers from the Chinese University of Hong Kong, Stanford University, Adobe Research, Nanyang Technological University, and Shanghai Artificial Intelligence Laboratory are interested in better generative 3D models through better evaluation metrics.
Hmm..What’s the background?
Existing evaluation metrics for text-to-3D generative models are limited, focusing on a single criterion and lacking the flexibility to generalize to different evaluation criteria.
User preference studies can provide adaptable and human-aligned results, but they are expensive to scale.
Ok, So what is proposed in the research paper?
This paper presents an automatic, versatile, and human-aligned evaluation metric for text-to-3D generative models. The proposed method consists of a prompt generator that uses GPT-4V to generate evaluating prompts and a method that instructs GPT-4V to compare two 3D assets according to user-defined criteria. The pairwise comparison results are used to automatically rank a set of text-to-3D models by assigning each model an Elo rating.
And what’s next?
The researchers discuss several unresolved challenges. It’s important to scale up this study to better verify the hypothesis. GPT-4V's responses are not always true, exhibiting hallucinations and systematic errors. As per the research, the Ensembling technique helps mitigate issues, but solving them fundamentally is an interesting direction.
So essentially,
Reliable evaluation metrics make GPTs human aligned ✅