ComfyGen: Comfortably generate high quality visuals

Oct 04, 2024

ComfyGen: LLM + ComfyUI gives better results than simply ComfyUI nodes

Paper: ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation (27 Pages)

Researchers from NVIDIA and Tel Aviv University are interested in a novel approach for improving text-to-image generation quality by using large language models (LLMs).

Hmm..What’s the background?

The standard approach to text-to-image generation relies on a single, monolithic model to convert a text prompt into an image. However, users often create complex multi-model workflows by ComfyUI nodes, which can result in higher quality images. The researchrs propose using an LLM to automatically generate these workflows, tailored to each individual user prompt.

Ok, So what is proposed in the research paper?

The paper presents two LLM-based approaches for prompt-adaptive workflow generation:

ComfyGen-IC (In-Context): This method uses a closed-source LLM and a table of pre-computed workflow scores for different image categories. Given a new prompt, the LLM classifies the prompt into relevant categories and selects the workflow that has historically performed best for those categories.
ComfyGen-FT (Fine-Tuning): This method fine-tunes an open-source LLM to predict the workflow that achieved a given score for a given prompt. During inference, the LLM is given a new prompt and a target score, and it predicts an appropriate workflow.

ComfyGen consistently outperforms the baseline SDXL model and popular fine-tuned variations (JuggernautXL, DreamshaperXL, and DPO-SDXL). ComfyGen also outperforms the two most popular generic workflows from the training corpus.

What’s next?

In future work, ComfyGen could be extended to handle image-to-image or video generation tasks. Further research is needed to enable ComfyGen to generate truly novel workflows, potentially through collaborative agent approaches.

So essentially,

ComfyGen: LLM + ComfyUI gives better results than simply ComfyUI nodes

Learned something new? Consider sharing it!

So Essentially

Discussion about this post