The study involves 25 participants and annotates for a total of 8236 images in the zero-shot MSCOCO 2014 validation human subset. They take 2-3 days to complete all the user study task, with a final review to examine the validity of human preference. Specifically, we conduct side-by-side comparisons between our generated results and each baseline model’s results. The asking question is “Considering both the image aesthetics and text-image alignment, which image is better? Prompt: .” The labelers are unaware of which image corresponds to which baseline, i.e., the place of two compared images are shuffled to achieve fair comparison without bias.

\ Additional Ablation Results for Structure-Guided Refiner. Due to the resource limit and resolution discrepancy, we experiment on 512×512 resolution to illustrate our design’s efficacy.

\ We note that all the labelers are well-trained for such text-to-image generation comparison tasks, who have passed the examination on a test set and have experience in this kind of comparisons for over 50 times. Below, we include the user study rating details for our method vs. baseline models. Each labeler can click on four options: a) The left image is better, in this case the corresponding model will get +1 grade. b) The right image is better. c) NSFW, which means the prompt/image contain NSFW contents, in this case both models will get 0 grade. d) Hard Case, where the labelers find it hard to tell which one’s image quality is better, in this case both models will get +0.5 grade. The detailed comparison statistics are shown in Table 8, where we report the grades of HyperHuman vs. baseline methods. It can be clearly seen that our proposed framework is superior than all the existing models, with better image quality, realism, aesthetics, and text-image alignment.

\ Detailed Comparion Statistics in User Study. We conduct a comprehensive user study on zero-shot MS-COCO 2014 validation human subset with well-trained participants.

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

Feed: Hacker Noon - Medium

View: Original article

Tags: framework options

Frameworks