Datasets. During training, we randomly sample the 30% subset from the SA-1B dataset [34], which contains ∼ 3 million images and ∼ 0.3 billion masks. Although this supervision provides diverse binary masks, it lacks the semantic class for each mask. In addition, following Chen et al. [8], we collect about 1.3 million image-text pairs and use a large vision-language model to refine them. Afterward, we use the ChatGPT-based parser to extract entities with descriptive words from these text descriptions.

\ Evaluation & metrics. We evaluate our model mainly on three tasks, including open-vocabulary semantic segmentation, open-vocabulary panoptic segmentation, and promptable segmentation. Following previous work [76], we adopt prompt engineering from [21, 66] and prompt templates from [22, 37]. For open-vocabulary semantic segmentation, we zero-shot evaluate the model on the COCO [40], ADE20K [84], PASCAL [18] datasets. The open-vocabulary semantic segmentation results are evaluated with the mean Intersection-over-Union (mIoU). For open-vocabulary panoptic segmentation, we evaluate the model on the COCO, ADE20K, and Cityscapes [15] datasets. We report the panoptic quality (PQ), semantic quality (SQ), and recognition quality (RQ) for open-vocabulary panoptic segmentation. For promptable segmentation, we report the 1-Point and 1-Box IoU (Oracle) on a wide range of datasets. Oracle denotes that we select the output mask with the max IoU by calculating the IoU between the prediction and target mask. More details can be found in Appendix B.

\ Table 1. Open-vocabulary semantic segmentation performance. We mainly compare with the fully-supervised and weakly-supervised methods. “COCO S.”, “COCO P.” and “COCO C.” denote the COCO stuff, panoptic and caption datasets. “O365” denotes the Object 365 dataset. “M. 41M” denotes the merged 41M image dataset. We report mIoU for all datasets.

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

Feed: Hacker Noon - Medium

View: Original article

Tags: framework

Frameworks