Scores on various methods on PEARL benchmark.
# | Tagger | Filter | Selector | Locator | In Mask | Score | MTurk | Expert |
- | Natural Placement* | 0.907 | 17.987 | 1.000 | 1.000 | |||
1 | RAM++ | G-DINO | GPT-4 | G-SAM (Center) | 0.702 | 7.634 | 0.527 | 0.690 |
2 | GPT-4V(ision) | CLIPSeg (Max) | 0.692 | -4.492 | 0.582 | 0.620 | ||
3 | GPT-4V(ision) | G-SAM (Center) | 0.686 | 4.317 | 0.580 | - | ||
4 | RAM++ | G-DINO | GPT-4 | CLIPSeg (Max) | 0.671 | -4.185 | 0.547 | - |
5 | LLaVa-v1.5-13B | CLIPSeg (Max) | 0.649 | -13.17 | - | - | ||
6 | SCP | G-DINO | GPT-4 | G-SAM (Center) | 0.615 | -6.464 | - | - |
7 | SCP | CLIPSeg | GPT-4 | G-SAM (Center) | 0.613 | -10.783 | - | - |
8 | SCP | G-DINO | GPT-4 | CLIPSeg (Max) | 0.596 | -13.005 | - | - |
9 | SCP | ViLT | GPT-4 | CLIPSeg (Max) | 0.588 | -15.300 | 0.514 | 0.570 |
10 | SCP | CLIPSeg | GPT-4 | CLIPSeg (Max) | 0.572 | -20.730 | - | - |
11 | GPT-4V (Pixel Location) | 0.321 | -34.282 | - | - | |||
12 | InstructPix2Pix | G-SAM (Bottom) | 0.283 | -60.852 | - | - | ||
13 | Random Placement* | 0.161 | -106.113 | 0.467 | 0.040 | |||
14 | Unnatural Placement* | 0.010 | -176.375 | 0.167 | 0.020 |