referring-expression-comprehension

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

multimodal pretraining image-captioning text-to-image-synthesis visual-question-answering referring-expression-comprehension vision-language pretrained-models prompt prompt-tuning 中文

Python 2.53 k

1 年前

MasterBin-IIAU / UNINEXT

[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval

instance-segmentation object-detection object-tracking perception referring-expression-comprehension referring-expression-segmentation unified-model multi-object-tracking-segmentation multiple-object-tracking referring-video-object-segmentation video-instance-segmentation single-object-tracking video-object-segmentation

Python 1.28 k

2 年前

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

foundation-model object-detection open-world tracking open-vocabulary-detection open-vocabulary-segmentation open-vocabulary-video-segmentation referring-expression-comprehension referring-expression-segmentation video-instance-segmentation video-object-segmentation zero-shot-object-detection referring-video-object-segmentation interactive-segmentation segment-anything

Python 1.15 k

1 年前

henghuiding / ReLA

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

multimodal-learning referring-expression-comprehension referring-expression-segmentation vision-language-transformer cvpr2023

Python 690

2 年前

shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

image-segmentation object-detection open-world referring-expression-comprehension vision-language-transformer

Python 588

1 年前

henghuiding / MeViS

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

multimodal-learning referring-expression-comprehension referring-expression-segmentation referring-video-object-segmentation video-understanding

Python 519

1 个月前

Charles-Xie / awesome-described-object-detection

#Awesome#A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests w...

Awesome Lists open-vocabulary-detection referring-expression-comprehension

313

2 个月前

henghuiding / gRefCOCO

A benchmark dataset for GRES and GREC [CVPR2023 Highlight]

dataset referring-expression-comprehension referring-expression-segmentation

Python 238

2 年前

luogen1996 / MCN

[CVPR2020] Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation, CVPR2020 (oral)

cvpr2020 referring-expression-comprehension referring-expression-segmentation multi-task-learning

Python 139

3 年前

shikras / d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

multi-modal-learning object-detection referring-expression-comprehension vision-language dataset open-vocabulary-detection

Python 137

1 年前

IDEA-Research / Rex-Thinker

Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning

mllm object-detection referring-expression-comprehension grpo

Python 116

3 个月前

MILVLG / rosita

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

vision-and-language vqa pre-training image-text-retrieval referring-expression-comprehension

Python 56

2 年前

luogen1996 / SimREC

A lightweight codebase for referring expression comprehension and segmentation

referring-expression-comprehension referring-expression-segmentation

Python 55

3 年前

xuyang-liu16 / VGDiffZero

[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

机器视觉 vision-language-model zero-shot-learning stable-diffusion text-to-image-generation referring-expression-comprehension

Python 16

7 个月前

Disguiser15 / RefTeacher

RefTeacher is a strong baseline method for Semi-Supervised Referring Expression Comprehension.

referring-expression-comprehension semi-supervised-learning

Python 12

2 年前

haoxiangzhao12138 / REIR

[ACMMM'25] Referring Expression Instance Retrieval and A Strong End-to-End Baseline

multimodal-deep-learning referring-expression-comprehension text-image-retrieval

2 个月前

willemsenbram / a-game-of-sorts

Repository for the paper "Collecting Visually-Grounded Dialogue with A Game Of Sorts"

dataset dialogue referring-expression-comprehension vision-and-language

Shell 4

2 年前

lparolari / harlequin

Code and DataLoader for the Harlequin dataset 🎨 described in the paper "Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension", presented at ICPR'24

referring-expression-comprehension synthetic-data-generation

Python 3

10 个月前