#自然语言处理#EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
#计算机科学#Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
New generation of CLIP with fine grained discrimination capability, ICML2025
The back-end of cross-modal retrieval system,wihch will contain services such as semantic location .etc
Toward Universal Multimodal Embedding
PyTorch implementation of 'CLIP' (Radford et al., 2021) from scratch and training it on Flickr8k + Flickr30k
[ACMMM'25] Referring Expression Instance Retrieval and A Strong End-to-End Baseline
#计算机科学#PIMA - A Novel Approach for Pill-Prescription Matching with GNN Assistance and Contrastive Learning
The LLM-Powered Video Search System is an advanced multimodal video search solution that leverages Large Language Models (LLMs) to enhance video retrieval through text, image, and metadata queries.
#搜索#A search engine, operating on the foundation of the OpenAI Clip Model to retrieve images corresponding to textual queries.
#计算机科学#Digimon Dataset for MultiModal Machine Learning
VisAlign: Aligning Visual Representations with Textual Semantics for Image Similarity and Retrieval