GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

unstructured-data

Website
Wikipedia
iterative/dvc
https://static.github-zh.com/github_avatars/iterative?size=40
iterative / dvc

#效率工具集合#🦉 Data Versioning and ML Experiments

数据科学机器学习reproducibilitydata-version-controldeveloper-tools人工智能unstructured-data
Python 14.54 k
4 天前
voxel51/fiftyone
https://static.github-zh.com/github_avatars/voxel51?size=40
voxel51 / fiftyone

#计算机科学#Refine high-quality datasets and visual AI models

机器学习人工智能深度学习机器视觉developer-tools数据科学Pythonactive-learningdata-centric-aidata-cleaningdata-curationdata-qualityimage-classificationobject-detectionunstructured-datavector-search可视化
Python 9.59 k
14 小时前
Zipstack/unstract
https://static.github-zh.com/github_avatars/Zipstack?size=40
Zipstack / unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

etl-pipelinellm-platformunstructured-data
Python 5.34 k
2 天前
https://static.github-zh.com/github_avatars/neo4j-labs?size=40
neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs

data-importgenaigraphgraph-raggraph-searchgraphdbgraphragknowledge-graphlangchainNeo4jragunstructured-datavectordb
Jupyter Notebook 3.57 k
11 天前
https://static.github-zh.com/github_avatars/towhee-io?size=40
towhee-io / towhee

#大语言模型#Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

机器学习convolutional-networksembedding-vectorsembeddings机器视觉图像处理video-processingfeature-extractionimage-retrievalunstructured-datafeature-vectortransformermilvusvision-transformervitpipeline大语言模型
Python 3.38 k
8 个月前
https://static.github-zh.com/github_avatars/instill-ai?size=40
instill-ai / instill-core

#大语言模型#🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications

unstructured-data低代码developer-toolsetl无代码Open SourceHacktoberfest人工智能API命令行界面generative-aiGogpt大语言模型pipelinePythonstable-diffusionTypeScript
Python 2.25 k
10 天前
https://static.github-zh.com/github_avatars/milvus-io?size=40
milvus-io / bootcamp

#自然语言处理#Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.

milvusunstructured-dataimage-searchaudio-searchquestion-answering深度学习自然语言处理image-classificationimage-recognitionPythonembeddings大语言模型ragsemantic-searchvector-database
Jupyter Notebook 2.14 k
3 天前
https://static.github-zh.com/github_avatars/nomic-ai?size=40
nomic-ai / nomic

Interact, analyze and structure massive text, image, embedding, audio and video datasets

Pythonclusteringduplicate-detectionembeddingstexttopic-modelingunstructured-data
Python 1.72 k
5 天前
https://static.github-zh.com/github_avatars/dingodb?size=40
dingodb / dingo

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ...

servingembedding-storevector-databasemysql-compatibilityembedding-searchkey-value-distributed-storevector-oceanunified-sqlstructured-dataunstructured-data
Java 1.52 k
3 天前
https://static.github-zh.com/github_avatars/tstanislawek?size=40
tstanislawek / awesome-document-understanding

#自然语言处理#A curated list of resources for Document Understanding (DU) topic

Awesome Lists机器学习information-extractionkey-information-extractiondocument-understandingrobotic-process-automationdocument-analysisdocument-layout-analysisOCR自然语言处理深度学习pdfrpapdf-documentsdocument-intelligenceunstructured-datadocument-ai
1.42 k
2 年前
https://static.github-zh.com/github_avatars/emcf?size=40
emcf / thepipe

#网络爬虫#Get clean data from tricky documents, powered by vision-language models ⚡

multimodalpdfvision-transformerlarge-language-modelsWebdocumentopenaiPythonscrapingvision-language-modelstructured-dataunstructured-data
Python 1.27 k
13 天前
https://static.github-zh.com/github_avatars/lotus-data?size=40
lotus-data / lotus

#大语言模型#LOTUS: A semantic query engine for fast and easy LLM-powered data processing

data大语言模型pandasPythonsemantic-searchunstructured-data
Python 1.2 k
18 天前
https://static.github-zh.com/github_avatars/Renumics?size=40
Renumics / spotlight

#计算机科学#Interactively explore unstructured datasets from your dataframe.

data-centric-aidata-curation数据可视化机器视觉机器学习audioexploratory-data-analysisImagetimeseriesVideomeshesunstructured-dataHacktoberfest
TypeScript 1.18 k
4 天前
https://static.github-zh.com/github_avatars/shcherbak-ai?size=40
shcherbak-ai / contextgem

#自然语言处理#ContextGem: Effortless LLM extraction from documents

人工智能data-extractiondocument-intelligencegenerative-ailegaltech大语言模型llm-framework自然语言处理prompt-engineeringtext-analysisunstructured-datadocx
Python 1.15 k
11 天前
https://static.github-zh.com/github_avatars/yobix-ai?size=40
yobix-ai / extractous

#自然语言处理#Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

extractionpdftikaunstructuredunstructured-datadata-pipelinesdocxetletl-pipelines大语言模型机器学习自然语言处理OCRpdf-parserragRust
Rust 1.14 k
6 个月前
https://static.github-zh.com/github_avatars/amphi-ai?size=40
amphi-ai / amphi-etl

Visual Data Preparation and Transformation. Low-Code Python-based ETL.

datadata-pipelinesetlstructured-dataunstructured-data数据分析数据科学data-preparation
TypeScript 1.07 k
20 天前
databricks/lilac
https://static.github-zh.com/github_avatars/databricks?size=40
databricks / lilac

Curate better data for LLMs

人工智能数据分析dataset-analysisunstructured-data
Python 1.04 k
1 年前
https://static.github-zh.com/github_avatars/JSv4?size=40
JSv4 / OpenContracts

#大语言模型#Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!

agentagentic-aietletl-pipeline大语言模型unstructured-datavector-databaseprompt-engineering
Python 878
6 天前
https://static.github-zh.com/github_avatars/nuclia?size=40
nuclia / nucliadb

#搜索#NucliaDB, The AI Search database for RAG

数据库language-model机器学习mlopssearch搜索引擎search-enginessemanticsemantic-search-enginetext-classificationunstructured-datavector-searchvector-search-enginevectorsPythonRust
Python 696
4 天前
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / docext

#自然语言处理#An on-premises, OCR-free unstructured data extraction and benchmarking toolkit. (https://idp-leaderboard.org/)

documentdocument-analysisextraction大语言模型机器学习自然语言处理OCRragunstructured-datavlmstable-extraction
Python 612
5 天前
loading...