GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

extract-data

Website
Wikipedia
https://static.github-zh.com/github_avatars/opendatalab?size=40
opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

extract-datalayout-analysisOCRParserpdfpdf-converterPythondocument-analysispdf-parserpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragai4science
Python 35.05 k
1 天前
pymupdf/PyMuPDF
https://static.github-zh.com/github_avatars/pymupdf?size=40
pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

mupdfxpspdf-documentsepubOCRpdf字体Python数据科学extract-datatable-extractiontesseracttext-processingtext-shaping
Python 7.38 k
2 天前
https://static.github-zh.com/github_avatars/bda-research?size=40
bda-research / node-crawler

#网络爬虫#Web Crawler/Spider for NodeJS + server-side jQuery ;-)

爬虫JavaScriptspiderextract-datacheeriojQueryNode.js
TypeScript 6.76 k
19 天前
https://static.github-zh.com/github_avatars/meltano?size=40
meltano / meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

DataOpseltOpen Sourcedatapipelinesextract-dataconnectorsintegrationtaploadersdata-pipelinesdata-engineering
Python 2.1 k
5 天前
https://static.github-zh.com/github_avatars/DocumindHQ?size=40
DocumindHQ / documind

Open-source platform for extracting structured data from documents using AI.

人工智能大语言模型Open Sourcepdf-extractordeveloper-toolsOCRdocument-analysisextract-dataParserpdfpdf-converterpdf-extractor-llm
JavaScript 1.33 k
1 个月前
https://static.github-zh.com/github_avatars/markummitchell?size=40
markummitchell / engauge-digitizer

Extracts data points from images of graphs

image-analysisextract-dataUtility Software
C++ 1.27 k
3 年前
https://static.github-zh.com/github_avatars/elixir-crawly?size=40
elixir-crawly / crawly

#网络爬虫#Crawly, a high-level web crawling & scraping framework for Elixir.

ElixirErlangscraperscrapingscraping-websitesextract-dataspider爬虫crawling
Elixir 1.03 k
9 个月前
https://static.github-zh.com/github_avatars/slotix?size=40
slotix / dataflowkit

#网络爬虫#Extract structured data from web sites. Web sites scraping.

Gogolang-libraryextract-datascraping-websitescrawlingscraperscrapingcdpheadless
Go 686
2 年前
https://static.github-zh.com/github_avatars/OmkarPathak?size=40
OmkarPathak / ResumeParser

A simple resume parser used for extracting information from resumes

resume-parserextract-dataGUIPythonParser
Python 304
1 年前
https://static.github-zh.com/github_avatars/danschultzer?size=40
danschultzer / receipt-scanner

Receipt scanner extracts information from your PDF or image receipts - built in NodeJS

OCRoptical-character-recognitionextract-dataextract-information
JavaScript 299
7 年前
https://static.github-zh.com/github_avatars/Qusic?size=40
Qusic / TraceUtility

Extract data from .trace documents generated by Instruments

instrumentsextract-data逆向工程profilingXcode
Objective-C 225
5 年前
https://static.github-zh.com/github_avatars/m92vyas?size=40
m92vyas / llm-reader

#网络爬虫#Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.

extract-data大语言模型llm-agentscraperscrapingscraping-websiteswebscrapingai-agent-toolsai-agentsfirecrawlrag
Python 196
16 天前
https://static.github-zh.com/github_avatars/yuanxu-li?size=40
yuanxu-li / html-table-extractor

#网络爬虫#extract data from html table

extract-databeautifulsoupHTMLtablescraping爬虫
Python 86
5 年前
https://static.github-zh.com/github_avatars/ropensci?size=40
ropensci / smapr

An R package for acquisition and processing of NASA SMAP data

NASArasterextract-datapeer-reviewedRr-packagerstats
R 85
2 年前
https://static.github-zh.com/github_avatars/msoap?size=40
msoap / html2data

Library and cli for extracting data from HTML via CSS selectors

Goextract-dataHomebrewHTMLcss-selector命令行界面LibraryscrappingParser
Go 69
9 个月前
https://static.github-zh.com/github_avatars/CairX?size=40
CairX / extract-colors-py

Extract colors from an image. Colors are grouped based on visual similarities using the CIE76 formula.

extract-data
Python 68
5 年前
https://static.github-zh.com/github_avatars/isaacmg?size=40
isaacmg / fb_scraper

FBLYZE is a Facebook scraping system and analysis system.

facebook-scraperApache Sparkkafkaextract-dataflink
Jupyter Notebook 64
4 年前
https://static.github-zh.com/github_avatars/Techcatchers?size=40
Techcatchers / PyLyrics-Extractor

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

extract-datasearch-algorithmpython-library
Python 58
1 年前
https://static.github-zh.com/github_avatars/fivesmallq?size=40
fivesmallq / web-data-extractor

#网络爬虫#Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.

spiderextract-datajsonpathxpath
Java 54
1 年前
https://static.github-zh.com/github_avatars/asad70?size=40
asad70 / Insider-Trading

This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.

extract-dataalgotradingtradingtrading-strategiesinsiders数据科学
Python 53
3 年前
loading...