GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-extraction

Website
Wikipedia
https://static.github-zh.com/github_avatars/getmaxun?size=40
getmaxun / maxun

#网络爬虫#一个可视化,通过鼠标点击完成数据采集的爬虫平台

自动化无代码scraperweb-automationweb-scraperweb-scrapingAPIbrowserbrowser-automationPlaywright自托管website-to-apirobotic-process-automationrpano-code-web-scraperagentsweb-agentdata-extractionweb-scraping-agentwebscraping
TypeScript 13.03 k
2 天前
https://static.github-zh.com/github_avatars/vi3k6i5?size=40
vi3k6i5 / flashtext

#自然语言处理#Extract Keywords from sentence or Replace keywords in sentences.

search-in-textkeyword-extraction自然语言处理word2vecdata-extraction
Python 5.66 k
2 个月前
D4Vinci/Scrapling
https://static.github-zh.com/github_avatars/D4Vinci?size=40
D4Vinci / Scrapling

#网络爬虫#🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫crawlingHacktoberfestPlaywrightPythonscrapingselectorsstealth-gameweb-scraperweb-scrapingweb-scraping-pythonwebscrapingxpath自动化人工智能ai-scrapingdatadata-extraction
Python 5.4 k
15 天前
https://static.github-zh.com/github_avatars/JonathanLink?size=40
JonathanLink / PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...

layouttextJavapdfextractdata-extractionpdfbox
Java 1.59 k
1 年前
https://static.github-zh.com/github_avatars/hi-primus?size=40
hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Sparkpysparkdata-wranglingbigdata数据科学data-cleansingdata-transformation机器学习data-profilingdata-extractiondata-exploration数据分析data-preparationcudfdaskdata-cleaning
Python 1.51 k
6 个月前
https://static.github-zh.com/github_avatars/shcherbak-ai?size=40
shcherbak-ai / contextgem

#自然语言处理#ContextGem: Effortless LLM extraction from documents

人工智能data-extractiondocument-intelligencegenerative-ailegaltech大语言模型llm-framework自然语言处理prompt-engineeringtext-analysisunstructured-datadocx
Python 1.15 k
11 天前
https://static.github-zh.com/github_avatars/raznem?size=40
raznem / parsera

#网络爬虫#Lightweight library for scraping web-sites with LLMs

data-extraction大语言模型scrapingPythonOpen Sourcewebscraping人工智能ai-scrapingPlaywright
Python 1.11 k
13 天前
https://static.github-zh.com/github_avatars/thinh-vu?size=40
thinh-vu / vnstock

A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone

stock-marketdata-extractionquantitative-financequantitative-analysisquantitative-trading
Python 876
7 天前
https://static.github-zh.com/github_avatars/polyrabbit?size=40
polyrabbit / hacker-news-digest

#网络爬虫#📰 Let ChatGPT Summarize Hacker News for You

hacker-newsPythondata-extractionhacker-news-readerRSSspider爬虫机器学习news-aggregatorChatGPTChatGPT APIopenaiopenai-api
Python 718
2 个月前
https://static.github-zh.com/github_avatars/adrienjoly?size=40
adrienjoly / npm-pdfreader

🚜 Parse text and tables from PDF files.

data-extractionpdf-converterParsingJavaScripttabular-data
HTML 675
5 个月前
https://static.github-zh.com/github_avatars/brightdata?size=40
brightdata / brightdata-mcp

#网络爬虫#A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

大语言模型mcpmodelcontextprotocolscrapingai-agentsbrowser-automationdata-collectiondata-extractionmcp-serverstructured-dataweb-crawlingweb-scraping
JavaScript 614
10 天前
https://static.github-zh.com/github_avatars/a-maliarov?size=40
a-maliarov / amazoncaptcha

Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.

captchacaptcha-solveramazonPythonpillowtraining-datadata-extraction
Python 481
6 天前
https://static.github-zh.com/github_avatars/py-pdf?size=40
py-pdf / benchmarks

Benchmarking PDF libraries

benchmarkdata-extractionmupdfpdfpypdf2text-extraction
Python 286
2 年前
https://static.github-zh.com/github_avatars/jpjacobpadilla?size=40
jpjacobpadilla / Stealth-Requests

Undetected web-scraping & seamless HTML parsing in Python!

Pythonhttp-clientdatahtml-parsinghttp-requestsrequestsweb-crawlerweb-scrapingwebscrapingxpathdata-extraction
Python 256
17 天前
https://static.github-zh.com/github_avatars/serpapi?size=40
serpapi / clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

自动化命令行界面command-line-tooldata-extractiondata-extractoremailemail-extract-with-proxyemail-extractionemail-extractoremail-marketingOpen SourceRubyrubygemweb-crawlingwebscraping
Ruby 181
1 年前
https://static.github-zh.com/github_avatars/molybdenum-99?size=40
molybdenum-99 / infoboxer

Wikipedia information extraction library

wikipediaMediaWikidata-extraction
Ruby 175
1 年前
https://static.github-zh.com/github_avatars/sypht-team?size=40
sypht-team / sypht-python-client

A python client for the Sypht API

data-extractioninformation-extractionAPIPythonpython3-libraryinvoiceextractextract-data-from-pdfpdf-parser
Python 162
1 年前
https://static.github-zh.com/github_avatars/johnbumgarner?size=40
johnbumgarner / newspaper3_usage_overview

This repository provides usage examples for the Python module Newspaper3k.

newsscraping-websitesPythondata-extractionbeautifulsouppython-requestsnlp-parsing
Python 147
1 年前
https://static.github-zh.com/github_avatars/dilawar?size=40
dilawar / PlotDigitizer

A Python utility to digitize plots.

data-extractionPython图像处理
Python 142
10 个月前
https://static.github-zh.com/github_avatars/MiniAiLive?size=40
MiniAiLive / ID-DocumentRecognition-SDK-Docker

MiniAiLive Intelligent ID OCR for Reliable Identity Verification From document verification to data entry, our MiniAiLive OCR solution can help transform your identity verification process.

Authenticationbiometricsdata-extractiondocument-classificationdocument-scannerekyc-verificationOCRonboarding
Python 126
1 个月前
loading...