data-extraction

#网络爬虫#一个可视化，通过鼠标点击完成数据采集的爬虫平台

自动化无代码 scraper web-automation web-scraper web-scraping API browser browser-automation Playwright 自托管 website-to-api robotic-process-automation rpa no-code-web-scraper agents data-extraction webscraping

TypeScript 13.62 k

2 天前

D4Vinci / Scrapling

#网络爬虫#🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫 crawling crawling-python Playwright Python scraping selectors stealth-game web-scraper web-scraping web-scraping-python webscraping xpath 自动化人工智能 ai-scraping data data-extraction mcp mcp-server

Python 7.31 k

7 小时前

vi3k6i5 / flashtext

#自然语言处理#Extract Keywords from sentence or Replace keywords in sentences.

search-in-text keyword-extraction 自然语言处理 word2vec data-extraction

Python 5.68 k

5 个月前

JonathanLink / PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...

layout text Java pdf extract data-extraction pdfbox

Java 1.6 k

2 年前

hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Spark pyspark data-wrangling bigdata 数据科学 data-cleansing data-transformation 机器学习 data-profiling data-extraction data-exploration 数据分析 data-preparation cudf dask data-cleaning

Python 1.52 k

10 个月前

shcherbak-ai / contextgem

#自然语言处理#ContextGem: Effortless LLM extraction from documents

人工智能 data-extraction document-intelligence generative-ai legaltech 大语言模型 llm-framework 自然语言处理 prompt-engineering text-analysis unstructured-data docx

Python 1.5 k

9 天前

brightdata / brightdata-mcp

#网络爬虫#A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

大语言模型 mcp modelcontextprotocol scraping ai-agents browser-automation data-collection data-extraction mcp-server structured-data web-crawling web-scraping

JavaScript 1.32 k

3 天前

raznem / parsera

#网络爬虫#Lightweight library for scraping web-sites with LLMs

data-extraction 大语言模型 scraping Python Open Source webscraping 人工智能 ai-scraping Playwright

Python 1.22 k

24 天前

thinh-vu / vnstock

A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone

stock-market data-extraction quantitative-finance quantitative-analysis quantitative-trading

Python 965

1 天前

polyrabbit / hacker-news-digest

#网络爬虫#📰 Let ChatGPT Summarize Hacker News for You

hacker-news Python data-extraction hacker-news-reader RSS spider 爬虫机器学习 news-aggregator ChatGPT ChatGPT API openai openai-api

Python 727

6 天前

adrienjoly / npm-pdfreader

🚜 Parse text and tables from PDF files.

data-extraction pdf-converter Parsing JavaScript tabular-data

HTML 691

8 个月前

trustgraph-ai / trustgraph

The agentic AI platform for enterprise. Built by data engineers for data engineers. Complete context engineering and LLM orchestration infrastructure. Run anywhere - local, cloud, or bare metal.

graphrag context context-engineering model-serving agentic-ai agentic-ai-development agentic-rag ai-native data data-engineering data-extraction etl-pipeline

Python 591

8 小时前

a-maliarov / amazoncaptcha

Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.

captcha captcha-solver amazon Python pillow training-data data-extraction

Python 482

1 个月前

py-pdf / benchmarks

Benchmarking PDF libraries

benchmark data-extraction mupdf pdf pypdf2 text-extraction

Python 311

3 个月前

jpjacobpadilla / Stealth-Requests

Undetected web-scraping & seamless HTML parsing in Python!

Python http-client data html-parsing http-requests python-web-scraper requests web-crawler web-scraping webscraping xpath data-extraction

Python 290

2 个月前

serpapi / clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

自动化命令行界面 command-line-tool data-extraction data-extractor email email-extract-with-proxy email-extraction email-extractor email-marketing Open Source Ruby rubygem web-crawling webscraping

Ruby 184

1 年前

ScrapeGraphAI / scrapecraft

🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.

人工智能自动化 data-extraction Docker FastAPI langgraph Python React TypeScript web-scraping webscraping Hacktoberfest

Python 176

1 个月前