GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

corpus-tools

Website
Wikipedia
https://static.github-zh.com/github_avatars/adbar?size=40
adbar / trafilatura

#网络爬虫#Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

web-scrapingtext-extraction自然语言处理text-mining爬虫text-preprocessingarticle-extractorreadabilityscrapinghtml-to-markdowncorpus-toolsrss-feednews-aggregatorrag大语言模型
Python 4.36 k
17 天前
https://static.github-zh.com/github_avatars/BLKSerene?size=40
BLKSerene / Wordless

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

corpuscorpus-linguisticscorpus-toolscorpus-processingliteraturetranslationParsingtaggerlemmatizerdependency-parser
Python 725
7 天前
https://static.github-zh.com/github_avatars/flairNLP?size=40
flairNLP / fundus

#网络爬虫#A very simple news crawler with a funny name

corpus爬虫自然语言处理PythonRSSscrapersitemaptext-extractionweb-scrapingcorpus-tools数据集image-classification
Python 388
4 天前
https://static.github-zh.com/github_avatars/bitextor?size=40
bitextor / bitextor

#网络爬虫#Bitextor generates translation memories from multilingual websites

dictionaries爬虫wgetParsingwarccorpus-toolscorpus-processingmachine-translationneural-machine-translationstatistical-machine-translation
Python 293
7 个月前
https://static.github-zh.com/github_avatars/grammarly?size=40
grammarly / ua-gec

#自然语言处理#UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

datasetcorpuscorpus-datacorpus-tools自然语言处理
Macaulay2 261
1 年前
https://static.github-zh.com/github_avatars/adbar?size=40
adbar / simplemma

#自然语言处理#Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

自然语言处理lemmatizertokenizationwordlistmorphological-analysiscorpus-toolsParsinglanguage-detectionlanguage-identification
Python 161
10 天前
https://static.github-zh.com/github_avatars/ynop?size=40
ynop / audiomate

Python library for handling audio datasets.

audiospeech-recognitioncorpus-toolsdata-loaderdataset-creationspeechmusicnoise
Python 138
2 年前
https://static.github-zh.com/github_avatars/Helsinki-NLP?size=40
Helsinki-NLP / OpusFilter

#自然语言处理#OpusFilter - Parallel corpus processing toolkit

corpus-toolscorpus-processing自然语言处理machine-translation
Python 104
12 天前
https://static.github-zh.com/github_avatars/NathanDuran?size=40
NathanDuran / Switchboard-Corpus

Utilities for Processing the Switchboard Dialogue Act Corpus

corpuscorpus-processingcorpus-datacorpus-toolsdialogue
Python 70
4 年前
https://static.github-zh.com/github_avatars/czcorpus?size=40
czcorpus / kontext

An advanced, extensible web front-end for the Manatee-open corpus search engine

corpus-toolscorpus-linguisticsui
TypeScript 65
9 天前
https://static.github-zh.com/github_avatars/koskenni?size=40
koskenni / beta

An open source reimplementation of Benny Brodda's BETA in Python

betastring-manipulationOpen Sourcehyphenationcorpus-tools
Python 63
6 年前
https://static.github-zh.com/github_avatars/lennes?size=40
lennes / spect

SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/

speechanalysisannotationcorpus-linguisticscorpus-toolsspeech-analysistranscriptiontranscript
HTML 57
2 年前
https://static.github-zh.com/github_avatars/johentsch?size=40
johentsch / ms3

A parser for annotated MuseScore 3 files.

corpuscorpus-datacorpus-processingcorpus-toolsmusescoreParsersheet-musictsvtsv-filesxml-parserxml-parser-libraryxml-parsing
Python 49
3 个月前
https://static.github-zh.com/github_avatars/LanguageMachines?size=40
LanguageMachines / PICCL

#自然语言处理#A set of workflows for corpus building through OCR, post-correction and normalisation

自然语言处理workflowOCRcorpus-toolscorpus-linguisticscomputational-linguistics
Python 49
3 年前
https://static.github-zh.com/github_avatars/silenterus?size=40
silenterus / deepspeech-cleaner

#计算机科学#Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework

deepspeech机器学习Mozillaspeech-recognitioncorpus-toolsdataset-creationmultilanguage
Python 47
2 年前
https://static.github-zh.com/github_avatars/nickduran?size=40
nickduran / align-linguistic-alignment

Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.

Pythonnotebooksnltkword2veccorpus-toolstext-analysis
Python 47
3 个月前
https://static.github-zh.com/github_avatars/M4t1ss?size=40
M4t1ss / parallel-corpora-tools

#自然语言处理#Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

nmtneuralmachinetranslationmachine-translationneural-machine-translationcorpus-tools自然语言处理languagelanguage-processingnatural-languagefilteringcleaning数据科学data-processing
PHP 41
1 年前
https://static.github-zh.com/github_avatars/uma-pi1?size=40
uma-pi1 / OPIEC

#自然语言处理#Reading the data from OPIEC - an Open Information Extraction corpus

information-extractioncorpuscorpus-datacorpus-tools自然语言处理natural-language-understandingwikipediaWikicorpus-processingdataset
Java 37
6 年前
https://static.github-zh.com/github_avatars/johnwdubois?size=40
johnwdubois / rezonator

Rezonator: Dynamics of human engagement

corpus-toolstext-analysis游戏开发dialoguecorpus-linguisticsconversational-ai
Yacc 35
8 个月前
https://static.github-zh.com/github_avatars/NathanDuran?size=40
NathanDuran / MRDA-Corpus

Utilities for Processing the Meeting Recorder Dialogue Act Corpus

corpuscorpus-datacorpus-processingcorpus-toolsdialogue
Python 32
4 年前
loading...