An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
#自然语言处理#A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
#自然语言处理#Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
A web-based engine for creating and annotating textual corpora
#网络爬虫#data resource untuk NLP bahasa indonesia
#网络爬虫#🕷️ The pipeline for the OSCAR corpus
Kanji usage frequency data collected from various sources
#搜索#Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP
#自然语言处理#An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
#自然语言处理#Large silver standart Russian corpus with NER, morphology and syntax markup
An advanced, extensible web front-end for the Manatee-open corpus search engine
#自然语言处理#A textual corpus database for the digital humanities.
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。
#自然语言处理#My solutions to selected exercises to "Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit" by Steven Bird, Ewan Klein, and Edward Loper.