word-segmentation · GitHub Topics

google / sentencepiece

#自然语言处理#Unsupervised text tokenizer for Neural Network-based text generation.

neural-machine-translation 自然语言处理 word-segmentation

C++ 11.26 k

4 天前

baidu / lac

百度NLP：分词，词性标注，命名实体识别，词重要性

word-segmentation part-of-speech-tagger named-entity-recognition chinese-word-segmentation chinese-nlp Parsing Python Java

C++ 3.97 k

4 年前

wolfgarbe / SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

levenshtein fuzzy-search approximate-string-matching edit-distance spellcheck spell-check levenshtein-distance damerau-levenshtein spelling fuzzy-matching word-segmentation chinese-text-segmentation chinese-word-segmentation spelling-correction

C# 3.29 k

6 个月前

PyThaiNLP / pythainlp

#自然语言处理#Thai natural language processing in Python

Python nlp-library 自然语言处理 word-segmentation thai Hacktoberfest computational-linguistics text-processing

Python 1.07 k

6 天前

VKCOM / YouTokenToMe

#自然语言处理#Unsupervised text tokenizer focused on computational efficiency

自然语言处理 word-segmentation bpe tokenization

C++ 973

1 年前

mammothb / symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Python spellcheck spell-check fuzzy-matching fuzzy-search spelling-correction damerau-levenshtein approximate-string-matching levenshtein edit-distance levenshtein-distance spelling word-segmentation chinese-text-segmentation chinese-word-segmentation

Python 843

3 天前

ckiplab / ckip-transformers

CKIP Transformers

transformers language-model word-segmentation part-of-speech-tagging named-entity-recognition

Python 750

2 年前

cbaziotis / ekphrasis

#自然语言处理#Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...

自然语言处理 text-processing nlp-library spelling-correction Parsing tokenization word-segmentation

Python 671

3 个月前

vncorenlp / VnCoreNLP

#自然语言处理#A Vietnamese natural language processing toolkit (NAACL 2018)

dependency-parsing named-entity-recognition pos-tagging word-segmentation vietnamese-nlp 自然语言处理 pos-tagger ner Parsing vietnamese Java Python

Java 638

3 年前

bab2min / Kiwi

#自然语言处理#Kiwi(지능형 한국어 형태소 분석기)

自然语言处理 korean morphological-analysis word-segmentation C++

C++ 622

3 天前

JayYip / m3tl

#自然语言处理#BERT for Multitask Learning

bert named-entity-recognition 自然语言处理 word-segmentation multitask-learning cws pretrained-models ner text-classification multi-task-learning transformer encoder-decoder

Jupyter Notebook 549

2 年前

modelscope / AdaSeq

#自然语言处理#AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

named-entity-recognition 自然语言处理 natural-language-understanding PyTorch sequence-labeling word-segmentation ner relation-extraction bert chinese-nlp crf information-extraction

Python 448

2 年前

taishi-i / nagisa

#自然语言处理#A Japanese tokenizer based on recurrent neural networks

dynet word-segmentation pos-tagging japanese nlp-library sequence-labeling 自然语言处理 Parsing

Python 404

4 个月前

ku-nlp / jumanpp

#自然语言处理#Juman++ (a Morphological Analyzer Toolkit)

自然语言处理 japanese morphological-analysis pos-tagging pos-tagger part-of-speech-tagger word-segmentation cjk Parsing

C++ 397

2 年前

jacksonllee / pycantonese

#自然语言处理#Cantonese Linguistics and NLP

cantonese computational-linguistics 自然语言处理 Python word-segmentation part-of-speech-tagging

Python 392

1 年前

yongzhuo / Pytorch-NLU

中文文本分类、序列标注工具包（pytorch），支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Chinese text classification and sequence labeling toolkit, supports multi class and multi label classification, text s...

Python PyTorch text-classification sequence-labeling named-entity-recognition word-segmentation pos-tagging chinese-text-segmentation transformers bert pretrained-models

Python 349

1 年前

bab2min / kiwipiepy

#自然语言处理#Python API for Kiwi

自然语言处理 korean morphological-analysis word-segmentation python-library

Python 333

4 个月前