sentencepiece · GitHub Topics

#安卓#Open source real-time translation app for Android that runs locally

translator bluetooth-le realtime-translator Android onnx onnxruntime sentencepiece transformers translation nllb Whisper mobile-app offline

C++ 9.15 k

6 天前

OpenNMT / Tokenizer

#自然语言处理#Fast and customizable text tokenization library with BPE and SentencePiece support

Parsing sentencepiece 自然语言处理 machine-translation bpe unicode tokenization icu Python C++

C++ 316

5 个月前

himkt / konoha

#自然语言处理#🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

自然语言处理 text-processing sentencepiece japanese

Python 254

5 个月前

taishan1994 / sentencepiece_chinese_bpe

使用sentencepiece中BPE训练中文词表，并在transformers中进行使用。

sentencepiece tokenization

Python 120

2 年前

lingvanex-mt / models

#自然语言处理#Free and open source pre-trained translation models, including Kurdish, Samoan, Xhosa, Lao, Corsican, Cebuano, Galician, Russian, Belarusian and Yoruba.

ctranslate2 machine-translation multilingual neural-networks 自然语言处理 sentencepiece yoruba translate translation translator

1 个月前

dhpollack / huggingface_libtorch

#自然语言处理#Minimal example of using a traced huggingface transformers model with libtorch

PyTorch libtorch 自然语言处理 C++sentencepiece albert

C++ 35

5 年前

eliben / go-sentencepiece

#大语言模型#Go implementation of the SentencePiece tokenizer

encoding Go language-model 大语言模型 tokenization sentencepiece

Go 34

1 年前

Systemcluster / kitoken

#自然语言处理#Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

bpe 自然语言处理 sentencepiece Parsing unigram word-segmentation Node.js Python Rust Web

Rust 33

6 个月前