#安卓#Open source real-time translation app for Android that runs locally
#自然语言处理#Fast and customizable text tokenization library with BPE and SentencePiece support
#自然语言处理#🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。
#自然语言处理#Free and open source pre-trained translation models, including Kurdish, Samoan, Xhosa, Lao, Corsican, Cebuano, Galician, Yiddish, Swahili, Russian, Belarusian and Yoruba.
#自然语言处理#Minimal example of using a traced huggingface transformers model with libtorch
#自然语言处理#A Robustly Optimized BERT Pretraining Approach for Vietnamese
#大语言模型#Go implementation of the SentencePiece tokenizer
#自然语言处理#Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
#自然语言处理#R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
Extremely simple and understandable GPT2 implementation with minor tweaks
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Use SentencePiece in Swift for tokenization and detokenization.
sentencepiece port to webassembly with browser compatibility
#自然语言处理#BERT implementation of PyTorch
#自然语言处理#To investigate various DNN text classifiers including MLP, CNN, RNN, BERT approaches.
Bengali language Tokenizer (SentencePiece)