Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
#自然语言处理#Unsupervised text tokenizer focused on computational efficiency
Ready-made tokenizer library for working with GPT and tiktoken
#自然语言处理#Fast and customizable text tokenization library with BPE and SentencePiece support
#自然语言处理#Explains nlp building blocks in a simple manner.
nfelib - bindings Python para e ler e gerir XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
#计算机科学#Machine Learning for Phishing Website Detection
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Simple-to-use scoring function for arbitrarily tokenized texts.
#大语言模型#GPT3 encoder & decoder tool written in Swift
#自然语言处理#Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
High performance unsupervised text tokenization for Ruby
Learning BPE embeddings by first learning a segmentation model and then training word2vec