GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

tokenization

Website
Wikipedia
https://static.github-zh.com/github_avatars/explosion?size=40
explosion / spaCy

#自然语言处理#工业级的 Python/CPython 自然语言处理(NLP)库

自然语言处理数据科学机器学习Pythoncython人工智能spaCynlp-library神经网络neural-networks深度学习named-entity-recognitionEntity resolutiontext-classificationtokenization
Python 31.77 k
18 天前
https://static.github-zh.com/github_avatars/AgentOps-AI?size=40
AgentOps-AI / tokencost

#大语言模型#Easy token price estimates for 400+ LLMs. TokenOps.

analyticsclaudelarge-language-models大语言模型observabilityopenaitokentokenization
Python 1.71 k
5 天前
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / Cosmos-Tokenizer

A suite of image and video neural tokenizers

diffusiontokenizationtransformers
Jupyter Notebook 1.64 k
4 个月前
https://static.github-zh.com/github_avatars/lunasec-io?size=40
lunasec-io / lunasec

LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTra...

tokenizationweb-securitycompliance安全soc2pci-dssgdprzero-trustdevsecopslog4shelldependency-analysisscanningCybersecurityscanning-toolcve-scanningsbomsbom-generatorContinuous Delivery (CD)software-composition-analysis
TypeScript 1.45 k
1 年前
securitybunker/databunker
https://static.github-zh.com/github_avatars/securitybunker?size=40
securitybunker / databunker

#安全#Secure Vault for Customer PII/PHI/PCI/KYC Records

gdprpiipiidata安全隐私privacy-by-designuser-consentccpacompliancelegaltechanonymizationdata-anonymization数据库encryptionvaulttokenizationapplication-serverpassportjsdata-protection
Go 1.31 k
16 天前
https://static.github-zh.com/github_avatars/RavenProject?size=40
RavenProject / Ravencoin

#区块链#Ravencoin Core integration/staging tree

ravenassettokentokenizationravencoin比特币区块链
C 1.1 k
1 年前
https://static.github-zh.com/github_avatars/VKCOM?size=40
VKCOM / YouTokenToMe

#自然语言处理#Unsupervised text tokenizer focused on computational efficiency

自然语言处理word-segmentationbpetokenization
C++ 968
1 年前
https://static.github-zh.com/github_avatars/explosion?size=40
explosion / spacy-streamlit

#自然语言处理#👑 spaCy building blocks and visualizers for Streamlit apps

spaCy自然语言处理visualizersdependency-parsingpart-of-speech-taggingnamed-entity-recognitionnerStreamlitvisualizertext-classificationtokenization机器学习
Python 840
1 年前
https://static.github-zh.com/github_avatars/AmoDinho?size=40
AmoDinho / datacamp-python-data-science-track

#自然语言处理#All the slides, accompanying code and exercises all stored in this repo. 🎈

Python机器学习数据科学datacamppandas神经网络neural-networks自然语言处理bokehscikit-learntokenizationdatascience
Python 839
2 年前
https://static.github-zh.com/github_avatars/nlp-uoregon?size=40
nlp-uoregon / trankit

#自然语言处理#Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

自然语言处理PyTorchlanguage-modelxlm-roberta机器学习深度学习人工智能universal-dependenciesmultilingualadapterstokenizationpart-of-speech-taggingdependency-parsing
Python 754
8 个月前
https://static.github-zh.com/github_avatars/cbaziotis?size=40
cbaziotis / ekphrasis

#自然语言处理#Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...

自然语言处理text-processingnlp-libraryspelling-correctionParsingtokenizationword-segmentation
Python 670
13 天前
https://static.github-zh.com/github_avatars/alasdairforsythe?size=40
alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

tokenizationtokenizeParsingvocabulary
Go 585
1 年前
https://static.github-zh.com/github_avatars/adobe?size=40
adobe / NLP-Cube

Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing

embeddingsParsingtokenizationpart-of-speech-taggerdependency-parserdependency-parsinguniversal-dependenciesmachine-translationinformation-extraction
HTML 558
7 个月前
https://static.github-zh.com/github_avatars/yooper?size=40
yooper / php-text-analysis

#自然语言处理#PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

自然语言处理PHPtokenizationtext-analysis
PHP 531
6 个月前
https://static.github-zh.com/github_avatars/macmade?size=40
macmade / ClangKit

ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.

clangLLVMParsingtokenizationdiagnosticssourceCodeSyntax Highlightingstatic-analysisCC++Objective-C
C 365
4 年前
https://static.github-zh.com/github_avatars/daac-tools?size=40
daac-tools / vibrato

#自然语言处理#🎤 vibrato: Viterbi-based accelerated tokenizer

自然语言处理Rustjapanesemorphological-analysisParsingsegmentationtokenization
Rust 362
1 个月前
https://static.github-zh.com/github_avatars/WorksApplications?size=40
WorksApplications / sudachi.rs

Sudachi in Rust 🦀 and new generation of SudachiPy

tokenizationmorphological-analysissegmentationpos-taggingRustPython
Rust 359
1 个月前
https://static.github-zh.com/github_avatars/SaberaTalukder?size=40
SaberaTalukder / TOTEM

#时序数据库#The official code 👩‍💻 for - TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

foundation-modelsrepresentation-learningtime-seriestime-series-analysistime-series-forecastingtokenization
Python 324
4 个月前
https://static.github-zh.com/github_avatars/OpenNMT?size=40
OpenNMT / Tokenizer

#自然语言处理#Fast and customizable text tokenization library with BPE and SentencePiece support

Parsingsentencepiece自然语言处理machine-translationbpeunicodetokenizationicuPythonC++
C++ 308
2 个月前
https://static.github-zh.com/github_avatars/FoundationVision?size=40
FoundationVision / OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

auto-regressive-modelimage-generationtokenizationvaevideo-generationvqvae
Python 296
1 年前
loading...