GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-processing

Website
Wikipedia
https://static.github-zh.com/github_avatars/pathwaycom?size=40
pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

batch-processingkafkapathwayPythonstreaming机器学习real-timedata-analyticsdata-pipelinesdata-processingdataflowetletl-frameworkiot-analyticsRuststream-processingtime-series-analysis
Python 29.42 k
8 小时前
onceupon/Bash-Oneliner
https://static.github-zh.com/github_avatars/onceupon?size=40
onceupon / Bash-Oneliner

Linux Bash 实用命令集合

oneliner-commandsBashdata-processinglinux-administration终端LinuxvariablesgrepxargssystemhardwareShellone-liners
10.51 k
4 个月前
johnkerl/miller
https://static.github-zh.com/github_avatars/johnkerl?size=40
johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

data-processingdata-cleaningCSVcsv-formatstreaming-datastreaming-algorithmstsvJSONjson-datadata-reduction统计statistical-analysisDevOpsdevops-toolstabular-data命令行界面command-line-tools
Go 9.39 k
1 天前
https://static.github-zh.com/github_avatars/TomWright?size=40
TomWright / dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

JSONYAMLconfigurationselector数据结构Parseryaml-processorjson-processingdevops-toolsGo命令行界面tomlQuery (disambiguation)updateXMLdata-processingdata-wrangling
Go 7.51 k
3 天前
NVIDIA/DALI
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / DALI

#计算机科学#A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

fast-data-pipelineimage-augmentationdata-augmentation图像处理data-processing深度学习机器学习Python神经网络gpugpu-tensorflowaudio-processingPyTorchmxnetpaddle
C++ 5.48 k
1 天前
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / data-juicer

#大语言模型#Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

数据分析数据科学large-language-models大语言模型数据可视化instruction-tuningpre-trainingmulti-modalsynthetic-datadatadata-pipelinedata-processingfoundation-models
Python 4.9 k
4 小时前
https://static.github-zh.com/github_avatars/deepseek-ai?size=40
deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

data-processingduckdb
Python 4.75 k
5 个月前
https://static.github-zh.com/github_avatars/unionai-oss?size=40
unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

pandasvalidationschemadataframesTestingpandas-dataframedata-validationdata-cleaningassertionshypothesis-testingdata-processing
Python 3.93 k
19 小时前
dashbitco/broadway
https://static.github-zh.com/github_avatars/dashbitco?size=40
dashbitco / broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixirdata-ingestiondata-processingconcurrent
Elixir 2.56 k
2 个月前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / DialoGPT

#计算机科学#Large-scale pretraining for dialogue

dialogue机器学习PyTorchtransformertext-generationdialogptgpt-2text-datadata-processing
Python 2.4 k
3 年前
https://static.github-zh.com/github_avatars/asyml?size=40
asyml / texar

#自然语言处理#Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

机器学习自然语言处理Tensorflow深度学习text-generationPythonmachine-translationdialog-systemstexarbertgpt-2xlnettext-datadata-processing
Python 2.39 k
4 年前
https://static.github-zh.com/github_avatars/cocoindex-io?size=40
cocoindex-io / cocoindex

#大语言模型#Data transformation framework for AI. Ultra performant, with incremental processing.

人工智能change-data-capturedatadata-indexingetlindexingpipelinePythonragreal-timeRustsemantic-searchstreamingdata-engineeringdata-infrastructuredata-processingdataflowhelp-wantedknowledge-graph大语言模型
Rust 2.31 k
2 天前
https://static.github-zh.com/github_avatars/numaproj?size=40
numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

Kubernetesstream-processingdata-processingpipelinemap-reduceHacktoberfest
Go 1.9 k
4 天前
https://static.github-zh.com/github_avatars/bytewax?size=40
bytewax / bytewax

#计算机科学#Python Stream Processing

Pythonstream-processingRustdata-engineeringdata-processing数据科学dataflow机器学习streaming-data
Python 1.78 k
4 个月前
https://static.github-zh.com/github_avatars/python-bonobo?size=40
python-bonobo / bonobo

Extract Transform Load for Python 3.5+

data-processingbonoboPython自动化parallelization
Python 1.6 k
2 年前
https://static.github-zh.com/github_avatars/pyper-dev?size=40
pyper-dev / pyper

Concurrent Python made simple

asyncioconcurrencyPythonthreadingdata-pipelinesdata-processingmultiprocessingparallel-computingdatadata-collectiondata-engineering
Python 1.45 k
6 个月前
https://static.github-zh.com/github_avatars/GoogleCloudPlatform?size=40
GoogleCloudPlatform / data-science-on-gcp

#计算机科学#Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

数据分析数据可视化cloud-computing机器学习data-pipelinedata-processing数据科学data-engineering
Jupyter Notebook 1.39 k
4 个月前
allenai/dolma
https://static.github-zh.com/github_avatars/allenai?size=40
allenai / dolma

#自然语言处理#Data and tools for generating and inspecting OLMo pre-training data.

data-processinglarge-language-models大语言模型machile-learning自然语言处理
Python 1.28 k
6 天前
https://static.github-zh.com/github_avatars/NVIDIA-NeMo?size=40
NVIDIA-NeMo / Curator

#大语言模型#Scalable data pre processing and curation toolkit for LLMs

data-curation大语言模型datadata-prepdata-preparationdata-processingdata-qualitydatacurationdatarecipesEntity resolutionfine-tuninglarge-language-modelslarge-scale-data-processingllmappsPython
Python 1.05 k
2 天前
https://static.github-zh.com/github_avatars/OpenDCAI?size=40
OpenDCAI / DataFlow

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

datadata-cleaningdata-pipelinesdata-processing数据科学大语言模型operatorsgradio-interface
Python 1.03 k
5 天前
loading...