GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-profiling

Website
Wikipedia
https://static.github-zh.com/github_avatars/ydataai?size=40
ydataai / ydata-profiling

#计算机科学#1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

pandas-profilingpandas-dataframe统计Jupyter Notebookexploration数据科学Pythonpandas机器学习深度学习exploratory-data-analysisedadata-qualityhtml-reportdata-exploration数据分析big-data-analyticsdata-profilingHacktoberfest
Python 13.13 k
5 天前
https://static.github-zh.com/github_avatars/cleanlab?size=40
cleanlab / cleanlab

#数据仓库#Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

weak-supervisiondata-cleaningdata-quality数据科学noisy-labelsdata-centric-aiout-of-distribution-detectionoutlier-detectionactive-learningdata-labelingdata-profilingdata-validationlabelingdata-curationannotationDataOpsdataquality大语言模型数据集exploratory-data-analysis
Python 10.89 k
9 天前
https://static.github-zh.com/github_avatars/great-expectations?size=40
great-expectations / great_expectations

Always know what to expect from your data.

pipeline-testsdataqualitydatacleaningdatacleaner数据科学data-profilingpipelinepipeline-testingcleandatadataunittestdata-unit-testsedaexploratory-data-analysisexploratory-analysisexploratorydataanalysisdata-qualitydata-engineeringpipeline-debtdata-profilersmlops
Python 10.73 k
2 天前
open-metadata/OpenMetadata
https://static.github-zh.com/github_avatars/open-metadata?size=40
open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team colla...

metadatadatadiscoverydataqualitydata-profilingmetadata-managementdataengineeringdata-catalogdata-observabilitydata-discoverydata-contractsdata-governancedata-lineagedata-validationsnowflakedata-qualitydata-quality-checksdata-collaborationmcpmcp-server
TypeScript 7.52 k1
1 天前
https://static.github-zh.com/github_avatars/fbdesignpro?size=40
fbdesignpro / sweetviz

#计算机科学#Visualize and compare datasets, target values and associations, with one line of code.

pandas-dataframeedapandasdata-exploration数据分析数据科学数据可视化机器学习data-profilingexploratory-data-analysis统计explorationPython
Python 3.04 k
1 年前
sodadata/soda-core
https://static.github-zh.com/github_avatars/sodadata?size=40
sodadata / soda-core

⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Pythondata-engineeringdata-governancedata-monitoringdata-observabilitydata-profilingdata-qualitydata-quality-checksdata-quality-monitoringdata-reliabilitydata-testingdata-unit-testsdata-validationdataqualitydbtpipeline-testingsnowflakedata-contracts
Python 2.17 k
2 天前
https://static.github-zh.com/github_avatars/hi-primus?size=40
hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Sparkpysparkdata-wranglingbigdata数据科学data-cleansingdata-transformation机器学习data-profilingdata-extractiondata-exploration数据分析data-preparationcudfdaskdata-cleaning
Python 1.52 k
9 个月前
opendatadiscovery/odd-platform
https://static.github-zh.com/github_avatars/opendatadiscovery?size=40
opendatadiscovery / odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Open Sourcedata-platformmetadatametadata-managementdata-pipelinesdata-engineeringobservabilitydata-catalogdata-discoverydata-lineagebigdataalertinglineagedata-profilingdata-explorationdata-governancedata-quality数据科学data-observability
Java 1.35 k
7 个月前
https://static.github-zh.com/github_avatars/cleanlab?size=40
cleanlab / cleanvision

#计算机科学#Automatically find issues in image datasets and practice data-centric computer vision.

机器视觉data-centric-aidata-explorationdata-qualitydata-validation深度学习exploratory-data-analysisimage-analysisimage-classificationimage-generationimage-qualityimage-segmentationdata-profiling数据科学
Python 1.11 k
5 个月前
https://static.github-zh.com/github_avatars/datavane?size=40
datavane / datavines

Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.

dataqualitydatasciencedorisApache Sparkmetadatacleandatadata-engineeringdata-profilersdata-profilingdata-qualitydata-quality-checksdata-quality-monitoring数据科学flink
Java 665
13 小时前
https://static.github-zh.com/github_avatars/polyaxon?size=40
polyaxon / traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

pandasdataframes数据科学Apache Sparkdaskplotly统计matplotlibdata-profiling数据可视化data-explorationDataOpsmlopsdata-qualitydata-quality-checksexplainable-aiPyTorchTensorflowtracking
Python 520
3 个月前
https://static.github-zh.com/github_avatars/ing-bank?size=40
ing-bank / popmon

Monitor the stability of a Pandas or Spark dataframe ⚙︎

监控数据科学Python统计data-profilingstatistical-testspandasApache Spark数据分析Jupyter NotebookIPythonmlopsHacktoberfest
Python 504
10 天前
https://static.github-zh.com/github_avatars/InfuseAI?size=40
InfuseAI / piperider

Code review for data in dbt

data-pipelinedata-profilingdata-quality数据科学data-explorationedaexploratory-data-analysisdata-testingPythondata-observabilitydata-reliability数据可视化dbt代码审查reportingpull-requests持续集成
Python 490
8 个月前
https://static.github-zh.com/github_avatars/polyaxon?size=40
polyaxon / haupt

#计算机科学#Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Tensorflow深度学习Jupyter NotebookPythonPyTorch机器学习modelsui可视化matplotlibplotlybokehmlops数据科学数据可视化data-processingdata-profilingtrackinglineageserving
Python 452
1 天前
https://static.github-zh.com/github_avatars/Desbordante?size=40
Desbordante / desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...

data-analyticsdata-cleaningdata-cleansingdata-engineeringdata-explorationdata-miningdata-profiling数据科学data-wranglingdata-preprocessingfeature-selectionfeature-engineeringfeature-extractionSpreadsheettabular-dataanomaly-detectionexploratory-data-analysisknowledge-discovery
C++ 419
3 天前
https://static.github-zh.com/github_avatars/databrickslabs?size=40
databrickslabs / dqx

Databricks framework to validate Data Quality of pySpark DataFrames

data-profilingdata-qualitydata-quality-checksdata-quality-monitoringdatabricksApache Sparkspark-streamingdlt
Python 312
4 天前
https://static.github-zh.com/github_avatars/dqops?size=40
dqops / dqo

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML f...

DataOpsdata-qualitydata-quality-checksdata-quality-measurementdata-quality-monitoring监控data-observabilitydata-profiling
Java 165
9 天前
https://static.github-zh.com/github_avatars/hi-primus?size=40
hi-primus / bumblebee

#数据仓库#🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)

data-profilingdata-cleaningGUIdata-preparationPythondaskoptimusgpucudf数据集
Vue 141
2 年前
https://static.github-zh.com/github_avatars/DataKitchen?size=40
DataKitchen / data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...

datadata-engineeringdata-observabilitydata-profilingdata-quality数据科学datacleanerdatacleaningDataOpsdataqualitysql-serverpipeline-testsPostgreSQLredshift自托管snowflakedata-reliability
Python 124
20 天前
https://static.github-zh.com/github_avatars/SJTU-DMTai?size=40
SJTU-DMTai / awesome-ml-data-quality-papers

#计算机科学#Papers about training data quality management for ML models.

data-managementdata-profilingdata-quality机器学习data-centric-ai
97
3 个月前
loading...