data-quality · GitHub Topics

GokuMohandas / Made-With-ML

#自然语言处理#学习如何设计、开发、部署、和迭代生产级机器学习应用

机器学习深度学习 PyTorch 自然语言处理数据科学 Python mlops data-engineering data-quality 大语言模型 ray distributed-training

Jupyter Notebook 42.94 k

1 年前

eugeneyan / applied-ml

#自然语言处理#精选大公司分享他们在生产中关于数据科学 & 机器学习的论文和技术博客等资源

applied-machine-learning production applied-data-science 机器学习数据科学 reinforcement-learning data-engineering recsys search 深度学习 data-quality data-discovery 机器视觉自然语言处理

28.3 k

1 年前

ydataai / ydata-profiling

#计算机科学#1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

pandas-profiling pandas-dataframe 统计 Jupyter Notebook exploration 数据科学 Python pandas 机器学习深度学习 exploratory-data-analysis eda data-quality html-report data-exploration 数据分析 big-data-analytics data-profiling Hacktoberfest

Python 13.13 k

5 天前

cleanlab / cleanlab

#数据仓库#Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Python 10.89 k

9 天前

great-expectations / great_expectations

Always know what to expect from your data.

Python 10.73 k

2 天前

voxel51 / fiftyone

#计算机科学#Refine high-quality datasets and visual AI models

机器学习人工智能深度学习机器视觉 developer-tools 数据科学 Python active-learning data-centric-ai data-cleaning data-curation data-quality image-classification object-detection unstructured-data vector-search 可视化

Python 9.86 k

1 天前

open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team colla...

metadata datadiscovery dataquality data-profiling metadata-management dataengineering data-catalog data-observability data-discovery data-contracts data-governance data-lineage data-validation snowflake data-quality data-quality-checks data-collaboration mcp mcp-server

TypeScript 7.52 k1

21 小时前

evidentlyai / evidently

#大语言模型#Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

data-drift Jupyter Notebook pandas-dataframe 机器学习 model-monitoring html-report mlops 数据科学 Hacktoberfest data-quality data-validation generative-ai 大语言模型 llmops

Jupyter Notebook 6.6 k

2 天前

feast-dev / feast

#计算机科学#The Open Source Feature Store for AI/ML

机器学习 features big-data feature-store Python mlops data-engineering 数据科学 data-quality

Python 6.33 k

1 天前

treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

data-engineering data-versioning Go object-storage data-lake aws-s3 data-quality azure-blob-storage google-cloud-storage git-for-data Apache Spark hadoop-filesystem datalake data-version-control azure-storage

Go 4.88 k

2 天前

GokuMohandas / mlops-course

#自然语言处理#Learn how to design, develop, deploy and iterate on production-grade ML applications.

机器学习深度学习 PyTorch mlops data-engineering data-quality 数据科学大语言模型自然语言处理 Python ray

Jupyter Notebook 3.2 k

1 年前

datafold / data-diff

Compare tables within or across databases

数据库 MySQL PostgreSQL snowflake rdbms trino data-engineering data-quality 数据科学 data-quality-monitoring dataengineering dataquality Oracle 数据库 SQL dbt Python data

Python 2.99 k

1 年前

whylabs / whylogs

#计算机科学#An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...