#数据仓库#OpenRefine(原名Google Refine) 是一个强大的数据清洗和转换工具
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
#计算机科学#A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
#计算机科学#Carefully curated resource links for data science in one place
#数据仓库#Blazing-fast Data-Wrangling toolkit
#大语言模型#ETL, Analytics, Versioning for Unstructured Data
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
#时序数据库#A Python toolbox for gaining geometric insights into high-dimensional data
#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
#计算机科学#Machine learning with dataframes
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Mi...
#计算机科学#Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
#计算机科学#Materials for following along with Hands-On Data Analysis with Pandas.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
An introductory workshop on pandas with notebooks and exercises for following along. Slides contain all solutions.
Data Analysis and Visualization in R for Ecologists
Pacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)
Like awk, but with SQL and table joins