No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
#编辑器#Build data pipelines, the easy way 🛠️
StreamX 的初衷是为了让流处理更简单. 打造一个一站式大数据平台,流批一体,湖仓一体的解决方案
#计算机科学#Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Implementing best practices for PySpark ETL jobs and applications.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
#大语言模型#Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
#计算机科学#A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
#计算机科学#A Clojure high performance data processing system
A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection
#大语言模型#Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.
A simplified, lightweight ETL Framework based on Apache Spark
#大语言模型#The Supabase of AI era. A modular, open-source backend for building AI-native software — designed for knowledge, not static data.
concurrent & fluent interface for (async) iterables
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
#计算机科学#A simple Spark-powered ETL framework that just works 🍺
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
This is a template you can use for your next data engineering portfolio project.