Remove Data Engineering Remove Demo Remove Open Source
article thumbnail

Ducklake: A journey to integrate DuckDB with Unity Catalog

Xebia

This summer, Databricks announced the open-sourcing of Unity Catalog. In this post, we’ll dive into how you can integrate DuckDB with the open-source Unity Catalog, walking you through our hands-on experience, sharing the setup process, and exploring both the opportunities and challenges of combining these two technologies.

article thumbnail

No-code business intelligence service y42 raises $2.9M seed round

TechCrunch

Like similar startups, y42 extends the idea data warehouse, which was traditionally used for analytics, and helps businesses operationalize this data. At the core of the service is a lot of open source and the company, for example, contributes to GitLabs’ Meltano platform for building data pipelines.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Cloudera

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. As a result, I decided to use an open-source Occupancy Detection Data Set to build this application.

article thumbnail

Monitoring dbt model and test executions using Elementary Data

Xebia

However, this requires a lot of custom engineering work and is not an easy task. Besides that you need to create a dashboard on top of this artifact data, to get meaningful insights out of it. Luckily, there is an open-source solution for this called Elementary Data.

Testing 130
article thumbnail

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

This blog post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers and production engineers. Impedance mismatch between data scientists, data engineers and production engineers. For now, we’ll focus on Kafka.

article thumbnail

Forget the Rules, Listen to the Data

Hu's Place - HitachiVantara

A Big Data Analytics pipeline– from ingestion of data to embedding analytics consists of three steps Data Engineering : The first step is flexible data on-boarding that accelerates time to value. This will require another product for data governance. This is colloquially called data wrangling.

Data 90
article thumbnail

The Good and the Bad of Databricks Lakehouse Platform

Altexsoft

What is Databricks Databricks is an analytics platform with a unified set of tools for data engineering, data management , data science, and machine learning. It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data.