Data Engineering, Demo and Open Source

Data Engineering

Demo

Open Source

Ducklake: A journey to integrate DuckDB with Unity Catalog

Xebia

OCTOBER 18, 2024

This summer, Databricks announced the open-sourcing of Unity Catalog. In this post, we’ll dive into how you can integrate DuckDB with the open-source Unity Catalog, walking you through our hands-on experience, sharing the setup process, and exploring both the opportunities and challenges of combining these two technologies.

Open Source

Open Source AWS Government Technical Review

No-code business intelligence service y42 raises $2.9M seed round

TechCrunch

MARCH 22, 2021

Like similar startups, y42 extends the idea data warehouse, which was traditionally used for analytics, and helps businesses operationalize this data. At the core of the service is a lot of open source and the company, for example, contributes to GitLabs’ Meltano platform for building data pipelines.

Business Intelligence

Business Intelligence Software Review B2B Analytics

Join 49,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Cloudera

JANUARY 20, 2021

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. As a result, I decided to use an open-source Occupancy Detection Data Set to build this application.

Artificial Inteligence

Artificial Inteligence Machine Learning Applications Data

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Monitoring dbt model and test executions using Elementary Data

Xebia

JANUARY 9, 2024

However, this requires a lot of custom engineering work and is not an easy task. Besides that you need to create a dashboard on top of this artifact data, to get meaningful insights out of it. Luckily, there is an open-source solution for this called Elementary Data.

Testing

Testing Data Open Source Applications

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

This blog post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers and production engineers. Impedance mismatch between data scientists, data engineers and production engineers. For now, we’ll focus on Kafka.

Artificial Inteligence

Artificial Inteligence Machine Learning Scalability Data Engineering

Forget the Rules, Listen to the Data

Hu's Place - HitachiVantara

MAY 10, 2019

A Big Data Analytics pipeline– from ingestion of data to embedding analytics consists of three steps Data Engineering : The first step is flexible data on-boarding that accelerates time to value. This will require another product for data governance. This is colloquially called data wrangling.

Data

Data Artificial Inteligence Machine Learning Weak Development Team

The Good and the Bad of Databricks Lakehouse Platform

Altexsoft

MARCH 30, 2023

What is Databricks Databricks is an analytics platform with a unified set of tools for data engineering, data management , data science, and machine learning. It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data.

Weak Development Team

Weak Development Team Artificial Inteligence Machine Learning Software Review

What you need to know about product management for AI

O'Reilly Media - Ideas

MARCH 31, 2020

The prospect of taking on a costly data infrastructure project is daunting. If your company is starting out on this path, it’s important to recognize that there are now widely available open source tools and commercial platforms that can power this foundation for you. AI doesn’t fit that model. How do you select what to work on?

Product Management

Product Management Artificial Inteligence Machine Learning Weak Development Team

Network Traffic Intelligence for ISPs

Kentik

MAY 23, 2017

The skills and resources required for open source don’t match core ISP priorities. With the advent of open source big data engines, the power of big data network analytics has seemed tantalizingly close. And that keeps generic open source tools from being a fully viable path.

Network

Network Open Source Big Data Load Balancer

Announcing Cloudera’s Enterprise Artificial Intelligence Partnership Ecosystem

Cloudera

DECEMBER 20, 2023

We see AI applications like chatbots being built on top of closed-source or open source foundational models. Those models are trained or augmented with data from a data management platform. The data management platform, models, and end applications are powered by cloud infrastructure and/or specialized hardware.

Artificial Inteligence

Artificial Inteligence Artificial Intelligence Enterprise Machine Learning

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

Altexsoft

OCTOBER 8, 2021

Usually, data integration software is divided into on-premise, cloud-based, and open-source types. On-premise data integration tools. As the name suggests, these tools aim at integrating data from different on-premise source systems. Open-source data integration tools. Suitable for.

Tools

Tools Data Software Review Open Source

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Cloudera

JULY 13, 2023

However, other query engines such as Hive and Spark can also benefit from this Iceberg improvement as well. Repeated metadata reads problem in Impala + Iceberg Apache Impala is an open source, distributed, massively parallel SQL query engine. It includes a live demo recording of Iceberg capabilities.

Weak Development Team

Weak Development Team Engineering Analytics Storage

Data Integration: Approaches, Techniques, Tools, and Best Practices for Implementation

Altexsoft

SEPTEMBER 10, 2021

For example, there isn’t much data you operate with: You maintain your reporting in Excel spreadsheets, store some data in a CRM, and also use a BI tool. In such a case, you can delegate integration work to a data engineer who will manually upload data into, say, a CSV file and move it to a BI system.

Tools

Tools Data Software Review Technical Review

Kentik APIs Enable Multi-Solution Integration

Kentik

OCTOBER 2, 2017

That’s why network operations has for years involved deployment of a mix of different commercial, open-source, and home-grown tools. Another API-based option that we’ve developed for our customers is Kentik Connect Pro, a plug-in that we worked with Grafana to develop for their popular open-source data graphing software.

Open Source

Open Source Network Metrics System

Apiumhub among top IT industry leaders in Code Europe event

Apiumhub

AUGUST 12, 2021

Gema Parreño Piqueras – Lead Data Science @Apiumhub Gema Parreno is currently a Lead Data Scientist at Apiumhub, passionate about machine learning and video games, with three years of experience at BBVA and later at Google in ML Prototype. She started her own startup (Cubicus) in 2013. Twitter: [link] Linkedin: [link].

Industry

Industry Technical Advisors CTO Coach Azure

ETL Testing: Importance, Process, and ETL Testing Tools

Altexsoft

OCTOBER 29, 2020

But before you dive in, we recommend you reviewing our more beginner-friendly articles on data transformation: Complete Guide to Business Intelligence and Analytics: Strategy, Steps, Processes, and Tools. What is Data Engineering: Explaining the Data Pipeline, Data Warehouse, and Data Engineer Role.

Testing

Testing Tools Software Review Technical Review

Consolidated Tools Improve Network Management

Kentik

MAY 31, 2017

Kentik’s founders, who ran large network operations at Akamai, Netflix, YouTube, and Cloudflare, well understand the challenges faced by teams working with siloed legacy tools and fragmented data sets. The time has come for them to put away their point solutions, spreadsheets, and open source tools.

Network

Network Tools Big Data Engineering

Big Data SaaS Saves Network Operations!

Kentik

JULY 19, 2017

Of course just opening one’s mind to the dream isn’t the same as having the solution. You could try to construct it yourself, for example by building it with open source tools. Learn more by digging into our product , seeing what our customers think, or reading a white paper on the Kentik Data Engine.

Big Data

Big Data Network Data Systems Review

Beyond Hadoop

Kentik

APRIL 11, 2016

Developed as a model for “processing and generating large data sets,” MapReduce was built around the core idea of using a map function to process a key/value pair into a set of intermediate key/value pairs, and then a reduce function to merge all intermediate values associated with a given intermediate key.

Big Data

Big Data Analytics Network Architecture

An Overview of the Top Text Annotation Tools For Natural Language Processing

John Snow Labs

MAY 24, 2023

Label Studio Label Studio is an open source data annotation tool for labeling multiple types of data. The two important functions of this tool are: – Performing different types of labeling with various data formats. – It offers documentation and live demos for ease of use.

Tools

Tools Artificial Inteligence Machine Learning Software Review

Five Takeaways from HashiConf US 2019: Building Infrastructure in a Multi-* World

Daniel Bryant

SEPTEMBER 13, 2019

What was worth noting was that (anecdotally) even engineers from large organisations were not looking for full workload portability (i.e. There were also two patterns of adoption of HashiCorp tooling I observed from engineers that I chatted to: Infrastructure-driven?—?in

Infrastructure

Infrastructure Azure Software Engineering Cloud

A Complete Guide to Data Visualization in Business Intelligence: Problems, Libraries, and Tools to Integrate, Free Data Visualization Tools

Altexsoft

SEPTEMBER 20, 2019

As the article is big enough, we suggest you to navigate using this outline, if needed: What is data visualization: how it works, types of data to visualize, visualization formats. Tools for data visualization: paid, free, and open-source instruments. Data visualization pitfalls: issues and challenges to consider.

Business Intelligence

Business Intelligence Tools Data Analytics

Technology Trends for 2022

O'Reilly Media - Ideas

JANUARY 25, 2022

A quick look at bigram usage (word pairs) doesn’t really distinguish between “data science,” “data engineering,” “data analysis,” and other terms; the most common word pair with “data” is “data governance,” followed by “data science.” It’s worth looking at alternatives to Oracle though.

Trends

Trends Technical Review Technology Artificial Inteligence

The Good and the Bad of Apache Airflow Pipeline Orchestration

Altexsoft

NOVEMBER 7, 2022

You can hardly compare data engineering toil with something as easy as breathing or as fast as the wind. The platform went live in 2015 at Airbnb, the biggest home-sharing and vacation rental site, as an orchestrator for increasingly complex data pipelines. How data engineering works. Source: Apache Airflow.

Weak Development Team

Weak Development Team Technical Review Software Review Data Engineering

Cost Conscious Data Warehousing with Cloudera Data Platform

Cloudera

DECEMBER 10, 2020

Drawing on more than a decade of experience in building and deploying massive scale data platforms on economical budgets, Cloudera has designed and delivered a cost-cutting cloud-native solution – Cloudera Data Warehouse (CDW), part of the new Cloudera Data Platform (CDP). Watch this video to get an overview of CDW. .

Data

Data Technical Review Storage Systems Review

Elevating Productivity: Cloudera Data Engineering Brings External IDE Connectivity to Apache Spark

Cloudera

NOVEMBER 21, 2024

As advanced analytics and AI continue to drive enterprise strategy, leaders are tasked with building flexible, resilient data pipelines that accelerate trusted insights. A New Level of Productivity with Remote Access The new Cloudera Data Engineering 1.23 Why Cloudera Data Engineering?

Data Engineering

Data Engineering Engineering Data Enterprise

Build agentic systems with CrewAI and Amazon Bedrock

AWS Machine Learning - AI

MARCH 31, 2025

In this post, we explore how CrewAIs open source agentic framework , combined with Amazon Bedrock , enables the creation of sophisticated multi-agent systems that can transform how businesses operate. A US Army veteran, Tony brings a diverse background in healthcare, data engineering, and AI. billion in 2024 to $47.1

Systems Review

Systems Review System Artificial Inteligence AWS

CTO Universe

Ducklake: A journey to integrate DuckDB with Unity Catalog

No-code business intelligence service y42 raises $2.9M seed round

Webinars

Trending Sources

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Webinars

Monitoring dbt model and test executions using Elementary Data

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Forget the Rules, Listen to the Data

The Good and the Bad of Databricks Lakehouse Platform

What you need to know about product management for AI

Network Traffic Intelligence for ISPs

Announcing Cloudera’s Enterprise Artificial Intelligence Partnership Ecosystem

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Data Integration: Approaches, Techniques, Tools, and Best Practices for Implementation

Kentik APIs Enable Multi-Solution Integration

Apiumhub among top IT industry leaders in Code Europe event

ETL Testing: Importance, Process, and ETL Testing Tools

Consolidated Tools Improve Network Management

Big Data SaaS Saves Network Operations!

Beyond Hadoop

An Overview of the Top Text Annotation Tools For Natural Language Processing

Five Takeaways from HashiConf US 2019: Building Infrastructure in a Multi-* World

A Complete Guide to Data Visualization in Business Intelligence: Problems, Libraries, and Tools to Integrate, Free Data Visualization Tools

Technology Trends for 2022

The Good and the Bad of Apache Airflow Pipeline Orchestration

Cost Conscious Data Warehousing with Cloudera Data Platform

Elevating Productivity: Cloudera Data Engineering Brings External IDE Connectivity to Apache Spark

Build agentic systems with CrewAI and Amazon Bedrock

Stay Connected