Remove Data Engineering Remove Demo Remove Open Source
article thumbnail

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Cloudera

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. As a result, I decided to use an open-source Occupancy Detection Data Set to build this application.

article thumbnail

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

This blog post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers and production engineers. Impedance mismatch between data scientists, data engineers and production engineers. For now, we’ll focus on Kafka.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Forget the Rules, Listen to the Data

Hu's Place - HitachiVantara

A Big Data Analytics pipeline– from ingestion of data to embedding analytics consists of three steps Data Engineering : The first step is flexible data on-boarding that accelerates time to value. This will require another product for data governance. This is colloquially called data wrangling.

Data 90
article thumbnail

Network Traffic Intelligence for ISPs

Kentik

The skills and resources required for open source don’t match core ISP priorities. With the advent of open source big data engines, the power of big data network analytics has seemed tantalizingly close. And that keeps generic open source tools from being a fully viable path.

Network 40
article thumbnail

Apiumhub among top IT industry leaders in Code Europe event

Apiumhub

Gema Parreño Piqueras – Lead Data Science @Apiumhub Gema Parreno is currently a Lead Data Scientist at Apiumhub, passionate about machine learning and video games, with three years of experience at BBVA and later at Google in ML Prototype. She started her own startup (Cubicus) in 2013. Twitter: [link] Linkedin: [link].

article thumbnail

Kentik APIs Enable Multi-Solution Integration

Kentik

That’s why network operations has for years involved deployment of a mix of different commercial, open-source, and home-grown tools. Another API-based option that we’ve developed for our customers is Kentik Connect Pro, a plug-in that we worked with Grafana to develop for their popular open-source data graphing software.

article thumbnail

ETL Testing: Importance, Process, and ETL Testing Tools

Altexsoft

But before you dive in, we recommend you reviewing our more beginner-friendly articles on data transformation: Complete Guide to Business Intelligence and Analytics: Strategy, Steps, Processes, and Tools. What is Data Engineering: Explaining the Data Pipeline, Data Warehouse, and Data Engineer Role.

Testing 63