Data Engineering, Open Source and Presentation

IDC chief research officer: GenAI, from experimentation to adoption

CIO

DECEMBER 19, 2024

The key areas we see are having an enterprise AI strategy, a unified governance model and managing the technology costs associated with genAI to present a compelling business case to the executive team. This involves grounding a commercially available or open-source LLM with your own data.

Artificial Inteligence

Artificial Inteligence Research Artificial Intelligence Enterprise

Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker

AWS Machine Learning - AI

APRIL 23, 2025

This approach supports the broader goal of digital transformation, making sure that archival data can be effectively used for research, policy development, and institutional knowledge retention. In this post, we discuss how you can build an AI-powered document processing platform with open source NER and LLMs on SageMaker.

Artificial Inteligence

Artificial Inteligence Open Source AWS Serverless

What is data architecture? A framework to manage data

CIO

DECEMBER 20, 2024

Data streaming is data flowing continuously from a source to a destination for processing and analysis in real-time or near real-time. A container orchestration system, such as open-source Kubernetes, is often used to automate software deployment, scaling, and management. Container orchestration.

Architecture

Architecture Data Fractional CTO Technical Review

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What is data science? Transforming data into value

CIO

APRIL 22, 2022

Organizations need data scientists and analysts with expertise in techniques for analyzing data. Data scientists are the core of most data science teams, but moving from data to analysis to production value requires a range of skills and roles. Data science processes and methodologies. Data science tools.

Data

Data Artificial Inteligence Machine Learning Analytics

What is data analytics? Analyzing and managing data for decisions

CIO

JUNE 7, 2022

More specifically: Descriptive analytics uses historical and current data from multiple sources to describe the present state, or a specified historical state, by identifying trends and patterns. Diagnostic analytics uses data (often generated via descriptive analytics) to discover the factors or reasons for past performance.

Analytics

Analytics Data Analysis Business Analytics

The top 15 big data and data analytics certifications

CIO

JUNE 14, 2023

The exam tests knowledge of Cloudera Data Visualization, Cloudera Machine Learning, Cloudera Data Science Workbench, and Cloudera Data Warehouse, as well as SQL, Apache Nifi, Apache Hive, and other open source technologies. The exam consists of 40 questions and the candidate has 120 minutes to complete it.

Big Data

Big Data Analytics Data eLearning

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

AWS Machine Learning - AI

NOVEMBER 15, 2024

This data includes manuals, communications, documents, and other content across various systems like SharePoint, OneNote, and the company’s intranet. Principal sought to develop natural language processing (NLP) and question-answering capabilities to accurately query and summarize this unstructured data at scale.

Generative AI

Generative AI AWS Groups Artificial Inteligence

Why Reinvent the Wheel? The Challenges of DIY Open Source Analytics Platforms

Cloudera

JULY 24, 2023

In their effort to reduce their technology spend, some organizations that leverage open source projects for advanced analytics often consider either building and maintaining their own runtime with the required data processing engines or retaining older, now obsolete, versions of legacy Cloudera runtimes (CDH or HDP).

Open Source

Open Source Analytics Software Review Metrics

Inferencing holds the clues to AI puzzles

CIO

APRIL 10, 2024

As with many data-hungry workloads, the instinct is to offload LLM applications into a public cloud, whose strengths include speedy time-to-market and scalability. Data-obsessed individuals such as Sherlock Holmes knew full well the importance of inferencing in making predictions, or in his case, solving mysteries.

Artificial Inteligence

Artificial Inteligence Generative AI Storage Artificial Intelligence

Behind the scenes: The daily impact of genAI at Hamburg’s largest gaming company

CIO

DECEMBER 10, 2024

The open-source database StarRocks, which is already integrated into InnoGames data infrastructure and has an interface to LangChain, is used for this purpose. Our second prototype, QueryMind, makes it possible to query this extensive data landscape using natural language. A glance at the results of the QueryMind query.

Games

Games Artificial Inteligence Company Artificial Intelligence

12 data science certifications that will pay off

CIO

JANUARY 19, 2024

Whether you’re looking to earn a certification from an accredited university, gain experience as a new grad, hone vendor-specific skills, or demonstrate your knowledge of data analytics, the following certifications (presented in alphabetical order) will work for you. Not finding what you’re looking for?

Artificial Inteligence

Artificial Inteligence Data Machine Learning Azure

Capital Group invests big in talent development

CIO

JULY 29, 2022

For example, if a data team member wants to increase their skills or move to a data engineer position, they can embark on a curriculum for up to two years to gain the right skills and experience. The bootcamp broadened my understanding of key concepts in data engineering.

Groups

Groups Development Security Programming

A Talented Team, Innovative Technology, and The Opportunity to Grow. There Is No Place Like Cloudera

Cloudera

SEPTEMBER 13, 2023

Once I got to work with all the amazing open-source Apache tools I was hooked. The grass isn’t always greener While the opportunity was exciting, I realized that I missed the old team, the open-source environment, innovative projects, and Cloudera overall. I found Apache NiFi especially interesting.

Innovation

Innovation Open Source Technology Data Engineering

Percona Live 2023 Event Recap

Datavail

JUNE 20, 2023

Percona Live 2023 was an exciting open-source database event that brought together industry experts, database administrators, data engineers, and IT leadership. Percona Live 2023 Session Highlights The three days of the event were packed with interesting open-source database sessions!

Open Source

Open Source Database Administration Survey AWS

Core technologies and tools for AI, big data, and cloud computing

O'Reilly Media - Ideas

FEBRUARY 11, 2019

We’ve assembled sessions from leading companies, many of which will share case studies of applications of machine learning methods, including multiple presentations involving deep learning: Strata Business Summit. Temporal data and time-series analytics. AI and machine learning in the enterprise. Deep Learning.

Big Data

Big Data Technology Tools Cloud

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. CRM platforms). Conclusion .

Scalability

Scalability Data Technical Review Analytics

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

AWS Machine Learning - AI

JUNE 21, 2024

To accomplish this, the application integrates with an open sourced eSentire LLM Gateway project to monitor the interactions with customer queries, backend agent actions, and application responses. The tool is able to correlate multiple datasets and present a response.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Serverless

Assessing progress in automation technologies

O'Reilly Media - Ideas

DECEMBER 6, 2018

We presented an overview of the state of automation technologies: we tried to highlight the state of the key building block technologies and we described how these tools might evolve in the near future. Novices and non-experts have also benefited from easy-to-use, open source libraries for machine learning.

Technology

Technology Artificial Inteligence Machine Learning Hardware

One Big Cluster Stuck: The Right Tool for the Right Job

Cloudera

JUNE 26, 2023

Here are some tips and tricks of the trade to prevent well-intended yet inappropriate data engineering and data science activities from cluttering or crashing the cluster. For data engineering and data science teams, CDSW is highly effective as a comprehensive platform that trains, develops, and deploys machine learning models.

Tools

Tools Data Engineering Analytics Testing

Ultimate Guide to Citus Con: An Event for Postgres, 2023 edition

The Citus Data

MARCH 31, 2023

So this ultimate guide post is my gift to those of you who want to know more about the 37 talks that will be presented at this year’s 2nd annual Citus Con: An Event for Postgres 2023 —and who want to read about it in blog post form. And yes, Citus Con is virtual again this year! Lots to learn from here!

Azure

Azure Open Source Virtualization Software Engineering

Four Ways Telcos Can Realize Data-Driven Transformation

Cloudera

OCTOBER 19, 2023

Consolidation presents perhaps the biggest overall challenge, not only with respect to the complexity of integrating dissimilar IT systems and data platforms, but also that of merging and reconciling business processes and operations.

Data

Data Compliance Architecture Data Engineering

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint.

AWS

AWS Open Source Linux Engineering Management

What is Streaming Analytics: Data Streaming, Stream Processing, and Real-time Analytics

Altexsoft

JANUARY 22, 2020

As a result, it became possible to provide real-time analytics by processing streamed data. Please note: this topic requires some general understanding of analytics and data engineering, so we suggest you read the following articles if you’re new to the topic: Data engineering overview.

Analytics

Analytics Data IoT Analysis

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

Cloudera, a leader in big data analytics, provides a unified Data Platform for data management, AI, and analytics. Our customers run some of the world’s most innovative, largest, and most demanding data science, data engineering, analytics, and AI use cases, including PB-size generative AI workloads.

Cloud

Cloud Artificial Inteligence Generative AI Analytics

What is OLAP: A Complete Guide to Online Analytical Processing

Altexsoft

APRIL 16, 2021

An overview of data warehouse types. Optionally, you may study some basic terminology on data engineering or watch our short video on the topic: What is data engineering. What is data pipeline. Creating a cube is a custom process each time, because data can’t be updated once it was modeled in a cube.

Analytics

Analytics Analysis Storage Business Intelligence

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Cloudera

MAY 18, 2021

That is accomplished by delivering most technical use cases through a primarily container-based CDP services (CDP services offer a distinct environment for separate technical use cases e.g., data streaming, data engineering, data warehousing etc.) Quantifiable improvements to Apache open source projects.

Cloud

Cloud Technical Review Storage Backup

Data Summit 2023 Event Recap

Datavail

JUNE 8, 2023

Data Summit 2023 was filled with thought-provoking sessions and presentations that explored the ever-evolving world of data. I’ll recap our presentations and everything else the Datavail team learned at Data Summit 2023. in order to ensure successful transitions from DBA roles into data engineering roles.

Database Administration

Database Administration Data Artificial Inteligence Analytics

The new challenges of scale: What it takes to go from PB to EB data scale

CIO

JUNE 14, 2023

Big data exploded onto the scene in the mid-2000s and has continued to grow ever since. Today, the data is even bigger, and managing these massive volumes of data presents a new challenge for many organizations. Even if you live and breathe tech every day, it’s difficult to conceptualize how big “big” really is.

Data

Data Scalability Storage Big Data

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

Collects and aggregates metadata from components and present cluster state. As a user/support engineer of Ozone, I may want to: . This architecture allows for: Extremely fast data ingest, and data engineering done at the data lake. Apache Ozone handles both large and small size files. .

Data

Data Storage Architecture Big Data

Certified technical partner solutions help customers succeed with Cloudera Data Platform

Cloudera

AUGUST 26, 2020

Informatica and Cloudera deliver a proven set of solutions for rapidly curating data into trusted information. Informatica’s comprehensive suite of Data Engineering solutions is designed to run natively on Cloudera Data Platform — taking full advantage of the scalable computing platform.

Data

Data Artificial Inteligence Machine Learning Disaster Recovery

How to build up a data team (everything I ever learned about recruiting)

Erik Bernhardsson

JUNE 7, 2014

Blog, talk at meetups, open source stuff , go to conferences. But over time if you do this right, you will get anecdotal feedback from candidates coming in saying they saw your presentation or read this cool story on Hacker News, or what not. Presenting the opportunity. Finding the people.

Recruiting

Recruiting Weak Development Team Data Software Review

Machine Learning basics: 10 Platforms to start learning and get awesome at it

UruIT

APRIL 27, 2020

If you haven’t already started, there’s no better time than the present and no better list that our Machine Learning basics selection. . MathWork focused on the development of these tools to become experts in high-end financial use and data engineering contexts. There’s no time like the present. .

Artificial Inteligence

Artificial Inteligence Machine Learning Azure Software Review

Radar trends to watch: March 2022

O'Reilly Media - Ideas

MARCH 1, 2022

It is not open source, and is now entering private beta. The Information Battery : Pre-computing and caching data when energy costs are low to minimize energy use when power costs are high is a good way to save money and take advantage of renewable energy sources.

Trends

Trends Blockchain Serverless Malware

Our help documentation is now available in Portuguese

Github

OCTOBER 23, 2019

Learn about the future of technology, contribute to open source projects, build community connections, and listen to a keynote presentation by Lorena Mesa, a GitHub data engineer specializing in machine learning.

Artificial Inteligence

Artificial Inteligence Machine Learning Continuous Integration Open Source

How to build up a data team (everything I ever learned about recruiting)

Erik Bernhardsson

JUNE 7, 2014

Blog, talk at meetups, open source stuff , go to conferences. But over time if you do this right, you will get anecdotal feedback from candidates coming in saying they saw your presentation or read this cool story on Hacker News, or what not. Presenting the opportunity. Finding the people.

Recruiting

Recruiting Weak Development Team Data Software Review

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

Cloudera

AUGUST 20, 2021

The Cloudera Data Platform comprises a number of ‘data experiences’ each delivering a distinct analytical capability using one or more purposely-built Apache open source projects such as Apache Spark for Data Engineering and Apache HBase for Operational Database workloads. Conclusion.

Strategy

Strategy Data Technical Review Weak Development Team

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

The cause is hybrid data – the massive amounts of data created everywhere businesses operate – in clouds, on-prem, and at the edge. Only a fraction of data created is actually stored and managed, with analysts estimating it to be between 4 – 6 ZB in 2020. The future is hybrid data, embrace it.

Data

Data Architecture Analytics Big Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas.

Architecture

Architecture Data Security Technical Review

Top Green Software Speakers

Apiumhub

NOVEMBER 20, 2023

With 16 years of professional experience in software engineering, including roles as CTO and CEO, he has become a prominent speaker at Green Software events in Germany. His primary responsibility is to integrate sustainability into the engineering roadmap and utilize the company’s portfolio to champion sustainability solutions.

Fractional CTO

Fractional CTO Software CTO Sustainability

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Netflix Tech

MARCH 5, 2019

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Scalability Cloud Data

Airbyte vs Fivetran: Comparing Features, Costs, and Use Cases

Openxcell

DECEMBER 12, 2024

This comparison will help you make an informed decision and ensure that your data flows smoothly. Airbyte, a leading open-source data integration platform, boasts over 35,000 deployments across open-source users and Airbyte Cloud subscribers. Now, let’s explore the whole analogy of Airbyte vs Fivetran.

Open Source

Open Source Comparison Weak Development Team Scalability

What are model governance and model operations?

O'Reilly Media - Ideas

JUNE 19, 2019

First, the machine learning community has conducted groundbreaking research in many areas of interest to companies, and much of this research has been conducted out in the open via preprints and conference presentations. Discussions around machine learning tend to revolve around the work of data scientists and model building experts.

Government

Government Artificial Inteligence Machine Learning Testing

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

Altexsoft

OCTOBER 8, 2021

In this article, we’re comparing several data integration tools against key criteria to help companies looking for ways to merge and centralize data make an informed choice. Data integration in a nutshell. With them, it is much easier and faster to comb through numerous data repositories to get the needed information.

Tools

Tools Data Software Review Open Source

Big Data Analytics: How It Works, Tools, and Real-Life Applications

Altexsoft

MAY 14, 2021

as data is being generated ? and any discoveries are presented almost instantaneously. Data generated from various sources including sensors, log files and social media, you name it, can be utilized both independently and as a supplement to existing transactional data many organizations already have at hand.

Big Data

Big Data Analytics Tools Applications

IDC chief research officer: GenAI, from experimentation to adoption

Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker

Webinars

Trending Sources

What is data architecture? A framework to manage data

Webinars

What is data science? Transforming data into value

What is data analytics? Analyzing and managing data for decisions

The top 15 big data and data analytics certifications

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

Why Reinvent the Wheel? The Challenges of DIY Open Source Analytics Platforms

Inferencing holds the clues to AI puzzles

Behind the scenes: The daily impact of genAI at Hamburg’s largest gaming company

12 data science certifications that will pay off

Capital Group invests big in talent development

A Talented Team, Innovative Technology, and The Opportunity to Grow. There Is No Place Like Cloudera

Percona Live 2023 Event Recap

Core technologies and tools for AI, big data, and cloud computing

Addressing the Three Scalability Challenges in Modern Data Platforms

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

Assessing progress in automation technologies

One Big Cluster Stuck: The Right Tool for the Right Job

Ultimate Guide to Citus Con: An Event for Postgres, 2023 edition

Four Ways Telcos Can Realize Data-Driven Transformation

Netflix at AWS re:Invent 2019

What is Streaming Analytics: Data Streaming, Stream Processing, and Real-time Analytics

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

What is OLAP: A Complete Guide to Online Analytical Processing

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Data Summit 2023 Event Recap

The new challenges of scale: What it takes to go from PB to EB data scale

Apache Ozone and Dense Data Nodes

Certified technical partner solutions help customers succeed with Cloudera Data Platform

How to build up a data team (everything I ever learned about recruiting)

Machine Learning basics: 10 Platforms to start learning and get awesome at it

Radar trends to watch: March 2022

Our help documentation is now available in Portuguese

How to build up a data team (everything I ever learned about recruiting)

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

The Future Is Hybrid Data, Embrace It

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Top Green Software Speakers

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Airbyte vs Fivetran: Comparing Features, Costs, and Use Cases

What are model governance and model operations?

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

Big Data Analytics: How It Works, Tools, and Real-Life Applications

Stay Connected