AWS, Data Engineering and Open Source

10 most in-demand enterprise IT skills

CIO

DECEMBER 10, 2024

AWS Amazon Web Services (AWS) is the most widely used cloud platform today. Central to cloud strategies across nearly every industry, AWS skills are in high demand as organizations look to make the most of the platforms wide range of offerings. As such, Oracle skills are perennially in-demand skill.

UI/UX

UI/UX Enterprise Artificial Inteligence Database Administration

Ducklake: A journey to integrate DuckDB with Unity Catalog

Xebia

OCTOBER 18, 2024

This summer, Databricks announced the open-sourcing of Unity Catalog. In this post, we’ll dive into how you can integrate DuckDB with the open-source Unity Catalog, walking you through our hands-on experience, sharing the setup process, and exploring both the opportunities and challenges of combining these two technologies.

Open Source

Open Source AWS Government Technical Review

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

AWS Machine Learning - AI

NOVEMBER 15, 2024

Principal wanted to use existing internal FAQs, documentation, and unstructured data and build an intelligent chatbot that could provide quick access to the right information for different roles. Principal also used the AWS open source repository Lex Web UI to build a frontend chat interface with Principal branding.

Generative AI

Generative AI AWS Groups Artificial Inteligence

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

CloudQuery raises $15M to demystify your cloud infrastructure setup

TechCrunch

JUNE 22, 2022

CloudQuery CEO and co-founder Yevgeny Pats helped launch the startup because he needed a tool to give him visibility into his cloud infrastructure resources, and he couldn’t find one on the open market. He built his own SQL-based tool to help understand exactly what resources he was using, based on data engineering best practices.

Infrastructure

Infrastructure Cloud Open Source Data Engineering

Revolutionizing customer service: MaestroQA’s integration with Amazon Bedrock for actionable insight

AWS Machine Learning - AI

MARCH 13, 2025

We discuss the unique challenges MaestroQA overcame and how they use AWS to build new features, drive customer insights, and improve operational inefficiencies. They were also able to use the familiar AWS SDK to quickly and effortlessly integrate Amazon Bedrock into their application. The best is yet to come.

Generative AI

Generative AI CTO Coach AWS Artificial Inteligence

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning - AI

NOVEMBER 20, 2024

Whether it’s structured data in databases or unstructured content in document repositories, enterprises often struggle to efficiently query and use this wealth of information. Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. aligned identity provider (IdP).

Data

Data AWS Groups Knowledge Base

Are you ready for MLOps? 🫵

Xebia

FEBRUARY 28, 2025

… that is not an awful lot. These days Data Science is not anymore a new domain by any means. The time when Hardvard Business Review posted the Data Scientist to be the “Sexiest Job of the 21st Century” is more than a decade ago [1]. In 2019 alone the Data Scientist job postings on Indeed rose by 256% [2].

Technical Review

Technical Review Weak Development Team Artificial Inteligence Machine Learning

Tecton raises $100M, proving that the MLOps market is still hot

TechCrunch

JULY 12, 2022

But building data pipelines to generate these features is hard, requires significant data engineering manpower, and can add weeks or months to project delivery times,” Del Balso told TechCrunch in an email interview. Systems use features to make their predictions. “We are still in the early innings of MLOps.

Artificial Inteligence

Artificial Inteligence Machine Learning Marketing Data Engineering

Union.ai raises $10M to simplify AI and ML workflow orchestration

TechCrunch

APRIL 12, 2022

Union.ai , a startup emerging from stealth with a commercial version of the open source AI orchestration platform Flyte, today announced that it raised $10 million in a round contributed by NEA and “select” angel investors. This will lead to revenue growth in the near future.”

Artificial Inteligence

Artificial Inteligence Machine Learning Open Source Biotech

The top 15 big data and data analytics certifications

CIO

JUNE 14, 2023

If you would like to submit a big data certification to this directory , please email us. AWS Certified Data Analytics The AWS Certified Data Analytics – Specialty certification is intended for candidates with experience and expertise working with AWS to design, build, secure, and maintain analytics solutions.

Big Data

Big Data Analytics Data eLearning

The 10 most in-demand IT jobs in finance

CIO

SEPTEMBER 2, 2022

The US financial services industry has fully embraced a move to the cloud, driving a demand for tech skills such as AWS and automation, as well as Python for data analytics, Java for developing consumer-facing apps, and SQL for database work. Data engineer.

Software Engineering

Software Engineering Data Engineering DevOps AWS

The 10 most in-demand IT jobs in finance

CIO

AUGUST 31, 2022

The US financial services industry has fully embraced a move to the cloud, driving a demand for tech skills such as AWS and automation, as well as Python for data analytics, Java for developing consumer-facing apps, and SQL for database work. Data engineer.

Software Engineering

Software Engineering Data Engineering DevOps AWS

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Open Source Linux Engineering Management

The rise of the data lakehouse: A new era of data value

CIO

AUGUST 18, 2022

Moonfare, a private equity firm, is transitioning from a PostgreSQL-based data warehouse on AWS to a Dremio data lakehouse on AWS for business intelligence and predictive analytics. Users coming from a data warehouse environment shouldn’t care where the data resides,” says Angelo Slawik, data engineer at Moonfare.

Data

Data Technical Advisors Technical Review Artificial Inteligence

Technology Trends for 2025

O'Reilly Media - Ideas

JANUARY 14, 2025

Building applications with RAG requires a portfolio of data (company financials, customer data, data purchased from other sources) that can be used to build queries, and data scientists know how to work with data at scale. Data engineers build the infrastructure to collect, store, and analyze data.

Trends

Trends Technology Security Artificial Inteligence

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

AWS Machine Learning - AI

JUNE 21, 2024

To accomplish this, eSentire built AI Investigator, a natural language query tool for their customers to access security platform data by using AWS generative artificial intelligence (AI) capabilities. eSentire has over 2 TB of signal data stored in their Amazon Simple Storage Service (Amazon S3) data lake.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Serverless

Capital Group invests big in talent development

CIO

JULY 29, 2022

For example, if a data team member wants to increase their skills or move to a data engineer position, they can embark on a curriculum for up to two years to gain the right skills and experience. The bootcamp broadened my understanding of key concepts in data engineering.

Groups

Groups Security Development Programming

Edmunds sets stage for AI with data infrastructure consolidation

CIO

JULY 10, 2023

His role now encompasses responsibility for data engineering, analytics development, and the vehicle inventory and statistics & pricing teams. The company was born as a series of print buying guides in 1966 and began making its data available via CD-ROM in the 1990s.

Infrastructure

Infrastructure Artificial Inteligence Data Generative AI

Demystifying MLOps: From Notebook to ML Application

Xebia

FEBRUARY 25, 2024

Data science is generally not operationalized Consider a data flow from a machine or process, all the way to an end-user. 2 In general, the flow of data from machine to the data engineer (1) is well operationalized. You could argue the same about the data engineering step (2) , although this differs per company.

Applications

Applications Technical Review Software Review Open Source

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning - AI

MARCH 18, 2025

This retrieved data is used as context, combined with the original prompt, to create an expanded prompt that is passed to the LLM. Streamlit This open source Python library makes it straightforward to create and share beautiful, custom web apps for ML and data science. The following diagram illustrates the RAG framework.

Artificial Inteligence

Artificial Inteligence Applications Generative AI Off-The-Shelf

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Progress

DECEMBER 30, 2019

Another cloud service I’m asked about is AWS Amplify from another popular cloud giant. Assuming you’re able to choose the best tool for the job, let’s contrast AWS Amplify with Kinvey, our serverless development platform for business apps. Where Does AWS Amplify Fit? When Should I Use Progress Kinvey?

AWS

AWS DevOps Disaster Recovery Serverless

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Progress

DECEMBER 30, 2019

Another cloud service I’m asked about is AWS Amplify from another popular cloud giant. Assuming you’re able to choose the best tool for the job, let’s contrast AWS Amplify with Kinvey, our serverless development platform for business apps. Where Does AWS Amplify Fit? When Should I Use Progress Kinvey?

AWS

AWS DevOps Disaster Recovery Serverless

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Progress

DECEMBER 30, 2019

Another cloud service I’m asked about is AWS Amplify from another popular cloud giant. Assuming you’re able to choose the best tool for the job, let’s contrast AWS Amplify with Kinvey, our serverless development platform for business apps. Where Does AWS Amplify Fit? When Should I Use Progress Kinvey?

AWS

AWS DevOps Disaster Recovery Serverless

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Open Source Linux Off-The-Shelf

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Open Source Linux Off-The-Shelf

7 Free Google Cloud Training Resources

ParkMyCloud

DECEMBER 11, 2020

If you know where to look, open-source learning is a great way to get familiar with different cloud service providers. . With the combined knowledge from our previous blog posts on free training resources for AWS and Azure , you’ll be well on your way to expanding your cloud expertise and finding your own niche.

Google Cloud

Google Cloud Training Resources Cloud

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Netflix, where this innovation was born, is perhaps the best example of a 100 PB scale S3 data lake that needed to be built into a data warehouse. The cloud native table format was open sourced into Apache Iceberg by its creators. At Cloudera, we are proud of our open-source roots and committed to enriching the community.

Data

Data Analytics Open Source Architecture

Percona Live 2023 Event Recap

Datavail

JUNE 20, 2023

Percona Live 2023 was an exciting open-source database event that brought together industry experts, database administrators, data engineers, and IT leadership. Percona Live 2023 Session Highlights The three days of the event were packed with interesting open-source database sessions!

Open Source

Open Source Database Administration Survey AWS

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

This blog post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers and production engineers. Impedance mismatch between data scientists, data engineers and production engineers. For now, we’ll focus on Kafka.

Artificial Inteligence

Artificial Inteligence Machine Learning Scalability Data Engineering

Core technologies and tools for AI, big data, and cloud computing

O'Reilly Media - Ideas

FEBRUARY 11, 2019

In the survey behind our upcoming report, “Evolving data infrastructure,” we found 85% of respondents indicated they had data infrastructure in at least one of the seven cloud providers we listed, with two-thirds (63%) using Amazon Web Services (AWS) for some portion of their data infrastructure. Security and privacy.

Big Data

Big Data Technology Tools Cloud

Technology Trends for 2024

O'Reilly Media - Ideas

JANUARY 25, 2024

Before that, cloud computing itself took off in roughly 2010 (AWS was founded in 2006); and Agile goes back to 2000 (the Agile Manifesto dates back to 2001, Extreme Programming to 1999). This change is apparently not an error in the data. If you want to run an open source language model on your laptop, try llamafile.)

Trends

Trends Technical Review Technology Artificial Inteligence

AI Chihuahua! Part I: Why Machine Learning is Dogged by Failure and Delays

d2iq

FEBRUARY 19, 2021

Components that are unique to data engineering and machine learning (red) surround the model, with more common elements (gray) in support of the entire infrastructure on the periphery. Before you can build a model, you need to ingest and verify data, after which you can extract features that power the model.

Artificial Inteligence

Artificial Inteligence Machine Learning Technical Review Software Review

Data Migration Software: Which Solution Fits Your Project Best

Altexsoft

DECEMBER 4, 2020

Three types of data migration tools. Use cases: small projects, specific source and target locations not supported by other solutions. Automation scripts can be written by data engineers or ETL developers in charge of your migration project. Phases of the data migration process. Data sources and destinations.

Software Review

Software Review Software Data Technical Review

What is OLAP: A Complete Guide to Online Analytical Processing

Altexsoft

APRIL 16, 2021

An overview of data warehouse types. Optionally, you may study some basic terminology on data engineering or watch our short video on the topic: What is data engineering. What is data pipeline. Creating a cube is a custom process each time, because data can’t be updated once it was modeled in a cube.

Analytics

Analytics Analysis Storage Business Intelligence

Ultimate Guide to Citus Con: An Event for Postgres, 2023 edition

The Citus Data

MARCH 31, 2023

Americas livestream, Citus open source user, real-time analytics, JSONB) Lessons learned: Migrating from AWS-Hosted PostgreSQL RDS to Self-Hosted Citus , by Matt Klein & Delaney Mackenzie of Jellyfish.co. (on-demand Checkpoint and WAL configs , by Samay Sharma on the Postgres open source team at Microsoft.

Azure

Azure Open Source Virtualization Software Engineering

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Netflix Tech

MARCH 5, 2019

the order of the rows on your Netflix home page, issuing content licenses when you click play, finding the Open Connect cache closest to you with the content you requested, and many more). All these micro-services are currently operated in AWS cloud infrastructure. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Scalability Cloud Data

Announcing Cloudera’s Enterprise Artificial Intelligence Partnership Ecosystem

Cloudera

DECEMBER 20, 2023

At our recent Evolve Conference in New York we were extremely excited to announce our founding AI ecosystem partners: Amazon Web Services (“AWS“), NVIDIA, and Pinecone. We see AI applications like chatbots being built on top of closed-source or open source foundational models. We’ll start with the enterprise AI stack.

Artificial Inteligence

Artificial Inteligence Artificial Intelligence Enterprise Machine Learning

Making AI Work in Legal Tech: Balancing Cost and Performance

Invid Group

AUGUST 28, 2024

The willingness to explore new tools like large language models (LLM), machine learning (ML) models, and natural language processing (NLP) is opening unthinkable possibilities to improve processes, reduce operational costs, or simply innovate [2]. They can be proprietary, third-party, open-source, and run either on-premises or in the cloud.

Technical Review

Technical Review Artificial Inteligence Performance Azure

Supporting Diverse ML Systems at Netflix

Netflix Tech

MARCH 7, 2024

Berg , Romain Cledat , Kayla Seeley , Shashank Srikanth , Chaoying Wang , Darin Yu Netflix uses data science and machine learning across all facets of the company, powering a wide range of business applications from our internal infrastructure and content demand modeling to media understanding.

System

System Artificial Inteligence Machine Learning Open Source

Data Summit 2023 Event Recap

Datavail

JUNE 8, 2023

From DBA to Data Engineer—The Strategic Role of DBAs in the Cloud Over the past few years, the IT landscape has experienced significant disruptions. Additionally, he highlighted the need for DBAs to have a deep understanding of cloud platforms like Amazon Web Services (AWS) and Microsoft Azure.

Database Administration

Database Administration Data Artificial Inteligence Analytics

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

Sure we can help you secure, manage, and analyze PetaBytes of structured and unstructured data. We do that on-prem with almost 1 ZB of data under management – nearly 20% of that global total. We can also do it with your preferred cloud – AWS, Azure or GCP. The future is hybrid data, embrace it.

Data

Data Architecture Analytics Big Data

Technology Trends for 2023

O'Reilly Media - Ideas

MARCH 1, 2023

The first quantum computers are now available through cloud providers like IBM and Amazon Web Services (AWS). Data Data is another very broad category, encompassing everything from traditional business analytics to artificial intelligence. Data engineering was the dominant topic by far, growing 35% year over year.

Trends

Trends Technical Review Technology Software Review

Your 2023 Data strategy in four resolutions

Capgemini

JANUARY 17, 2023

By creating a lakehouse, a company gives every employee the ability to access and employ data and artificial intelligence to make better business decisions. Many organizations that implement a lakehouse as their key data strategy are seeing lightning-speed data insights with horizontally scalable data-engineering pipelines.

Strategy

Strategy Technical Review Data Weak Development Team

Discover and Explore Data Faster with the CDP DDE Template

Cloudera

SEPTEMBER 1, 2020

Solr is a standard and open source, commonly adopted text search engine with rich query APIs for performing analytics over text and other unstructured data. Navigate to the Management Console > Environments > click on an environment where you would like to create a cluster > click Create Data Hub.

Data

Data Backup Disaster Recovery Storage

10 most in-demand enterprise IT skills

Ducklake: A journey to integrate DuckDB with Unity Catalog

Webinars

Trending Sources

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

Webinars

CloudQuery raises $15M to demystify your cloud infrastructure setup

Revolutionizing customer service: MaestroQA’s integration with Amazon Bedrock for actionable insight

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

Are you ready for MLOps? 🫵

Tecton raises $100M, proving that the MLOps market is still hot

Union.ai raises $10M to simplify AI and ML workflow orchestration

The top 15 big data and data analytics certifications

The 10 most in-demand IT jobs in finance

The 10 most in-demand IT jobs in finance

Netflix at AWS re:Invent 2019

The rise of the data lakehouse: A new era of data value

Technology Trends for 2025

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

Capital Group invests big in talent development

Edmunds sets stage for AI with data infrastructure consolidation

Demystifying MLOps: From Notebook to ML Application

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

7 Free Google Cloud Training Resources

The Future of the Data Lakehouse – Open

Percona Live 2023 Event Recap

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Core technologies and tools for AI, big data, and cloud computing

Technology Trends for 2024

AI Chihuahua! Part I: Why Machine Learning is Dogged by Failure and Delays

Data Migration Software: Which Solution Fits Your Project Best

What is OLAP: A Complete Guide to Online Analytical Processing

Ultimate Guide to Citus Con: An Event for Postgres, 2023 edition

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Announcing Cloudera’s Enterprise Artificial Intelligence Partnership Ecosystem

Making AI Work in Legal Tech: Balancing Cost and Performance

Supporting Diverse ML Systems at Netflix

Data Summit 2023 Event Recap

The Future Is Hybrid Data, Embrace It

Technology Trends for 2023

Your 2023 Data strategy in four resolutions

Discover and Explore Data Faster with the CDP DDE Template

Stay Connected