AWS, Data Engineering and Open Source

10 most in-demand enterprise IT skills

CIO

DECEMBER 10, 2024

AWS Amazon Web Services (AWS) is the most widely used cloud platform today. Central to cloud strategies across nearly every industry, AWS skills are in high demand as organizations look to make the most of the platforms wide range of offerings. As such, Oracle skills are perennially in-demand skill.

UI/UX

UI/UX Enterprise Artificial Inteligence Database Administration

The future of data: A 5-pillar approach to modern data management

CIO

DECEMBER 11, 2024

This approach is repeatable, minimizes dependence on manual controls, harnesses technology and AI for data management and integrates seamlessly into the digital product development process. They must also select the data processing frameworks such as Spark, Beam or SQL-based processing and choose tools for ML.

Data

Data Technical Review Software Review Weak Development Team

Ducklake: A journey to integrate DuckDB with Unity Catalog

Xebia

OCTOBER 18, 2024

This summer, Databricks announced the open-sourcing of Unity Catalog. In this post, we’ll dive into how you can integrate DuckDB with the open-source Unity Catalog, walking you through our hands-on experience, sharing the setup process, and exploring both the opportunities and challenges of combining these two technologies.

Open Source

Open Source AWS Government Technical Review

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

AWS Machine Learning - AI

NOVEMBER 15, 2024

Principal wanted to use existing internal FAQs, documentation, and unstructured data and build an intelligent chatbot that could provide quick access to the right information for different roles. Principal also used the AWS open source repository Lex Web UI to build a frontend chat interface with Principal branding.

Generative AI

Generative AI AWS Groups Artificial Inteligence

CloudQuery raises $15M to demystify your cloud infrastructure setup

TechCrunch

JUNE 22, 2022

CloudQuery CEO and co-founder Yevgeny Pats helped launch the startup because he needed a tool to give him visibility into his cloud infrastructure resources, and he couldn’t find one on the open market. He built his own SQL-based tool to help understand exactly what resources he was using, based on data engineering best practices.

Infrastructure

Infrastructure Cloud Open Source Data Engineering

Revolutionizing customer service: MaestroQA’s integration with Amazon Bedrock for actionable insight

AWS Machine Learning - AI

MARCH 13, 2025

We discuss the unique challenges MaestroQA overcame and how they use AWS to build new features, drive customer insights, and improve operational inefficiencies. They were also able to use the familiar AWS SDK to quickly and effortlessly integrate Amazon Bedrock into their application. The best is yet to come.

Generative AI

Generative AI CTO Coach AWS Artificial Inteligence

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning - AI

NOVEMBER 20, 2024

Whether it’s structured data in databases or unstructured content in document repositories, enterprises often struggle to efficiently query and use this wealth of information. Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. aligned identity provider (IdP).

Data

Data AWS Groups Knowledge Base

Highlights from JupyterCon in New York 2018

O'Reilly Media - Data

AUGUST 24, 2018

Watch keynotes covering Jupyter's role in business, data science, higher education, open source, journalism, and other domains, from JupyterCon in New York 2018. Luciano Resende explores some of the open source initiatives IBM is leading in the Jupyter ecosystem. Why contribute to open source?

Open Source

Open Source Journal Artificial Inteligence Machine Learning

Are you ready for MLOps? 🫵

Xebia

FEBRUARY 28, 2025

… that is not an awful lot. These days Data Science is not anymore a new domain by any means. The time when Hardvard Business Review posted the Data Scientist to be the “Sexiest Job of the 21st Century” is more than a decade ago [1]. In 2019 alone the Data Scientist job postings on Indeed rose by 256% [2].

Technical Review

Technical Review Weak Development Team Artificial Inteligence Machine Learning

Tecton raises $100M, proving that the MLOps market is still hot

TechCrunch

JULY 12, 2022

But building data pipelines to generate these features is hard, requires significant data engineering manpower, and can add weeks or months to project delivery times,” Del Balso told TechCrunch in an email interview. Systems use features to make their predictions. “We are still in the early innings of MLOps.

Artificial Inteligence

Artificial Inteligence Machine Learning Marketing Data Engineering

Union.ai raises $10M to simplify AI and ML workflow orchestration

TechCrunch

APRIL 12, 2022

Union.ai , a startup emerging from stealth with a commercial version of the open source AI orchestration platform Flyte, today announced that it raised $10 million in a round contributed by NEA and “select” angel investors. This will lead to revenue growth in the near future.”

Artificial Inteligence

Artificial Inteligence Machine Learning Open Source Biotech

The top 15 big data and data analytics certifications

CIO

JUNE 14, 2023

If you would like to submit a big data certification to this directory , please email us. AWS Certified Data Analytics The AWS Certified Data Analytics – Specialty certification is intended for candidates with experience and expertise working with AWS to design, build, secure, and maintain analytics solutions.

Big Data

Big Data Analytics Data eLearning

The 10 most in-demand IT jobs in finance

CIO

SEPTEMBER 2, 2022

The US financial services industry has fully embraced a move to the cloud, driving a demand for tech skills such as AWS and automation, as well as Python for data analytics, Java for developing consumer-facing apps, and SQL for database work. Data engineer.

Software Engineering

Software Engineering Data Engineering DevOps AWS

The 10 most in-demand IT jobs in finance

CIO

AUGUST 31, 2022

The US financial services industry has fully embraced a move to the cloud, driving a demand for tech skills such as AWS and automation, as well as Python for data analytics, Java for developing consumer-facing apps, and SQL for database work. Data engineer.

Software Engineering

Software Engineering Data Engineering DevOps AWS

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Open Source Linux Engineering Management

The rise of the data lakehouse: A new era of data value

CIO

AUGUST 18, 2022

Moonfare, a private equity firm, is transitioning from a PostgreSQL-based data warehouse on AWS to a Dremio data lakehouse on AWS for business intelligence and predictive analytics. Users coming from a data warehouse environment shouldn’t care where the data resides,” says Angelo Slawik, data engineer at Moonfare.

Data

Data Technical Advisors Technical Review Artificial Inteligence

Technology Trends for 2025

O'Reilly Media - Ideas

JANUARY 14, 2025

Building applications with RAG requires a portfolio of data (company financials, customer data, data purchased from other sources) that can be used to build queries, and data scientists know how to work with data at scale. Data engineers build the infrastructure to collect, store, and analyze data.

Trends

Trends Technology Security Artificial Inteligence

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

AWS Machine Learning - AI

JUNE 21, 2024

To accomplish this, eSentire built AI Investigator, a natural language query tool for their customers to access security platform data by using AWS generative artificial intelligence (AI) capabilities. eSentire has over 2 TB of signal data stored in their Amazon Simple Storage Service (Amazon S3) data lake.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Serverless

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning - AI

MARCH 18, 2025

This retrieved data is used as context, combined with the original prompt, to create an expanded prompt that is passed to the LLM. Streamlit This open source Python library makes it straightforward to create and share beautiful, custom web apps for ML and data science. The following diagram illustrates the RAG framework.

Artificial Inteligence

Artificial Inteligence Applications Generative AI Off-The-Shelf

Capital Group invests big in talent development

CIO

JULY 29, 2022

For example, if a data team member wants to increase their skills or move to a data engineer position, they can embark on a curriculum for up to two years to gain the right skills and experience. The bootcamp broadened my understanding of key concepts in data engineering.

Groups

Groups Security Development Programming

Edmunds sets stage for AI with data infrastructure consolidation

CIO

JULY 10, 2023

His role now encompasses responsibility for data engineering, analytics development, and the vehicle inventory and statistics & pricing teams. The company was born as a series of print buying guides in 1966 and began making its data available via CD-ROM in the 1990s.

Infrastructure

Infrastructure Artificial Inteligence Data Generative AI

Demystifying MLOps: From Notebook to ML Application

Xebia

FEBRUARY 25, 2024

Data science is generally not operationalized Consider a data flow from a machine or process, all the way to an end-user. 2 In general, the flow of data from machine to the data engineer (1) is well operationalized. You could argue the same about the data engineering step (2) , although this differs per company.

Applications

Applications Technical Review Software Review Open Source

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Progress

DECEMBER 30, 2019

Another cloud service I’m asked about is AWS Amplify from another popular cloud giant. Assuming you’re able to choose the best tool for the job, let’s contrast AWS Amplify with Kinvey, our serverless development platform for business apps. Where Does AWS Amplify Fit? When Should I Use Progress Kinvey?

AWS

AWS DevOps Disaster Recovery Serverless

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Progress

DECEMBER 30, 2019

Another cloud service I’m asked about is AWS Amplify from another popular cloud giant. Assuming you’re able to choose the best tool for the job, let’s contrast AWS Amplify with Kinvey, our serverless development platform for business apps. Where Does AWS Amplify Fit? When Should I Use Progress Kinvey?

AWS

AWS DevOps Disaster Recovery Serverless

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Progress

DECEMBER 30, 2019

Another cloud service I’m asked about is AWS Amplify from another popular cloud giant. Assuming you’re able to choose the best tool for the job, let’s contrast AWS Amplify with Kinvey, our serverless development platform for business apps. Where Does AWS Amplify Fit? When Should I Use Progress Kinvey?

AWS

AWS DevOps Disaster Recovery Serverless

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Open Source Linux Off-The-Shelf

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Open Source Linux Off-The-Shelf

7 Free Google Cloud Training Resources

ParkMyCloud

DECEMBER 11, 2020

If you know where to look, open-source learning is a great way to get familiar with different cloud service providers. . With the combined knowledge from our previous blog posts on free training resources for AWS and Azure , you’ll be well on your way to expanding your cloud expertise and finding your own niche.

Google Cloud

Google Cloud Training Resources Cloud

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Cloudera

MAY 18, 2021

That is accomplished by delivering most technical use cases through a primarily container-based CDP services (CDP services offer a distinct environment for separate technical use cases e.g., data streaming, data engineering, data warehousing etc.) Quantifiable improvements to Apache open source projects.

Cloud

Cloud Technical Review Storage Backup

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Netflix, where this innovation was born, is perhaps the best example of a 100 PB scale S3 data lake that needed to be built into a data warehouse. The cloud native table format was open sourced into Apache Iceberg by its creators. At Cloudera, we are proud of our open-source roots and committed to enriching the community.

Data

Data Analytics Open Source Architecture

Percona Live 2023 Event Recap

Datavail

JUNE 20, 2023

Percona Live 2023 was an exciting open-source database event that brought together industry experts, database administrators, data engineers, and IT leadership. Percona Live 2023 Session Highlights The three days of the event were packed with interesting open-source database sessions!

Open Source

Open Source Database Administration Survey AWS

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

This blog post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers and production engineers. Impedance mismatch between data scientists, data engineers and production engineers. For now, we’ll focus on Kafka.

Artificial Inteligence

Artificial Inteligence Machine Learning Scalability Data Engineering

Core technologies and tools for AI, big data, and cloud computing

O'Reilly Media - Ideas

FEBRUARY 11, 2019

In the survey behind our upcoming report, “Evolving data infrastructure,” we found 85% of respondents indicated they had data infrastructure in at least one of the seven cloud providers we listed, with two-thirds (63%) using Amazon Web Services (AWS) for some portion of their data infrastructure. Security and privacy.

Big Data

Big Data Technology Tools Cloud

Technology Trends for 2024

O'Reilly Media - Ideas

JANUARY 25, 2024

Before that, cloud computing itself took off in roughly 2010 (AWS was founded in 2006); and Agile goes back to 2000 (the Agile Manifesto dates back to 2001, Extreme Programming to 1999). This change is apparently not an error in the data. If you want to run an open source language model on your laptop, try llamafile.)

Trends

Trends Technical Review Technology Artificial Inteligence

AI Chihuahua! Part I: Why Machine Learning is Dogged by Failure and Delays

d2iq

FEBRUARY 19, 2021

Components that are unique to data engineering and machine learning (red) surround the model, with more common elements (gray) in support of the entire infrastructure on the periphery. Before you can build a model, you need to ingest and verify data, after which you can extract features that power the model.

Artificial Inteligence

Artificial Inteligence Machine Learning Technical Review Software Review

The Good and the Bad of Databricks Lakehouse Platform

Altexsoft

MARCH 30, 2023

What is Databricks Databricks is an analytics platform with a unified set of tools for data engineering, data management , data science, and machine learning. It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data.

Weak Development Team

Weak Development Team Artificial Inteligence Machine Learning Software Review

Data Migration Software: Which Solution Fits Your Project Best

Altexsoft

DECEMBER 4, 2020

Three types of data migration tools. Use cases: small projects, specific source and target locations not supported by other solutions. Automation scripts can be written by data engineers or ETL developers in charge of your migration project. Phases of the data migration process. Data sources and destinations.

Software Review

Software Review Software Data Technical Review

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

Cloudera, a leader in big data analytics, provides a unified Data Platform for data management, AI, and analytics. Our customers run some of the world’s most innovative, largest, and most demanding data science, data engineering, analytics, and AI use cases, including PB-size generative AI workloads.

Cloud

Cloud Artificial Inteligence Generative AI Analytics

What is OLAP: A Complete Guide to Online Analytical Processing

Altexsoft

APRIL 16, 2021

An overview of data warehouse types. Optionally, you may study some basic terminology on data engineering or watch our short video on the topic: What is data engineering. What is data pipeline. Creating a cube is a custom process each time, because data can’t be updated once it was modeled in a cube.

Analytics

Analytics Analysis Storage Business Intelligence

Ultimate Guide to Citus Con: An Event for Postgres, 2023 edition

The Citus Data

MARCH 31, 2023

Americas livestream, Citus open source user, real-time analytics, JSONB) Lessons learned: Migrating from AWS-Hosted PostgreSQL RDS to Self-Hosted Citus , by Matt Klein & Delaney Mackenzie of Jellyfish.co. (on-demand Checkpoint and WAL configs , by Samay Sharma on the Postgres open source team at Microsoft.

Azure

Azure Open Source Virtualization Software Engineering

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Netflix Tech

MARCH 5, 2019

the order of the rows on your Netflix home page, issuing content licenses when you click play, finding the Open Connect cache closest to you with the content you requested, and many more). All these micro-services are currently operated in AWS cloud infrastructure. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Scalability Cloud Data

Announcing Cloudera’s Enterprise Artificial Intelligence Partnership Ecosystem

Cloudera

DECEMBER 20, 2023

At our recent Evolve Conference in New York we were extremely excited to announce our founding AI ecosystem partners: Amazon Web Services (“AWS“), NVIDIA, and Pinecone. We see AI applications like chatbots being built on top of closed-source or open source foundational models. We’ll start with the enterprise AI stack.

Artificial Inteligence

Artificial Inteligence Artificial Intelligence Enterprise Machine Learning

10 Platforms for Getting Started with Machine Learning

UruIT

JULY 23, 2019

AWS has removed the barriers to machine learning that have traditionally slowed down developers and data scientists. Pricing: AWS offers a pay-as-you-go model. MathWork focused on the development of these tools in order to become experts on high-end financial use and data engineering contexts. Amazon Web Services.

Artificial Inteligence

Artificial Inteligence Machine Learning Azure Software Review

Making AI Work in Legal Tech: Balancing Cost and Performance

Invid Group

AUGUST 28, 2024

The willingness to explore new tools like large language models (LLM), machine learning (ML) models, and natural language processing (NLP) is opening unthinkable possibilities to improve processes, reduce operational costs, or simply innovate [2]. They can be proprietary, third-party, open-source, and run either on-premises or in the cloud.

Technical Review

Technical Review Artificial Inteligence Performance Azure

10 most in-demand enterprise IT skills

The future of data: A 5-pillar approach to modern data management

Webinars

Trending Sources

Ducklake: A journey to integrate DuckDB with Unity Catalog

Webinars

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

CloudQuery raises $15M to demystify your cloud infrastructure setup

Revolutionizing customer service: MaestroQA’s integration with Amazon Bedrock for actionable insight

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

Highlights from JupyterCon in New York 2018

Are you ready for MLOps? 🫵

Tecton raises $100M, proving that the MLOps market is still hot

Union.ai raises $10M to simplify AI and ML workflow orchestration

The top 15 big data and data analytics certifications

The 10 most in-demand IT jobs in finance

The 10 most in-demand IT jobs in finance

Netflix at AWS re:Invent 2019

The rise of the data lakehouse: A new era of data value

Technology Trends for 2025

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

Capital Group invests big in talent development

Edmunds sets stage for AI with data infrastructure consolidation

Demystifying MLOps: From Notebook to ML Application

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

AWS Amplify or Kinvey for External Databases, Identity Providers and DevOps

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

7 Free Google Cloud Training Resources

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

The Future of the Data Lakehouse – Open

Percona Live 2023 Event Recap

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Core technologies and tools for AI, big data, and cloud computing

Technology Trends for 2024

AI Chihuahua! Part I: Why Machine Learning is Dogged by Failure and Delays

The Good and the Bad of Databricks Lakehouse Platform

Data Migration Software: Which Solution Fits Your Project Best

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

What is OLAP: A Complete Guide to Online Analytical Processing

Ultimate Guide to Citus Con: An Event for Postgres, 2023 edition

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Announcing Cloudera’s Enterprise Artificial Intelligence Partnership Ecosystem

10 Platforms for Getting Started with Machine Learning

Making AI Work in Legal Tech: Balancing Cost and Performance

Stay Connected