Data Engineering, Presentation and Scalability

What is data architecture? A framework to manage data

CIO

DECEMBER 20, 2024

DAMA Internationals Data Management Body of Knowledge is a framework specifically for data management. It provides standard definitions for data management functions, deliverables, roles, and other terminology, and presents guiding principles for data management. Scalable data pipelines.

Architecture

Architecture Data Fractional CTO Technical Review

Fundamentals of Data Engineering

Xebia

JANUARY 19, 2023

The following is a review of the book Fundamentals of Data Engineering by Joe Reis and Matt Housley, published by O’Reilly in June of 2022, and some takeaway lessons. This book is as good for a project manager or any other non-technical role as it is for a computer science student or a data engineer.

Data Engineering

Data Engineering Engineering Data Technical Review

Why thinking like a tech company is essential for your business’s survival

CIO

MARCH 13, 2025

Expanding our approach to risk management Risk management is part of our DNA, but AI presents new types of risks that businesses havent dealt with before. So, our goal is to meet them where they are providing guidance thats both practical and easy to follow.

Company

Company Generative AI Insurance Education

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Driving Agility and Scalability through Smart Data

Cloudera

MAY 3, 2021

Last year presented business and organizational challenges that hadn’t been seen in a century and the troubling fact is that the challenges applied pains and gains unequally across industry segments. Cloudera sees success in terms of two very simple outputs or results – building enterprise agility and enterprise scalability.

Scalability

Scalability Agile Data Systems Review

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. CRM platforms).

Scalability

Scalability Data Technical Review Analytics

Meet Perficient at Data Summit 2025

Perficient

APRIL 22, 2025

.” What topics do you think will be top-of-mind for attendees this year? “Im especially interested in the intersection of data engineering and AI. Ive been lucky to work on modern data teams where weve adopted CI/CD pipelines and scalable architectures. It wont always be easybut it will be worth it.

Meeting

Meeting Data Serverless Data Engineering

Inferencing holds the clues to AI puzzles

CIO

APRIL 10, 2024

As with many data-hungry workloads, the instinct is to offload LLM applications into a public cloud, whose strengths include speedy time-to-market and scalability. Without data, Holmes’ argument proceeds, one can twist facts to suit their theories, rather than use theories to suit facts. Inferencing and… Sherlock Holmes???

Artificial Inteligence

Artificial Inteligence Generative AI Storage Artificial Intelligence

Integrating Key Vault Secrets with Azure Synapse Analytics

Apiumhub

DECEMBER 9, 2024

It allows information engineers, facts scientists, and enterprise analysts to query, control, and use lots of equipment and languages to gain insights. Benefits: Synapse’s dedicated SQL pools provide robust data warehousing with MPP (massively parallel processing) for high-speed queries and reporting.

Azure

Azure Analytics Storage Artificial Inteligence

TomoCredit raises $7M to help the cash rich and credit poor

TechCrunch

FEBRUARY 10, 2021

Tomo Credit feels to me like it is tackling this in a hugely scalable, mainstream way.”. Looking ahead, Tomo plans to use its new capital to triple its headcount of 15, mostly with the goal of hiring full stack and data engineers.

Weak Development Team

Weak Development Team Banking Fintech Data Engineering

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

AWS Machine Learning - AI

NOVEMBER 15, 2024

This data includes manuals, communications, documents, and other content across various systems like SharePoint, OneNote, and the company’s intranet. Principal sought to develop natural language processing (NLP) and question-answering capabilities to accurately query and summarize this unstructured data at scale.

Generative AI

Generative AI AWS Groups Artificial Inteligence

Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker

AWS Machine Learning - AI

APRIL 23, 2025

Designed with a serverless, cost-optimized architecture, the platform provisions SageMaker endpoints dynamically, providing efficient resource utilization while maintaining scalability. Serverless on AWS AWS GovCloud (US) Generative AI on AWS About the Authors Nick Biso is a Machine Learning Engineer at AWS Professional Services.

Artificial Inteligence

Artificial Inteligence Open Source AWS Serverless

How to Screen and Interview Fintech Data Engineer

Mobilunity

MAY 3, 2024

When it comes to financial technology, data engineers are the most important architects. As fintech continues to change the way standard financial services are done, the data engineer’s job becomes more and more important in shaping the future of the industry.

Data Engineering

Data Engineering Fintech Engineering Data

Hire Big Data Engineer: Salaries, Stack and Roles

Mobilunity

AUGUST 3, 2021

Technologies that have expanded Big Data possibilities even further are cloud computing and graph databases. The cloud offers excellent scalability, while graph databases offer the ability to display incredible amounts of data in a way that makes analytics efficient and effective. Who is Big Data Engineer?

Big Data

Big Data Data Engineering Engineering Data

Deletion Vectors in Delta Live Tables: Identifying and Remediating Compliance Risks

Perficient

MARCH 27, 2025

Ensuring compliant data deletion is a critical challenge for data engineering teams, especially in industries like healthcare, finance, and government. Deletion Vectors in Delta Live Tables offer an efficient and scalable way to handle record deletion without requiring expensive file rewrites. What Are Deletion Vectors?

Compliance

Compliance Systems Review Policies Storage

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Netflix Tech

MARCH 5, 2019

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Scalability Cloud Data

Capital Group invests big in talent development

CIO

JULY 29, 2022

For example, if a data team member wants to increase their skills or move to a data engineer position, they can embark on a curriculum for up to two years to gain the right skills and experience. The bootcamp broadened my understanding of key concepts in data engineering.

Groups

Groups Security Development Programming

What is Machine Learning Engineer: Responsibilities, Skills, and Value Brought

Altexsoft

JUNE 29, 2021

MLEs are usually a part of a data science team which includes data engineers , data architects, data and business analysts, and data scientists. Who does what in a data science team. Machine learning engineers are relatively new to data-driven companies.

Artificial Inteligence

Artificial Inteligence Machine Learning Engineering Data Engineering

9 Great Reasons to Join the DataRobot AI Experience Virtual Event Jun 7-8

DataRobot

JUNE 1, 2022

Through a series of virtual keynotes, technical sessions, and educational resources, learn about innovations for the next decade of AI, helping you deliver projects that generate the most powerful business results while ensuring your AI solutions are enterprise ready—secure, governed, scalable, and trusted.

Virtualization

Virtualization Artificial Inteligence Machine Learning Artificial Intelligence

The new challenges of scale: What it takes to go from PB to EB data scale

CIO

JUNE 14, 2023

Big data exploded onto the scene in the mid-2000s and has continued to grow ever since. Today, the data is even bigger, and managing these massive volumes of data presents a new challenge for many organizations. Focus on scalability. So, how do we achieve scalability?

Data

Data Scalability Storage Big Data

Automate Sensitive Data Protection with Metadata-Driven Masking

Xebia

JANUARY 30, 2025

In this blog post, we want to tell you about our recent effort to do metadata-driven data masking in a way that is scalable, consistent and reproducible. Using dbt to define and document data classifications and Databricks to enforce dynamic masking, we ensure that access is controlled automatically based on metadata.

Data

Data Groups Data Engineering Systems Review

Altexsoft - Untitled Article

Altexsoft

JANUARY 14, 2021

Often, it is aggregated or segmented in data marts, facilitating analysis and reporting as users can get information by units, sections, departments, etc. Data warehouse architecture. The architecture of a data warehouse is a system defining how data is presented and processed within a repository. Scalability.

Backup

Backup Azure Software Review Architecture

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

Cloudera

AUGUST 20, 2021

The Cloudera Data Platform comprises a number of ‘data experiences’ each delivering a distinct analytical capability using one or more purposely-built Apache open source projects such as Apache Spark for Data Engineering and Apache HBase for Operational Database workloads.

Strategy

Strategy Data Technical Review Weak Development Team

The State of Tech: 4 Trends to Watch in 2022

Mentormate

JANUARY 11, 2022

Platform and managed service vendors continue to roll out better solutions to the people shortage challenges presented above. Custom and off-the-shelf microservices cover the complexity of security, scalability, and data isolation and integrate into complex workflows through orchestration.

Technical Review

Technical Review Trends Off-The-Shelf Software Review

Improving air quality with generative AI

AWS Machine Learning - AI

JUNE 18, 2024

This post presents a solution that uses a generative artificial intelligence (AI) to standardize air quality data from low-cost sensors in Africa, specifically addressing the air quality data integration problem of low-cost sensors. Having a human-in-the-loop to validate each data transformation step is optional.

Generative AI

Generative AI Artificial Inteligence Technical Review AWS

Data Summit 2023 Event Recap

Datavail

JUNE 8, 2023

Data Summit 2023 was filled with thought-provoking sessions and presentations that explored the ever-evolving world of data. I’ll recap our presentations and everything else the Datavail team learned at Data Summit 2023. in order to ensure successful transitions from DBA roles into data engineering roles.

Database Administration

Database Administration Data Artificial Inteligence Analytics

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Cloudera

MAY 18, 2021

Infrastructure cost optimization by enabling container-based scalability for compute resources based on processing load and by leveraging object storage that has lower price point than compute-attached storage. Experience configuration / use case deployment: At the data lifecycle experience level (e.g., Flow Management. Not available.

Cloud

Cloud Technical Review Storage Backup

Data Architect: Role Description, Skills, Certifications and When to Hire

Altexsoft

FEBRUARY 11, 2023

Data architect and other data science roles compared Data architect vs data engineer Data engineer is an IT specialist that develops, tests, and maintains data pipelines to bring together data from various sources and make it available for data scientists and other specialists.

Data

Data Data Engineering Big Data Architecture

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning - AI

SEPTEMBER 3, 2024

Scalability and performance – The EMR Serverless integration automatically scales the compute resources up or down based on your workload’s demands, making sure you always have the necessary processing power to handle your big data tasks.

Serverless

Serverless AWS Artificial Inteligence Big Data

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it. Businesses are also looking to move to a scale-out storage model that provides dense storages along with reliability, scalability, and performance.

Data

Data Storage Architecture Big Data

How to build up a data team (everything I ever learned about recruiting)

Erik Bernhardsson

JUNE 7, 2014

But over time if you do this right, you will get anecdotal feedback from candidates coming in saying they saw your presentation or read this cool story on Hacker News, or what not. Presenting the opportunity. Finding the people. I think most people in the industry are fed up with bad bulk messages over email/LinkedIn.

Recruiting

Recruiting Weak Development Team Data Software Review

How to build up a data team (everything I ever learned about recruiting)

Erik Bernhardsson

JUNE 7, 2014

But over time if you do this right, you will get anecdotal feedback from candidates coming in saying they saw your presentation or read this cool story on Hacker News, or what not. Presenting the opportunity. Finding the people. I think most people in the industry are fed up with bad bulk messages over email/LinkedIn.

Recruiting

Recruiting Weak Development Team Data Software Review

The Multifaceted Value Proposition of the Cloudera Data Platform

Cloudera

FEBRUARY 22, 2021

It builds on a foundation of technologies from CDH (Cloudera Data Hub) and HDP (Hortonworks Data Platform) technologies and delivers a holistic, integrated data platform from Edge to AI helping clients to accelerate complex data pipelines and democratize data assets. Business value acceleration.

Data

Data Analytics Government Technical Review

Apiumhub is delighted to support YOW! LONDON 2022

Apiumhub

NOVEMBER 17, 2022

More than 25 speakers will be present at the conference to share their knowledge and opinions on a variety of topics in the tech industry. Francesco Cesarini – Founder, & Technical Director at Erland Solutions, Co-author of “Erlang Programming“ and “Designing for Scalability with Erlang/OTP“. Meet the speakers.

Technical Review

Technical Review Software Review Fractional CTO CTO Coach

Back to the Financial Regulatory Future

Cloudera

FEBRUARY 15, 2024

While there are clear reasons SVB collapsed, which can be reviewed here , my purpose in this post isn’t to rehash the past but to present some of the regulatory and compliance challenges financial (and to some degree insurance) institutions face and how data plays a role in mitigating and managing risk.

Insurance

Insurance Compliance Technical Review Banking

Data Governance and Strategy for the Global Enterprise

Cloudera

OCTOBER 1, 2022

“There are still a ton of challenges associated with getting machine learning and AI to scale…as the portfolio of deployed models has expanded, we’re facing all these new questions about how to best create and manage reliable, scalable, and cost effective infrastructure to support the model life cycle. Deliver use cases to market.

Government

Government Enterprise Artificial Inteligence Strategy

Boost your ADF productivity with Terraform

Xebia

OCTOBER 23, 2024

Example ingestion process using ADF ADF provides a GUI allowing users to easily create pipelines connecting various data sources with their targets. However, it also presents the risk of inefficiently consuming significant development time. This click-based development approach may seem accessible compared to high-code alternatives.

Azure

Azure Software Review Technical Review Resources

Certified technical partner solutions help customers succeed with Cloudera Data Platform

Cloudera

AUGUST 26, 2020

Informatica and Cloudera deliver a proven set of solutions for rapidly curating data into trusted information. Informatica’s comprehensive suite of Data Engineering solutions is designed to run natively on Cloudera Data Platform — taking full advantage of the scalable computing platform.

Data

Data Artificial Inteligence Machine Learning Disaster Recovery

Don’t Blink: You’ll Miss Something Amazing!

Cloudera

OCTOBER 4, 2023

Fast moving data and real time analysis present us with some amazing opportunities. Every organization has some data that happens in real time, whether it is understanding what our users are doing on our websites or watching our systems and equipment as they perform mission critical tasks for us. Don’t blink — or you’ll miss it!

Analytics

Analytics Artificial Inteligence Machine Learning Telecommunications

New live online training courses

O'Reilly Media - Ideas

JUNE 4, 2019

Giving a Powerful Presentation , July 25. How to Give Great Presentations , August 13. Programming with Data: Advanced Python and Pandas , July 9. Understanding Data Science Algorithms in R: Regression , July 12. Cleaning Data at Scale , July 15. Scalable Data Science with Apache Hadoop and Spark , July 16.

Course

Course Training Artificial Inteligence Software Review

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

The cause is hybrid data – the massive amounts of data created everywhere businesses operate – in clouds, on-prem, and at the edge. Only a fraction of data created is actually stored and managed, with analysts estimating it to be between 4 – 6 ZB in 2020.

Data

Data Architecture Analytics Big Data

What is Streaming Analytics: Data Streaming, Stream Processing, and Real-time Analytics

Altexsoft

JANUARY 22, 2020

As a result, it became possible to provide real-time analytics by processing streamed data. Please note: this topic requires some general understanding of analytics and data engineering, so we suggest you read the following articles if you’re new to the topic: Data engineering overview.

Analytics

Analytics Data IoT Analysis

Percona Live 2023 Event Recap

Datavail

JUNE 20, 2023

Percona Live 2023 was an exciting open-source database event that brought together industry experts, database administrators, data engineers, and IT leadership. Keynotes, breakout sessions, workshops, and panel discussions kept the database conversations going throughout the event. Check out our events calendar for 2023.

Open Source

Open Source Database Administration Survey AWS

Netflix at AWS re:Invent 2019

Netflix Tech

NOVEMBER 22, 2019

1pm-2pm NFX 207 Benchmarking stateful services in the cloud Vinay Chella , Data Platform Engineering Manager Abstract : AWS cloud services make it possible to achieve millions of operations per second in a scalable fashion across multiple regions. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Open Source Linux Engineering Management

Big Data Analytics: How It Works, Tools, and Real-Life Applications

Altexsoft

MAY 14, 2021

as data is being generated ? and any discoveries are presented almost instantaneously. Data generated from various sources including sensors, log files and social media, you name it, can be utilized both independently and as a supplement to existing transactional data many organizations already have at hand.

Big Data

Big Data Analytics Tools Applications

What is data architecture? A framework to manage data

Fundamentals of Data Engineering

Why thinking like a tech company is essential for your business’s survival

Webinars

Driving Agility and Scalability through Smart Data

Addressing the Three Scalability Challenges in Modern Data Platforms

Meet Perficient at Data Summit 2025

Inferencing holds the clues to AI puzzles

Integrating Key Vault Secrets with Azure Synapse Analytics

TomoCredit raises $7M to help the cash rich and credit poor

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker

How to Screen and Interview Fintech Data Engineer

Hire Big Data Engineer: Salaries, Stack and Roles

Deletion Vectors in Delta Live Tables: Identifying and Remediating Compliance Risks

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Capital Group invests big in talent development

What is Machine Learning Engineer: Responsibilities, Skills, and Value Brought

9 Great Reasons to Join the DataRobot AI Experience Virtual Event Jun 7-8

The new challenges of scale: What it takes to go from PB to EB data scale

Automate Sensitive Data Protection with Metadata-Driven Masking

Altexsoft - Untitled Article

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

The State of Tech: 4 Trends to Watch in 2022

Improving air quality with generative AI

Data Summit 2023 Event Recap

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Data Architect: Role Description, Skills, Certifications and When to Hire

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Apache Ozone and Dense Data Nodes

How to build up a data team (everything I ever learned about recruiting)

How to build up a data team (everything I ever learned about recruiting)

The Multifaceted Value Proposition of the Cloudera Data Platform

Apiumhub is delighted to support YOW! LONDON 2022

Back to the Financial Regulatory Future

Data Governance and Strategy for the Global Enterprise

Boost your ADF productivity with Terraform

Certified technical partner solutions help customers succeed with Cloudera Data Platform

Don’t Blink: You’ll Miss Something Amazing!

New live online training courses

The Future Is Hybrid Data, Embrace It

What is Streaming Analytics: Data Streaming, Stream Processing, and Real-time Analytics

Percona Live 2023 Event Recap

Netflix at AWS re:Invent 2019

Big Data Analytics: How It Works, Tools, and Real-Life Applications

Stay Connected