Architecture, Performance and Reference

Agentic AI design: An architectural case study

CIO

NOVEMBER 19, 2024

You can use these agents through a process called chaining, where you break down complex tasks into manageable tasks that agents can perform as part of an automated workflow. It’s important to break it down this way so you can see beyond the hype and understand what is specifically being referred to. Do you see any issues?

Case Study

Case Study Artificial Inteligence Study Architecture

12 AI predictions for 2025

CIO

DECEMBER 30, 2024

The company says it can achieve PhD-level performance in challenging benchmark tests in physics, chemistry, and biology. In these uses case, we have enough reference implementations to point to and say, Theres value to be had here.' Now, it will evolve again, says Malhotra. Agents are the next phase, he says.

Fractional CTO

Fractional CTO Software Development CTO Coach Architecture

The AI Future According to Google Cloud Next ’25: My Interesting Finds

Xebia

APRIL 17, 2025

Thinking refers to an internal reasoning process using the first output tokens, allowing it to solve more complex tasks. Native Multi-Agent Architecture: Build scalable applications by composing specialized agents in a hierarchy. Built-in Evaluation: Systematically assess agent performance. Gemini 2.5

Google Cloud

Google Cloud Artificial Inteligence Cloud Video

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

From project to product: Architecting the future of enterprise technology

CIO

JANUARY 14, 2025

In todays digital-first economy, enterprise architecture must also evolve from a control function to an enablement platform. This transformation requires a fundamental shift in how we approach technology delivery moving from project-based thinking to product-oriented architecture. The stakes have never been higher.

Technical Review

Technical Review Enterprise Technology Architecture

Multi-LLM routing strategies for generative AI applications on AWS

AWS Machine Learning - AI

APRIL 9, 2025

Although an individual LLM can be highly capable, it might not optimally address a wide range of use cases or meet diverse performance requirements. In contrast, more complex questions might require the application to summarize a lengthy dissertation by performing deeper analysis, comparison, and evaluation of the research results.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Applications

Navigating the cloud maze: A 5-phase approach to optimizing cloud strategies

CIO

JANUARY 7, 2025

It prevents vendor lock-in, gives a lever for strong negotiation, enables business flexibility in strategy execution owing to complicated architecture or regional limitations in terms of security and legal compliance if and when they rise and promotes portability from an application architecture perspective.

Cloud

Cloud Strategy Architecture Policies

Ready to transform how your IT organization drives business outcomes with AIOps?

CIO

JANUARY 3, 2025

Because of the adoption of containers, microservices architectures, and CI/CD pipelines, these environments are increasingly complex and noisy. These changes can cause many more unexpected performance and availability issues.

Organization

Organization Artificial Intelligence Artificial Inteligence DevOps

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

AWS Machine Learning - AI

OCTOBER 31, 2024

By implementing this architectural pattern, organizations that use Google Workspace can empower their workforce to access groundbreaking AI solutions powered by Amazon Web Services (AWS) and make informed decisions without leaving their collaboration tool. In the following sections, we explain how to deploy this architecture.

Generative AI

Generative AI Lambda Applications AWS

Accelerate AWS Well-Architected reviews with Generative AI

AWS Machine Learning - AI

MARCH 4, 2025

To achieve these goals, the AWS Well-Architected Framework provides comprehensive guidance for building and improving cloud architectures. The solution incorporates the following key features: Using a Retrieval Augmented Generation (RAG) architecture, the system generates a context-aware detailed assessment.

Generative AI

Generative AI Technical Review Software Review Systems Review

How an architecture-led transformation puts the customer first

CIO

SEPTEMBER 14, 2023

With this in mind, we embarked on a digital transformation that enables us to better meet customer needs now and in the future by adopting a lightweight, microservices architecture. We found that being architecturally led elevates the customer and their needs so we can design the right solution for the right problem.

Architecture

Architecture Technical Review Microservices Engineering

Deploy DeepSeek-R1 Distilled Llama models in Amazon Bedrock

AWS Machine Learning - AI

JANUARY 29, 2025

Their DeepSeek-R1 models represent a family of large language models (LLMs) designed to handle a wide range of tasks, from code generation to general reasoning, while maintaining competitive performance and efficiency. 70B-Instruct ), offer different trade-offs between performance and resource requirements.

Generative AI

Generative AI Artificial Inteligence AWS Technical Review

The Importance of Assessing Interpersonal Skills in Recruitment

Hacker Earth Developers Blog

DECEMBER 4, 2024

Tech roles are rarely performed in isolation. Example: A candidate might perform well in a calm, structured interview environment but struggle to collaborate effectively in high-pressure, real-world scenarios like product launches or tight deadlines. Why interpersonal skills matter in tech hiring ?

Recruiting

Recruiting Technical Review Software Review Exercises

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

AWS Machine Learning - AI

OCTOBER 29, 2024

Refer to Supported Regions and models for batch inference for current supporting AWS Regions and models. For instructions on how to start your Amazon Bedrock batch inference job, refer to Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock.

Scalability

Scalability Lambda Generative AI AWS

Build a multi-tenant generative AI environment for your enterprise on AWS

AWS Machine Learning - AI

NOVEMBER 7, 2024

Shared components refer to the functionality and features shared by all tenants. You can also bring your own customized models and deploy them to Amazon Bedrock for supported architectures. If it leads to better performance, your existing default prompt in the application is overridden with the new one.

Generative AI

Generative AI AWS Enterprise Artificial Inteligence

Optimize hosting DeepSeek-R1 distilled models with Hugging Face TGI on Amazon SageMaker AI

AWS Machine Learning - AI

MARCH 13, 2025

Model Variants The current DeepSeek model collection consists of the following models: DeepSeek-V3 An LLM that uses a Mixture-of-Experts (MoE) architecture. These models retain their existing architecture while gaining additional reasoning capabilities through a distillation process.

Artificial Inteligence

Artificial Inteligence AWS Machine Learning Load Balancer

The CIO’s Triple Play: Cyber Resilience, Performance, and AIOps/DevOps

CIO

JULY 14, 2022

Infinidat added cyber resilience on its InfiniGuard ® secondary storage system during the past year and, at the end of April 2022, across its primary storage platforms with the InfiniSafe Reference Architecture, encompassing Infinidat’s complete portfolio.

Performance

Performance DevOps Storage Weak Development Team

What is Private Cloud Architecture: Complete Overview

OTS Solutions

DECEMBER 4, 2023

Private cloud architecture is an increasingly popular approach to cloud computing that offers organizations greater control, security, and customization over their cloud infrastructure. What is Private Cloud Architecture? Why is Private Cloud Architecture important for Businesses?

Architecture

Architecture Cloud Disaster Recovery Scalability

Customize DeepSeek-R1 distilled models using Amazon SageMaker HyperPod recipes – Part 1

AWS Machine Learning - AI

MARCH 3, 2025

To achieve optimal performance for specific use cases, customers are adopting and adapting these FMs to their unique domain requirements. Tuning model architecture requires technical expertise, training and fine-tuning parameters, and managing distributed training infrastructure, among others.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Training

Host concurrent LLMs with LoRAX

AWS Machine Learning - AI

APRIL 16, 2025

These models are tailored to perform specialized tasks within specific domains or micro-domains. They can host the different variants on a single EC2 instance instead of a fleet of model endpoints, saving costs without impacting performance. The following diagram is the solution architecture.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Storage

Model customization, RAG, or both: A case study with Amazon Nova

AWS Machine Learning - AI

APRIL 10, 2025

In this post, we demonstrate how to effectively perform model customization and RAG with Amazon Nova models as a baseline. Model customization refers to adapting a pre-trained language model to better fit specific tasks, domains, or datasets. Optimized for cost-effective performance, they are trained on data in over 200 languages.

Case Study

Case Study Artificial Inteligence Study Generative AI

Generative AI operating models in enterprise organizations with Amazon Bedrock

AWS Machine Learning - AI

JANUARY 29, 2025

In this post, we evaluate different generative AI operating model architectures that could be adopted. Governance in the context of generative AI refers to the frameworks, policies, and processes that streamline the responsible development, deployment, and use of these technologies.

Generative AI

Generative AI Organization Enterprise Artificial Inteligence

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

AWS Machine Learning - AI

NOVEMBER 20, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Artificial Inteligence

Artificial Inteligence Applications Knowledge Base Generative AI

Automate emails for task management using Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, and Amazon Bedrock Guardrails

AWS Machine Learning - AI

NOVEMBER 19, 2024

Seamlessly integrate with APIs – Interact with existing business APIs to perform real-time actions such as transaction processing or customer data updates directly through email. Solution overview This section outlines the architecture designed for an email support system using generative AI.

Knowledge Base

Knowledge Base Technical Review Generative AI Lambda

Video security analysis for privileged access management using generative AI and Amazon Bedrock

AWS Machine Learning - AI

JANUARY 22, 2025

Security and compliance regulations require that security teams audit the actions performed by systems administrators using privileged credentials. Video recordings cant be easily parsed like log files, requiring security team members to playback the recordings to review the actions performed in them.

Generative AI

Generative AI Video Analysis Technical Review

Boosting team innovation, productivity, and knowledge sharing with Amazon Q Business – Web experience

AWS Machine Learning - AI

JANUARY 13, 2025

For more on MuleSofts journey to cloud computing, refer to Why a Cloud Operating Model? The following diagram shows the reference architecture for various personas, including developers, support engineers, DevOps, and FinOps to connect with internal databases and the web using Amazon Q Business.

Generative AI

Generative AI AWS Innovation Knowledge Base

SAP publishes open source manifesto

CIO

JUNE 27, 2024

It arrives alongside the announcement of SAP’s Open Reference Architecture project as part of the EU’s IPCEI-CIS initiative. Organizations are choosing these platforms based on effective cost, performance, and scalability.”

Open Source

Open Source Architecture Linux Exercises

Empower your generative AI application with a comprehensive custom observability solution

AWS Machine Learning - AI

OCTOBER 29, 2024

Observability refers to the ability to understand the internal state and behavior of a system by analyzing its outputs, logs, and metrics. Evaluation, on the other hand, involves assessing the quality and relevance of the generated outputs, enabling continual improvement. versions, catering to different programming preferences.

Generative AI

Generative AI Applications AWS Knowledge Base

Eliyan raises $40M from Intel and Micron to build chiplet interconnects

TechCrunch

NOVEMBER 8, 2022

Increasingly, as Moore’s law rears its ugly head, computer chip developers are adopting “chiplet” architectures to scale their hardware’s processing power. “Process” in chip lingo refers to an architectural platform; TSMC began mass-producing 5 nm chips in 2020.

Automotive

Automotive Energy Architecture Hardware

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

AWS Machine Learning - AI

NOVEMBER 27, 2024

To maximize performance and optimize training, organizations frequently need to employ advanced distributed training strategies. In a transformer architecture, such layers are the embedding layers and the multilayer perceptron (MLP) layers. and prior Llama models) and Mistral model architectures for context parallelism.

Training

Training Artificial Inteligence AWS Machine Learning

Mastering Multi-Cloud with Cloudera: Strategic Data & AI Deployments Across Clouds

Cloudera

JANUARY 7, 2025

While multi-cloud generally refers to the use of multiple cloud providers, hybrid encompasses both cloud and on-premises integrations, as well as multi-cloud setups. A leading meal kit provider migrated its data architecture to Cloudera on AWS, utilizing Cloudera’s Open Data Lakehouse capabilities.

Cloud

Cloud Data Scalability Compliance

High-performance computing on AWS

Xebia

AUGUST 29, 2023

How does High-Performance Computing on AWS differ from regular computing? For this HPC will bring massive parallel computing, cluster and workload managers and high-performance components to the table. Each job references a job definition. Today’s server hardware is powerful enough to execute most compute tasks.

AWS

AWS Performance Storage Linux

Improve RAG performance using Cohere Rerank

AWS Machine Learning - AI

SEPTEMBER 16, 2024

RAG is an approach that combines information retrieval techniques with natural language processing (NLP) to enhance the performance of text generation or language modeling tasks. This is a reference notebook, and it cannot run unless you make changes suggested in the notebook. Refrain from using full access in production environments.

Performance

Performance Software Review AWS Systems Review

Asure’s approach to enhancing their call center experience using generative AI and Amazon Q in Quicksight

AWS Machine Learning - AI

MARCH 20, 2025

Asure anticipated that generative AI could aid contact center leaders to understand their teams support performance, identify gaps and pain points in their products, and recognize the most effective strategies for training customer support representatives using call transcripts.

Generative AI

Generative AI Artificial Inteligence Metrics AWS

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

AWS Machine Learning - AI

MAY 30, 2024

CBRE, in parallel, completed UAT testing to confirm it performed as expected. The following figure illustrates the core architecture for the NLQ capability. Following steps 5 and 6 in the architecture, the relevant tables schema is sent as input context to the model to generate a SQL query according to the input NLQ.

AWS

AWS Lambda Performance Artificial Inteligence

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

AWS Machine Learning - AI

MARCH 3, 2025

Overview of Pixtral 12B Pixtral 12B, Mistrals inaugural VLM, delivers robust performance across a range of benchmarks, surpassing other open models and rivaling larger counterparts, according to Mistrals evaluation. Mistral developed a novel architecture for Pixtral 12B, optimized for both computational efficiency and performance.

Insurance

Insurance AWS eCommerce Software Review

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

AWS Machine Learning - AI

OCTOBER 16, 2024

For some content, additional screening is performed to generate subtitles and captions. The general architecture of the metadata pipeline consists of two primary steps: Generate transcriptions of audio tracks: use speech recognition models to generate accurate transcripts of the audio content.

Media

Media Video Artificial Inteligence Generative AI

Boost productivity by using AI in cloud operational health management

AWS Machine Learning - AI

OCTOBER 11, 2024

Event-driven operations management Operational events refer to occurrences within your organization’s cloud environment that might impact the performance, resilience, security, or cost of your workloads. The following diagram illustrates the solution architecture.

Cloud

Cloud AWS Serverless Policies

Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services

AWS Machine Learning - AI

NOVEMBER 13, 2024

The following diagram illustrates the architecture of the application. Authentication is performed against the Amazon Cognito user pool. For more details about the authentication and authorization flows, refer to Accessing AWS services using an identity pool after sign-in.

Generative AI

Generative AI AWS Lambda Authentication

Reduce conversational AI response time through inference at the edge with AWS Local Zones

AWS Machine Learning - AI

MARCH 3, 2025

Response latency refers to the time between the user finishing their speech and beginning to hear the AI assistants response. For a full list of available Local Zones, refer to the Local Zones locations page. To determine the storage types that are supported, refer to the Compute and storage section in AWS Local Zones features.

AWS

AWS Artificial Inteligence Technical Review Systems Review

Generative AI-powered game design: Accelerating early development with Stability AI models on Amazon Bedrock

AWS Machine Learning - AI

MARCH 26, 2025

Its improved architecture, based on the Multimodal Diffusion Transformer (MMDiT), combines multiple pre-trained text encoders for enhanced text understanding and uses QK-normalization to improve training stability. The model demonstrates improved performance in image quality, typography, and complex prompt understanding.

Generative AI

Generative AI Games Development AWS

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

AWS Machine Learning - AI

DECEMBER 2, 2024

For generative AI models requiring multiple instances to handle high-throughput inference requests, this added significant overhead to the total scaling time, potentially impacting application performance during traffic spikes. We ran 5+ scaling simulations and observed consistent performance with low variations across trials.

Generative AI

Generative AI Artificial Inteligence Machine Learning AWS

End-to-End Lineage and External Raw Data Access in Databricks

Perficient

MARCH 31, 2025

However, enabling external users to access raw data while maintaining security and lineage integrity requires a well-thought-out architecture. This blog outlines a reference architecture to achieve this balance. Recommended Architecture 1. Allow external users to access raw data without compromising governance.

Data

Data Architecture Government Policies

Enable Amazon Bedrock cross-Region inference in multi-account environments

AWS Machine Learning - AI

MARCH 27, 2025

Amazon Bedrock cross-Region inference capability that provides organizations with flexibility to access foundation models (FMs) across AWS Regions while maintaining optimal performance and availability. Instead, the system dynamically routes traffic across multiple Regions, maintaining optimal resource utilization and performance.

Artificial Inteligence

Artificial Inteligence AWS Technical Review Policies

Getting started with computer use in Amazon Bedrock Agents

AWS Machine Learning - AI

MARCH 14, 2025

This capability enables Anthropics Claude models to identify whats on a screen, understand the context of UI elements, and recognize actions that should be performed such as clicking buttons, typing text, scrolling, and navigating between applications. The following diagram illustrates the solution architecture.

AWS

AWS Generative AI Linux Groups

Agentic AI design: An architectural case study

12 AI predictions for 2025

Webinars

Trending Sources

The AI Future According to Google Cloud Next ’25: My Interesting Finds

Webinars

From project to product: Architecting the future of enterprise technology

Multi-LLM routing strategies for generative AI applications on AWS

Navigating the cloud maze: A 5-phase approach to optimizing cloud strategies

Ready to transform how your IT organization drives business outcomes with AIOps?

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

Accelerate AWS Well-Architected reviews with Generative AI

How an architecture-led transformation puts the customer first

Deploy DeepSeek-R1 Distilled Llama models in Amazon Bedrock

The Importance of Assessing Interpersonal Skills in Recruitment

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

Build a multi-tenant generative AI environment for your enterprise on AWS

Optimize hosting DeepSeek-R1 distilled models with Hugging Face TGI on Amazon SageMaker AI

The CIO’s Triple Play: Cyber Resilience, Performance, and AIOps/DevOps

What is Private Cloud Architecture: Complete Overview

Customize DeepSeek-R1 distilled models using Amazon SageMaker HyperPod recipes – Part 1

Host concurrent LLMs with LoRAX

Model customization, RAG, or both: A case study with Amazon Nova

Generative AI operating models in enterprise organizations with Amazon Bedrock

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Automate emails for task management using Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, and Amazon Bedrock Guardrails

Video security analysis for privileged access management using generative AI and Amazon Bedrock

Boosting team innovation, productivity, and knowledge sharing with Amazon Q Business – Web experience

SAP publishes open source manifesto

Empower your generative AI application with a comprehensive custom observability solution

Eliyan raises $40M from Intel and Micron to build chiplet interconnects

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

Mastering Multi-Cloud with Cloudera: Strategic Data & AI Deployments Across Clouds

High-performance computing on AWS

Improve RAG performance using Cohere Rerank

Asure’s approach to enhancing their call center experience using generative AI and Amazon Q in Quicksight

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

Boost productivity by using AI in cloud operational health management

Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services

Reduce conversational AI response time through inference at the edge with AWS Local Zones

Generative AI-powered game design: Accelerating early development with Stability AI models on Amazon Bedrock

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

End-to-End Lineage and External Raw Data Access in Databricks

Enable Amazon Bedrock cross-Region inference in multi-account environments

Getting started with computer use in Amazon Bedrock Agents

Stay Connected