Artificial Inteligence, AWS and Scalability

Extend large language models powered by Amazon SageMaker AI using Model Context Protocol

AWS Machine Learning - AI

MAY 1, 2025

For MCP implementation, you need a scalable infrastructure to host these servers and an infrastructure to host the large language model (LLM), which will perform actions with the tools implemented by the MCP server. We will deep dive into the MCP architecture later in this post.

Artificial Inteligence

Artificial Inteligence AWS Architecture Generative AI

Multi-LLM routing strategies for generative AI applications on AWS

AWS Machine Learning - AI

APRIL 9, 2025

Organizations are increasingly using multiple large language models (LLMs) when building generative AI applications. Although an individual LLM can be highly capable, it might not optimally address a wide range of use cases or meet diverse performance requirements.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Applications

Build and deploy a UI for your generative AI applications with AWS and Python

AWS Machine Learning - AI

NOVEMBER 6, 2024

Traditionally, building frontend and backend applications has required knowledge of web development frameworks and infrastructure management, which can be daunting for those with expertise primarily in data science and machine learning. Access to Amazon Bedrock foundation models is not granted by default.

Generative AI

Generative AI AWS Artificial Inteligence Applications

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building a Scalable ML Pipeline and API in AWS

Dzone - DevOps

MARCH 28, 2025

With rapid progress in the fields of machine learning (ML) and artificial intelligence (AI), it is important to deploy the AI/ML model efficiently in production environments. The architecture downstream ensures scalability, cost efficiency, and real-time access to applications.

Scalability

Scalability Artificial Inteligence AWS Artificial Intelligence

AI in action: Stories of how enterprises are transforming and modernizing

CIO

MARCH 20, 2025

Generative and agentic artificial intelligence (AI) are paving the way for this evolution. AI practitioners and industry leaders discussed these trends, shared best practices, and provided real-world use cases during EXLs recent virtual event, AI in Action: Driving the Shift to Scalable AI. The EXLerate.AI

Artificial Inteligence

Artificial Inteligence Enterprise Insurance Artificial Intelligence

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips

AWS Machine Learning - AI

NOVEMBER 26, 2024

The use of large language models (LLMs) and generative AI has exploded over the last year. With the release of powerful publicly available foundation models, tools for training, fine tuning and hosting your own LLM have also become democratized. xlarge instances are only available in these AWS Regions.

Artificial Inteligence

Artificial Inteligence AWS Artificial Intelligence Generative AI

Accelerate AWS Well-Architected reviews with Generative AI

AWS Machine Learning - AI

MARCH 4, 2025

To achieve these goals, the AWS Well-Architected Framework provides comprehensive guidance for building and improving cloud architectures. This allows teams to focus more on implementing improvements and optimizing AWS infrastructure. This scalability allows for more frequent and comprehensive reviews.

Generative AI

Generative AI Technical Review Software Review Systems Review

Amazon Bedrock Marketplace now includes NVIDIA models: Introducing NVIDIA Nemotron-4 NIM microservices

AWS Machine Learning - AI

DECEMBER 4, 2024

At AWS re:Invent 2024, we are excited to introduce Amazon Bedrock Marketplace. This a revolutionary new capability within Amazon Bedrock that serves as a centralized hub for discovering, testing, and implementing foundation models (FMs).

Artificial Inteligence

Artificial Inteligence Microservices Generative AI AWS

Techniques and approaches for monitoring large language models on AWS

AWS Machine Learning - AI

FEBRUARY 26, 2024

Large Language Models (LLMs) have revolutionized the field of natural language processing (NLP), improving tasks such as language translation, text summarization, and sentiment analysis. Monitoring the performance and behavior of LLMs is a critical task for ensuring their safety and effectiveness.

Artificial Inteligence

Artificial Inteligence AWS Lambda Metrics

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

AWS Machine Learning - AI

OCTOBER 29, 2024

Refer to Supported Regions and models for batch inference for current supporting AWS Regions and models. Although batch inference offers numerous benefits, it’s limited to 10 batch inference jobs submitted per model per Region. Amazon S3 invokes the {stack_name}-create-batch-queue-{AWS-Region} Lambda function.

Scalability

Scalability Lambda Generative AI AWS

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

AWS Machine Learning - AI

DECEMBER 2, 2024

Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in Amazon SageMaker, which significantly reduces the time required to scale generative AI models for inference. It supports a wide range of popular open source LLMs, making it a popular choice for diverse AI applications.

Generative AI

Generative AI Artificial Inteligence Machine Learning AWS

Build a multi-tenant generative AI environment for your enterprise on AWS

AWS Machine Learning - AI

NOVEMBER 7, 2024

It also uses a number of other AWS services such as Amazon API Gateway , AWS Lambda , and Amazon SageMaker. You can use AWS services such as Application Load Balancer to implement this approach. It consists of one or more components depending on the number of FM providers and number and types of custom models used.

Generative AI

Generative AI AWS Artificial Inteligence Enterprise

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI

AWS Machine Learning - AI

FEBRUARY 12, 2025

Large language models (LLMs) have revolutionized the field of natural language processing with their ability to understand and generate humanlike text. Researchers developed Medusa , a framework to speed up LLM inference by adding extra heads to predict multiple tokens simultaneously.

Artificial Inteligence

Artificial Inteligence Training AWS Machine Learning

Orchestrate generative AI workflows with Amazon Bedrock and AWS Step Functions

AWS Machine Learning - AI

NOVEMBER 22, 2024

This post discusses how to use AWS Step Functions to efficiently coordinate multi-step generative AI workflows, such as parallelizing API calls to Amazon Bedrock to quickly gather answers to lists of submitted questions.

Generative AI

Generative AI AWS Technical Review Backup

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service

AWS Machine Learning - AI

NOVEMBER 14, 2024

This is where AWS and generative AI can revolutionize the way we plan and prepare for our next adventure. With the significant developments in the field of generative AI , intelligent applications powered by foundation models (FMs) can help users map out an itinerary through an intuitive natural conversation interface.

Artificial Inteligence

Artificial Inteligence Generative AI Travel AWS

Introducing AWS MCP Servers for code assistants (Part 1)

AWS Machine Learning - AI

APRIL 1, 2025

Were excited to announce the open source release of AWS MCP Servers for code assistants a suite of specialized Model Context Protocol (MCP) servers that bring Amazon Web Services (AWS) best practices directly to your development workflow. This post is the first in a series covering AWS MCP Servers.

AWS

AWS Software Review Knowledge Base Artificial Inteligence

Model customization, RAG, or both: A case study with Amazon Nova

AWS Machine Learning - AI

APRIL 10, 2025

The introduction of Amazon Nova models represent a significant advancement in the field of AI, offering new opportunities for large language model (LLM) optimization. In this post, we demonstrate how to effectively perform model customization and RAG with Amazon Nova models as a baseline.

Case Study

Case Study Artificial Inteligence Study Generative AI

Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group

AWS Machine Learning - AI

APRIL 21, 2025

In this blog post, we discuss how Prompt Optimization improves the performance of large language models (LLMs) for intelligent text processing task in Yuewen Group. Evolution from Traditional NLP to LLM in Intelligent Text Processing Yuewen Group leverages AI for intelligent analysis of extensive web novel texts.

Artificial Inteligence

Artificial Inteligence Groups Applications Innovation

9 IT skills where expertise pays the most

CIO

APRIL 25, 2025

Artificial Intelligence Average salary: $130,277 Expertise premium: $23,525 (15%) AI tops the list as the skill that can earn you the highest pay bump, earning tech professionals nearly an 18% premium over other tech skills. Read on to find out how such expertise can make you stand out in any industry.

Artificial Inteligence

Artificial Inteligence DevOps Virtualization Industry

EBSCOlearning scales assessment generation for their online learning content with generative AI

AWS Machine Learning - AI

DECEMBER 11, 2024

In this post, we illustrate how EBSCOlearning partnered with AWS Generative AI Innovation Center (GenAIIC) to use the power of generative AI in revolutionizing their learning assessment process. The evaluation process includes three phases: LLM-based guideline evaluation, rule-based checks, and a final evaluation.

Generative AI

Generative AI Artificial Inteligence Guidelines Education

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

AWS Machine Learning - AI

OCTOBER 16, 2024

As DPG Media grows, they need a more scalable way of capturing metadata that enhances the consumer experience on online video services and aids in understanding key content characteristics. The following were some initial challenges in automation: Language diversity – The services host both Dutch and English shows.

Media

Media Video Artificial Inteligence Generative AI

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

AWS Machine Learning - AI

NOVEMBER 15, 2024

Principal wanted to use existing internal FAQs, documentation, and unstructured data and build an intelligent chatbot that could provide quick access to the right information for different roles. Principal also used the AWS open source repository Lex Web UI to build a frontend chat interface with Principal branding.

Generative AI

Generative AI AWS Groups Artificial Inteligence

How AWS sales uses Amazon Q Business for customer engagement

AWS Machine Learning - AI

DECEMBER 11, 2024

Earlier this year, we published the first in a series of posts about how AWS is transforming our seller and customer journeys using generative AI. Field Advisor serves four primary use cases: AWS-specific knowledge search With Amazon Q Business, weve made internal data sources as well as public AWS content available in Field Advisors index.

AWS

AWS Generative AI Technical Review Artificial Inteligence

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

AWS Machine Learning - AI

NOVEMBER 26, 2024

With the rise of large language models (LLMs) like Meta Llama 3.1, there is an increasing need for scalable, reliable, and cost-effective solutions to deploy and serve these models. Adjust the following configuration to suit your needs, such as the Amazon EKS version, cluster name, and AWS Region.

AWS

AWS Load Balancer Software Review Artificial Inteligence

Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker

AWS Machine Learning - AI

APRIL 23, 2025

National Laboratory has implemented an AI-driven document processing platform that integrates named entity recognition (NER) and large language models (LLMs) on Amazon SageMaker AI. In this post, we discuss how you can build an AI-powered document processing platform with open source NER and LLMs on SageMaker.

Artificial Inteligence

Artificial Inteligence Open Source AWS Serverless

FloQast builds an AI-powered accounting transformation solution with Anthropic’s Claude 3 on Amazon Bedrock

AWS Machine Learning - AI

APRIL 30, 2025

With advancement in AI technology, the time is right to address such complexities with large language models (LLMs). Amazon Bedrock has helped democratize access to LLMs, which have been challenging to host and manage. The following diagram illustrates the architecture using AWS services.

Artificial Inteligence

Artificial Inteligence Technical Review Software Review Generative AI

Host concurrent LLMs with LoRAX

AWS Machine Learning - AI

APRIL 16, 2025

Out-of-the-box models often lack the specific knowledge required for certain domains or organizational terminologies. To address this, businesses are turning to custom fine-tuned models, also known as domain-specific large language models (LLMs). Why LoRAX for LoRA deployment on AWS?

Artificial Inteligence

Artificial Inteligence Generative AI AWS Storage

Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock

AWS Machine Learning - AI

NOVEMBER 1, 2024

This visibility is essential for setting accurate pricing for generative AI offerings, implementing chargebacks, and establishing usage-based billing models. Without a scalable approach to controlling costs, organizations risk unbudgeted usage and cost overruns. anthropic.claude-3-sonnet-20240229-v1:0", "inferenceProfileId": "us-1.anthropic.claude-3-sonnet-20240229-v1:0",

Generative AI

Generative AI AWS Artificial Inteligence Budget

Use zero-shot large language models on Amazon Bedrock for custom named entity recognition

AWS Machine Learning - AI

JUNE 18, 2024

Some examples include extracting players and positions in an NFL game summary, products mentioned in an AWS keynote transcript, or key names from an article on a favorite tech company. This process must be repeated for every new document and entity type, making it impractical for processing large volumes of documents at scale.

Artificial Inteligence

Artificial Inteligence Generative AI AWS Machine Learning

Optimize hosting DeepSeek-R1 distilled models with Hugging Face TGI on Amazon SageMaker AI

AWS Machine Learning - AI

MARCH 13, 2025

DeepSeek-R1 , developed by AI startup DeepSeek AI , is an advanced large language model (LLM) distinguished by its innovative, multi-stage training process. Instead of relying solely on traditional pre-training and fine-tuning, DeepSeek-R1 integrates reinforcement learning to achieve more refined outputs.

Artificial Inteligence

Artificial Inteligence AWS Machine Learning Load Balancer

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Development Support Program

AWS Machine Learning - AI

JULY 31, 2024

Amazon Web Services (AWS) is committed to supporting the development of cutting-edge generative artificial intelligence (AI) technologies by companies and organizations across the globe. Let’s dive in and explore how these organizations are transforming what’s possible with generative AI on AWS.

Artificial Inteligence

Artificial Inteligence AWS Programming Innovation

Build a video insights and summarization engine using generative AI with Amazon Bedrock

AWS Machine Learning - AI

OCTOBER 29, 2024

This engine uses artificial intelligence (AI) and machine learning (ML) services and generative AI on AWS to extract transcripts, produce a summary, and provide a sentiment for the call. This post provides guidance on how you can create a video insights and summarization engine using AWS AI/ML services.

Generative AI

Generative AI Video Engineering Artificial Inteligence

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning - AI

FEBRUARY 6, 2024

To support overarching pharmacovigilance activities, our pharmaceutical customers want to use the power of machine learning (ML) to automate the adverse event detection from various data sources, such as social media feeds, phone calls, emails, and handwritten notes, and trigger appropriate actions.

Artificial Inteligence

Artificial Inteligence Pharmaceuticals Healthcare AWS

WordFinder app: Harnessing generative AI on AWS for aphasia communication

AWS Machine Learning - AI

MAY 2, 2025

David Copland, from QARC, and Scott Harding, a person living with aphasia, used AWS services to develop WordFinder, a mobile, cloud-based solution that helps individuals with aphasia increase their independence through the use of AWS generative AI technology. The following diagram illustrates the solution architecture on AWS.

Generative AI

Generative AI AWS Lambda Authentication

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

AWS Machine Learning - AI

APRIL 30, 2025

Prerequisites For a successful implementation of Amazon Bedrock Model Distillation, youll need to meet several requirements. We recommend referring to the Submit a model distillation job in Amazon Bedrock in the official AWS documentation for the most up-to-date and comprehensive information. 70B and Llama 3.1

Artificial Inteligence

Artificial Inteligence Generative AI AWS Training

Discover, Protect and Respond with AWS and Prisma Cloud

Prisma Clud

NOVEMBER 22, 2024

Organizations are increasingly turning to cloud providers, like Amazon Web Services (AWS), to address these challenges and power their digital transformation initiatives. However, the vastness of AWS environments and the ease of spinning up new resources and services can lead to cloud sprawl and ongoing security risks.

AWS

AWS Cloud Network Compliance

Enable Amazon Bedrock cross-Region inference in multi-account environments

AWS Machine Learning - AI

MARCH 27, 2025

Amazon Bedrock cross-Region inference capability that provides organizations with flexibility to access foundation models (FMs) across AWS Regions while maintaining optimal performance and availability. We provide practical examples for both SCP modifications and AWS Control Tower implementations.

Artificial Inteligence

Artificial Inteligence AWS Technical Review Policies

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

AWS Machine Learning - AI

OCTOBER 31, 2024

AWS offers powerful generative AI services , including Amazon Bedrock , which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. The following figure illustrates the high-level design of the solution.

Generative AI

Generative AI Lambda Applications AWS

Insights in implementing production-ready solutions with generative AI

AWS Machine Learning - AI

APRIL 30, 2025

This post explores key insights and lessons learned from AWS customers in Europe, Middle East, and Africa (EMEA) who have successfully navigated this transition, providing a roadmap for others looking to follow suit. For more information, you can watch the AWS Summit Milan 2024 presentation.

Generative AI

Generative AI Artificial Inteligence AWS Machine Learning

Responsible AI in action: How Data Reply red teaming supports generative AI safety on AWS

AWS Machine Learning - AI

APRIL 29, 2025

At Data Reply and AWS, we are committed to helping organizations embrace the transformative opportunities generative AI presents, while fostering the safe, responsible, and trustworthy development of AI systems. Post-authentication, users access the UI Layer, a gateway to the Red Teaming Playground built on AWS Amplify and React.

Generative AI

Generative AI Weak Development Team AWS Artificial Inteligence

Google’s AI innovations at Cloud Next 2025: What CIOs need to know

CIO

APRIL 15, 2025

During his one hour forty minute-keynote, Thomas Kurian, CEO of Google Cloud showcased updates around most of the companys offerings, including new large language models (LLMs) , a new AI accelerator chip, new open source frameworks around agents, and updates to its data analytics, databases, and productivity tools and services among others.

Cloud

Cloud Innovation Artificial Inteligence Google Cloud

Creating asynchronous AI agents with Amazon Bedrock

AWS Machine Learning - AI

MARCH 13, 2025

Advancements in multimodal artificial intelligence (AI), where agents can understand and generate not just text but also images, audio, and video, will further broaden their applications. Conversely, asynchronous event-driven systems offer greater flexibility and scalability through their distributed nature.

Artificial Inteligence

Artificial Inteligence Lambda Generative AI Travel

Generative AI operating models in enterprise organizations with Amazon Bedrock

AWS Machine Learning - AI

JANUARY 29, 2025

Intelligent document processing , translation and summarization, flexible and insightful responses for customer support agents, personalized marketing content, and image and code generation are a few use cases using generative AI that organizations are rolling out in production.

Generative AI

Generative AI Organization Enterprise Artificial Inteligence

Deploy DeepSeek-R1 Distilled Llama models in Amazon Bedrock

AWS Machine Learning - AI

JANUARY 29, 2025

Their DeepSeek-R1 models represent a family of large language models (LLMs) designed to handle a wide range of tasks, from code generation to general reasoning, while maintaining competitive performance and efficiency. For more information, see Create a service role for model import.

Generative AI

Generative AI Artificial Inteligence AWS Serverless

Extend large language models powered by Amazon SageMaker AI using Model Context Protocol

Multi-LLM routing strategies for generative AI applications on AWS

Webinars

Trending Sources

Build and deploy a UI for your generative AI applications with AWS and Python

Webinars

Building a Scalable ML Pipeline and API in AWS

AI in action: Stories of how enterprises are transforming and modernizing

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips

Accelerate AWS Well-Architected reviews with Generative AI

Amazon Bedrock Marketplace now includes NVIDIA models: Introducing NVIDIA Nemotron-4 NIM microservices

Techniques and approaches for monitoring large language models on AWS

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

Build a multi-tenant generative AI environment for your enterprise on AWS

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI

Orchestrate generative AI workflows with Amazon Bedrock and AWS Step Functions

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service

Introducing AWS MCP Servers for code assistants (Part 1)

Model customization, RAG, or both: A case study with Amazon Nova

Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group

9 IT skills where expertise pays the most

EBSCOlearning scales assessment generation for their online learning content with generative AI

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

How AWS sales uses Amazon Q Business for customer engagement

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker

FloQast builds an AI-powered accounting transformation solution with Anthropic’s Claude 3 on Amazon Bedrock

Host concurrent LLMs with LoRAX

Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock

Use zero-shot large language models on Amazon Bedrock for custom named entity recognition

Optimize hosting DeepSeek-R1 distilled models with Hugging Face TGI on Amazon SageMaker AI

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Development Support Program

Build a video insights and summarization engine using generative AI with Amazon Bedrock

Deploy large language models for a healthtech use case on Amazon SageMaker

WordFinder app: Harnessing generative AI on AWS for aphasia communication

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

Discover, Protect and Respond with AWS and Prisma Cloud

Enable Amazon Bedrock cross-Region inference in multi-account environments

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

Insights in implementing production-ready solutions with generative AI

Responsible AI in action: How Data Reply red teaming supports generative AI safety on AWS

Google’s AI innovations at Cloud Next 2025: What CIOs need to know

Creating asynchronous AI agents with Amazon Bedrock

Generative AI operating models in enterprise organizations with Amazon Bedrock

Deploy DeepSeek-R1 Distilled Llama models in Amazon Bedrock

Stay Connected