Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM
AWS Machine Learning - AI
NOVEMBER 26, 2024
Startup probe – Gives the application time to start up. It allows up to 25 minutes for the application to start before considering it failed. These probes assume that your vLLM application exposes a /health endpoint. As a result, traffic won’t be balanced across all replicas of your deployment.
Let's personalize your content