Skip to content

Conversation

@falconlee236
Copy link
Contributor

@falconlee236 falconlee236 commented Jun 28, 2025

Summary

Adds Azure Kubernetes Service (AKS) deployment option alongside existing GKE support, giving users choice between cloud platforms for GPU-accelerated ML inference workloads.

What's New

Infrastructure

AKS Cluster: Complete Terraform configuration with GPU node pools
GPU Support: NVIDIA T4 GPUs on Standard_NC4as_T4_v3 VMs
Monitoring: Prometheus stack with GPU metrics
Automation: SSH key generation and kubeconfig management

File Structure

├── azure-infrastructure/     # New: AKS Terraform configs
├── production-stack/         # Updated: Helm charts for AKS
├── azurek8s                  # New: Generated kubeconfig
├── Makefile                  # Updated: Azure commands
└── README.md                 # Updated: Azure documentation

Key Components

  • Default Node Pool: Standard_D4_v4 for management workloads
  • GPU Node Pool: NC4as_T4_v3 with proper taints/labels for ML workloads
  • NVIDIA GPU Operator: Enhanced GPU management vs Device Plugin
  • Prometheus Monitoring: Complete observability stack

Benefits

  • Multi-cloud Choice: Azure alternative to GCP deployment
  • Cost Options: Competitive Azure GPU pricing
  • Azure Integration: Native ecosystem support
  • Enhanced Monitoring: Built-in Prometheus stack

Impact

  • No Breaking Changes: Existing GKE deployments unaffected
  • Additive Feature: Users can choose GCP or Azure
  • Same API: Identical vLLM endpoints and configuration

Considerations for Reviewers

  • Please check spell check with AKS, post commit spell checking fail because of AKS keyword.
  • Please check My README.md , This is a guide how to setup this Azure production-stack infra.
  • I will start AWS infra about production stack If This Issue not merged or assigned this issue to others immediately by @YuhanLiu11 or @Hanchenli

FIX #271

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

Signed-off-by: falconlee236 <falconlee236@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @falconlee236, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the cloud deployment options for the vLLM production stack by adding full support for Microsoft Azure Kubernetes Service (AKS). It provides all necessary Terraform configurations and detailed documentation to enable users to deploy GPU-accelerated ML inference workloads on Azure, offering a robust alternative to existing GKE deployments without introducing breaking changes.

Highlights

  • Azure AKS Integration: This pull request introduces comprehensive Terraform configurations to enable the deployment of a GPU-accelerated vLLM production stack on Microsoft Azure Kubernetes Service (AKS). It provides a complete infrastructure setup, offering users a multi-cloud choice alongside existing GKE support.
  • GPU-Accelerated Infrastructure: The new configurations provision an AKS cluster with specialized GPU node pools (using NVIDIA T4 GPUs on Standard_NC4as_T4_v3 VMs) and integrate the NVIDIA GPU Operator for efficient GPU management and allocation for ML inference workloads.
  • Comprehensive Monitoring and Observability: A full Prometheus monitoring stack is deployed, including GPU metrics and a Prometheus Adapter for custom vLLM metrics, ensuring enhanced observability of the deployed ML workloads.
  • Automated Deployment Workflow: A new Makefile is added to streamline the entire deployment process, allowing for automated creation and destruction of both the Azure infrastructure and the vLLM Helm charts with simple commands.
  • Detailed User Documentation: An extensive README.md guide is included, providing step-by-step instructions for setting up, testing, and troubleshooting the vLLM production stack on Azure, covering prerequisites, component details, GPU selection, and cleanup procedures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@falconlee236 falconlee236 marked this pull request as ready for review June 28, 2025 07:59
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Terraform-based deployment guide for vLLM on Azure Kubernetes Service (AKS). The review focuses on improving documentation accuracy and fixing minor issues in automation scripts and Terraform configurations to enhance clarity and user experience.

Signed-off-by: falconlee236 <falconlee236@gmail.com>
@falconlee236 falconlee236 force-pushed the feat/terraform-tutorial-azure branch from 4473083 to e695da8 Compare June 28, 2025 08:12
Signed-off-by: falconlee236 <falconlee236@gmail.com>
Signed-off-by: falconlee236 <falconlee236@gmail.com>
Signed-off-by: falconlee236 <falconlee236@gmail.com>
Copy link
Contributor

@kobe0938 kobe0938 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

@YuhanLiu11 YuhanLiu11 enabled auto-merge (squash) July 3, 2025 18:47
@YuhanLiu11
Copy link
Collaborator

hey @falconlee236 Thanks for your contribution! Can you merge the latest changes?

@YuhanLiu11 YuhanLiu11 merged commit 8c712c5 into vllm-project:main Jul 4, 2025
6 of 7 checks passed
@falconlee236
Copy link
Contributor Author

Hi @YuhanLiu11
I update the branch. Please check the status

And Can I work the production stack infrastructure about aws eks??

I wanna know your thoughts

@falconlee236 falconlee236 deleted the feat/terraform-tutorial-azure branch July 4, 2025 03:09
@kobe0938
Copy link
Contributor

kobe0938 commented Jul 7, 2025

Hi @YuhanLiu11 I update the branch. Please check the status

And Can I work the production stack infrastructure about aws eks??

I wanna know your thoughts

Definitely! That's seems to be the one last piece for Terraform. Feel free to contribute. Thanks.

@falconlee236
Copy link
Contributor Author

Thx to @kobe0938 @YuhanLiu11 ,
I will work on AWS EKS implement with terraform PR, After that Let's talk about blog about production-stack with terraform

Senne-Mennes pushed a commit to Senne-Mennes/production-stack that referenced this pull request Oct 22, 2025
* add aks terraform feature

Signed-off-by: falconlee236 <falconlee236@gmail.com>

* apply gemini code review

Signed-off-by: falconlee236 <falconlee236@gmail.com>

* apply gemini code review 2

Signed-off-by: falconlee236 <falconlee236@gmail.com>

* add blank line

Signed-off-by: falconlee236 <falconlee236@gmail.com>

* add new line

Signed-off-by: falconlee236 <falconlee236@gmail.com>

---------

Signed-off-by: falconlee236 <falconlee236@gmail.com>
Signed-off-by: senne.mennes@capgemini.com <senne.mennes@capgemini.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: Terraform tutorial for MS Azure

3 participants