Introducing the AMD Xilinx Inference Server

Bingqing Guo

Published: Sep 22, 2022

AI - Machine Learning
Overview
Vitis AI

Overview

The Xilinx Inference Server is the fastest new way to deploy your Vitis™ AI environment XModels for inferencing. You no longer need to write custom logic with the Vitis AI Runtime libraries for each XModel. Instead, you can use the Vitis AI tools to compile and prepare your XModel (or grab a trained one from the Vitis AI Model Zoo), and then use the Inference Server to make the XModel available for servicing inferencing requests. These requests can be easily made using the included Python API, which provides methods to load your XModel and directly make an inference without touching any C++. In addition to ease of use, the Inference Server provides a high-performance and scalable solution to leverage all the FPGAs on your machine or even in your cluster with Kubernetes and KServe. In the future, we plan on supporting other machine learning frameworks and even GPUs to create an all-in-one solution for heterogeneous machine learning inference.

The AMD Xilinx Inference Server is open-sourced on GitHub and under active development. Clone the repository and try it out! Take a look through the documentation for how to get started to set up the environment and walking through some examples.

How to Start

Say you wanted to make some inferences to a trained ResNet50 model with your Alveo™ U250 data center accelerator card. You’d be in luck as there’s already a trained XModel for this platform that you can find from the Vitis AI Model Zoo. But before you can use the Inference Server, you need to prepare your host and board. Follow the instructions in the Vitis AI repository to install the Xilinx Runtime (XRT), the AMD Xilinx Resource Manager (XRM), and the target platform on the Alveo card. Once your host and card are set up, you’re ready to use the server. Note that the following example and instructions are adapted from the documentation which will have the most up-to-date version of these instructions.

$ git clone https://github.com/Xilinx/inference-server.git

$ cd inference-server

$ ./proteus dockerize

First, we clone the repository and build the Docker image to run the server. The resulting Docker container contains all the dependencies to build, test and run the Inference Server. By using containers, we can easily run the server and deploy it onto clusters.

$ ./proteus run --dev

Once the container is built, we can start it by using this command. This will start the container, mount our local directory into it for development, pass along any FPGAs on the host, and drop us into a terminal in the container. The rest of these instructions are run inside the container.

$ proteus build –all

In the container, we can build the server executable. Once the executable is built, we’re ready to use it for inference. One easy way to do this is using a Python script, which we break down next.

import proteus

To simplify interacting with the server from Python, we provide a Python library that we can import into our script.

server = proteus.Server()

client = proteus.RestClient("127.0.0.1:8998")

server.start()

client.wait_until_live()

Next, we can create our server and client. We point our client to the address where the server is running (by default, the server will be running on the localhost at port 8998). Then, we can start our server and let our client wait until the server is live.

parameters = {"xmodel": path_to_xmodel}

response = client.load("Xmodel", parameters)

worker_name = response.html

while not client.model_ready(worker_name):

pass

Since we want to run the ResNet50 XModel, we load the XModel worker and pass it the path to the XModel we downloaded from the Vitis AI Model Zoo. The server responds back with an endpoint that we can use for subsequent interactions with this worker. We then wait until the worker is ready.

images = []

for _ in range(batch_size):

images.append(path_to_image)

images = preprocess(images)

request = proteus.ImageInferenceRequest(images, True)

response = client.infer(worker_name, request)

Now, we’re ready to make an inference. We can prepare a batch of images to send to the server and preprocess them in Python using custom logic. Finally, we can prepare the request using the preprocessed images and send it to the server for inference. The response can then be parsed, postprocessed, and evaluated.

Next Steps

The example above shows the basic method of interacting with the AMD Xilinx Inference Server. Check out the documentation to learn more about automatic batching, the C++ API, deploying on a cluster, user-defined parallelism, and running end-to-end inferences. Stay tuned to the AMD Xilinx Inference Server repository for future updates!

About Bingqing Guo

Bingqing Guo, SW & AI Product Marketing Manager at CPG AMD. Bingqing has been working in the marketing of AI acceleration solutions for years. With her understanding of the market and effective promotion strategies, more users have begun to use AMD Vitis AI in their product development and recognized the improvements that Vitis AI has brought to their performance.

See all Bingqing Guo's articles

サーバー

ビジネスシステム

ワークステーション

エンベデッド

パーソナル ノート PC

パーソナル デスクトップ

ハンドヘルド

リソース

アクセラレータ

アダプティブ アクセラレータ

DPU アクセラレータ

イーサネット アダプター

ワークステーション

デスクトップ

ノート PC

リソース

アダプティブ SoC & FPGA

システム オン モジュール (SOM)

テクノロジ

開発者リソース

評価ボード & キット

プロセッサ ツール

グラフィックス ツール＆アプリケーション

アダプティブ SoC & FPGA ツール

IP & アプリ

GPU アクセラレータ ツール & アプリケーション

概要

データセンター & クラウド向け

エッジ & エンドポイント向け

開発者向け

業界

業界

業界

業界

Industrias

ワークロード

ゲーミング

システム

テクノロジ

リソース

EPYC プロセッサ

Radeon グラフィックス & AMD チップセット

FPGA & アダプティブ SoC

Alveo アクセラレータ & Kria SOM

Ryzen プロセッサ

イーサネット アダプター

概要

EPYC プロセッサ

アクセラレータ

アダプティブ SoC、FPGA、SOM

グラフィックス

概要

製品情報 ＆トレーニング

製品仕様

リソース

プロセッサ & グラフィックス

DPU アクセラレータ

アダプティブ SoC & FPGA

AMD 正規販売店から購入

アダプティブ & エンベデッドコンピューティング

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Introducing the AMD Xilinx Inference Server

Overview

How to Start

Next Steps

About Bingqing Guo

会社情報

ニュース ＆ イベント

コミュニティ

パートナー

投資家

パーソナルノート PC

パーソナルデスクトップ

アダプティブアクセラレータ

イーサネットアダプター

システムオンモジュール (SOM)

プロセッサツール

グラフィックスツール＆アプリケーション

GPU アクセラレータツール & アプリケーション

イーサネットアダプター

製品情報＆トレーニング

ニュース＆イベント