{"id":22821,"date":"2025-04-19T06:31:30","date_gmt":"2025-04-18T22:31:30","guid":{"rendered":"https:\/\/www.wsisp.com\/helps\/22821.html"},"modified":"2025-04-19T06:31:30","modified_gmt":"2025-04-18T22:31:30","slug":"%e5%9c%a8-amd-gpu-%e4%b8%8a%e4%bd%bf%e7%94%a8-vllm-%e7%9a%84-triton-%e6%8e%a8%e7%90%86%e6%9c%8d%e5%8a%a1%e5%99%a8","status":"publish","type":"post","link":"https:\/\/www.wsisp.com\/helps\/22821.html","title":{"rendered":"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668"},"content":{"rendered":"<p>Triton Inference Server with vLLM on AMD GPUs \u2014 ROCm Blogs<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" alt=\"\" height=\"580\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/04\/20250418223128-6802d2c09d49e.png\" width=\"580\" \/><\/p>\n<\/p>\n<p>2025\u5e741\u67088\u65e5&#xff0c;\u4f5c\u8005&#xff1a;Fabricio Flores&#xff0c;Tiffany Mintz&#xff0c;Eliot Li&#xff0c;Yao Liu&#xff0c;Ted Themistokleous&#xff0c;Brian Pickrell&#xff0c;Vish Vadlamani<\/p>\n<\/p>\n<p>Triton \u63a8\u7406\u670d\u52a1\u5668\u662f\u4e00\u4e2a\u5f00\u6e90\u5e73\u53f0&#xff0c;\u65e8\u5728\u7b80\u5316 AI \u63a8\u7406\u8fc7\u7a0b\u3002\u5b83\u652f\u6301\u4ece\u5404\u79cd\u673a\u5668\u5b66\u4e60\u548c\u6df1\u5ea6\u5b66\u4e60\u6846\u67b6&#xff08;\u5305\u62ec Tensorflow\u3001PyTorch \u548c vLLM&#xff09;\u4e2d\u90e8\u7f72\u3001\u6269\u5c55\u548c\u63a8\u7406\u8bad\u7ec3\u540e\u7684 AI \u6a21\u578b&#xff0c;\u4f7f\u5176\u9002\u7528\u4e8e\u5404\u79cd AI \u5de5\u4f5c\u8d1f\u8f7d\u3002\u5b83\u88ab\u8bbe\u8ba1\u4e3a\u53ef\u8de8\u591a\u4e2a\u73af\u5883\u5de5\u4f5c&#xff0c;\u5305\u62ec\u4e91\u3001\u6570\u636e\u4e2d\u5fc3\u548c\u8fb9\u7f18\u8bbe\u5907\u3002<\/p>\n<p>Triton \u63a8\u7406\u670d\u52a1\u5668\u7684\u4e00\u4e9b\u529f\u80fd\u5305\u62ec&#xff1a;<\/p>\n<ul>\n<li>\n<p>\u6846\u67b6\u7075\u6d3b\u6027: \u5141\u8bb8\u90e8\u7f72\u6765\u81ea\u4e0d\u540c\u6846\u67b6\u7684\u6a21\u578b&#xff08;\u53c2\u89c1\u00a0Triton \u63a8\u7406\u670d\u52a1\u5668\u540e\u53f0&#xff09;&#xff0c;\u65e0\u8bba\u5e95\u5c42\u57fa\u7840\u8bbe\u65bd\u5982\u4f55\u3002\u6b64\u7075\u6d3b\u6027\u5141\u8bb8\u5728\u540c\u4e00\u786c\u4ef6\u4e0a\u8fd0\u884c\u591a\u4e2a\u6a21\u578b\u6216\u4e00\u4e2a\u6a21\u578b\u7684\u591a\u4e2a\u5b9e\u4f8b&#xff0c;\u63d0\u9ad8\u8d44\u6e90\u5229\u7528\u7387\u3002<\/p>\n<\/li>\n<li>\n<p>\u786c\u4ef6\u548c\u90e8\u7f72\u591a\u6837\u6027: \u5b83\u9488\u5bf9 GPU \u548c CPU \u73af\u5883\u90fd\u8fdb\u884c\u4e86\u4f18\u5316&#xff0c;\u8fd9\u4f7f\u5f97\u5b83\u53ef\u4ee5\u90e8\u7f72\u5728\u5404\u79cd\u786c\u4ef6\u4e0a\u3002Triton \u63a8\u7406\u670d\u52a1\u5668\u53ef\u4ee5\u5728\u4e91\u7aef\u3001\u6570\u636e\u4e2d\u5fc3\u6216\u8fb9\u7f18\u8bbe\u5907\u4e0a\u4f7f\u7528&#xff0c;\u4f7f\u5176\u9ad8\u5ea6\u591a\u6837\u5316\u3002<\/p>\n<\/li>\n<li>\n<p>\u6027\u80fd\u4f18\u5316: \u901a\u8fc7\u52a8\u6001\u6279\u5904\u7406\u589e\u5f3a\u63a8\u7406\u6027\u80fd&#xff0c;\u52a8\u6001\u6279\u5904\u7406\u805a\u5408\u8f83\u5c0f\u7684\u63a8\u7406\u8bf7\u6c42\u4ee5\u4f18\u5316\u5904\u7406\u5e76\u5b9e\u73b0\u5e76\u53d1\u6a21\u578b\u6267\u884c\u3002\u8fd9\u79cd\u80fd\u529b\u5141\u8bb8\u540c\u65f6\u8fd0\u884c\u591a\u4e2a\u6a21\u578b&#xff0c;\u5bf9\u4e8e\u9700\u8981\u6700\u5c0f\u5ef6\u8fdf\u7684\u5b9e\u65f6\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002<\/p>\n<\/li>\n<\/ul>\n<p>\u5728\u672c\u6587\u4e2d&#xff0c;\u6211\u4eec\u5c06\u9010\u6b65\u5411\u60a8\u5c55\u793a\u5982\u4f55\u5728 AMD GPU \u4e0a\u4f7f\u7528 ROCm \u8bbe\u7f6e\u5177\u6709\u00a0vLLM\u00a0\u540e\u7aef\u7684 Triton \u63a8\u7406\u670d\u52a1\u5668\u3002\u6211\u4eec\u9996\u5148\u7b80\u8981\u4ecb\u7ecd\u5c06\u00a0vLLM\u00a0\u4f5c\u4e3a Triton \u63a8\u7406\u670d\u52a1\u5668\u540e\u7aef\u7684\u4e00\u4e9b\u5173\u952e\u65b9\u9762\u3002\u7136\u540e&#xff0c;\u6211\u4eec\u63d0\u4f9b\u8be6\u7ec6\u7684\u64cd\u4f5c\u6307\u5357&#xff0c;\u5c55\u793a\u5982\u4f55\u4f7f\u7528 vLLM \u540e\u7aef\u8bbe\u7f6e Triton \u63a8\u7406\u670d\u52a1\u5668&#xff0c;\u5e76\u57283\u4e2a LLMs&#xff08;&#096;microsoft\/phi-2&#096;\u3001&#096;mistral-7b-instruct&#096; \u548c\u00a0meta-llama\/Meta-Llama-3-8B-Instruct&#xff09;\u4e0a\u8fdb\u884c\u63a8\u7406\u6d4b\u8bd5\u3002<\/p>\n<h3>\u8981\u6c42<\/h3>\n<ul>\n<li>\n<p>AMD GPU: \u53c2\u89c1\u00a0ROCm \u6587\u6863\u9875\u9762\u00a0\u4e86\u89e3\u652f\u6301\u7684\u786c\u4ef6\u548c\u64cd\u4f5c\u7cfb\u7edf\u3002\u672c\u6587\u5728\u914d\u59078\u4e2a AMD Instinct MI210 GPUs \u7684\u673a\u5668\u4e0a\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002<\/p>\n<\/li>\n<li>\n<p>ROCm 6.1&#043;: \u53c2\u89c1\u00a0ROCm Linux \u5b89\u88c5\u6307\u5357\u00a0\u4e86\u89e3\u5b89\u88c5\u8bf4\u660e\u3002<\/p>\n<\/li>\n<li>\n<p>Docker: \u53c2\u89c1\u00a0\u5728 Ubuntu \u4e0a\u5b89\u88c5 Docker \u5f15\u64ce\u00a0\u4e86\u89e3\u5b89\u88c5\u8bf4\u660e\u3002<\/p>\n<\/li>\n<li>\n<p>Hugging Face \u8bbf\u95ee\u4ee4\u724c: \u6b64\u535a\u5ba2\u9700\u8981\u4e00\u4e2a\u00a0Hugging Face\u00a0\u5e10\u6237&#xff0c;\u5e76\u751f\u6210\u4e00\u4e2a\u00a0\u7528\u6237\u8bbf\u95ee\u4ee4\u724c\u3002<\/p>\n<\/li>\n<li>\n<p>\u8bbf\u95ee Hugging Face \u4e0a\u7684\u00a0mistral\u00a0\u548c\u00a0Llama-3\u00a0\u6a21\u578b. \u8fd9\u4e9b\u662f Hugging Face \u4e0a\u7684\u00a0\u9650\u5236\u8bbf\u95ee\u6a21\u578b\u3002\u5982\u9700\u8bf7\u6c42\u8bbf\u95ee&#xff0c;\u8bf7\u53c2\u89c1\u00a0mistralai\/Mistral-7B-Instruct-v0.2\u00a0\u548c\u00a0meta-llama\/Meta-Llama-3-8B-Instruct\u3002<\/p>\n<\/li>\n<\/ul>\n<p>\u60a8\u53ef\u4ee5\u5728\u6b64\u00a0GitHub \u6587\u4ef6\u5939\u00a0\u627e\u5230\u4e0e\u672c\u6587\u76f8\u5173\u7684\u6587\u4ef6\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" alt=\"\" height=\"580\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/04\/20250418223128-6802d2c0cb4dd.png\" width=\"580\" \/><\/p>\n<h3>Triton \u63a8\u7406\u670d\u52a1\u5668&#xff1a;vLLM \u540e\u7aef<\/h3>\n<p>Triton \u63a8\u7406\u670d\u52a1\u5668\u7684\u540e\u7aef\u6307\u7684\u662f\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u8d1f\u8d23\u6267\u884c AI \u6a21\u578b\u7684\u7ec4\u4ef6\u3002\u540e\u7aef\u662f\u4e00\u4e2a\u56f4\u7ed5\u7279\u5b9a\u673a\u5668\u5b66\u4e60\u6846\u67b6&#xff08;\u5982\u00a0PyTorch,\u00a0TensorFlow,\u00a0vLLM\u6216\u5176\u4ed6&#xff09;\u7684\u5c01\u88c5\u3002\u6bcf\u4e2a\u540e\u7aef\u90fd\u5b9e\u73b0\u4e3a\u4e00\u4e2a\u5171\u4eab\u5e93&#xff0c;\u6a21\u578b\u53ef\u4ee5\u914d\u7f6e\u4e3a\u4f7f\u7528\u7279\u5b9a\u7684\u540e\u7aef\u3002\u4f8b\u5982&#xff0c;\u5982\u679c\u4e00\u4e2a\u6a21\u578b\u4f7f\u7528 PyTorch&#xff0c;\u90a3\u4e48\u540e\u7aef\u5c06\u914d\u7f6e\u4e3a\u4e0e PyTorch \u5e93\u4ea4\u4e92\u3002<\/p>\n<p>Triton \u63a8\u7406\u670d\u52a1\u5668\u9879\u76ee\u63d0\u4f9b\u4e86\u4e00\u7ec4\u7ecf\u8fc7\u6d4b\u8bd5\u548c\u5728\u6bcf\u4e2a\u7248\u672c\u4e2d\u66f4\u65b0\u7684\u652f\u6301\u7684\u540e\u7aef\u3002\u5173\u4e8e\u652f\u6301\u7684\u540e\u7aef\u5217\u8868&#xff0c;\u8bf7\u53c2\u89c1\u00a0Where can I find all the backends that are available for Triton?\u53ef\u4ee5\u627e\u5230\u6240\u6709\u53ef\u7528\u7684 Triton \u540e\u7aef\u3002\u672c\u535a\u5ba2\u91cd\u70b9\u4ecb\u7ecd vLLM \u540e\u7aef\u3002<\/p>\n<\/p>\n<p>\u4f7f\u7528 vLLM \u4f5c\u4e3a\u540e\u7aef\u53ef\u4ee5\u542f\u7528\u5927\u8bed\u8a00\u6a21\u578b&#xff08;LLMs&#xff09;\u7684\u63a8\u7406\u670d\u52a1&#xff0c;\u5176\u7279\u70b9\u662f\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5ef6\u8fdf\u3002vLLM \u662f\u4e00\u4e2a\u4e13\u95e8\u4e3a\u5904\u7406 LLM \u63a8\u7406\u4f18\u5316\u7684\u5f15\u64ce&#xff0c;\u7279\u522b\u662f\u5728\u6301\u7eed\u6279\u5904\u7406\u548c\u5185\u5b58\u6548\u7387\u81f3\u5173\u91cd\u8981\u7684\u573a\u666f\u4e0b\u3002<\/p>\n<p>\u4ee5\u4e0b\u662f Triton \u63a8\u7406\u670d\u52a1\u5668\u4e2d vLLM \u7684\u4e00\u4e9b\u5173\u952e\u65b9\u9762&#xff1a;<\/p>\n<ul>\n<li>\n<p>vLLM \u96c6\u6210: vLLM \u4ece23.10 \u7248\u672c\u5f00\u59cb\u96c6\u6210\u5230 Triton \u63a8\u7406\u670d\u52a1\u5668\u4e2d\u3002\u53ef\u4ee5\u901a\u8fc7\u5305\u542b vLLM \u540e\u7aef\u7684\u9884\u6784\u5efa\u5bb9\u5668\u6216\u901a\u8fc7\u6784\u5efa\u81ea\u5b9a\u4e49\u5bb9\u5668\u6765\u4f7f\u7528\u8be5\u96c6\u6210\u3002\u8fd9\u79cd\u96c6\u6210\u5141\u8bb8\u901a\u8fc7 Triton \u63a8\u7406\u670d\u52a1\u5668\u7684\u7075\u6d3b\u548c\u53ef\u6269\u5c55\u67b6\u6784\u63d0\u4f9b\u5982 Facebook \u7684 OPT \u7cfb\u5217\u3001LLaMA \u6a21\u578b\u7b49\u6a21\u578b\u670d\u52a1\u3002<\/p>\n<\/li>\n<li>\n<p>\u914d\u7f6e\u4e0e\u90e8\u7f72: \u5728\u8bbe\u7f6e vLLM \u4f5c\u4e3a\u540e\u7aef\u65f6&#xff0c;\u9700\u914d\u7f6e\u6a21\u578b\u4ed3\u5e93\u3002\u8fd9\u4e2a\u4ed3\u5e93\u5305\u62ec\u00a0model.json\u00a0\u548c\u00a0config.pbtxt\u00a0\u6587\u4ef6\u3002\u8fd9\u4e9b\u914d\u7f6e\u5b9a\u4e49\u4e86\u6a21\u578b\u53c2\u6570&#xff0c;\u4f8b\u5982\u5185\u5b58\u5229\u7528\u3001\u6279\u5904\u7406\u5927\u5c0f\u548c\u6a21\u578b\u7279\u5b9a\u8bbe\u7f6e\u3002<\/p>\n<\/li>\n<li>\n<p>\u6027\u80fd\u7279\u6027: Triton \u63a8\u7406\u670d\u52a1\u5668\u4e2d\u7684 vLLM \u540e\u7aef\u652f\u6301\u5f02\u6b65\u63a8\u7406&#xff0c;\u8fd9\u5bf9\u4e8e\u5927\u89c4\u6a21\u6587\u672c\u751f\u6210\u548c\u5904\u7406\u7b49\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u5f20\u91cf\u5e76\u884c\u548c\u5206\u9875\u6ce8\u610f\u529b\u7b49\u7279\u6027\u589e\u5f3a\u4e86\u591a GPU \u6027\u80fd&#xff0c;\u4f7f vLLM \u9002\u5408\u8de8\u5206\u5e03\u5f0f\u7cfb\u7edf\u5904\u7406\u5927\u6a21\u578b\u3002<\/p>\n<\/li>\n<li>\n<p>\u90e8\u7f72\u9009\u9879: \u4f7f\u7528 vLLM \u540e\u7aef\u7684\u6a21\u578b\u53ef\u4ee5\u90e8\u7f72\u5728\u5404\u79cd\u5e73\u53f0\u4e0a&#xff0c;\u5305\u62ec\u4e91\u73af\u5883\u3002\u5bb9\u5668\u5316\u90e8\u7f72\u786e\u4fdd\u6a21\u578b\u53ef\u4ee5\u6839\u636e\u6027\u80fd\u9700\u6c42\u8fdb\u884c\u6c34\u5e73\u6269\u5c55&#xff0c;\u5e76\u652f\u6301 Kubernetes \u548c\u5176\u4ed6\u7f16\u6392\u7cfb\u7edf\u3002<\/p>\n<\/li>\n<\/ul>\n<p>\u4f7f\u7528 vLLM \u4f5c\u4e3a Triton \u63a8\u7406\u670d\u52a1\u5668\u7684\u540e\u7aef&#xff0c;\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9ad8\u5ea6\u4f18\u5316\u7684\u670d\u52a1\u5f15\u64ce&#xff0c;\u4e13\u95e8\u9002\u5e94 LLM \u7684\u7279\u5b9a\u9700\u6c42&#xff0c;\u5e76\u4e14\u8fd8\u80fd\u5229\u7528 Triton \u63a8\u7406\u670d\u52a1\u5668\u7684\u5f3a\u5927\u57fa\u7840\u8bbe\u65bd\u4ee5\u5b9e\u73b0\u53ef\u6269\u5c55\u7684\u63a8\u7406\u670d\u52a1\u3002<\/p>\n<h3 style=\"background-color:transparent\">\u8bbe\u7f6e\u5e26\u6709 vLLM \u540e\u7aef\u7684 Triton \u63a8\u7406\u670d\u52a1\u5668<\/h3>\n<p>\u8981\u4f7f\u7528 Triton \u63a8\u7406\u670d\u52a1\u5668\u548c vLLM \u540e\u7aef\u6267\u884c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406&#xff0c;\u8bf7\u6309\u7167\u4ee5\u4e0b\u6b65\u9aa4\u64cd\u4f5c&#xff1a;<\/p>\n<ul>\n<li>\n<p>\u8bbe\u7f6e\u5e26\u6709 vLLM \u540e\u7aef\u7684 Triton \u63a8\u7406\u670d\u52a1\u5668: \u6211\u4eec\u6b63\u5728\u914d\u7f6e\u4e00\u4e2a\u00a0docker compose\u00a0\u6587\u4ef6&#xff0c;\u5176\u4e2d\u5305\u62ec\u4e00\u4e2a\u5e26\u6709 vLLM \u540e\u7aef\u7684 Triton \u63a8\u7406\u670d\u52a1\u5668\u5bb9\u5668\u3002\u8be5\u00a0docker compose\u00a0\u6587\u4ef6\u5f15\u7528\u4e86\u9884\u5148\u5b89\u88c5\u4e86 Triton \u63a8\u7406\u670d\u52a1\u5668\u7684 Docker \u955c\u50cf&#xff08;\u8be5\u955c\u50cf\u53ef\u4ee5\u4ece\u6e90\u4ee3\u7801\u6784\u5efa\u6216\u4ece\u6ce8\u518c\u8868\u4e2d\u62c9\u53d6&#xff09;&#xff0c;\u5b9a\u4e49\u4e86 GPU \u8bbf\u95ee&#xff0c;\u8bbe\u7f6e\u4e86\u5b58\u50a8\u5e93\u8def\u5f84&#xff0c;\u5e76\u66b4\u9732\u4e86\u5fc5\u8981\u7684\u7aef\u53e3\u3002<\/p>\n<\/li>\n<li>\n<p>\u51c6\u5907\u6a21\u578b\u5e93: \u6a21\u578b\u5e93\u662f\u4e00\u4e2a\u76ee\u5f55\u6216\u4e00\u7ec4\u76ee\u5f55&#xff0c;\u5176\u4e2d\u5305\u542b\u5c06\u7528\u4e8e\u63a8\u7406\u7684\u6a21\u578b\u3002\u6bcf\u4e2a\u6a21\u578b\u5728\u5b58\u50a8\u5e93\u4e2d\u4ee5\u7279\u5b9a\u7684\u7ed3\u6784\u7ec4\u7ec7\u3002\u6bcf\u6b21 Triton \u63a8\u7406\u670d\u52a1\u5668\u542f\u52a8\u65f6\u90fd\u4f1a\u626b\u63cf\u548c\u52a0\u8f7d\u6b64\u7ed3\u6784\u3002<\/p>\n<p>\u6a21\u578b\u5e93\u7684\u7ed3\u6784\u5982\u4e0b:<\/p>\n<p> model_repository\/<br \/>\n    \u251c\u2500\u2500 &lt;model_name_1&gt;\/<br \/>\n    \u2502   \u251c\u2500\u2500 config.pbtxt  # \u63cf\u8ff0\u6a21\u578b\u7684\u914d\u7f6e\u6587\u4ef6<br \/>\n    \u2502   \u251c\u2500\u2500 1\/  # \u7248\u672c\u76ee\u5f55&#xff08;Triton \u63a8\u7406\u670d\u52a1\u5668\u652f\u6301\u7248\u672c\u63a7\u5236&#xff09;<br \/>\n    \u2502   \u2502   \u2514\u2500\u2500 model.onnx  # \u5b9e\u9645\u7684\u6a21\u578b\u6587\u4ef6&#xff08;\u4f8b\u5982&#xff0c;ONNX&#xff0c;PyTorch&#xff0c;vLLM&#xff09;<br \/>\n    \u2502   \u2514\u2500\u2500 2\/<br \/>\n    \u2502       \u2514\u2500\u2500 model.onnx<br \/>\n    \u251c\u2500\u2500 &lt;model_name_2&gt;\/<br \/>\n    \u2502   \u251c\u2500\u2500 config.pbtxt<br \/>\n    \u2502   \u251c\u2500\u2500 1\/<br \/>\n    \u2502   \u2502   \u2514\u2500\u2500 model.json<br \/>\n    \u2502   \u2514\u2500\u2500 2\/<br \/>\n    \u2502       \u2514\u2500\u2500 model.json<\/p>\n<p>model_repository\u00a0\u662f\u5305\u542b\u4e00\u4e2a\u6216\u591a\u4e2a\u5b50\u76ee\u5f55\u7684\u6839\u76ee\u5f55&#xff0c;\u6bcf\u4e2a\u5b50\u76ee\u5f55\u4ee3\u8868\u4e00\u4e2a\u6a21\u578b\u3002\u6bcf\u4e2a\u6a21\u578b\u88ab\u7ec4\u7ec7\u5230\u4e00\u4e2a \u6a21\u578b\u76ee\u5f55\u00a0(&lt;model_name&gt;), \u4e2d&#xff0c;\u76ee\u5f55\u540d\u79f0\u5bf9\u5e94\u4e8e\u6a21\u578b\u7684\u540d\u79f0. \u5728\u6a21\u578b\u76ee\u5f55\u4e2d&#xff0c;\u6709\u7248\u672c\u76ee\u5f55\u00a0(1\/,\u00a02\/) \u5141\u8bb8\u540c\u4e00\u6a21\u578b\u7684\u591a\u4e2a\u7248\u672c\u5171\u5b58\u3002\u6bcf\u4e2a\u7248\u672c\u76ee\u5f55\u5305\u542b\u5b9e\u9645\u7684\u6a21\u578b\u6587\u4ef6\u3002\u8fd9\u4e9b\u6587\u4ef6\u4f7f Triton \u63a8\u7406\u670d\u52a1\u5668\u80fd\u591f\u8bc6\u522b\u548c\u670d\u52a1\u6b63\u786e\u7684\u7248\u672c\u3002\u6a21\u578b\u6587\u4ef6\u00a0(model.onnx,\u00a0model.json, \u7b49) \u5b58\u50a8\u6a21\u578b\u7684\u67b6\u6784\u548c\u63a8\u7406\u53c2\u6570\u3002\u6700\u540e&#xff0c;\u914d\u7f6e\u6587\u4ef6&#xff08;&#096;config.pbtxt&#096;&#xff09;\u5b9a\u4e49\u4e86\u8f93\u5165\u548c\u8f93\u51fa\u5f20\u91cf\u7684\u540d\u79f0\u3001\u5f62\u72b6\u3001\u6570\u636e\u7c7b\u578b\u548c\u5176\u4ed6\u914d\u7f6e\u3002<\/p>\n<\/li>\n<li>\n<p>\u5b9a\u4e49\u6a21\u578b\u914d\u7f6e\u548c\u6a21\u578b\u6587\u4ef6: \u4f7f\u7528 vLLM \u540e\u7aef\u65f6&#xff0c;\u6a21\u578b\u914d\u7f6e\u6587\u4ef6\u5fc5\u987b\u9664\u4e86\u6570\u636e\u7c7b\u578b\u548c\u5f62\u72b6\u5916&#xff0c;\u8fd8\u6307\u5b9a\u540e\u7aef\u7c7b\u578b\u3002\u4e00\u4e2a\u7b80\u5316\u7248\u672c\u7684\u00a0config.pbtxt\u00a0\u5982\u4e0b&#xff1a;<\/p>\n<p> backend: &#034;vllm&#034;<\/p>\n<p>input [<br \/>\n{<br \/>\n    name: &#034;text_input&#034;<br \/>\n    data_type: TYPE_STRING<br \/>\n    dims: [ 1 ]<br \/>\n}<br \/>\n]<\/p>\n<p>output [<br \/>\n{<br \/>\n    name: &#034;text_output&#034;<br \/>\n    data_type: TYPE_STRING<br \/>\n    dims: [ -1 ]<br \/>\n}<br \/>\n]<\/p>\n<p>\u800c\u6a21\u578b\u6587\u4ef6\u00a0model.json&#xff0c;\u5176\u4e2d\u6307\u5b9a\u4e86\u6a21\u578b\u521d\u59cb\u5316\u548c\u63a8\u7406\u53c2\u6570&#xff0c;\u5982\u4e0b&#xff1a;<\/p>\n<p> {<br \/>\n    &#034;model&#034;:&#034;meta-llama\/Meta-Llama-3-8B-Instruct&#034;,<br \/>\n    &#034;gpu_memory_utilization&#034;: 0.8,<br \/>\n    &#034;tensor_parallel_size&#034;: 2,<br \/>\n    &#034;trust_remote_code&#034;: true,<br \/>\n    &#034;disable_log_requests&#034;: true,<br \/>\n    &#034;enforce_eager&#034;: true,<br \/>\n    &#034;max_model_len&#034;: 2048<br \/>\n}<\/p>\n<p>\u8fd9\u4e9b\u53c2\u6570\u4e2d&#xff0c;&#096;model&#096; \u6307\u5b9a\u6a21\u578b\u7684\u540d\u79f0&#xff0c;&#096;gpu_memory_utilization&#096; \u9650\u5236\u6a21\u578b\u53ea\u80fd\u4f7f\u7528 GPU \u5185\u5b58\u7684\u4e00\u5b9a\u767e\u5206\u6bd4&#xff0c;&#096;tensor_parallel_size&#096; \u5b9a\u4e49\u6a21\u578b\u5e94\u4f7f\u7528\u7684 GPU \u6570\u91cf\u4ee5\u8fdb\u884c\u5e76\u884c\u5904\u7406\u3002\u6709\u5173\u66f4\u591a\u53c2\u6570\u548c\u914d\u7f6e\u6587\u4ef6\u7684\u8be6\u7ec6\u4fe1\u606f, \u8bf7\u53c2\u89c1Triton Inference Server-vLLM \u6587\u6863:\u00a0\u542f\u52a8 Triton \u63a8\u7406\u670d\u52a1\u5668.<\/p>\n<\/li>\n<\/ul>\n<p>\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a Docker Compose \u914d\u7f6e&#xff0c;\u81ea\u52a8\u5316\u4e86\u5e26\u6709 vLLM \u540e\u7aef\u7684 Triton \u63a8\u7406\u670d\u52a1\u5668\u7684\u6574\u4e2a\u8bbe\u7f6e\u3002\u6b64\u8bbe\u7f6e\u5305\u62ec\u6784\u5efa Docker \u955c\u50cf&#xff0c;\u901a\u8fc7\u00a0docker-compose.yaml\u00a0\u6587\u4ef6\u914d\u7f6e AMD GPU \u8bbf\u95ee\u6743\u9650&#xff0c;\u5e76\u8bbe\u7f6e\u5305\u542b 3 \u4e2a\u4e0d\u540c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b&#xff08;LLM&#xff09;\u7684\u6a21\u578b\u5e93&#xff08;&#096;.\/triton_server_vllm\/src\/model_repository&#096;&#xff09;\u4ee5\u8fdb\u884c\u6d4b\u8bd5\u3002\u4f7f\u7528\u6b64\u8bbe\u7f6e&#xff0c;\u8fd0\u884c\u00a0docker compose build\u00a0\u548c\u00a0docker compose up\u00a0\u547d\u4ee4\u6765\u542f\u52a8 Triton \u63a8\u7406\u670d\u52a1\u5668&#xff0c;\u800c\u65e0\u9700\u624b\u52a8\u5b8c\u6210\u524d\u9762\u7684\u6b65\u9aa4\u3002<\/p>\n<p>\u8ba9\u6211\u4eec\u4ece\u6e90\u4ee3\u7801\u5f00\u59cb\u6784\u5efa Triton \u63a8\u7406\u670d\u52a1\u5668 Docker \u955c\u50cf\u3002\u514b\u9686\u5305\u542b AMD ROCm \u7248\u672c\u7684 Triton \u63a8\u7406\u670d\u52a1\u5668\u7684\u5b58\u50a8\u5e93:<\/p>\n<p>git clone https:\/\/github.com\/ROCm\/tritoninferenceserver-vllm.git<\/p>\n<p>\u63a5\u4e0b\u6765&#xff0c;\u8fdb\u5165\u00a0tritoninferenceserver-vllm\u00a0\u76ee\u5f55\u5e76\u8fd0\u884c\u00a0build-vllm-docker.py\u00a0Python \u811a\u672c\u6765\u6784\u5efa Docker \u955c\u50cf&#xff1a;<\/p>\n<p>cd tritoninferenceserver-vllm<\/p>\n<p>python3 build-vllm-docker.py &#8211;no-container-pull &#8211;enable-logging &#8211;enable-stats \\\\<br \/>\n  &#8211;enable-tracing &#8211;enable-rocm  &#8211;endpoint&#061;grpc \\\\<br \/>\n  &#8211;image gpu-base,rocm\/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2 \\\\<br \/>\n  &#8211;endpoint&#061;http &#8211;backend&#061;python &#8211;backend&#061;vllm<\/p>\n<p>\u65b0\u6784\u5efa\u7684 Docker \u955c\u50cf\u540d\u4e3a\u00a0tritonserver\u3002\u8981\u9a8c\u8bc1\u5b83\u7684\u5b58\u5728&#xff0c;\u8bf7\u4f7f\u7528\u4ee5\u4e0b\u547d\u4ee4&#xff1a;<\/p>\n<p>docker images | grep tritonserver<\/p>\n<p>\u8f93\u51fa\u5c06\u7c7b\u4f3c\u4e8e:<\/p>\n<p>REPOSITORY                TAG            IMAGE ID       CREATED         SIZE<br \/>\ntritonserver              latest         fffefb8a8258   22 hours ago    62.8GB<\/p>\n<p>\u6784\u5efa\u5b8c\u00a0tritonserver\u00a0Docker \u955c\u50cf\u540e&#xff0c;\u8ba9\u6211\u4eec\u8fd4\u56de\u5230\u539f\u59cb\u76ee\u5f55\u5e76\u514b\u9686\u8fd9\u4e2a\u535a\u5ba2\u7684\u5b58\u50a8\u5e93:<\/p>\n<p>cd ..<br \/>\ngit clone https:\/\/github.com\/ROCm\/rocm-blogs.git<br \/>\ncd rocm-blogs\/blogs\/artificial-intelligence\/triton_server_vllm\/docker<\/p>\n<p>\u7136\u540e\u7f16\u8f91\u73af\u5883\u6587\u4ef6&#xff1a;&#096;.\/triton_server_vllm\/docker\/.env&#096; \u5e76\u63d0\u4f9b Hugging Face Token:<\/p>\n<p>HUGGING_FACE_HUB_TOKEN&#061;&lt;YOUR_HUGGING_FACE_ACCESS_TOKEN&gt;<\/p>\n<p>\u63a5\u4e0b\u6765&#xff0c;\u8fd0\u884c\u4ee5\u4e0b\u547d\u4ee4\u8d4b\u4e88\u00a0start_services.sh\u00a0\u811a\u672c\u6267\u884c\u6743\u9650:<\/p>\n<p>chmod &#043;x start_services.sh<\/p>\n<p>\u6700\u540e&#xff0c;\u6784\u5efa\u5e76\u542f\u52a8 Docker \u5bb9\u5668:<\/p>\n<p>docker compose build<br \/>\ndocker compose up<\/p>\n<\/p>\n<p>\u6ce8\u610f&#xff1a;\u542f\u52a8\u5bb9\u5668\u548c\u670d\u52a1\u5c06\u82b1\u8d39\u4e00\u4e9b\u65f6\u95f4&#xff0c;\u56e0\u4e3a\u6a21\u578b\u00a0Mistral-7B-Instruct-v0.1\u3001&#096;microsoft\/phi-2&#096; \u548c\u00a0meta-llama\/Meta-Llama-3-8B-Instruct\u00a0\u5c06\u4ece Hugging Face Hub \u4e0b\u8f7d\u548c\u63d0\u4f9b\u670d\u52a1\u3002<\/p>\n<p>\u6267\u884c\u00a0docker compose up\u00a0\u547d\u4ee4\u540e&#xff0c;\u7ec8\u7aef\u5c06\u663e\u793a\u7c7b\u4f3c\u4ee5\u4e0b\u7684\u8f93\u51fa:<\/p>\n<p>[&#043;] Running 2\/1<br \/>\n \u2714 Network docker_default                 Created  0.1s<br \/>\n \u2714 Container docker-triton_server_vllm-1  Created  0.0s<br \/>\nAttaching to triton_server_vllm-1<br \/>\n&#8230;<br \/>\ntriton_server_vllm-1  | [I 2024-08-27 15:33:39.976 ServerApp] Jupyter Server 2.14.2 is running at:<br \/>\ntriton_server_vllm-1  | [I 2024-08-27 15:33:39.976 ServerApp] http:\/\/3dd761dca9b9:8888\/lab<br \/>\ntriton_server_vllm-1  | [I 2024-08-27 15:33:39.976 ServerApp]     http:\/\/127.0.0.1:8888\/lab<br \/>\ntriton_server_vllm-1  | [I 2024-08-27 15:33:39.976 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).<br \/>\n&#8230;<\/p>\n<p>triton_server_vllm-1  | INFO 08-27 22:22:12 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model&#061;&#039;mistralai\/Mistral-7B-Instruct-v0.1&#039;, tokenizer&#061;&#039;mistralai\/Mistral-7B-Instruct-v0.1&#039;, tokenizer_mode&#061;auto, revision&#061;None, tokenizer_revision&#061;None, trust_remote_code&#061;True, dtype&#061;torch.bfloat16, max_seq_len&#061;32768, download_dir&#061;None, load_format&#061;auto, tensor_parallel_size&#061;2, disable_custom_all_reduce&#061;True, quantization&#061;None, enforce_eager&#061;True, kv_cache_dtype&#061;auto, device_config&#061;cuda, seed&#061;0)<\/p>\n<p>triton_server_vllm-1  | INFO 08-27 22:22:13 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model&#061;&#039;microsoft\/phi-2&#039;, tokenizer&#061;&#039;microsoft\/phi-2&#039;, tokenizer_mode&#061;auto, revision&#061;None, tokenizer_revision&#061;None, trust_remote_code&#061;True, dtype&#061;torch.float16, max_seq_len&#061;2048, download_dir&#061;None, load_format&#061;auto, tensor_parallel_size&#061;2, disable_custom_all_reduce&#061;True, quantization&#061;None, enforce_eager&#061;True, kv_cache_dtype&#061;auto, device_config&#061;cuda, seed&#061;0)<\/p>\n<p>triton_server_vllm-1  | INFO 08-27 22:22:13 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model&#061;&#039;meta-llama\/Meta-Llama-3-8B-Instruct&#039;, tokenizer&#061;&#039;meta-llama\/Meta-Llama-3-8B-Instruct&#039;, tokenizer_mode&#061;auto, revision&#061;None, tokenizer_revision&#061;None, trust_remote_code&#061;True, dtype&#061;torch.bfloat16, max_seq_len&#061;8192, download_dir&#061;None, load_format&#061;auto, tensor_parallel_size&#061;2, disable_custom_all_reduce&#061;True, quantization&#061;None, enforce_eager&#061;True, kv_cache_dtype&#061;auto, device_config&#061;cuda, seed&#061;0)<\/p>\n<p>\u5f53 Triton Inference Server \u51c6\u5907\u5c31\u7eea\u65f6&#xff0c;\u63a7\u5236\u53f0\u5c06\u663e\u793a\u4ee5\u4e0b\u5185\u5bb9:<\/p>\n<p>triton_server_vllm-1  | I0827 22:27:53.490967 15 grpc_server.cc:2513] Started GRPCInferenceService at 0.0.0.0:8001<br \/>\ntriton_server_vllm-1  | I0827 22:27:53.491185 15 http_server.cc:4497] Started HTTPService at 0.0.0.0:8000<\/p>\n<p>\u5728\u63a7\u5236\u53f0\u7684\u8f93\u51fa\u4e2d\u6211\u4eec\u770b\u5230:<\/p>\n<ul>\n<li>\n<p>\u00a0Jupyter \u670d\u52a1\u5668\u6b63\u5728\u8fd0\u884c&#xff0c;\u5730\u5740\u4e3a\u00a0http:\/\/127.0.0.1:8888\/lab<\/p>\n<\/li>\n<li>\n<p>\u6a21\u578b\u00a0mistralai\/Mistral-7B-Instruct-v0.1\u00a0\u6b63\u5728\u521d\u59cb\u5316\u3002\u53ef\u4ee5\u5728\u00a0http:\/\/localhost:8000\/v2\/models\/mistral-7b-instruct\/generate\u00a0\u53d1\u9001\u8bf7\u6c42<\/p>\n<\/li>\n<li>\n<p>\u6a21\u578b\u00a0microsoft\/phi-2\u00a0\u6b63\u5728\u521d\u59cb\u5316\u3002\u53ef\u4ee5\u5728\u00a0http:\/\/localhost:8000\/v2\/models\/phi2\/generate\u00a0\u53d1\u9001\u8bf7\u6c42<\/p>\n<\/li>\n<li>\n<p>\u6a21\u578b\u00a0meta-llama\/Meta-Llama-3-8B-Instruct\u00a0\u6b63\u5728\u521d\u59cb\u5316\u3002\u53ef\u4ee5\u5728\u00a0http:\/\/localhost:8000\/v2\/models\/llama3-8b-instruct\/generate\u00a0\u53d1\u9001\u8bf7\u6c42<\/p>\n<\/li>\n<\/ul>\n<p>\u968f\u7740\u6a21\u578b\u51c6\u5907\u597d\u8fdb\u884c\u63a8\u7406&#xff0c;\u6211\u4eec\u53ef\u4ee5\u8fdb\u884c\u4e00\u4e9b\u6d4b\u8bd5\u3002<\/p>\n<h3 style=\"background-color:transparent\">\u7406\u89e3\u6a21\u578b\u5e93\u7ed3\u6784\u548c\u914d\u7f6e<\/h3>\n<p>docker-compose.yaml\u00a0\u6587\u4ef6\u5305\u542b\u521b\u5efa Docker \u5bb9\u5668\u7684\u5fc5\u8981\u914d\u7f6e&#xff0c;\u8be5\u5bb9\u5668\u53ef\u4ee5\u670d\u52a1\u00a0phi-2,\u00a0Mistral-7B-Instruct-v0.1, \u548c\u00a0Meta-Llama-3-8B-Instruct\u00a0\u6a21\u578b\u3002\u7528\u4e8e\u670d\u52a1\u548c\u6267\u884c\u6bcf\u4e2a\u6a21\u578b\u63a8\u7406\u7684\u5177\u4f53\u914d\u7f6e\u4f4d\u4e8e\u00a0.\/triton_server_vllm\/src\/model_repository\u00a0\u6587\u4ef6\u5939\u4e2d\u3002\u5728\u6211\u4eec\u7684\u4f8b\u5b50\u4e2d&#xff0c;\u8fd9\u4e2a\u00a0model_repository\u00a0\u6587\u4ef6\u5939\u5177\u6709\u4ee5\u4e0b\u7ed3\u6784&#xff1a;<\/p>\n<p>model_repository\/<br \/>\n    \u251c\u2500\u2500 llama3-8b-instruct\/<br \/>\n    \u2502   \u251c\u2500\u2500 config.pbtxt    # \u63cf\u8ff0\u6a21\u578b\u7684\u914d\u7f6e\u6587\u4ef6<br \/>\n    \u2502   \u251c\u2500\u2500 1\/              # \u7248\u672c\u76ee\u5f55<br \/>\n    \u2502       \u2514\u2500\u2500 model.json  # \u5b9e\u9645\u7684\u6a21\u578b\u6587\u4ef6<br \/>\n    \u251c\u2500\u2500 mistral-7b-instruct\/<br \/>\n    \u2502   \u251c\u2500\u2500 config.pbtxt<br \/>\n    \u2502   \u251c\u2500\u2500 1\/<br \/>\n    \u2502       \u2514\u2500\u2500 model.json<br \/>\n    \u251c\u2500\u2500 phi2\/<br \/>\n    \u2502   \u251c\u2500\u2500 config.pbtxt<br \/>\n    \u2502   \u251c\u2500\u2500 1\/<br \/>\n    \u2502       \u2514\u2500\u2500 model.json<\/p>\n<p>\u6bcf\u4e2a\u6a21\u578b\u7684\u00a0model.json\u00a0\u6587\u4ef6\u5305\u542b\u5176\u81ea\u8eab\u7684\u914d\u7f6e\u3002\u5bf9\u4e8e\u00a0llama3-8b-instruct\u00a0\u6a21\u578b&#xff0c;\u5176\u00a0model.json\u00a0\u6587\u4ef6\u5982\u4e0b&#xff1a;<\/p>\n<p>{<br \/>\n    &#034;model&#034;:&#034;meta-llama\/Meta-Llama-3-8B-Instruct&#034;,<br \/>\n    &#034;gpu_memory_utilization&#034;: 0.8,<br \/>\n    &#034;tensor_parallel_size&#034;: 2,<br \/>\n    &#034;trust_remote_code&#034;: true,<br \/>\n    &#034;disable_log_requests&#034;: true,<br \/>\n    &#034;enforce_eager&#034;: true,<br \/>\n    &#034;max_model_len&#034;: 2048<br \/>\n}<\/p>\n<p>For the\u00a0Mistral-7B-Instruct-v0.1\u00a0model its\u00a0model.json\u00a0is:<\/p>\n<p>{<br \/>\n    &#034;model&#034;:&#034;mistralai\/Mistral-7B-Instruct-v0.1&#034;,<br \/>\n    &#034;gpu_memory_utilization&#034;: 0.8,<br \/>\n    &#034;tensor_parallel_size&#034;: 2,<br \/>\n    &#034;trust_remote_code&#034;: true,<br \/>\n    &#034;disable_log_requests&#034;: true,<br \/>\n    &#034;enforce_eager&#034;: true,<br \/>\n    &#034;max_model_len&#034;: 2048<br \/>\n}<\/p>\n<p>\u5bf9\u4e8e\u00a0Mistral-7B-Instruct-v0.1\u00a0\u6a21\u578b&#xff0c;\u5176\u00a0model.json\u00a0\u6587\u4ef6\u5982\u4e0b&#xff1a;<\/p>\n<p>{<br \/>\n    &#034;model&#034;:&#034;microsoft\/phi-2&#034;,<br \/>\n    &#034;gpu_memory_utilization&#034;: 0.8,<br \/>\n    &#034;tensor_parallel_size&#034;: 1,<br \/>\n    &#034;trust_remote_code&#034;: true,<br \/>\n    &#034;disable_log_requests&#034;: true,<br \/>\n    &#034;enforce_eager&#034;: true,<br \/>\n    &#034;max_model_len&#034;: 2048<br \/>\n}<\/p>\n<p>\u6bcf\u4e2a\u00a0model.json\u00a0\u6587\u4ef6\u4e2d\u7684\u00a0tensor_parallel_size\u00a0\u53c2\u6570\u503c\u6307\u5b9a\u4e86\u7528\u4e8e\u6bcf\u4e2a\u6a21\u578b\u5e76\u884c\u8ba1\u7b97\u7684 GPU \u6570\u91cf\u3002\u7531\u4e8e\u6211\u4eec\u5e0c\u671b\u540c\u65f6\u8fd0\u884c\u8fd93\u4e2a\u6a21\u578b&#xff0c;\u5e76\u4e14\u62e5\u67098\u4e2a AMD Instinct MI210 GPU&#xff0c;\u8fd9\u610f\u5473\u7740\u00a0Meta-Llama-3-8B-Instruct\u00a0\u5c06\u4f7f\u75288\u4e2aGPU\u4e2d\u76842\u4e2a&#xff0c;&#096;Mistral-7B-Instruct-v0.1&#096; \u5c06\u4f7f\u7528\u5269\u4f596\u4e2aGPU\u4e2d\u76842\u4e2a&#xff0c;\u800c\u00a0phi-2\u00a0\u5c06\u4f7f\u7528\u5269\u4f594\u4e2aGPU\u4e2d\u76841\u4e2a\u3002\u5982\u679c\u67d0\u4e2a\u6a21\u578b\u9700\u8981\u66f4\u591aGPU&#xff0c;\u6211\u4eec\u9700\u8981\u8c03\u6574\u4e00\u4e2a\u6216\u591a\u4e2a\u6a21\u578b\u7684\u00a0tensor_parallel_size\u00a0\u53c2\u6570\u503c\u4ee5\u9002\u5e94\u53ef\u7528\u7684GPU\u6570\u91cf\u3002<\/p>\n<p>\u5173\u4e8e\u66f4\u591a\u53c2\u6570\u548c\u914d\u7f6e\u6587\u4ef6\u7684\u4fe1\u606f&#xff0c;\u8bf7\u53c2\u9605 Triton Inference Server-vLLM \u6587\u6863&#xff1a;Start Triton Inference Server<\/p>\n<h3>\u4f7f\u7528 phi-2, Mistral-7B-Instruct-v0.1 \u548c Meta-Llama-3-8B-Instruct \u8fdb\u884c\u63a8\u7406<\/h3>\n<p>\u5f53\u6211\u4eec\u7684 Jupyter Lab \u548c Triton \u63a8\u7406\u670d\u52a1\u5668\u8fd0\u884c\u65f6&#xff0c;\u5bfc\u822a\u5230\u00a0http:\/\/127.0.0.1:8888\/lab\/tree\/src\/triton_server_vllm.ipynb\u00a0\u8fdb\u884c\u8fd9\u4e9b\u6a21\u578b\u7684\u63a8\u7406\u3002<\/p>\n<p>\u8ba9\u6211\u4eec\u5f00\u59cb\u6d4b\u8bd5\u00a0microsoft\/phi-2&#xff0c;\u5982\u4e0b:<\/p>\n<p># \u5b9a\u4e49\u7aef\u70b9URL<br \/>\nurl &#061; &#034;http:\/\/localhost:8000\/v2\/models\/phi2\/generate&#034;<\/p>\n<p># \u5b9a\u4e49\u8d1f\u8f7d<br \/>\npayload &#061; {<br \/>\n    &#034;text_input&#034;: &#034;What is triton inference server?&#034;,<br \/>\n    &#034;parameters&#034;: {<br \/>\n        &#034;stream&#034;: False,<br \/>\n        &#034;temperature&#034;: 0,<br \/>\n        &#034;max_tokens&#034;: 100<br \/>\n    }<br \/>\n}<\/p>\n<p># \u8bbe\u7f6e\u8bf7\u6c42\u5934&#xff08;\u53ef\u9009&#xff09;<br \/>\nheaders &#061; {<br \/>\n    &#034;Content-Type&#034;: &#034;application\/json&#034;<br \/>\n}<\/p>\n<p># \u53d1\u9001 POST \u8bf7\u6c42<br \/>\nresponse &#061; requests.post(url, data&#061;json.dumps(payload), headers&#061;headers)<\/p>\n<p># \u6253\u5370\u54cd\u5e94<br \/>\nprint(response.status_code)<br \/>\nprint(response.json())<\/p>\n<p>\u6211\u4eec\u6b63\u5728\u5411 Triton \u63a8\u7406\u670d\u52a1\u5668\u53d1\u9001\u5e26\u6709\u63d0\u793a\u7684\u8d1f\u8f7d&#xff1a;&#096;&#034;What is triton inference server?&#034;\u3002\u8f93\u51fa\u5305\u542b\u54cd\u5e94\u72b6\u6001 200&#096; \u548c\u4e00\u4e2a\u00a0json\u00a0\u5bf9\u8c61:<\/p>\n<p>200<br \/>\n{&#039;model_name&#039;: &#039;phi2&#039;, &#039;model_version&#039;: &#039;1&#039;, &#039;text_output&#039;: &#039;What is triton inference server?\\\\n\\\\nTriton inference server is a software that helps to run machine learning models on a computer. It is like a helper that makes sure the models work correctly and gives us the results we need.\\\\n\\\\nWhat is the purpose of triton inference server?\\\\n\\\\nThe purpose of triton inference server is to help us use machine learning models in our daily lives. It makes it easier for us to use these models and get the results we need.\\\\n\\\\nHow does triton inference server&#039;}<\/p>\n<p>\u4f7f\u7528&#096;Mistral-7B-Instruct-v0.1&#096;\u65f6&#xff0c;\u6211\u4eec\u6709\u5982\u4e0b\u4ee3\u7801:<\/p>\n<p># \u5b9a\u4e49\u7aef\u70b9URL<br \/>\nurl &#061; &#034;http:\/\/localhost:8000\/v2\/models\/mistral-7b-instruct\/generate&#034;<\/p>\n<p># \u5b9a\u4e49\u8d1f\u8f7d<br \/>\npayload &#061; {<br \/>\n    &#034;text_input&#034;: &#034;What is triton inference server?&#034;,<br \/>\n    &#034;parameters&#034;: {<br \/>\n        &#034;stream&#034;: False,<br \/>\n        &#034;temperature&#034;: 0,<br \/>\n        &#034;max_tokens&#034;: 100<br \/>\n    }<br \/>\n}<\/p>\n<p># \u8bbe\u7f6e\u5934\u4fe1\u606f (\u53ef\u9009)<br \/>\nheaders &#061; {<br \/>\n    &#034;Content-Type&#034;: &#034;application\/json&#034;<br \/>\n}<\/p>\n<p># \u53d1\u9001POST\u8bf7\u6c42<br \/>\nresponse &#061; requests.post(url, data&#061;json.dumps(payload), headers&#061;headers)<\/p>\n<p># \u6253\u5370\u54cd\u5e94<br \/>\nprint(response.status_code)<br \/>\nprint(response.json())<\/p>\n<p>\u8f93\u51fa\u5982\u4e0b&#xff1a;<\/p>\n<p>200<br \/>\n{&#039;model_name&#039;: &#039;mistral-7b-instruct&#039;, &#039;model_version&#039;: &#039;1&#039;, &#039;text_output&#039;: &#039;What is triton inference server?\\\\n\\\\nTriton Inference Server is an open-source, high-performance, and scalable inference engine for deep learning models. It supports a wide range of deep learning frameworks, including TensorFlow, PyTorch, and MXNet, and can be used to deploy deep learning models in various environments, such as edge devices, cloud services, and on-premises data centers.\\\\n\\\\nTriton Inference Server provides a unified API for accessing&#039;}<\/p>\n<p>\u6700\u7ec8&#xff0c;\u4f7f\u7528&#096;meta-llama\/Meta-Llama-3-8B-Instruct&#096;\u8fdb\u884c\u63a8\u7406\u65f6&#xff0c;\u6211\u4eec\u6709\u5982\u4e0b\u4ee3\u7801:<\/p>\n<p># \u5b9a\u4e49\u7aef\u70b9URL<br \/>\nurl &#061; &#034;http:\/\/localhost:8000\/v2\/models\/llama3-8b-instruct\/generate&#034;<\/p>\n<p># \u5b9a\u4e49\u8d1f\u8f7d<br \/>\npayload &#061; {<br \/>\n    &#034;text_input&#034;: &#034;What is triton inference server?&#034;,<br \/>\n    &#034;parameters&#034;: {<br \/>\n        &#034;stream&#034;: False,<br \/>\n        &#034;temperature&#034;: 0,<br \/>\n        &#034;max_tokens&#034;: 100<br \/>\n    }<br \/>\n}<\/p>\n<p># \u8bbe\u7f6e\u5934\u4fe1\u606f (\u53ef\u9009)<br \/>\nheaders &#061; {<br \/>\n    &#034;Content-Type&#034;: &#034;application\/json&#034;<br \/>\n}<\/p>\n<p># \u53d1\u9001POST\u8bf7\u6c42<br \/>\nresponse &#061; requests.post(url, data&#061;json.dumps(payload), headers&#061;headers)<\/p>\n<p># \u6253\u5370\u54cd\u5e94<br \/>\nprint(response.status_code)<br \/>\nprint(response.json())<\/p>\n<p>POST\u8bf7\u6c42\u7684\u54cd\u5e94\u5982\u4e0b&#xff1a;<\/p>\n<p>200<br \/>\n{&#039;model_name&#039;: &#039;llama3-8b-instruct&#039;, &#039;model_version&#039;: &#039;1&#039;, &#039;text_output&#039;: &#039;What is triton inference server?\u00b6\\\\n\\\\nTriton Inference Server is an open-source, high-performance, scalable, and extensible deep learning inference server developed by NVIDIA. It is designed to serve as a production-ready inference engine for deep learning models, allowing developers to deploy and manage their models in a scalable and efficient manner.\\\\n\\\\nTriton Inference Server provides a number of features that make it an attractive choice for deploying deep learning models in production environments, including:\\\\n\\\\n1. **Model serving**: Triton Inference Server can&#039;}<\/p>\n<p>\u540c\u65f6\u90e8\u7f72\u8fd9\u4e09\u4e2a\u6a21\u578b&#xff08;&#096;microsoft\/phi-2&#096;&#xff0c;&#096;Mistral-7B-Instruct-v0.1&#096;&#xff0c;\u548c&#096;meta-llama\/Meta-Llama-3-8B-Instruct&#096;&#xff09;\u4f7f\u6211\u4eec\u80fd\u591f\u63d0\u4f9b\u591a\u4e2aLLM\u670d\u52a1\u3002Triton\u63a8\u7406\u670d\u52a1\u5668\u4e0evLLM\u540e\u7aef\u7ba1\u7406\u4e86\u5fc5\u8981\u7684\u8d44\u6e90&#xff0c;\u5e76\u4f18\u5316\u4e86\u5185\u5b58\u5229\u7528\u7387&#xff0c;\u4ee5\u540c\u65f6\u8fd0\u884c\u8fd9\u4e9b\u6a21\u578b\u3002<\/p>\n<h3>\u603b\u7ed3<\/h3>\n<p>\u5728\u8fd9\u7bc7\u535a\u5ba2\u4e2d&#xff0c;\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u4e0evLLM\u540e\u7aef\u90e8\u7f72\u548c\u670d\u52a1\u4e09\u4e2aLLM\u3002\u8fd9\u4e9b\u90fd\u7531AMD GPU\u548cROCm\u8f6f\u4ef6\u5e73\u53f0\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u6b65\u6b65\u6307\u5357&#xff0c;\u4ecb\u7ecd\u5982\u4f55\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u9ad8\u6548\u5904\u7406\u591a\u4e2aLLM&#xff0c;\u5c55\u793a\u4e86AMD\u786c\u4ef6\u5728\u9ad8\u9700\u6c42AI\u5e94\u7528\u4e2d\u7684\u5f3a\u5927\u6027\u80fd\u548c\u53ef\u9760\u6027\u3002<\/p>\n<\/p>\n<\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb1.9k\u6b21\uff0c\u70b9\u8d5e16\u6b21\uff0c\u6536\u85cf14\u6b21\u3002\u5728\u8fd9\u7bc7\u535a\u5ba2\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u4e0evLLM\u540e\u7aef\u90e8\u7f72\u548c\u670d\u52a1\u4e09\u4e2aLLM\u3002\u8fd9\u4e9b\u90fd\u7531AMD GPU\u548cROCm\u8f6f\u4ef6\u5e73\u53f0\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u6b65\u6b65\u6307\u5357\uff0c\u4ecb\u7ecd\u5982\u4f55\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u9ad8\u6548\u5904\u7406\u591a\u4e2aLLM\uff0c\u5c55\u793a\u4e86AMD\u786c\u4ef6\u5728\u9ad8\u9700\u6c42AI\u5e94\u7528\u4e2d\u7684\u5f3a\u5927\u6027\u80fd\u548c\u53ef\u9760\u6027\u3002_vllm triton<\/p>\n","protected":false},"author":2,"featured_media":22819,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[1753,50],"topic":[],"class_list":["post-22821","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-server","tag-rocm","tag-50"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.wsisp.com\/helps\/22821.html\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"og:description\" content=\"\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb1.9k\u6b21\uff0c\u70b9\u8d5e16\u6b21\uff0c\u6536\u85cf14\u6b21\u3002\u5728\u8fd9\u7bc7\u535a\u5ba2\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u4e0evLLM\u540e\u7aef\u90e8\u7f72\u548c\u670d\u52a1\u4e09\u4e2aLLM\u3002\u8fd9\u4e9b\u90fd\u7531AMD GPU\u548cROCm\u8f6f\u4ef6\u5e73\u53f0\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u6b65\u6b65\u6307\u5357\uff0c\u4ecb\u7ecd\u5982\u4f55\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u9ad8\u6548\u5904\u7406\u591a\u4e2aLLM\uff0c\u5c55\u793a\u4e86AMD\u786c\u4ef6\u5728\u9ad8\u9700\u6c42AI\u5e94\u7528\u4e2d\u7684\u5f3a\u5927\u6027\u80fd\u548c\u53ef\u9760\u6027\u3002_vllm triton\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.wsisp.com\/helps\/22821.html\" \/>\n<meta property=\"og:site_name\" content=\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"article:published_time\" content=\"2025-04-18T22:31:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/04\/20250418223128-6802d2c09d49e.png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/22821.html\",\"url\":\"https:\/\/www.wsisp.com\/helps\/22821.html\",\"name\":\"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"isPartOf\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\"},\"datePublished\":\"2025-04-18T22:31:30+00:00\",\"dateModified\":\"2025-04-18T22:31:30+00:00\",\"author\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/22821.html#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.wsisp.com\/helps\/22821.html\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/22821.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.wsisp.com\/helps\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\",\"url\":\"https:\/\/www.wsisp.com\/helps\/\",\"name\":\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"description\":\"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"contentUrl\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"caption\":\"admin\"},\"sameAs\":[\"http:\/\/wp.wsisp.com\"],\"url\":\"https:\/\/www.wsisp.com\/helps\/author\/admin\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.wsisp.com\/helps\/22821.html","og_locale":"zh_CN","og_type":"article","og_title":"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","og_description":"\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb1.9k\u6b21\uff0c\u70b9\u8d5e16\u6b21\uff0c\u6536\u85cf14\u6b21\u3002\u5728\u8fd9\u7bc7\u535a\u5ba2\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u4e0evLLM\u540e\u7aef\u90e8\u7f72\u548c\u670d\u52a1\u4e09\u4e2aLLM\u3002\u8fd9\u4e9b\u90fd\u7531AMD GPU\u548cROCm\u8f6f\u4ef6\u5e73\u53f0\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u6b65\u6b65\u6307\u5357\uff0c\u4ecb\u7ecd\u5982\u4f55\u7528Triton\u63a8\u7406\u670d\u52a1\u5668\u9ad8\u6548\u5904\u7406\u591a\u4e2aLLM\uff0c\u5c55\u793a\u4e86AMD\u786c\u4ef6\u5728\u9ad8\u9700\u6c42AI\u5e94\u7528\u4e2d\u7684\u5f3a\u5927\u6027\u80fd\u548c\u53ef\u9760\u6027\u3002_vllm triton","og_url":"https:\/\/www.wsisp.com\/helps\/22821.html","og_site_name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","article_published_time":"2025-04-18T22:31:30+00:00","og_image":[{"url":"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/04\/20250418223128-6802d2c09d49e.png"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"8 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.wsisp.com\/helps\/22821.html","url":"https:\/\/www.wsisp.com\/helps\/22821.html","name":"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","isPartOf":{"@id":"https:\/\/www.wsisp.com\/helps\/#website"},"datePublished":"2025-04-18T22:31:30+00:00","dateModified":"2025-04-18T22:31:30+00:00","author":{"@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41"},"breadcrumb":{"@id":"https:\/\/www.wsisp.com\/helps\/22821.html#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.wsisp.com\/helps\/22821.html"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.wsisp.com\/helps\/22821.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.wsisp.com\/helps"},{"@type":"ListItem","position":2,"name":"\u5728 AMD GPU \u4e0a\u4f7f\u7528 vLLM \u7684 Triton \u63a8\u7406\u670d\u52a1\u5668"}]},{"@type":"WebSite","@id":"https:\/\/www.wsisp.com\/helps\/#website","url":"https:\/\/www.wsisp.com\/helps\/","name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","description":"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41","name":"admin","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/","url":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","contentUrl":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","caption":"admin"},"sameAs":["http:\/\/wp.wsisp.com"],"url":"https:\/\/www.wsisp.com\/helps\/author\/admin"}]}},"_links":{"self":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/22821","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/comments?post=22821"}],"version-history":[{"count":0,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/22821\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media\/22819"}],"wp:attachment":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media?parent=22821"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/categories?post=22821"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/tags?post=22821"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/topic?post=22821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}