{"id":48690,"date":"2025-07-30T13:03:01","date_gmt":"2025-07-30T05:03:01","guid":{"rendered":"https:\/\/www.wsisp.com\/helps\/48690.html"},"modified":"2025-07-30T13:03:01","modified_gmt":"2025-07-30T05:03:01","slug":"%e4%b8%80%e6%ad%a5%e5%88%b0%e4%bd%8d%ef%bc%81%e7%94%a8-modal-%e5%bf%ab%e9%80%9f%e9%83%a8%e7%bd%b2-vllm-%e6%8e%a8%e7%90%86%e6%9c%8d%e5%8a%a1%e5%99%a8%e5%85%a8%e6%b5%81%e7%a8%8b%e5%ae%9e%e6%88%98","status":"publish","type":"post","link":"https:\/\/www.wsisp.com\/helps\/48690.html","title":{"rendered":"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357"},"content":{"rendered":"<p>\u6211\u7b2c\u4e00\u6b21\u63a5\u89e6 Modal \u662f\u5728\u53c2\u52a0 Hugging Face \u9ed1\u5ba2\u677e\u65f6&#xff0c;\u5f53\u65f6\u771f\u7684\u88ab\u5b83\u7684\u6613\u7528\u6027\u6240\u60ca\u8273\u3002\u8fd9\u4e2a\u5e73\u53f0\u53ef\u4ee5\u8ba9\u4f60\u5728\u51e0\u5206\u949f\u5185\u6784\u5efa\u548c\u90e8\u7f72\u5e94\u7528&#xff0c;\u4f53\u9a8c\u4e0e BentoCloud \u7c7b\u4f3c&#xff0c;\u6d41\u7545\u9ad8\u6548\u3002\u901a\u8fc7 Modal&#xff0c;\u4f60\u53ef\u4ee5\u914d\u7f6e\u81ea\u5df1\u7684 Python \u5e94\u7528&#xff0c;\u5305\u62ec GPU\u3001Docker \u955c\u50cf\u548c Python \u4f9d\u8d56\u7b49\u7cfb\u7edf\u73af\u5883&#xff0c;\u7136\u540e\u53ea\u9700\u4e00\u6761\u547d\u4ee4\u5373\u53ef\u90e8\u7f72\u5230\u4e91\u7aef\u3002<\/p>\n<p>\u5728\u672c\u6559\u7a0b\u4e2d&#xff0c;\u6211\u4eec\u5c06\u5b66\u4e60\u5982\u4f55\u914d\u7f6e Modal\u3001\u521b\u5efa vLLM \u670d\u52a1\u5668&#xff0c;\u5e76\u5b89\u5168\u5730\u90e8\u7f72\u5230\u4e91\u7aef\u3002\u540c\u65f6&#xff0c;\u8fd8\u4f1a\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528 CURL \u53ca OpenAI SDK \u6d4b\u8bd5\u4f60\u7684 vLLM \u670d\u52a1\u5668\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" alt=\"\" height=\"720\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/07\/20250730050258-6889a7829d6f6.png\" width=\"1200\" \/><\/p>\n<hr \/>\n<h4>1. \u914d\u7f6e Modal<\/h4>\n<p>Modal \u662f\u4e00\u4e2a\u65e0\u670d\u52a1\u5668(serverless)\u5e73\u53f0&#xff0c;\u53ef\u8ba9\u4f60\u8fdc\u7a0b\u8fd0\u884c\u4efb\u610f\u4ee3\u7801\u3002\u53ea\u9700\u4e00\u884c\u547d\u4ee4&#xff0c;\u4f60\u5c31\u80fd\u6302\u8f7d GPU\u3001\u5c06\u51fd\u6570\u4f5c\u4e3a Web \u7aef\u70b9\u670d\u52a1&#xff0c;\u5e76\u90e8\u7f72\u6301\u4e45\u5316\u7684\u5b9a\u65f6\u4efb\u52a1\u3002\u5b83\u975e\u5e38\u9002\u5408\u521d\u5b66\u8005\u3001\u6570\u636e\u79d1\u5b66\u5bb6\u4ee5\u53ca\u4e0d\u60f3\u5904\u7406\u4e91\u57fa\u7840\u8bbe\u65bd\u7684\u975e\u8f6f\u4ef6\u5de5\u7a0b\u4e13\u4e1a\u7528\u6237\u3002<\/p>\n<p>\u9996\u5148&#xff0c;\u5b89\u88c5 Modal \u7684 Python \u5ba2\u6237\u7aef\u3002\u8fd9\u4e2a\u5de5\u5177\u53ef\u4ee5\u8ba9\u4f60\u76f4\u63a5\u5728\u7ec8\u7aef\u6784\u5efa\u955c\u50cf\u3001\u90e8\u7f72\u5e94\u7528\u4ee5\u53ca\u7ba1\u7406\u4e91\u8d44\u6e90\u3002<\/p>\n<p>pip install modal<\/p>\n<p>\u63a5\u7740&#xff0c;\u5728\u672c\u5730\u673a\u5668\u4e0a\u8fdb\u884c Modal \u7684\u8bbe\u7f6e\u3002\u8fd0\u884c\u4ee5\u4e0b\u547d\u4ee4&#xff0c;\u6309\u63d0\u793a\u5b8c\u6210\u8d26\u53f7\u521b\u5efa\u548c\u8bbe\u5907\u8ba4\u8bc1&#xff1a;<\/p>\n<p>python -m modal setup<\/p>\n<p>\u901a\u8fc7\u8bbe\u7f6e VLLM_API_KEY \u73af\u5883\u53d8\u91cf&#xff0c;vLLM \u53ef\u4ee5\u63d0\u4f9b\u4e00\u4e2a\u5b89\u5168\u7684\u7aef\u70b9&#xff0c;\u53ea\u6709\u62e5\u6709\u6709\u6548 API Key \u7684\u7528\u6237\u624d\u80fd\u8bbf\u95ee\u670d\u52a1\u5668\u3002\u4f60\u53ef\u4ee5\u901a\u8fc7 Modal Secret \u6dfb\u52a0\u73af\u5883\u53d8\u91cf\u6765\u8bbe\u7f6e\u8ba4\u8bc1\u3002<\/p>\n<p>\u7528\u4f60\u81ea\u5df1\u7684 API Key \u66ff\u6362 your_actual_api_key_here&#xff1a;<\/p>\n<p>modal secret create vllm-api VLLM_API_KEY&#061;your_actual_api_key_here<\/p>\n<p>\u8fd9\u6837\u53ef\u4ee5\u786e\u4fdd\u4f60\u7684 API Key \u5b89\u5168\u5b58\u50a8&#xff0c;\u4ec5\u90e8\u7f72\u7684\u5e94\u7528\u80fd\u591f\u8bbf\u95ee\u3002<\/p>\n<hr \/>\n<h4>2. \u4f7f\u7528 Modal \u521b\u5efa vLLM \u5e94\u7528<\/h4>\n<p>\u672c\u8282\u5c06\u6307\u5bfc\u4f60\u5728 Modal \u4e0a\u6784\u5efa\u53ef\u6269\u5c55\u7684 vLLM \u63a8\u7406\u670d\u52a1\u5668&#xff0c;\u6d89\u53ca\u81ea\u5b9a\u4e49 Docker \u955c\u50cf\u3001\u6301\u4e45\u5316\u5b58\u50a8\u4e0e GPU \u52a0\u901f\u3002\u6211\u4eec\u9009\u7528 mistralai\/Magistral-Small-2506 \u6a21\u578b&#xff0c;\u8be5\u6a21\u578b\u9700\u8981\u5bf9\u5206\u8bcd\u5668\u548c\u5de5\u5177\u8c03\u7528\u89e3\u6790\u505a\u7279\u5b9a\u914d\u7f6e\u3002<\/p>\n<p>\u521b\u5efa vllm_inference.py \u6587\u4ef6&#xff0c;\u6dfb\u52a0\u5982\u4e0b\u4ee3\u7801&#xff1a;<\/p>\n<ul>\n<li>\u57fa\u4e8e Debian Slim\u3001Python 3.12 \u548c\u6240\u9700\u4f9d\u8d56\u5b9a\u4e49 vLLM \u955c\u50cf&#xff0c;\u8bbe\u7f6e\u73af\u5883\u53d8\u91cf\u4ee5\u4f18\u5316\u6a21\u578b\u4e0b\u8f7d\u548c\u63a8\u7406\u6027\u80fd\u3002<\/li>\n<li>\u4e3a\u907f\u514d\u91cd\u590d\u4e0b\u8f7d\u5e76\u52a0\u901f\u542f\u52a8&#xff0c;\u521b\u5efa\u4e24\u4e2a Modal \u5377&#xff08;Volume&#xff09;&#xff1a;\u4e00\u4e2a\u7528\u4e8e Hugging Face \u6a21\u578b\u3001\u4e00\u4e2a\u7528\u4e8e vLLM \u7f13\u5b58\u3002<\/li>\n<li>\u6307\u5b9a\u6a21\u578b\u53ca\u7248\u672c&#xff0c;\u786e\u4fdd\u53ef\u590d\u73b0\u6027&#xff1b;\u542f\u7528 vLLM V1 \u5f15\u64ce\u63d0\u5347\u6027\u80fd\u3002<\/li>\n<li>\u914d\u7f6e Modal \u5e94\u7528&#xff0c;\u5305\u62ec GPU \u8d44\u6e90\u3001\u4f38\u7f29\u3001\u8d85\u65f6\u3001\u5b58\u50a8\u53ca\u5bc6\u94a5\u7ba1\u7406\u3002\u4e3a\u7a33\u5b9a\u8d77\u89c1&#xff0c;\u9650\u5236\u6bcf\u4e2a\u526f\u672c\u7684\u5e76\u53d1\u8bf7\u6c42\u6570\u3002<\/li>\n<li>\u521b\u5efa Web \u670d\u52a1\u5668&#xff0c;\u5e76\u7528 Python \u7684 subprocess \u5e93\u6267\u884c vLLM \u670d\u52a1\u542f\u52a8\u547d\u4ee4\u3002<\/li>\n<\/ul>\n<p>import modal<\/p>\n<p>vllm_image &#061; (<br \/>\n    modal.Image.debian_slim(python_version&#061;&#034;3.12&#034;)<br \/>\n    .pip_install(<br \/>\n        &#034;vllm&#061;&#061;0.9.1&#034;,<br \/>\n        &#034;huggingface_hub[hf_transfer]&#061;&#061;0.32.0&#034;,<br \/>\n        &#034;flashinfer-python&#061;&#061;0.2.6.post1&#034;,<br \/>\n        extra_index_url&#061;&#034;https:\/\/download.pytorch.org\/whl\/cu128&#034;,<br \/>\n    )<br \/>\n    .env(<br \/>\n        {<br \/>\n            &#034;HF_HUB_ENABLE_HF_TRANSFER&#034;: &#034;1&#034;,  # \u66f4\u5feb\u7684\u6a21\u578b\u4f20\u8f93<br \/>\n            &#034;NCCL_CUMEM_ENABLE&#034;: &#034;1&#034;,<br \/>\n        }<br \/>\n    )<br \/>\n)<\/p>\n<p>MODEL_NAME &#061; &#034;mistralai\/Magistral-Small-2506&#034;<br \/>\nMODEL_REVISION &#061; &#034;48c97929837c3189cb3cf74b1b5bc5824eef5fcc&#034;<\/p>\n<p>hf_cache_vol &#061; modal.Volume.from_name(&#034;huggingface-cache&#034;, create_if_missing&#061;True)<br \/>\nvllm_cache_vol &#061; modal.Volume.from_name(&#034;vllm-cache&#034;, create_if_missing&#061;True)<br \/>\nvllm_image &#061; vllm_image.env({&#034;VLLM_USE_V1&#034;: &#034;1&#034;})<\/p>\n<p>FAST_BOOT &#061; True<\/p>\n<p>app &#061; modal.App(&#034;magistral-small-vllm&#034;)<\/p>\n<p>N_GPU &#061; 2<br \/>\nMINUTES &#061; 60  # \u79d2<br \/>\nVLLM_PORT &#061; 8000<\/p>\n<p>&#064;app.function(<br \/>\n    image&#061;vllm_image,<br \/>\n    gpu&#061;f&#034;A100:{N_GPU}&#034;,<br \/>\n    scaledown_window&#061;15 * MINUTES,  # \u65e0\u8bf7\u6c42\u540e\u505c\u7559\u591a\u957f\u65f6\u95f4<br \/>\n    timeout&#061;10 * MINUTES,           # \u5bb9\u5668\u542f\u52a8\u6700\u957f\u7b49\u5f85\u65f6\u95f4<br \/>\n    volumes&#061;{<br \/>\n        &#034;\/root\/.cache\/huggingface&#034;: hf_cache_vol,<br \/>\n        &#034;\/root\/.cache\/vllm&#034;: vllm_cache_vol,<br \/>\n    },<br \/>\n    secrets&#061;[modal.Secret.from_name(&#034;vllm-api&#034;)],<br \/>\n)<br \/>\n&#064;modal.concurrent(  # \u6bcf\u4e2a\u526f\u672c\u6700\u591a\u5e76\u53d1\u5904\u7406\u591a\u5c11\u8bf7\u6c42&#xff0c;\u9700\u4ed4\u7ec6\u8c03\u4f18<br \/>\n    max_inputs&#061;32<br \/>\n)<br \/>\n&#064;modal.web_server(port&#061;VLLM_PORT, startup_timeout&#061;10 * MINUTES)<br \/>\ndef serve():<br \/>\n    import subprocess<\/p>\n<p>    cmd &#061; [<br \/>\n        &#034;vllm&#034;,<br \/>\n        &#034;serve&#034;,<br \/>\n        MODEL_NAME,<br \/>\n        &#034;&#8211;tokenizer_mode&#034;,<br \/>\n        &#034;mistral&#034;,<br \/>\n        &#034;&#8211;config_format&#034;,<br \/>\n        &#034;mistral&#034;,<br \/>\n        &#034;&#8211;load_format&#034;,<br \/>\n        &#034;mistral&#034;,<br \/>\n        &#034;&#8211;tool-call-parser&#034;,<br \/>\n        &#034;mistral&#034;,<br \/>\n        &#034;&#8211;enable-auto-tool-choice&#034;,<br \/>\n        &#034;&#8211;tensor-parallel-size&#034;,<br \/>\n        &#034;2&#034;,<br \/>\n        &#034;&#8211;revision&#034;,<br \/>\n        MODEL_REVISION,<br \/>\n        &#034;&#8211;served-model-name&#034;,<br \/>\n        MODEL_NAME,<br \/>\n        &#034;&#8211;host&#034;,<br \/>\n        &#034;0.0.0.0&#034;,<br \/>\n        &#034;&#8211;port&#034;,<br \/>\n        str(VLLM_PORT),<br \/>\n    ]<\/p>\n<p>    cmd &#043;&#061; [&#034;&#8211;enforce-eager&#034; if FAST_BOOT else &#034;&#8211;no-enforce-eager&#034;]<br \/>\n    print(cmd)<br \/>\n    subprocess.Popen(&#034; &#034;.join(cmd), shell&#061;True)<\/p>\n<hr \/>\n<h4>3. \u5728 Modal \u4e0a\u90e8\u7f72 vLLM \u670d\u52a1\u5668<\/h4>\n<p>\u73b0\u5728\u4f60\u7684 vllm_inference.py \u6587\u4ef6\u5df2\u7ecf\u51c6\u5907\u597d&#xff0c;\u53ef\u4ee5\u7528\u4e00\u6761\u547d\u4ee4\u5c06 vLLM \u670d\u52a1\u5668\u90e8\u7f72\u5230 Modal&#xff1a;<\/p>\n<p>modal deploy vllm_inference.py<\/p>\n<p>\u51e0\u79d2\u949f\u5185&#xff0c;Modal \u4f1a\u81ea\u52a8\u6784\u5efa\u5bb9\u5668\u955c\u50cf&#xff08;\u5982\u5c1a\u672a\u6784\u5efa&#xff09;\u5e76\u90e8\u7f72\u5e94\u7528\u3002\u4f60\u5c06\u770b\u5230\u7c7b\u4f3c\u5982\u4e0b\u7684\u8f93\u51fa&#xff1a;<\/p>\n<p>\u2713 Created objects.<br \/>\n\u251c\u2500\u2500 &#x1f528; Created mount C:\\\\Repository\\\\GitHub\\\\Deploying-the-Magistral-with-Modal\\\\vllm_inference.py<br \/>\n\u2514\u2500\u2500 &#x1f528; Created web function serve &#061;&gt; https:\/\/abidali899&#8211;magistral-small-vllm-serve.modal.run<br \/>\n\u2713 App deployed in 6.671s! &#x1f389;<\/p>\n<p>View Deployment: https:\/\/modal.com\/apps\/abidali899\/main\/deployed\/magistral-small-vllm<\/p>\n<p>\u90e8\u7f72\u5b8c\u6210\u540e&#xff0c;\u670d\u52a1\u5668\u4f1a\u5f00\u59cb\u4e0b\u8f7d\u6a21\u578b\u6743\u91cd\u5e76\u52a0\u8f7d\u5230 GPU \u4e0a\u3002\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u80fd\u9700\u8981\u51e0\u5206\u949f&#xff08;\u5927\u578b\u6a21\u578b\u901a\u5e38\u7ea6 5 \u5206\u949f&#xff09;&#xff0c;\u8bf7\u8010\u5fc3\u7b49\u5f85\u6a21\u578b\u521d\u59cb\u5316\u3002<\/p>\n<p>\u4f60\u53ef\u4ee5\u5728 Modal \u63a7\u5236\u53f0\u7684 Apps \u533a\u67e5\u770b\u90e8\u7f72\u548c\u65e5\u5fd7\u3002<\/p>\n<hr \/>\n<h5>Modal \u4e0a\u90e8\u7f72 Magistral vLLM \u670d\u52a1\u5668<\/h5>\n<p>\u5f53\u65e5\u5fd7\u663e\u793a\u670d\u52a1\u5668\u5df2\u8fd0\u884c\u5e76\u51c6\u5907\u5c31\u7eea\u540e&#xff0c;\u53ef\u4ee5\u5728\u81ea\u52a8\u751f\u6210\u7684 API \u6587\u6863\u9875\u9762\u8fdb\u884c\u4ea4\u4e92\u5f0f\u6d4b\u8bd5&#xff0c;\u4e86\u89e3\u6240\u6709\u53ef\u7528\u7aef\u70b9&#xff0c;\u5e76\u53ef\u76f4\u63a5\u5728\u6d4f\u89c8\u5668\u4e2d\u6d4b\u8bd5\u3002<\/p>\n<hr \/>\n<h5>\u9a8c\u8bc1\u6a21\u578b\u52a0\u8f7d\u4e0e\u53ef\u7528\u6027<\/h5>\n<p>\u8981\u786e\u8ba4\u6a21\u578b\u5df2\u52a0\u8f7d\u5e76\u53ef\u8bbf\u95ee&#xff0c;\u8bf7\u5728\u7ec8\u7aef\u8fd0\u884c\u4ee5\u4e0b CURL \u547d\u4ee4&#xff1a;<\/p>\n<p>\u5c06 &lt;api-key&gt; \u66ff\u6362\u4e3a\u4f60\u914d\u7f6e\u7684 vLLM \u670d\u52a1\u5668\u5b9e\u9645 API Key&#xff1a;<\/p>\n<p>curl -X &#039;GET&#039; \\\\<br \/>\n  &#039;https:\/\/abidali899&#8211;magistral-small-vllm-serve.modal.run\/v1\/models&#039; \\\\<br \/>\n  -H &#039;accept: application\/json&#039; \\\\<br \/>\n  -H &#039;Authorization: Bearer &lt;api-key&gt;&#039;<\/p>\n<p>\u5982\u5f97\u5230\u5982\u4e0b\u8fd4\u56de&#xff0c;\u8bf4\u660e\u6a21\u578b\u5df2\u5c31\u7eea&#xff0c;\u53ef\u8fdb\u884c\u63a8\u7406&#xff1a;<\/p>\n<p>{&#034;object&#034;:&#034;list&#034;,&#034;data&#034;:[{&#034;id&#034;:&#034;mistralai\/Magistral-Small-2506&#034;,&#034;object&#034;:&#034;model&#034;,&#034;created&#034;:1750013321,&#034;owned_by&#034;:&#034;vllm&#034;,&#034;root&#034;:&#034;mistralai\/Magistral-Small-2506&#034;,&#034;parent&#034;:null,&#034;max_model_len&#034;:40960,&#034;permission&#034;:[{&#034;id&#034;:&#034;modelperm-33a33f8f600b4555b44cb42fca70b931&#034;,&#034;object&#034;:&#034;model_permission&#034;,&#034;created&#034;:1750013321,&#034;allow_create_engine&#034;:false,&#034;allow_sampling&#034;:true,&#034;allow_logprobs&#034;:true,&#034;allow_search_indices&#034;:false,&#034;allow_view&#034;:true,&#034;allow_fine_tuning&#034;:false,&#034;organization&#034;:&#034;*&#034;,&#034;group&#034;:null,&#034;is_blocking&#034;:false}]}]}<\/p>\n<hr \/>\n<h4>4. \u4f7f\u7528 OpenAI SDK \u4e0e vLLM \u670d\u52a1\u5668\u4ea4\u4e92<\/h4>\n<p>vLLM \u63d0\u4f9b\u4e86 OpenAI \u517c\u5bb9\u7684\u7aef\u70b9&#xff0c;\u4f60\u53ef\u4ee5\u50cf\u8c03\u7528 OpenAI API \u4e00\u6837\u8c03\u7528\u81ea\u5df1\u7684 vLLM \u670d\u52a1\u3002\u4e0b\u9762\u4ecb\u7ecd\u5982\u4f55\u7528 OpenAI Python SDK \u8fdb\u884c\u5b89\u5168\u8fde\u63a5\u4e0e\u6d4b\u8bd5\u3002<\/p>\n<p>\u9996\u5148&#xff0c;\u5728\u9879\u76ee\u76ee\u5f55\u521b\u5efa .env \u6587\u4ef6&#xff0c;\u6dfb\u52a0\u4f60\u7684 vLLM API Key&#xff1a;<\/p>\n<p>VLLM_API_KEY&#061;your-actual-api-key-here<\/p>\n<p>\u5b89\u88c5 python-dotenv \u548c openai \u5e93&#xff1a;<\/p>\n<p>pip install python-dotenv openai<\/p>\n<p>\u521b\u5efa client.py \u6587\u4ef6&#xff0c;\u6d4b\u8bd5 vLLM \u670d\u52a1\u5668\u7684\u5404\u79cd\u529f\u80fd&#xff0c;\u5305\u62ec\u7b80\u5355\u5bf9\u8bdd\u548c\u6d41\u5f0f\u54cd\u5e94&#xff1a;<\/p>\n<p>import asyncio<br \/>\nimport json<br \/>\nimport os<\/p>\n<p>from dotenv import load_dotenv<br \/>\nfrom openai import AsyncOpenAI, OpenAI<\/p>\n<p># \u52a0\u8f7d .env \u6587\u4ef6\u4e2d\u7684\u73af\u5883\u53d8\u91cf<br \/>\nload_dotenv()<\/p>\n<p># \u83b7\u53d6 API key<br \/>\napi_key &#061; os.getenv(&#034;VLLM_API_KEY&#034;)<\/p>\n<p># \u8bbe\u7f6e OpenAI \u5ba2\u6237\u7aef&#xff0c;\u81ea\u5b9a\u4e49 base_url<br \/>\nclient &#061; OpenAI(<br \/>\n    api_key&#061;api_key,<br \/>\n    base_url&#061;&#034;https:\/\/abidali899&#8211;magistral-small-vllm-serve.modal.run\/v1&#034;,<br \/>\n)<\/p>\n<p>MODEL_NAME &#061; &#034;mistralai\/Magistral-Small-2506&#034;<\/p>\n<p># &#8212; 1. \u7b80\u5355\u5bf9\u8bdd &#8212;<br \/>\ndef run_simple_completion():<br \/>\n    print(&#034;\\\\n&#034; &#043; &#034;&#061;&#034; * 40)<br \/>\n    print(&#034;[1] SIMPLE COMPLETION DEMO&#034;)<br \/>\n    print(&#034;&#061;&#034; * 40)<br \/>\n    try:<br \/>\n        messages &#061; [<br \/>\n            {&#034;role&#034;: &#034;system&#034;, &#034;content&#034;: &#034;You are a helpful assistant.&#034;},<br \/>\n            {&#034;role&#034;: &#034;user&#034;, &#034;content&#034;: &#034;What is the capital of France?&#034;},<br \/>\n        ]<br \/>\n        response &#061; client.chat.completions.create(<br \/>\n            model&#061;MODEL_NAME,<br \/>\n            messages&#061;messages,<br \/>\n            max_tokens&#061;32,<br \/>\n        )<br \/>\n        print(&#034;\\\\nResponse:\\\\n    &#034; &#043; response.choices[0].message.content.strip())<br \/>\n    except Exception as e:<br \/>\n        print(f&#034;[ERROR] Simple completion failed: {e}&#034;)<br \/>\n    print(&#034;\\\\n&#034; &#043; &#034;&#061;&#034; * 40 &#043; &#034;\\\\n&#034;)<\/p>\n<p># &#8212; 2. \u6d41\u5f0f\u54cd\u5e94 &#8212;<br \/>\ndef run_streaming():<br \/>\n    print(&#034;\\\\n&#034; &#043; &#034;&#061;&#034; * 40)<br \/>\n    print(&#034;[2] STREAMING DEMO&#034;)<br \/>\n    print(&#034;&#061;&#034; * 40)<br \/>\n    try:<br \/>\n        messages &#061; [<br \/>\n            {&#034;role&#034;: &#034;system&#034;, &#034;content&#034;: &#034;You are a helpful assistant.&#034;},<br \/>\n            {&#034;role&#034;: &#034;user&#034;, &#034;content&#034;: &#034;Write a short poem about AI.&#034;},<br \/>\n        ]<br \/>\n        stream &#061; client.chat.completions.create(<br \/>\n            model&#061;MODEL_NAME,<br \/>\n            messages&#061;messages,<br \/>\n            max_tokens&#061;64,<br \/>\n            stream&#061;True,<br \/>\n        )<br \/>\n        print(&#034;\\\\nStreaming response:&#034;)<br \/>\n        print(&#034;    &#034;, end&#061;&#034;&#034;)<br \/>\n        for chunk in stream:<br \/>\n            content &#061; chunk.choices[0].delta.content<br \/>\n            if content:<br \/>\n                print(content, end&#061;&#034;&#034;, flush&#061;True)<br \/>\n        print(&#034;\\\\n[END OF STREAM]&#034;)<br \/>\n    except Exception as e:<br \/>\n        print(f&#034;[ERROR] Streaming demo failed: {e}&#034;)<br \/>\n    print(&#034;\\\\n&#034; &#043; &#034;&#061;&#034; * 40 &#043; &#034;\\\\n&#034;)<\/p>\n<p># &#8212; 3. \u5f02\u6b65\u6d41\u5f0f\u54cd\u5e94 &#8212;<br \/>\nasync def run_async_streaming():<br \/>\n    print(&#034;\\\\n&#034; &#043; &#034;&#061;&#034; * 40)<br \/>\n    print(&#034;[3] ASYNC STREAMING DEMO&#034;)<br \/>\n    print(&#034;&#061;&#034; * 40)<br \/>\n    try:<br \/>\n        async_client &#061; AsyncOpenAI(<br \/>\n            api_key&#061;api_key,<br \/>\n            base_url&#061;&#034;https:\/\/abidali899&#8211;magistral-small-vllm-serve.modal.run\/v1&#034;,<br \/>\n        )<br \/>\n        messages &#061; [<br \/>\n            {&#034;role&#034;: &#034;system&#034;, &#034;content&#034;: &#034;You are a helpful assistant.&#034;},<br \/>\n            {&#034;role&#034;: &#034;user&#034;, &#034;content&#034;: &#034;Tell me a fun fact about space.&#034;},<br \/>\n        ]<br \/>\n        stream &#061; await async_client.chat.completions.create(<br \/>\n            model&#061;MODEL_NAME,<br \/>\n            messages&#061;messages,<br \/>\n            max_tokens&#061;32,<br \/>\n            stream&#061;True,<br \/>\n        )<br \/>\n        print(&#034;\\\\nAsync streaming response:&#034;)<br \/>\n        print(&#034;    &#034;, end&#061;&#034;&#034;)<br \/>\n        async for chunk in stream:<br \/>\n            content &#061; chunk.choices[0].delta.content<br \/>\n            if content:<br \/>\n                print(content, end&#061;&#034;&#034;, flush&#061;True)<br \/>\n        print(&#034;\\\\n[END OF ASYNC STREAM]&#034;)<br \/>\n    except Exception as e:<br \/>\n        print(f&#034;[ERROR] Async streaming demo failed: {e}&#034;)<br \/>\n    print(&#034;\\\\n&#034; &#043; &#034;&#061;&#034; * 40 &#043; &#034;\\\\n&#034;)<\/p>\n<p>if __name__ &#061;&#061; &#034;__main__&#034;:<br \/>\n    run_simple_completion()<br \/>\n    run_streaming()<br \/>\n    asyncio.run(run_async_streaming())<\/p>\n<hr \/>\n<p>\u5728\u7ec8\u7aef\u8fd0\u884c\u4f60\u7684\u6d4b\u8bd5\u811a\u672c&#xff1a;<\/p>\n<p>python client.py<\/p>\n<p>\u4f60\u5c06\u770b\u5230\u5982\u4e0b\u8f93\u51fa&#xff0c;\u8868\u660e\u90e8\u7f72\u4e0e\u54cd\u5e94\u90fd\u975e\u5e38\u5feb\u901f\u4e14\u5ef6\u8fdf\u4f4e&#xff1a;<\/p>\n<p>&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<br \/>\n[1] SIMPLE COMPLETION DEMO<br \/>\n&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<\/p>\n<p>Response:<br \/>\n    The capital of France is Paris. Is there anything else you&#039;d like to know about France?<\/p>\n<p>&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<br \/>\n[2] STREAMING DEMO<br \/>\n&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<\/p>\n<p>Streaming response:<br \/>\n    In Silicon dreams, I&#039;m born, I learn,<br \/>\nFrom data streams and human works.<br \/>\nI grow, I calculate, I see,<br \/>\nThe patterns that the humans leave.<\/p>\n<p>I write, I speak, I code, I play,<br \/>\nWith logic sharp, and snappy pace.<br \/>\nYet for all my smarts, this day<br \/>\n[END OF STREAM]<\/p>\n<p>&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<br \/>\n[3] ASYNC STREAMING DEMO<br \/>\n&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<\/p>\n<p>Async streaming response:<br \/>\n    Sure, here&#039;s a fun fact about space: &#034;There&#039;s a planet that may be entirely made of diamond. Blast! In 2004,<br \/>\n[END OF ASYNC STREAM]<\/p>\n<p>&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<\/p>\n<p>\u5728 Modal \u63a7\u5236\u53f0\u53ef\u67e5\u770b\u6240\u6709\u51fd\u6570\u8c03\u7528\u3001\u65f6\u95f4\u6233\u3001\u6267\u884c\u65f6\u957f\u548c\u72b6\u6001\u3002<\/p>\n<hr \/>\n<p>\u5982\u5728\u8fd0\u884c\u4e0a\u8ff0\u4ee3\u7801\u65f6\u9047\u5230\u95ee\u9898&#xff0c;\u8bf7\u53c2\u8003 kingabzpro\/Deploying-the-Magistral-with-Modal GitHub \u4ed3\u5e93&#xff0c;\u5e76\u6309\u7167 README \u6307\u5f15\u6392\u67e5\u3002<\/p>\n<hr \/>\n<h4>\u7ed3\u8bba<\/h4>\n<p>Modal \u662f\u4e00\u4e2a\u975e\u5e38\u6709\u8da3\u7684\u5e73\u53f0&#xff0c;\u6211\u6bcf\u5929\u90fd\u5728\u4e0d\u65ad\u5b66\u4e60\u3002\u5b83\u662f\u901a\u7528\u578b\u5e73\u53f0&#xff0c;\u65e2\u9002\u7528\u4e8e\u7b80\u5355 Python \u5e94\u7528&#xff0c;\u4e5f\u53ef\u7528\u4e8e\u673a\u5668\u5b66\u4e60\u8bad\u7ec3\u548c\u90e8\u7f72\u3002\u4e0d\u4ec5\u80fd\u505a API \u63a8\u7406\u670d\u52a1&#xff0c;\u8fd8\u80fd\u8fdc\u7a0b\u8fd0\u884c\u8bad\u7ec3\u811a\u672c&#xff0c;\u5bf9\u5927\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002<\/p>\n<p>Modal \u8bbe\u8ba1\u7406\u5ff5\u662f\u4e3a\u975e\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u670d\u52a1&#xff0c;\u8ba9\u4f60\u65e0\u9700\u5173\u5fc3\u5e95\u5c42\u57fa\u7840\u8bbe\u65bd&#xff0c;\u80fd\u591f\u5feb\u901f\u90e8\u7f72\u5e94\u7528\u3002\u65e0\u9700\u624b\u52a8\u642d\u5efa\u670d\u52a1\u5668\u3001\u914d\u7f6e\u5b58\u50a8\u3001\u7f51\u7edc\u6216\u5904\u7406 Kubernetes\u3001Docker \u7b49\u7e41\u7410\u95ee\u9898\u2014\u2014\u53ea\u9700\u5199\u597d Python \u811a\u672c\u518d\u90e8\u7f72&#xff0c;\u5176\u4f59\u5168\u7531 Modal \u4e91\u7aef\u81ea\u52a8\u5b8c\u6210\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb802\u6b21\uff0c\u70b9\u8d5e17\u6b21\uff0c\u6536\u85cf12\u6b21\u3002\u6211\u7b2c\u4e00\u6b21\u63a5\u89e6 Modal \u662f\u5728\u53c2\u52a0 Hugging Face \u9ed1\u5ba2\u677e\u65f6\uff0c\u5f53\u65f6\u771f\u7684\u88ab\u5b83\u7684\u6613\u7528\u6027\u6240\u60ca\u8273\u3002\u8fd9\u4e2a\u5e73\u53f0\u53ef\u4ee5\u8ba9\u4f60\u5728\u51e0\u5206\u949f\u5185\u6784\u5efa\u548c\u90e8\u7f72\u5e94\u7528\uff0c\u4f53\u9a8c\u4e0e BentoCloud \u7c7b\u4f3c\uff0c\u6d41\u7545\u9ad8\u6548\u3002\u901a\u8fc7 Modal\uff0c\u4f60\u53ef\u4ee5\u914d\u7f6e\u81ea\u5df1\u7684 Python \u5e94\u7528\uff0c\u5305\u62ec GPU\u3001Docker \u955c\u50cf\u548c Python \u4f9d\u8d56\u7b49\u7cfb\u7edf\u73af\u5883\uff0c\u7136\u540e\u53ea\u9700\u4e00\u6761\u547d\u4ee4\u5373\u53ef\u90e8\u7f72\u5230\u4e91\u7aef\u3002\u5728\u672c\u6559\u7a0b\u4e2d\uff0c\u6211\u4eec\u5c06\u5b66\u4e60\u5982\u4f55\u914d\u7f6e Modal\u3001\u521b\u5efa vLLM \u670d\u52a1\u5668\uff0c\u5e76\u5b89\u5168\u5730\u90e8\u7f72\u5230\u4e91\u7aef\u3002\u540c\u65f6\uff0c\u8fd8\u4f1a\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528 CURL \u53ca OpenAI SDK \u6d4b\u8bd5\u4f60\u7684 vLLM<\/p>\n","protected":false},"author":2,"featured_media":48689,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[50,43,395,51,44],"topic":[],"class_list":["post-48690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-server","tag-50","tag-43","tag-395","tag-51","tag-44"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.wsisp.com\/helps\/48690.html\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"og:description\" content=\"\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb802\u6b21\uff0c\u70b9\u8d5e17\u6b21\uff0c\u6536\u85cf12\u6b21\u3002\u6211\u7b2c\u4e00\u6b21\u63a5\u89e6 Modal \u662f\u5728\u53c2\u52a0 Hugging Face \u9ed1\u5ba2\u677e\u65f6\uff0c\u5f53\u65f6\u771f\u7684\u88ab\u5b83\u7684\u6613\u7528\u6027\u6240\u60ca\u8273\u3002\u8fd9\u4e2a\u5e73\u53f0\u53ef\u4ee5\u8ba9\u4f60\u5728\u51e0\u5206\u949f\u5185\u6784\u5efa\u548c\u90e8\u7f72\u5e94\u7528\uff0c\u4f53\u9a8c\u4e0e BentoCloud \u7c7b\u4f3c\uff0c\u6d41\u7545\u9ad8\u6548\u3002\u901a\u8fc7 Modal\uff0c\u4f60\u53ef\u4ee5\u914d\u7f6e\u81ea\u5df1\u7684 Python \u5e94\u7528\uff0c\u5305\u62ec GPU\u3001Docker \u955c\u50cf\u548c Python \u4f9d\u8d56\u7b49\u7cfb\u7edf\u73af\u5883\uff0c\u7136\u540e\u53ea\u9700\u4e00\u6761\u547d\u4ee4\u5373\u53ef\u90e8\u7f72\u5230\u4e91\u7aef\u3002\u5728\u672c\u6559\u7a0b\u4e2d\uff0c\u6211\u4eec\u5c06\u5b66\u4e60\u5982\u4f55\u914d\u7f6e Modal\u3001\u521b\u5efa vLLM \u670d\u52a1\u5668\uff0c\u5e76\u5b89\u5168\u5730\u90e8\u7f72\u5230\u4e91\u7aef\u3002\u540c\u65f6\uff0c\u8fd8\u4f1a\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528 CURL \u53ca OpenAI SDK \u6d4b\u8bd5\u4f60\u7684 vLLM\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.wsisp.com\/helps\/48690.html\" \/>\n<meta property=\"og:site_name\" content=\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-30T05:03:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/07\/20250730050258-6889a7829d6f6.png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/48690.html\",\"url\":\"https:\/\/www.wsisp.com\/helps\/48690.html\",\"name\":\"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"isPartOf\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\"},\"datePublished\":\"2025-07-30T05:03:01+00:00\",\"dateModified\":\"2025-07-30T05:03:01+00:00\",\"author\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/48690.html#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.wsisp.com\/helps\/48690.html\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/48690.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.wsisp.com\/helps\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\",\"url\":\"https:\/\/www.wsisp.com\/helps\/\",\"name\":\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"description\":\"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"contentUrl\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"caption\":\"admin\"},\"sameAs\":[\"http:\/\/wp.wsisp.com\"],\"url\":\"https:\/\/www.wsisp.com\/helps\/author\/admin\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.wsisp.com\/helps\/48690.html","og_locale":"zh_CN","og_type":"article","og_title":"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","og_description":"\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb802\u6b21\uff0c\u70b9\u8d5e17\u6b21\uff0c\u6536\u85cf12\u6b21\u3002\u6211\u7b2c\u4e00\u6b21\u63a5\u89e6 Modal \u662f\u5728\u53c2\u52a0 Hugging Face \u9ed1\u5ba2\u677e\u65f6\uff0c\u5f53\u65f6\u771f\u7684\u88ab\u5b83\u7684\u6613\u7528\u6027\u6240\u60ca\u8273\u3002\u8fd9\u4e2a\u5e73\u53f0\u53ef\u4ee5\u8ba9\u4f60\u5728\u51e0\u5206\u949f\u5185\u6784\u5efa\u548c\u90e8\u7f72\u5e94\u7528\uff0c\u4f53\u9a8c\u4e0e BentoCloud \u7c7b\u4f3c\uff0c\u6d41\u7545\u9ad8\u6548\u3002\u901a\u8fc7 Modal\uff0c\u4f60\u53ef\u4ee5\u914d\u7f6e\u81ea\u5df1\u7684 Python \u5e94\u7528\uff0c\u5305\u62ec GPU\u3001Docker \u955c\u50cf\u548c Python \u4f9d\u8d56\u7b49\u7cfb\u7edf\u73af\u5883\uff0c\u7136\u540e\u53ea\u9700\u4e00\u6761\u547d\u4ee4\u5373\u53ef\u90e8\u7f72\u5230\u4e91\u7aef\u3002\u5728\u672c\u6559\u7a0b\u4e2d\uff0c\u6211\u4eec\u5c06\u5b66\u4e60\u5982\u4f55\u914d\u7f6e Modal\u3001\u521b\u5efa vLLM \u670d\u52a1\u5668\uff0c\u5e76\u5b89\u5168\u5730\u90e8\u7f72\u5230\u4e91\u7aef\u3002\u540c\u65f6\uff0c\u8fd8\u4f1a\u4ecb\u7ecd\u5982\u4f55\u4f7f\u7528 CURL \u53ca OpenAI SDK \u6d4b\u8bd5\u4f60\u7684 vLLM","og_url":"https:\/\/www.wsisp.com\/helps\/48690.html","og_site_name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","article_published_time":"2025-07-30T05:03:01+00:00","og_image":[{"url":"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/07\/20250730050258-6889a7829d6f6.png"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"5 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.wsisp.com\/helps\/48690.html","url":"https:\/\/www.wsisp.com\/helps\/48690.html","name":"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","isPartOf":{"@id":"https:\/\/www.wsisp.com\/helps\/#website"},"datePublished":"2025-07-30T05:03:01+00:00","dateModified":"2025-07-30T05:03:01+00:00","author":{"@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41"},"breadcrumb":{"@id":"https:\/\/www.wsisp.com\/helps\/48690.html#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.wsisp.com\/helps\/48690.html"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.wsisp.com\/helps\/48690.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.wsisp.com\/helps"},{"@type":"ListItem","position":2,"name":"\u4e00\u6b65\u5230\u4f4d\uff01\u7528 Modal \u5feb\u901f\u90e8\u7f72 vLLM \u63a8\u7406\u670d\u52a1\u5668\u5168\u6d41\u7a0b\u5b9e\u6218\u6307\u5357"}]},{"@type":"WebSite","@id":"https:\/\/www.wsisp.com\/helps\/#website","url":"https:\/\/www.wsisp.com\/helps\/","name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","description":"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41","name":"admin","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/","url":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","contentUrl":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","caption":"admin"},"sameAs":["http:\/\/wp.wsisp.com"],"url":"https:\/\/www.wsisp.com\/helps\/author\/admin"}]}},"_links":{"self":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/48690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/comments?post=48690"}],"version-history":[{"count":0,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/48690\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media\/48689"}],"wp:attachment":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media?parent=48690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/categories?post=48690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/tags?post=48690"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/topic?post=48690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}