云计算百科
云计算领域专业知识百科平台

TensorFlow Serving学习笔记2: 模型服务

本文深入剖析 TensorFlow Serving 的核心架构与实现机制,结合源码分析揭示其如何实现高可用、动态更新的生产级模型服务。

一、TensorFlow Serving 核心架构

1.1 分层架构设计

TensorFlow Serving 采用模块化分层设计,各组件职责分明:

组件职责源码位置
Servables 可服务对象(如模型),基础服务单元 core/servable.h
Loaders 管理模型加载/卸载生命周期 core/loader.h
Managers 管理 Servable 集合,路由请求到正确版本 core/manager.h
Sources 提供 Loader,通知 Manager 新版本可用 core/source.h
ServerCore 中枢系统,协调各组件工作 model_servers/server_core.h
1.2 请求处理全流程

#mermaid-svg-L6SzHU50a0Dj0srj {font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-L6SzHU50a0Dj0srj .error-icon{fill:#552222;}#mermaid-svg-L6SzHU50a0Dj0srj .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-L6SzHU50a0Dj0srj .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-L6SzHU50a0Dj0srj .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-L6SzHU50a0Dj0srj .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-L6SzHU50a0Dj0srj .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-L6SzHU50a0Dj0srj .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-L6SzHU50a0Dj0srj .marker{fill:#333333;stroke:#333333;}#mermaid-svg-L6SzHU50a0Dj0srj .marker.cross{stroke:#333333;}#mermaid-svg-L6SzHU50a0Dj0srj svg{font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-L6SzHU50a0Dj0srj .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-L6SzHU50a0Dj0srj text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-L6SzHU50a0Dj0srj .actor-line{stroke:grey;}#mermaid-svg-L6SzHU50a0Dj0srj .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-L6SzHU50a0Dj0srj .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-L6SzHU50a0Dj0srj #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-L6SzHU50a0Dj0srj .sequenceNumber{fill:white;}#mermaid-svg-L6SzHU50a0Dj0srj #sequencenumber{fill:#333;}#mermaid-svg-L6SzHU50a0Dj0srj #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-L6SzHU50a0Dj0srj .messageText{fill:#333;stroke:#333;}#mermaid-svg-L6SzHU50a0Dj0srj .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-L6SzHU50a0Dj0srj .labelText,#mermaid-svg-L6SzHU50a0Dj0srj .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-L6SzHU50a0Dj0srj .loopText,#mermaid-svg-L6SzHU50a0Dj0srj .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-L6SzHU50a0Dj0srj .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-L6SzHU50a0Dj0srj .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-L6SzHU50a0Dj0srj .noteText,#mermaid-svg-L6SzHU50a0Dj0srj .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-L6SzHU50a0Dj0srj .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-L6SzHU50a0Dj0srj .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-L6SzHU50a0Dj0srj .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-L6SzHU50a0Dj0srj .actorPopupMenu{position:absolute;}#mermaid-svg-L6SzHU50a0Dj0srj .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-L6SzHU50a0Dj0srj .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-L6SzHU50a0Dj0srj .actor-man circle,#mermaid-svg-L6SzHU50a0Dj0srj line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-L6SzHU50a0Dj0srj :root{–mermaid-font-family:\”trebuchet ms\”,verdana,arial,sans-serif;}

Client

REST/gRPC

PredictionService

ServerCore

Session

HTTP/gRPC请求

路由请求

获取模型

执行session.run()

返回预测结果

封装响应

返回预测数据

Client

REST/gRPC

PredictionService

ServerCore

Session

二、核心机制深度解析

2.1 动态模型加载机制

核心流程:

#mermaid-svg-A4QI9m5fvn3OkykN {font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-A4QI9m5fvn3OkykN .error-icon{fill:#552222;}#mermaid-svg-A4QI9m5fvn3OkykN .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-A4QI9m5fvn3OkykN .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-A4QI9m5fvn3OkykN .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-A4QI9m5fvn3OkykN .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-A4QI9m5fvn3OkykN .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-A4QI9m5fvn3OkykN .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-A4QI9m5fvn3OkykN .marker{fill:#333333;stroke:#333333;}#mermaid-svg-A4QI9m5fvn3OkykN .marker.cross{stroke:#333333;}#mermaid-svg-A4QI9m5fvn3OkykN svg{font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-A4QI9m5fvn3OkykN .label{font-family:\”trebuchet ms\”,verdana,arial,sans-serif;color:#333;}#mermaid-svg-A4QI9m5fvn3OkykN .cluster-label text{fill:#333;}#mermaid-svg-A4QI9m5fvn3OkykN .cluster-label span{color:#333;}#mermaid-svg-A4QI9m5fvn3OkykN .label text,#mermaid-svg-A4QI9m5fvn3OkykN span{fill:#333;color:#333;}#mermaid-svg-A4QI9m5fvn3OkykN .node rect,#mermaid-svg-A4QI9m5fvn3OkykN .node circle,#mermaid-svg-A4QI9m5fvn3OkykN .node ellipse,#mermaid-svg-A4QI9m5fvn3OkykN .node polygon,#mermaid-svg-A4QI9m5fvn3OkykN .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-A4QI9m5fvn3OkykN .node .label{text-align:center;}#mermaid-svg-A4QI9m5fvn3OkykN .node.clickable{cursor:pointer;}#mermaid-svg-A4QI9m5fvn3OkykN .arrowheadPath{fill:#333333;}#mermaid-svg-A4QI9m5fvn3OkykN .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-A4QI9m5fvn3OkykN .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-A4QI9m5fvn3OkykN .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-A4QI9m5fvn3OkykN .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-A4QI9m5fvn3OkykN .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-A4QI9m5fvn3OkykN .cluster text{fill:#333;}#mermaid-svg-A4QI9m5fvn3OkykN .cluster span{color:#333;}#mermaid-svg-A4QI9m5fvn3OkykN div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-A4QI9m5fvn3OkykN :root{–mermaid-font-family:\”trebuchet ms\”,verdana,arial,sans-serif;}

加载

检测新模型

创建Loader

构建LoaderHarness

状态机管理

kReady

提供服务

LoaderHarness 状态机:

enum class State {
kNew, // 新建状态
kLoading, // 加载中
kReady, // 就绪状态
kQuiescing, // 静默中
kUnloading, // 卸载中
kError // 错误状态
};

关键设计:

  • 线程安全状态转换:
  • Status LoaderHarness::Load() {
    mutex_lock l(mu_); // 状态锁
    TransitionState(State::kLoading);
    // …执行加载
    }

  • 自动资源回收:
  • LoaderHarness::~LoaderHarness() {
    if (state_ == State::kReady) Unload();
    }

    2.2 ServerCore 启动流程

    BuildAndStart() 函数核心逻辑:

    Status Server::BuildAndStart(const Options& opts) {
    // 1. 配置验证
    if (opts.grpc_port == 0) return errors::InvalidArgument("端口未设置");

    // 2. 构建ServerCore配置
    ServerCore::Options options;

    // 3. 模型配置加载
    if (opts.model_config_file.empty()) {
    options.model_server_config = BuildSingleModelConfig(...);
    } else {
    TF_RETURN_IF_ERROR(ParseProtoTextFile(...));
    }

    // 4. 资源配置
    session_bundle_config.mutable_session_config()
    ->mutable_gpu_options()
    ->set_per_process_gpu_memory_fraction(0.8); // GPU内存限制

    // 5. 创建ServerCore核心
    TF_RETURN_IF_ERROR(ServerCore::Create(std::move(options), &server_core_));

    // 6. 启动gRPC服务
    ::grpc::ServerBuilder builder;
    builder.AddListeningPort(..., BuildServerCredentials(...));
    grpc_server_ = builder.BuildAndStart();

    // 7. 启动HTTP服务
    if (opts.http_port != 0) {
    http_server_ = CreateAndStartHttpServer(...);
    }

    return Status::OK();
    }

    在这里插入图片描述

    三、关键设计亮点

    3.1 动态更新机制

    #mermaid-svg-coMywcCZBjZy53tD {font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-coMywcCZBjZy53tD .error-icon{fill:#552222;}#mermaid-svg-coMywcCZBjZy53tD .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-coMywcCZBjZy53tD .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-coMywcCZBjZy53tD .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-coMywcCZBjZy53tD .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-coMywcCZBjZy53tD .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-coMywcCZBjZy53tD .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-coMywcCZBjZy53tD .marker{fill:#333333;stroke:#333333;}#mermaid-svg-coMywcCZBjZy53tD .marker.cross{stroke:#333333;}#mermaid-svg-coMywcCZBjZy53tD svg{font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-coMywcCZBjZy53tD .label{font-family:\”trebuchet ms\”,verdana,arial,sans-serif;color:#333;}#mermaid-svg-coMywcCZBjZy53tD .cluster-label text{fill:#333;}#mermaid-svg-coMywcCZBjZy53tD .cluster-label span{color:#333;}#mermaid-svg-coMywcCZBjZy53tD .label text,#mermaid-svg-coMywcCZBjZy53tD span{fill:#333;color:#333;}#mermaid-svg-coMywcCZBjZy53tD .node rect,#mermaid-svg-coMywcCZBjZy53tD .node circle,#mermaid-svg-coMywcCZBjZy53tD .node ellipse,#mermaid-svg-coMywcCZBjZy53tD .node polygon,#mermaid-svg-coMywcCZBjZy53tD .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-coMywcCZBjZy53tD .node .label{text-align:center;}#mermaid-svg-coMywcCZBjZy53tD .node.clickable{cursor:pointer;}#mermaid-svg-coMywcCZBjZy53tD .arrowheadPath{fill:#333333;}#mermaid-svg-coMywcCZBjZy53tD .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-coMywcCZBjZy53tD .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-coMywcCZBjZy53tD .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-coMywcCZBjZy53tD .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-coMywcCZBjZy53tD .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-coMywcCZBjZy53tD .cluster text{fill:#333;}#mermaid-svg-coMywcCZBjZy53tD .cluster span{color:#333;}#mermaid-svg-coMywcCZBjZy53tD div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\”trebuchet ms\”,verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-coMywcCZBjZy53tD :root{–mermaid-font-family:\”trebuchet ms\”,verdana,arial,sans-serif;}

    配置文件变更

    PeriodicFunction轮询

    触发ReloadConfig

    增量加载新模型

    流量无缝切换

    实现代码:

    fs_config_polling_thread_.reset(new PeriodicFunction(
    [this, config_file] {
    this->PollFilesystemAndReloadConfig(config_file);
    },
    poll_interval * 1000000 // 微秒单位
    ));

    3.2 资源隔离设计

    GPU内存隔离:

    // 限制单模型GPU内存使用
    session_bundle_config.mutable_session_config()
    ->mutable_gpu_options()
    ->set_per_process_gpu_memory_fraction(0.6);

    并行计算优化:

    // 智能并行配置
    if (intra_op > 0 || inter_op > 0) {
    // 分别设置算子内/间并行度
    session_config->set_intra_op_parallelism_threads(intra_op);
    session_config->set_inter_op_parallelism_threads(inter_op);
    } else {
    // 统一并行设置
    session_config->set_intra_op_parallelism_threads(session_parallel);
    session_config->set_inter_op_parallelism_threads(session_parallel);
    }

    四、生产级特性实现

    4.1 服务高可用设计
    机制实现方式效果
    模型预热 enable_model_warmup 参数 避免冷启动延迟
    失败重试 max_num_load_retries 配置 提升模型加载成功率
    版本回滚 AvailabilityPreservingPolicy 策略 自动回退问题版本
    4.2 安全通信保障

    SSL/TLS 加密配置:

    ::grpc::SslServerCredentialsOptions ssl_ops(
    GRPC_SSL_REQUEST_AND_REQUIRE_CLIENT_CERTIFICATE_AND_VERIFY);
    ssl_ops.pem_root_certs = custom_ca; // 自定义CA

    五、核心参数大全

    参数名类型默认值作用
    grpc_port int gRPC服务端口(必须设置)
    model_base_path string 单模型基路径
    per_process_gpu_memory_fraction float 1.0 GPU内存分配比例
    tensorflow_intra_op_parallelism int 0 算子内并行线程数
    fs_model_config_poll_wait_seconds int 0 配置轮询间隔(秒)
    enable_model_warmup bool false 启用模型预热减少延迟

    Reference

    TensorFlow 入门实操 源代码 tensorflow serving源码分析_mob6454cc6bf0b7的技术博客_51CTO博客

    TensorFlow Serving源码解读_tensorflow serving 代码解析-CSDN博客

    tensorflow-serving源码阅读1_tensorflow源码阅读-CSDN博客

    tensorflow serving 源码 tensorflow源码阅读_柳随风的技术博客_51CTO博客

    https://zhuanlan.zhihu.com/p/700830357

    赞(0)
    未经允许不得转载:网硕互联帮助中心 » TensorFlow Serving学习笔记2: 模型服务
    分享到: 更多 (0)

    评论 抢沙发

    评论前必须登录!