{"id":35908,"date":"2025-05-07T09:09:34","date_gmt":"2025-05-07T01:09:34","guid":{"rendered":"https:\/\/www.wsisp.com\/helps\/35908.html"},"modified":"2025-05-07T09:09:34","modified_gmt":"2025-05-07T01:09:34","slug":"%e4%b8%80%e6%96%87%e8%af%a6%e8%a7%a3llama%e7%b3%bb%e5%88%97%e6%a8%a1%e5%9e%8b%ef%bc%9a%e5%8e%9f%e7%90%86%e4%bb%8b%e7%bb%8d%e3%80%81%e4%bb%a3%e7%a0%81%e8%a7%a3%e8%af%bb","status":"publish","type":"post","link":"https:\/\/www.wsisp.com\/helps\/35908.html","title":{"rendered":"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb"},"content":{"rendered":"<h3>LLaMA\u8be6\u89e3<\/h3>\n<p>LLaMA&#xff08;Large Language Model Meta AI&#xff09;\u662f\u7531Meta&#xff08;\u524d\u8eab\u4e3aFacebook&#xff09;\u5f00\u53d1\u7684\u4e00\u79cd\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b&#xff0c;\u65e8\u5728\u63d0\u9ad8\u81ea\u7136\u8bed\u8a00\u5904\u7406&#xff08;NLP&#xff09;\u4efb\u52a1\u7684\u6027\u80fd\u3002LLaMA\u57fa\u4e8e\u53d8\u6362\u5668&#xff08;Transformer&#xff09;\u67b6\u6784&#xff0c;\u5e76\u7ecf\u8fc7\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3&#xff0c;\u4ee5\u4fbf\u5728\u591a\u79cd\u8bed\u8a00\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002<\/p>\n<p>Meta AI\u8ba4\u4e3a&#xff1a;\u5bf9\u4e8e\u7ed9\u5b9a\u7684\u8ba1\u7b97\u9884\u7b97&#xff0c;\u6700\u4f73\u6027\u80fd\u4e0d\u662f\u901a\u8fc7\u6700\u5927\u7684\u6a21\u578b\u5b9e\u73b0\u7684&#xff0c;\u800c\u662f\u901a\u8fc7\u5728\u66f4\u591a\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\u8f83\u5c0f\u6a21\u578b\u5b9e\u73b0\u7684\u3002<\/p>\n<p><font color=\"red\">\u524d\u6392\u63d0\u793a&#xff0c;\u6587\u672b\u6709\u5927\u6a21\u578bAGI-CSDN\u72ec\u5bb6\u8d44\u6599\u5305\u54e6&#xff01;<\/font><\/p>\n<h3>\u6a21\u578b\u7ed3\u6784<\/h3>\n<p>\u4e0eGPT\u7b49\u751f\u6210\u6a21\u578b\u7c7b\u4f3c&#xff0c;LLaMA\u4e5f\u53ea\u4f7f\u7528\u4e86Transformer\u7684\u89e3\u7801\u5668&#xff0c;\u4f46\u57fa\u4e8eTransformer\u8fdb\u884c\u4e86\u4e09\u4e2a\u6539\u8fdb&#xff1a;<\/p>\n<li>\u4f7f\u7528\u4e86GPT3\u7684\u9884\u6807\u51c6\u5316\u3002\u4e3a\u4e86\u63d0\u9ad8\u8bad\u7ec3\u7a33\u5b9a\u6027&#xff0c;\u5bf9\u6bcf\u4e2aTransformer\u5b50\u5c42\u7684\u8f93\u5165\u8fdb\u884c\u5f52\u4e00\u5316&#xff0c;\u800c\u4e0d\u662f\u5bf9\u8f93\u51fa\u8fdb\u884c\u5f52\u4e00\u5316\u3002\u4f7f\u7528\u7531RMSNorm \u5f52\u4e00\u5316\u51fd\u6570\u3002<\/li>\n<li>\u7528 SwiGLU \u6fc0\u6d3b\u51fd\u6570\u66ff\u6362 ReLU \u975e\u7ebf\u6027&#xff0c;\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002\u4f7f\u7528 2 3 4 d \\\\frac{2}{3}4d 32\u200b4d\u7684\u7ef4\u5ea6\u4ee3\u66ffPaLM\u4e2d\u7684 4 d 4d 4d\u3002<\/li>\n<li>\u7c7b\u4f3cGPTNeo&#xff0c;\u5220\u9664\u4e86\u7edd\u5bf9\u4f4d\u7f6e\u5d4c\u5165&#xff0c;\u800c\u662f\u6dfb\u52a0\u4e86\u65cb\u8f6c\u4f4d\u7f6e\u5d4c\u5165&#xff08;RoPE&#xff09;\u3002<\/li>\n<p>\u4e0b\u9762\u9010\u4e00\u4ecb\u7ecd\u8fd9\u4e09\u4e2a\u6539\u8fdb&#xff1a;<\/p>\n<h4>RMSNorm<\/h4>\n<p>RMSNorm&#xff08;Root Mean Square Normalization&#xff09;\u662f\u4e00\u79cd\u5f52\u4e00\u5316\u6280\u672f&#xff0c;\u7528\u4e8e\u7a33\u5b9a\u548c\u52a0\u901f\u795e\u7ecf\u7f51\u7edc\u7684\u8bad\u7ec3\u8fc7\u7a0b\u3002\u4e0e\u5176\u4ed6\u5f52\u4e00\u5316\u65b9\u6cd5&#xff08;\u5982BatchNorm\u548cLayerNorm&#xff09;\u4e0d\u540c&#xff0c;RMSNorm\u901a\u8fc7\u8ba1\u7b97\u8f93\u5165\u5f20\u91cf\u7684\u5747\u65b9\u6839&#xff08;RMS&#xff09;\u6765\u8fdb\u884c\u5f52\u4e00\u5316\u3002RMSNorm\u516c\u5f0f\u5982\u4e0b&#xff1a; RMSNorm ( x ) &#061; x 1 d \u2211 i &#061; 1 d x i 2 &#043; \u03f5 \u22c5 \u03b3 \\\\text{RMSNorm}(x) &#061; \\\\frac{x}{\\\\sqrt{\\\\frac{1}{d} \\\\sum_{i&#061;1}^{d} x_i^2 &#043; \\\\epsilon}} \\\\cdot \\\\gamma RMSNorm(x)&#061;d1\u200b\u2211i&#061;1d\u200bxi2\u200b&#043;\u03f5 \u200bx\u200b\u22c5\u03b3 \u5176\u4e2d x x x\u662f\u8f93\u5165\u5411\u91cf&#xff0c; d d d \u662f\u8f93\u5165\u5411\u91cf\u7684\u7ef4\u5ea6&#xff0c; \u03f5 \\\\epsilon \u03f5\u662f\u4e00\u4e2a\u5c0f\u5e38\u6570&#xff0c;\u7528\u4e8e\u907f\u514d\u9664\u96f6\u9519\u8bef&#xff0c; \u03b3 \\\\gamma \u03b3\u662f\u4e00\u4e2a\u53ef\u5b66\u4e60\u7684\u7f29\u653e\u53c2\u6570\u3002<\/p>\n<p>LLaMa\u4e2d\u7684\u5b9e\u73b0\u5982\u4e0b&#xff1a;<\/p>\n<p>class RMSNorm(torch.nn.Module):<br \/>\n    def __init__(self, dim: int, eps: float &#061; 1e-6):<br \/>\n        super().__init__()<br \/>\n        self.eps &#061; eps<br \/>\n        self.weight &#061; nn.Parameter(torch.ones(dim))  <\/p>\n<p>    def _norm(self, x):<br \/>\n        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim&#061;True) &#043; self.eps)  <\/p>\n<p>    def forward(self, x):<br \/>\n        output &#061; self._norm(x.float()).type_as(x)<br \/>\n        return output * self.weight<\/p>\n<h4>SwiGLU\u6fc0\u6d3b\u51fd\u6570<\/h4>\n<p>SwiGLU (Swish-Gated Linear Unit) \u662f\u4e00\u79cd\u7528\u4e8e\u795e\u7ecf\u7f51\u7edc\u7684\u6fc0\u6d3b\u51fd\u6570&#xff0c;\u5b83\u7ed3\u5408\u4e86Swish\u6fc0\u6d3b\u51fd\u6570\u548c\u95e8\u63a7\u673a\u5236&#xff0c;\u80fd\u591f\u6709\u6548\u5730\u589e\u5f3a\u6a21\u578b\u7684\u8868\u8fbe\u80fd\u529b\u548c\u6027\u80fd\u3002\u516c\u5f0f\u5982\u4e0b&#xff1a; SwiGLU ( x ) &#061; Swish ( x ) \u22c5 ( Gated Linear Unit ( x ) ) \\\\text{SwiGLU}(x) &#061; \\\\text{Swish}(x) \\\\cdot (\\\\text{Gated Linear Unit}(x)) SwiGLU(x)&#061;Swish(x)\u22c5(Gated Linear Unit(x)) Swish ( x ) &#061; x \u22c5 \u03c3 ( x ) \\\\text{Swish}(x) &#061; x \\\\cdot \\\\sigma(x) Swish(x)&#061;x\u22c5\u03c3(x) Gated Linear Unit ( x ) &#061; Linear 1 ( x ) \u22c5 \u03c3 ( Linear 2 ( x ) ) \\\\text{Gated Linear Unit}(x) &#061; \\\\text{Linear}_1(x) \\\\cdot \\\\sigma(\\\\text{Linear}_2(x)) Gated Linear Unit(x)&#061;Linear1\u200b(x)\u22c5\u03c3(Linear2\u200b(x)) \u03c3 ( x ) &#061; 1 1 &#043; e \u2212 x \\\\sigma(x) &#061; \\\\frac{1}{1 &#043; e^{-x}} \u03c3(x)&#061;1&#043;e\u2212x1\u200b<\/p>\n<p>Linear 1 \\\\text{Linear}_1 Linear1\u200b\u548c Linear 2 \\\\text{Linear}_2 Linear2\u200b\u662f\u4e24\u4e2a\u5355\u72ec\u7684\u7ebf\u6027\u53d8\u6362\u3002<\/p>\n<p>LLaMa\u4ee3\u7801\u4e2d\u4f7f\u7528 F . s i l u ( x ) F.silu(x) F.silu(x)\u6dfb\u52a0SwiGLU\u6fc0\u6d3b\u51fd\u6570<\/p>\n<h4>RoPE<\/h4>\n<p>\u65cb\u8f6c\u4f4d\u7f6e\u5d4c\u5165&#xff08;Rotary Position Embedding, RoPE&#xff09;\u662f\u4e00\u79cd\u4e3a\u5e8f\u5217\u6a21\u578b&#xff08;\u5982Transformer&#xff09;\u63d0\u4f9b\u4f4d\u7f6e\u7f16\u7801\u7684\u65b9\u6cd5\u3002RoPE\u901a\u8fc7\u5c06\u8f93\u5165\u5411\u91cf\u5728\u590d\u6570\u57df\u8fdb\u884c\u65cb\u8f6c\u53d8\u6362&#xff0c;\u6765\u7f16\u7801\u5e8f\u5217\u4e2d\u4f4d\u7f6e\u7684\u4fe1\u606f\u3002\u4e0e\u4f20\u7edf\u7684\u4f4d\u7f6e\u7f16\u7801\u65b9\u6cd5&#xff08;\u5982\u6b63\u5f26-\u4f59\u5f26\u4f4d\u7f6e\u7f16\u7801&#xff09;\u76f8\u6bd4&#xff0c;RoPE\u80fd\u591f\u66f4\u597d\u5730\u6355\u6349\u5e8f\u5217\u4e2d\u7684\u76f8\u5bf9\u4f4d\u7f6e\u4fe1\u606f&#xff0c;\u63d0\u9ad8\u6a21\u578b\u7684\u8868\u73b0\u529b\u3002<\/p>\n<p>\u65cb\u8f6c\u4f4d\u7f6e\u5d4c\u5165&#xff08;RoPE&#xff09;\u662f\u4e00\u79cd\u4e3a\u5e8f\u5217\u6a21\u578b\u63d0\u4f9b\u4f4d\u7f6e\u7f16\u7801\u7684\u65b9\u6cd5\u3002\u5176\u901a\u8fc7\u5c06\u8f93\u5165\u5411\u91cf\u5728\u590d\u6570\u57df\u8fdb\u884c\u65cb\u8f6c\u53d8\u6362\u6765\u7f16\u7801\u4f4d\u7f6e\u4fe1\u606f\u3002\u4ee5\u4e0b\u662fRoPE\u7684\u5177\u4f53\u5b9e\u73b0\u6b65\u9aa4&#xff1a;<\/p>\n<li>\n<p>\u9891\u7387\u5411\u91cf\u7684\u8ba1\u7b97: f i &#061; 1 \u03b8 2 i d f_i &#061; \\\\frac{1}{\\\\theta^{\\\\frac{2i}{d}}} fi\u200b&#061;\u03b8d2i\u200b1\u200b \u5176\u4e2d \u03b8 \\\\theta \u03b8\u662f\u4e00\u4e2a\u5e38\u6570&#xff08;\u901a\u5e38\u53d6 10000&#xff09;&#xff0c; i i i\u662f\u5411\u91cf\u7ef4\u5ea6\u7684\u7d22\u5f15\u3002<\/p>\n<\/li>\n<li>\n<p>\u65cb\u8f6c\u89d2\u5ea6\u7684\u8ba1\u7b97: angle ( t ) &#061; t \u22c5 f i \\\\text{angle}(t) &#061; t \\\\cdot f_i angle(t)&#061;t\u22c5fi\u200b \u5176\u4e2d t t t\u662f\u4f4d\u7f6e\u7d22\u5f15\u3002<\/p>\n<\/li>\n<li>\n<p>\u5e94\u7528\u65cb\u8f6c\u53d8\u6362: \u5bf9\u6bcf\u4e2a\u4f4d\u7f6e t t t\u7684\u8f93\u5165\u5411\u91cf x t x_t xt\u200b&#xff0c;\u5728\u590d\u6570\u57df\u8fdb\u884c\u65cb\u8f6c\u53d8\u6362&#xff1a; x t \u2032 &#061; x t \u22c5 e j \u22c5 angle ( t ) x_t\u2019 &#061; x_t \\\\cdot e^{j \\\\cdot \\\\text{angle}(t)} xt\u2032\u200b&#061;xt\u200b\u22c5ej\u22c5angle(t) \u5bf9\u4e8e\u4f4d\u7f6e\u7f16\u7801&#xff0c;\u5e38\u89c4\u7684\u505a\u6cd5\u662f\u5728\u8ba1\u7b97 query&#xff0c;key \u548c value \u5411\u91cf\u4e4b\u524d&#xff0c;\u4f1a\u8ba1\u7b97\u4e00\u4e2a\u4f4d\u7f6e\u7f16\u7801\u5411\u91cf \u52a0\u5230\u8bcd\u5d4c\u5165\u4e0a&#xff0c;\u4f4d\u7f6e\u7f16\u7801\u5411\u91cf\u540c\u6837\u4e5f\u662f\u7ef4\u5411\u91cf&#xff0c;\u7136\u540e\u518d\u4e58\u4ee5\u5bf9\u5e94\u7684\u53d8\u6362\u77e9\u9635\u3002<\/p>\n<\/li>\n<p>RoPE \u7684 self-attention \u64cd\u4f5c\u7684\u6d41\u7a0b\u662f&#xff1a;\u5bf9\u4e8e token \u5e8f\u5217\u4e2d\u7684\u6bcf\u4e2a\u8bcd\u5d4c\u5165\u5411\u91cf&#xff0c;\u9996\u5148\u8ba1\u7b97\u5176\u5bf9\u5e94\u7684 query \u548c key \u5411\u91cf&#xff0c;\u7136\u540e\u5bf9\u6bcf\u4e2a token \u4f4d\u7f6e\u90fd\u8ba1\u7b97\u5bf9\u5e94\u7684\u65cb\u8f6c\u4f4d\u7f6e\u7f16\u7801&#xff0c;\u63a5\u7740\u5bf9\u6bcf\u4e2a token \u4f4d\u7f6e\u7684 query \u548c key \u5411\u91cf\u7684\u5143\u7d20\u6309\u7167\u4e24\u4e24\u4e00\u7ec4\u5e94\u7528\u65cb\u8f6c\u53d8\u6362&#xff0c;\u6700\u540e\u518d\u8ba1\u7b97 query \u548c key \u4e4b\u95f4\u7684\u5185\u79ef\u5f97\u5230 self-attention \u7684\u8ba1\u7b97\u7ed3\u679c\u3002<\/p>\n<p>\u4e0b\u56fe\u5f88\u76f4\u89c2\u7684\u5c55\u793a\u4e86\u65cb\u8f6c\u53d8\u6362\u7684\u8fc7\u7a0b&#xff1a; <img decoding=\"async\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/05\/20250507010932-681ab2ccd23ff.png\" alt=\"image.png\" \/><\/p>\n<p>\u65cb\u8f6c\u7f16\u7801 RoPE \u53ef\u4ee5\u6709\u6548\u5730\u4fdd\u6301\u4f4d\u7f6e\u4fe1\u606f\u7684\u76f8\u5bf9\u5173\u7cfb&#xff0c;\u5373\u76f8\u90bb\u4f4d\u7f6e\u7684\u7f16\u7801\u4e4b\u95f4\u6709\u4e00\u5b9a\u7684\u76f8\u4f3c\u6027&#xff0c;\u800c\u8fdc\u79bb\u4f4d\u7f6e\u7684\u7f16\u7801\u4e4b\u95f4\u6709\u4e00\u5b9a\u7684\u5dee\u5f02\u6027\u3002 \u8fd9\u6837\u53ef\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u4f4d\u7f6e\u4fe1\u606f\u7684\u611f\u77e5\u548c\u5229\u7528\u3002\u8fd9\u4e00\u70b9\u662f\u5176\u4ed6\u7edd\u5bf9\u4f4d\u7f6e\u7f16\u7801\u65b9\u5f0f&#xff08;\u5982\u6b63\u5f26\u4f4d\u7f6e\u7f16\u7801\u3001\u5b66\u4e60\u7684\u4f4d\u7f6e\u7f16\u7801\u7b49&#xff09;\u6240\u4e0d\u5177\u5907\u7684&#xff0c;\u56e0\u4e3a\u5b83\u4eec\u53ea\u80fd\u8868\u793a\u7edd\u5bf9\u4f4d\u7f6e&#xff0c;\u800c\u4e0d\u80fd\u8868\u793a\u76f8\u5bf9\u4f4d\u7f6e\u3002<\/p>\n<p>\u4e3a\u4ec0\u4e48\u65cb\u8f6c\u4f4d\u7f6e\u5d4c\u5165\u6709\u6548&#xff1f;<\/p>\n<li>\u6355\u6349\u76f8\u5bf9\u4f4d\u7f6e\u4fe1\u606f&#xff1a;\u4f20\u7edf\u7684\u4f4d\u7f6e\u5d4c\u5165\u65b9\u6cd5\u901a\u5e38\u4ec5\u7f16\u7801\u7edd\u5bf9\u4f4d\u7f6e&#xff0c;\u8fd9\u53ef\u80fd\u5728\u5904\u7406\u957f\u5e8f\u5217\u6216\u9700\u8981\u6355\u6349\u76f8\u5bf9\u4f4d\u7f6e\u4fe1\u606f\u7684\u4efb\u52a1\u4e2d\u8868\u73b0\u4e0d\u4f73\u3002\u800cRoPE\u901a\u8fc7\u65cb\u8f6c\u53d8\u6362\u81ea\u7136\u5730\u5f15\u5165\u4e86\u76f8\u5bf9\u4f4d\u7f6e\u4fe1\u606f&#xff0c;\u4f7f\u5f97\u6a21\u578b\u80fd\u591f\u66f4\u597d\u5730\u7406\u89e3\u5e8f\u5217\u4e2d\u5404\u4e2a\u4f4d\u7f6e\u4e4b\u95f4\u7684\u76f8\u5bf9\u5173\u7cfb\u3002<\/li>\n<li>\u7531\u4e8eRoPE\u901a\u8fc7\u590d\u6570\u57df\u7684\u65cb\u8f6c\u53d8\u6362\u6765\u7f16\u7801\u4f4d\u7f6e&#xff0c;\u8fd9\u79cd\u53d8\u6362\u80fd\u591f\u6355\u6349\u66f4\u52a0\u4e30\u5bcc\u7684\u4f4d\u7f6e\u4fe1\u606f\u3002\u76f8\u6bd4\u4e8e\u7b80\u5355\u7684\u7ebf\u6027\u53d8\u6362&#xff0c;\u65cb\u8f6c\u53d8\u6362\u63d0\u4f9b\u4e86\u66f4\u5f3a\u7684\u975e\u7ebf\u6027\u8868\u8fbe\u80fd\u529b&#xff0c;\u4f7f\u5f97\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\u5177\u6709\u66f4\u597d\u7684\u8868\u73b0\u529b\u3002<\/li>\n<li>RoPE\u7684\u8ba1\u7b97\u76f8\u5bf9\u7b80\u5355&#xff0c;\u4e0d\u9700\u8981\u590d\u6742\u7684\u77e9\u9635\u8fd0\u7b97\u3002\u9884\u8ba1\u7b97\u9891\u7387\u5411\u91cf\u548c\u5e94\u7528\u65cb\u8f6c\u53d8\u6362\u7684\u8fc7\u7a0b\u53ef\u4ee5\u9ad8\u6548\u5730\u5b9e\u73b0&#xff0c;\u9002\u5408\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u5927\u89c4\u6a21\u90e8\u7f72\u3002<\/li>\n<li>RoPE\u80fd\u591f\u65e0\u7f1d\u96c6\u6210\u5230\u73b0\u6709\u7684Transformer\u67b6\u6784\u4e2d&#xff0c;\u4e0d\u9700\u8981\u5bf9\u6a21\u578b\u7ed3\u6784\u8fdb\u884c\u5927\u7684\u4fee\u6539\u3002\u8fd9\u79cd\u517c\u5bb9\u6027\u4f7f\u5f97RoPE\u6210\u4e3a\u4e00\u79cd\u6613\u4e8e\u5e94\u7528\u548c\u63a8\u5e7f\u7684\u4f4d\u7f6e\u7f16\u7801\u65b9\u6cd5\u3002<\/li>\n<li>\u5728\u957f\u5e8f\u5217\u5904\u7406\u4efb\u52a1\u4e2d&#xff0c;\u4f20\u7edf\u7684\u4f4d\u7f6e\u7f16\u7801\u65b9\u6cd5\u53ef\u80fd\u4f1a\u9047\u5230\u4fe1\u606f\u7a00\u91ca\u6216\u8ba1\u7b97\u590d\u6742\u5ea6\u589e\u52a0\u7684\u95ee\u9898\u3002RoPE\u901a\u8fc7\u5f15\u5165\u65cb\u8f6c\u53d8\u6362&#xff0c;\u53ef\u4ee5\u66f4\u597d\u5730\u4fdd\u6301\u957f\u5e8f\u5217\u4e2d\u7684\u4f4d\u7f6e\u4fe1\u606f&#xff0c;\u4f7f\u5f97\u6a21\u578b\u5728\u957f\u5e8f\u5217\u4efb\u52a1\u4e2d\u8868\u73b0\u66f4\u52a0\u7a33\u5b9a\u548c\u9ad8\u6548\u3002<\/li>\n<li>(\u8fd9\u4e00\u70b9\u662f\u6211\u7684\u731c\u60f3)\u5728\u9ad8\u7ef4\u5411\u91cf\u4e2d&#xff0c;\u65b9\u5411\u662f\u6bd4\u6a21\u957f\u66f4\u91cd\u8981\u7684\u91cf&#xff0c;\u5e38\u89c4\u4f4d\u7f6e\u7f16\u7801\u76f4\u63a5\u5728\u8bcd\u5d4c\u5165\u4e0a\u52a0\u4e0a\u4f4d\u7f6e\u7f16\u7801&#xff0c;\u76f8\u5f53\u4e8e\u6539\u53d8\u4e86\u6a21\u957f&#xff0c;\u65cb\u8f6c\u4f4d\u7f6e\u7f16\u7801\u6539\u53d8\u4e86\u65b9\u5411&#xff0c;\u5b9e\u9645\u4e0a\u6bd4\u5e38\u89c4\u4f4d\u7f6e\u7f16\u7801\u591a\u83b7\u5f97\u4e86\u4e00\u90e8\u5206\u4fe1\u606f\u3002<\/li>\n<p>\u4e0b\u9762\u8fd9\u7bc7\u6587\u7ae0\u7ed9\u51fa\u4e86\u516c\u5f0f\u539f\u7406\u548c\u63a8\u5bfc&#xff0c;\u8bb2\u89e3\u5341\u5206\u8be6\u7ec6&#xff1a;\u70b9\u51fb\u6b64\u5904<\/p>\n<p>\u5728LLaMA\u4e2d&#xff0c;RoPE\u4f7f\u7528\u4e0b\u9762\u7684\u65b9\u5f0f\u5b9e\u73b0&#xff1a;<\/p>\n<p>def precompute_freqs_cis(dim: int, end: int, theta: float &#061; 10000.0):<br \/>\n    freqs &#061; 1.0 \/ (theta ** (torch.arange(0, dim, 2)[: (dim \/\/ 2)].float() \/ dim))<br \/>\n    t &#061; torch.arange(end, device&#061;freqs.device)  # type: ignore<br \/>\n    freqs &#061; torch.outer(t, freqs).float()  # type: ignore<br \/>\n    freqs_cis &#061; torch.polar(torch.ones_like(freqs), freqs)  # complex64<br \/>\n    return freqs_cis  <\/p>\n<p>def reshape_for_broadcast(freqs_cis: torch.Tensor, x: torch.Tensor):<br \/>\n    ndim &#061; x.ndim<br \/>\n    assert 0 &lt;&#061; 1 &lt; ndim<br \/>\n    assert freqs_cis.shape &#061;&#061; (x.shape[1], x.shape[-1])<br \/>\n    shape &#061; [d if i &#061;&#061; 1 or i &#061;&#061; ndim &#8211; 1 else 1 for i, d in enumerate(x.shape)]<br \/>\n    return freqs_cis.view(*shape)  <\/p>\n<p>def apply_rotary_emb(<br \/>\n    xq: torch.Tensor,<br \/>\n    xk: torch.Tensor,<br \/>\n    freqs_cis: torch.Tensor,<br \/>\n) -&gt; Tuple[torch.Tensor, torch.Tensor]:<br \/>\n    xq_ &#061; torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2))<br \/>\n    xk_ &#061; torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2))<br \/>\n    freqs_cis &#061; reshape_for_broadcast(freqs_cis, xq_)<br \/>\n    xq_out &#061; torch.view_as_real(xq_ * freqs_cis).flatten(3)<br \/>\n    xk_out &#061; torch.view_as_real(xk_ * freqs_cis).flatten(3)<br \/>\n    return xq_out.type_as(xq), xk_out.type_as(xk)<\/p>\n<p>\u4e0b\u9762\u7684\u4ee3\u7801\u7ed9\u51fa\u4e86\u52a0\u5165\u65cb\u8f6c\u4f4d\u7f6e\u5d4c\u5165\u7684\u6ce8\u610f\u529b\u673a\u5236&#xff1a;<\/p>\n<p>class Attention(nn.Module):<br \/>\n    def __init__(self, args: ModelArgs):<br \/>\n        super().__init__()  <\/p>\n<p>        self.n_local_heads &#061; args.n_heads \/\/ fs_init.get_model_parallel_world_size()<br \/>\n        self.head_dim &#061; args.dim \/\/ args.n_heads  <\/p>\n<p>        self.wq &#061; ColumnParallelLinear(<br \/>\n            args.dim,<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            bias&#061;False,<br \/>\n            gather_output&#061;False,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )<br \/>\n        self.wk &#061; ColumnParallelLinear(<br \/>\n            args.dim,<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            bias&#061;False,<br \/>\n            gather_output&#061;False,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )<br \/>\n        self.wv &#061; ColumnParallelLinear(<br \/>\n            args.dim,<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            bias&#061;False,<br \/>\n            gather_output&#061;False,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )<br \/>\n        self.wo &#061; RowParallelLinear(<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            args.dim,<br \/>\n            bias&#061;False,<br \/>\n            input_is_parallel&#061;True,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )  <\/p>\n<p>        self.cache_k &#061; torch.zeros(<br \/>\n            (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)<br \/>\n        ).cuda()<br \/>\n        self.cache_v &#061; torch.zeros(<br \/>\n            (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)<br \/>\n        ).cuda()  <\/p>\n<p>    def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, mask: Optional[torch.Tensor]):<br \/>\n        bsz, seqlen, _ &#061; x.shape<br \/>\n        xq, xk, xv &#061; self.wq(x), self.wk(x), self.wv(x)  <\/p>\n<p>        xq &#061; xq.view(bsz, seqlen, self.n_local_heads, self.head_dim)<br \/>\n        xk &#061; xk.view(bsz, seqlen, self.n_local_heads, self.head_dim)<br \/>\n        xv &#061; xv.view(bsz, seqlen, self.n_local_heads, self.head_dim)  <\/p>\n<p>        xq, xk &#061; apply_rotary_emb(xq, xk, freqs_cis&#061;freqs_cis)  <\/p>\n<p>        self.cache_k &#061; self.cache_k.to(xq)<br \/>\n        self.cache_v &#061; self.cache_v.to(xq)  <\/p>\n<p>        self.cache_k[:bsz, start_pos : start_pos &#043; seqlen] &#061; xk<br \/>\n        self.cache_v[:bsz, start_pos : start_pos &#043; seqlen] &#061; xv  <\/p>\n<p>        keys &#061; self.cache_k[:bsz, : start_pos &#043; seqlen]<br \/>\n        values &#061; self.cache_v[:bsz, : start_pos &#043; seqlen]  <\/p>\n<p>        xq &#061; xq.transpose(1, 2)<br \/>\n        keys &#061; keys.transpose(1, 2)<br \/>\n        values &#061; values.transpose(1, 2)<br \/>\n        scores &#061; torch.matmul(xq, keys.transpose(2, 3)) \/ math.sqrt(self.head_dim)<br \/>\n        if mask is not None:<br \/>\n            scores &#061; scores &#043; mask  # (bs, n_local_heads, slen, cache_len &#043; slen)<br \/>\n        scores &#061; F.softmax(scores.float(), dim&#061;-1).type_as(xq)<br \/>\n        output &#061; torch.matmul(scores, values)  # (bs, n_local_heads, slen, head_dim)<br \/>\n        output &#061; output.transpose(<br \/>\n            1, 2<br \/>\n        ).contiguous().view(bsz, seqlen, -1)  <\/p>\n<p>        return self.wo(output)<\/p>\n<p>\u63a5\u4e0b\u6765\u7ed9\u51faLLaMA\u5b9e\u73b0\u7684\u5168\u90e8\u4ee3\u7801&#xff1a;<\/p>\n<p># Copyright (c) Meta Platforms, Inc. and affiliates.<br \/>\n# This software may be used and distributed according to the terms of the GNU General Public License version 3.  <\/p>\n<p>from typing import Optional, Tuple<br \/>\nfrom dataclasses import dataclass<br \/>\nimport math  <\/p>\n<p>import torch<br \/>\nfrom torch import nn<br \/>\nimport torch.nn.functional as F  <\/p>\n<p>import fairscale.nn.model_parallel.initialize as fs_init<br \/>\nfrom fairscale.nn.model_parallel.layers import (<br \/>\n    ParallelEmbedding,<br \/>\n    RowParallelLinear,<br \/>\n    ColumnParallelLinear,<br \/>\n)  <\/p>\n<p>&#064;dataclass<br \/>\nclass ModelArgs:<br \/>\n    dim: int &#061; 512<br \/>\n    n_layers: int &#061; 8<br \/>\n    n_heads: int &#061; 8<br \/>\n    vocab_size: int &#061; -1  # defined later by tokenizer<br \/>\n    multiple_of: int &#061; 256  # make SwiGLU hidden layer size multiple of large power of 2<br \/>\n    norm_eps: float &#061; 1e-5  <\/p>\n<p>    max_batch_size: int &#061; 32<br \/>\n    max_seq_len: int &#061; 2048  <\/p>\n<p>class RMSNorm(torch.nn.Module):<br \/>\n    def __init__(self, dim: int, eps: float &#061; 1e-6):<br \/>\n        super().__init__()<br \/>\n        self.eps &#061; eps<br \/>\n        self.weight &#061; nn.Parameter(torch.ones(dim))  <\/p>\n<p>    def _norm(self, x):<br \/>\n        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim&#061;True) &#043; self.eps)  <\/p>\n<p>    def forward(self, x):<br \/>\n        output &#061; self._norm(x.float()).type_as(x)<br \/>\n        return output * self.weight  <\/p>\n<p>def precompute_freqs_cis(dim: int, end: int, theta: float &#061; 10000.0):<br \/>\n    freqs &#061; 1.0 \/ (theta ** (torch.arange(0, dim, 2)[: (dim \/\/ 2)].float() \/ dim))<br \/>\n    t &#061; torch.arange(end, device&#061;freqs.device)  # type: ignore<br \/>\n    freqs &#061; torch.outer(t, freqs).float()  # type: ignore<br \/>\n    freqs_cis &#061; torch.polar(torch.ones_like(freqs), freqs)  # complex64<br \/>\n    return freqs_cis  <\/p>\n<p>def reshape_for_broadcast(freqs_cis: torch.Tensor, x: torch.Tensor):<br \/>\n    ndim &#061; x.ndim<br \/>\n    assert 0 &lt;&#061; 1 &lt; ndim<br \/>\n    assert freqs_cis.shape &#061;&#061; (x.shape[1], x.shape[-1])<br \/>\n    shape &#061; [d if i &#061;&#061; 1 or i &#061;&#061; ndim &#8211; 1 else 1 for i, d in enumerate(x.shape)]<br \/>\n    return freqs_cis.view(*shape)  <\/p>\n<p>def apply_rotary_emb(<br \/>\n    xq: torch.Tensor,<br \/>\n    xk: torch.Tensor,<br \/>\n    freqs_cis: torch.Tensor,<br \/>\n) -&gt; Tuple[torch.Tensor, torch.Tensor]:<br \/>\n    xq_ &#061; torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2))<br \/>\n    xk_ &#061; torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2))<br \/>\n    freqs_cis &#061; reshape_for_broadcast(freqs_cis, xq_)<br \/>\n    xq_out &#061; torch.view_as_real(xq_ * freqs_cis).flatten(3)<br \/>\n    xk_out &#061; torch.view_as_real(xk_ * freqs_cis).flatten(3)<br \/>\n    return xq_out.type_as(xq), xk_out.type_as(xk)  <\/p>\n<p>class Attention(nn.Module):<br \/>\n    def __init__(self, args: ModelArgs):<br \/>\n        super().__init__()  <\/p>\n<p>        self.n_local_heads &#061; args.n_heads \/\/ fs_init.get_model_parallel_world_size()<br \/>\n        self.head_dim &#061; args.dim \/\/ args.n_heads  <\/p>\n<p>        self.wq &#061; ColumnParallelLinear(<br \/>\n            args.dim,<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            bias&#061;False,<br \/>\n            gather_output&#061;False,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )<br \/>\n        self.wk &#061; ColumnParallelLinear(<br \/>\n            args.dim,<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            bias&#061;False,<br \/>\n            gather_output&#061;False,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )<br \/>\n        self.wv &#061; ColumnParallelLinear(<br \/>\n            args.dim,<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            bias&#061;False,<br \/>\n            gather_output&#061;False,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )<br \/>\n        self.wo &#061; RowParallelLinear(<br \/>\n            args.n_heads * self.head_dim,<br \/>\n            args.dim,<br \/>\n            bias&#061;False,<br \/>\n            input_is_parallel&#061;True,<br \/>\n            init_method&#061;lambda x: x,<br \/>\n        )  <\/p>\n<p>        self.cache_k &#061; torch.zeros(<br \/>\n            (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)<br \/>\n        ).cuda()<br \/>\n        self.cache_v &#061; torch.zeros(<br \/>\n            (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)<br \/>\n        ).cuda()  <\/p>\n<p>    def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, mask: Optional[torch.Tensor]):<br \/>\n        bsz, seqlen, _ &#061; x.shape<br \/>\n        xq, xk, xv &#061; self.wq(x), self.wk(x), self.wv(x)  <\/p>\n<p>        xq &#061; xq.view(bsz, seqlen, self.n_local_heads, self.head_dim)<br \/>\n        xk &#061; xk.view(bsz, seqlen, self.n_local_heads, self.head_dim)<br \/>\n        xv &#061; xv.view(bsz, seqlen, self.n_local_heads, self.head_dim)  <\/p>\n<p>        xq, xk &#061; apply_rotary_emb(xq, xk, freqs_cis&#061;freqs_cis)  <\/p>\n<p>        self.cache_k &#061; self.cache_k.to(xq)<br \/>\n        self.cache_v &#061; self.cache_v.to(xq)  <\/p>\n<p>        self.cache_k[:bsz, start_pos : start_pos &#043; seqlen] &#061; xk<br \/>\n        self.cache_v[:bsz, start_pos : start_pos &#043; seqlen] &#061; xv  <\/p>\n<p>        keys &#061; self.cache_k[:bsz, : start_pos &#043; seqlen]<br \/>\n        values &#061; self.cache_v[:bsz, : start_pos &#043; seqlen]  <\/p>\n<p>        xq &#061; xq.transpose(1, 2)<br \/>\n        keys &#061; keys.transpose(1, 2)<br \/>\n        values &#061; values.transpose(1, 2)<br \/>\n        scores &#061; torch.matmul(xq, keys.transpose(2, 3)) \/ math.sqrt(self.head_dim)<br \/>\n        if mask is not None:<br \/>\n            scores &#061; scores &#043; mask  # (bs, n_local_heads, slen, cache_len &#043; slen)<br \/>\n        scores &#061; F.softmax(scores.float(), dim&#061;-1).type_as(xq)<br \/>\n        output &#061; torch.matmul(scores, values)  # (bs, n_local_heads, slen, head_dim)<br \/>\n        output &#061; output.transpose(<br \/>\n            1, 2<br \/>\n        ).contiguous().view(bsz, seqlen, -1)  <\/p>\n<p>        return self.wo(output)  <\/p>\n<p>class FeedForward(nn.Module):<br \/>\n    def __init__(<br \/>\n        self,<br \/>\n        dim: int,<br \/>\n        hidden_dim: int,<br \/>\n        multiple_of: int,<br \/>\n    ):<br \/>\n        super().__init__()<br \/>\n        hidden_dim &#061; int(2 * hidden_dim \/ 3)<br \/>\n        hidden_dim &#061; multiple_of * ((hidden_dim &#043; multiple_of &#8211; 1) \/\/ multiple_of)  <\/p>\n<p>        self.w1 &#061; ColumnParallelLinear(<br \/>\n            dim, hidden_dim, bias&#061;False, gather_output&#061;False, init_method&#061;lambda x: x<br \/>\n        )<br \/>\n        self.w2 &#061; RowParallelLinear(<br \/>\n            hidden_dim, dim, bias&#061;False, input_is_parallel&#061;True, init_method&#061;lambda x: x<br \/>\n        )<br \/>\n        self.w3 &#061; ColumnParallelLinear(<br \/>\n            dim, hidden_dim, bias&#061;False, gather_output&#061;False, init_method&#061;lambda x: x<br \/>\n        )  <\/p>\n<p>    def forward(self, x):<br \/>\n        return self.w2(F.silu(self.w1(x)) * self.w3(x))  <\/p>\n<p>class TransformerBlock(nn.Module):<br \/>\n    def __init__(self, layer_id: int, args: ModelArgs):<br \/>\n        super().__init__()<br \/>\n        self.n_heads &#061; args.n_heads<br \/>\n        self.dim &#061; args.dim<br \/>\n        self.head_dim &#061; args.dim \/\/ args.n_heads<br \/>\n        self.attention &#061; Attention(args)<br \/>\n        self.feed_forward &#061; FeedForward(<br \/>\n            dim&#061;args.dim, hidden_dim&#061;4 * args.dim, multiple_of&#061;args.multiple_of<br \/>\n        )<br \/>\n        self.layer_id &#061; layer_id<br \/>\n        self.attention_norm &#061; RMSNorm(args.dim, eps&#061;args.norm_eps)<br \/>\n        self.ffn_norm &#061; RMSNorm(args.dim, eps&#061;args.norm_eps)  <\/p>\n<p>    def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, mask: Optional[torch.Tensor]):<br \/>\n        h &#061; x &#043; self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask)<br \/>\n        out &#061; h &#043; self.feed_forward.forward(self.ffn_norm(h))<br \/>\n        return out  <\/p>\n<p>class Transformer(nn.Module):<br \/>\n    def __init__(self, params: ModelArgs):<br \/>\n        super().__init__()<br \/>\n        self.params &#061; params<br \/>\n        self.vocab_size &#061; params.vocab_size<br \/>\n        self.n_layers &#061; params.n_layers  <\/p>\n<p>        self.tok_embeddings &#061; ParallelEmbedding(<br \/>\n            params.vocab_size, params.dim, init_method&#061;lambda x: x<br \/>\n        )  <\/p>\n<p>        self.layers &#061; torch.nn.ModuleList()<br \/>\n        for layer_id in range(params.n_layers):<br \/>\n            self.layers.append(TransformerBlock(layer_id, params))  <\/p>\n<p>        self.norm &#061; RMSNorm(params.dim, eps&#061;params.norm_eps)<br \/>\n        self.output &#061; ColumnParallelLinear(<br \/>\n            params.dim, params.vocab_size, bias&#061;False, init_method&#061;lambda x: x<br \/>\n        )  <\/p>\n<p>        self.freqs_cis &#061; precompute_freqs_cis(<br \/>\n            self.params.dim \/\/ self.params.n_heads, self.params.max_seq_len * 2<br \/>\n        )  <\/p>\n<p>    &#064;torch.inference_mode()<br \/>\n    def forward(self, tokens: torch.Tensor, start_pos: int):<br \/>\n        _bsz, seqlen &#061; tokens.shape<br \/>\n        h &#061; self.tok_embeddings(tokens)<br \/>\n        self.freqs_cis &#061; self.freqs_cis.to(h.device)<br \/>\n        freqs_cis &#061; self.freqs_cis[start_pos : start_pos &#043; seqlen]  <\/p>\n<p>        mask &#061; None<br \/>\n        if seqlen &gt; 1:<br \/>\n            mask &#061; torch.full((1, 1, seqlen, seqlen), float(&#034;-inf&#034;), device&#061;tokens.device)<br \/>\n            mask &#061; torch.triu(mask, diagonal&#061;start_pos &#043; 1).type_as(h)  <\/p>\n<p>        for layer in self.layers:<br \/>\n            h &#061; layer(h, start_pos, freqs_cis, mask)<br \/>\n        h &#061; self.norm(h)<br \/>\n        output &#061; self.output(h[:, -1, :])  # only compute last logits<br \/>\n        return output.float()<\/p>\n<hr \/>\n<h2><font color=\"red\">CSDN\u72ec\u5bb6\u798f\u5229<\/font><\/h2>\n<p>\u6700\u540e&#xff0c;\u611f\u8c22\u6bcf\u4e00\u4e2a\u8ba4\u771f\u9605\u8bfb\u6211\u6587\u7ae0\u7684\u4eba&#xff0c;\u793c\u5c1a\u5f80\u6765\u603b\u662f\u8981\u6709\u7684&#xff0c;\u4e0b\u9762\u8d44\u6599\u867d\u7136\u4e0d\u662f\u4ec0\u4e48\u5f88\u503c\u94b1\u7684\u4e1c\u897f&#xff0c;\u5982\u679c\u4f60\u7528\u5f97\u5230\u7684\u8bdd\u53ef\u4ee5\u76f4\u63a5\u62ff\u8d70&#xff1a; <img decoding=\"async\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/05\/20250507010933-681ab2cd27cc3.png\" alt=\"\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb897\u6b21\uff0c\u70b9\u8d5e18\u6b21\uff0c\u6536\u85cf21\u6b21\u3002\u4e0b\u56fe\u5f88\u76f4\u89c2\u7684\u5c55\u793a\u4e86\u65cb\u8f6c\u53d8\u6362\u7684\u8fc7\u7a0b\uff1a\u65cb\u8f6c\u7f16\u7801 RoPE \u53ef\u4ee5\u6709\u6548\u5730\u4fdd\u6301\u4f4d\u7f6e\u4fe1\u606f\u7684\u76f8\u5bf9\u5173\u7cfb\uff0c_swigu layer<\/p>\n","protected":false},"author":2,"featured_media":35906,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[1699,66,347,50,2215,224,51],"topic":[],"class_list":["post-35908","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-server","tag-agi","tag-ai","tag-llama","tag-50","tag-2215","tag-224","tag-51"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.wsisp.com\/helps\/35908.html\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"og:description\" content=\"\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb897\u6b21\uff0c\u70b9\u8d5e18\u6b21\uff0c\u6536\u85cf21\u6b21\u3002\u4e0b\u56fe\u5f88\u76f4\u89c2\u7684\u5c55\u793a\u4e86\u65cb\u8f6c\u53d8\u6362\u7684\u8fc7\u7a0b\uff1a\u65cb\u8f6c\u7f16\u7801 RoPE \u53ef\u4ee5\u6709\u6548\u5730\u4fdd\u6301\u4f4d\u7f6e\u4fe1\u606f\u7684\u76f8\u5bf9\u5173\u7cfb\uff0c_swigu layer\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.wsisp.com\/helps\/35908.html\" \/>\n<meta property=\"og:site_name\" content=\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"article:published_time\" content=\"2025-05-07T01:09:34+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/05\/20250507010932-681ab2ccd23ff.png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/35908.html\",\"url\":\"https:\/\/www.wsisp.com\/helps\/35908.html\",\"name\":\"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"isPartOf\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\"},\"datePublished\":\"2025-05-07T01:09:34+00:00\",\"dateModified\":\"2025-05-07T01:09:34+00:00\",\"author\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/35908.html#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.wsisp.com\/helps\/35908.html\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/35908.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.wsisp.com\/helps\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\",\"url\":\"https:\/\/www.wsisp.com\/helps\/\",\"name\":\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"description\":\"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"contentUrl\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"caption\":\"admin\"},\"sameAs\":[\"http:\/\/wp.wsisp.com\"],\"url\":\"https:\/\/www.wsisp.com\/helps\/author\/admin\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.wsisp.com\/helps\/35908.html","og_locale":"zh_CN","og_type":"article","og_title":"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","og_description":"\u6587\u7ae0\u6d4f\u89c8\u9605\u8bfb897\u6b21\uff0c\u70b9\u8d5e18\u6b21\uff0c\u6536\u85cf21\u6b21\u3002\u4e0b\u56fe\u5f88\u76f4\u89c2\u7684\u5c55\u793a\u4e86\u65cb\u8f6c\u53d8\u6362\u7684\u8fc7\u7a0b\uff1a\u65cb\u8f6c\u7f16\u7801 RoPE \u53ef\u4ee5\u6709\u6548\u5730\u4fdd\u6301\u4f4d\u7f6e\u4fe1\u606f\u7684\u76f8\u5bf9\u5173\u7cfb\uff0c_swigu layer","og_url":"https:\/\/www.wsisp.com\/helps\/35908.html","og_site_name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","article_published_time":"2025-05-07T01:09:34+00:00","og_image":[{"url":"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2025\/05\/20250507010932-681ab2ccd23ff.png"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"11 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.wsisp.com\/helps\/35908.html","url":"https:\/\/www.wsisp.com\/helps\/35908.html","name":"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","isPartOf":{"@id":"https:\/\/www.wsisp.com\/helps\/#website"},"datePublished":"2025-05-07T01:09:34+00:00","dateModified":"2025-05-07T01:09:34+00:00","author":{"@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41"},"breadcrumb":{"@id":"https:\/\/www.wsisp.com\/helps\/35908.html#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.wsisp.com\/helps\/35908.html"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.wsisp.com\/helps\/35908.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.wsisp.com\/helps"},{"@type":"ListItem","position":2,"name":"\u4e00\u6587\u8be6\u89e3LLaMa\u7cfb\u5217\u6a21\u578b\uff1a\u539f\u7406\u4ecb\u7ecd\u3001\u4ee3\u7801\u89e3\u8bfb"}]},{"@type":"WebSite","@id":"https:\/\/www.wsisp.com\/helps\/#website","url":"https:\/\/www.wsisp.com\/helps\/","name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","description":"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41","name":"admin","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/","url":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","contentUrl":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","caption":"admin"},"sameAs":["http:\/\/wp.wsisp.com"],"url":"https:\/\/www.wsisp.com\/helps\/author\/admin"}]}},"_links":{"self":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/35908","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/comments?post=35908"}],"version-history":[{"count":0,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/35908\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media\/35906"}],"wp:attachment":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media?parent=35908"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/categories?post=35908"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/tags?post=35908"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/topic?post=35908"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}