{"id":60102,"date":"2026-01-14T22:50:19","date_gmt":"2026-01-14T14:50:19","guid":{"rendered":"https:\/\/www.wsisp.com\/helps\/60102.html"},"modified":"2026-01-14T22:50:19","modified_gmt":"2026-01-14T14:50:19","slug":"engram%ef%bc%81%e3%80%8aconditional-memory-via-scalable-lookup-a-new-axis-of-sparsity-for-large-language-models%e3%80%8b%e8%a7%a3%e8%af%bb%ef%bc%81","status":"publish","type":"post","link":"https:\/\/www.wsisp.com\/helps\/60102.html","title":{"rendered":"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01"},"content":{"rendered":"<h3>0. \u8fd9\u7bc7\u8bba\u6587\u5230\u5e95\u60f3\u89e3\u51b3\u4ec0\u4e48<\/h3>\n<p>\u6838\u5fc3\u77db\u76fe&#xff1a;MoE \u901a\u8fc7\u201c\u6761\u4ef6\u8ba1\u7b97\u201d\u628a\u53c2\u6570\u89c4\u6a21\u505a\u5927\u4f46\u4e0d\u6309\u6bd4\u4f8b\u589e\u52a0 FLOPs&#xff1b;\u7136\u800c Transformer \u6ca1\u6709\u539f\u751f\u201c\u77e5\u8bc6\u67e5\u8868\u201d\u539f\u8bed&#xff0c;\u5bfc\u81f4\u5f88\u591a\u201c\u672c\u8d28\u4e0a\u662f\u9759\u6001\u77e5\u8bc6\/\u56fa\u5b9a\u642d\u914d\/\u5c40\u90e8\u4f9d\u8d56\u201d\u7684\u4e1c\u897f&#xff0c;\u4e5f\u88ab\u8feb\u7528\u6ce8\u610f\u529b&#043;MLP \u5728\u591a\u5c42\u91cc\u201c\u7b97\u51fa\u6765\u201d&#xff0c;\u6d6a\u8d39\u6df1\u5ea6\u4e0e\u7b97\u529b\u3002\u8bba\u6587\u628a\u8bed\u8a00\u5efa\u6a21\u62c6\u6210\u4e24\u7c7b\u5f02\u8d28\u5b50\u4efb\u52a1&#xff1a;<\/p>\n<ul>\n<li>\u7ec4\u5408\u63a8\u7406&#xff08;compositional reasoning&#xff09;&#xff1a;\u9700\u8981\u52a8\u6001\u8ba1\u7b97&#xff08;MoE\u64c5\u957f&#xff09;<\/li>\n<li>\u77e5\u8bc6\u68c0\u7d22&#xff08;knowledge retrieval&#xff09;&#xff1a;\u66f4\u50cf\u9759\u6001\u67e5\u8868&#xff08;\u4f20\u7edf Transformer \u6ca1\u6709&#xff09;<\/li>\n<\/ul>\n<p>\u56e0\u6b64\u4f5c\u8005\u63d0\u51fa\u4e00\u4e2a\u65b0\u8f74&#xff1a;Conditional Memory&#xff08;\u6761\u4ef6\u8bb0\u5fc6&#xff09;&#xff0c;\u4e0e MoE \u7684 Conditional Computation \u4e92\u8865&#xff0c;\u5e76\u7528\u4e00\u4e2a\u53ef\u5de5\u7a0b\u5316\u7684\u5b9e\u4f8b Engram \u6765\u5b9e\u73b0\u3002<\/p>\n<hr \/>\n<h3>1. Engram \u662f\u4ec0\u4e48&#xff1a;\u628a\u201c\u9759\u6001\u6a21\u5f0f\u201d\u4ece\u201c\u52a8\u6001\u8ba1\u7b97\u201d\u91cc\u5265\u79bb\u51fa\u6765<\/h3>\n<h4>1.1 \u6a21\u5757\u4f4d\u7f6e\u4e0e\u603b\u4f53\u6d41\u7a0b&#xff08;\u4e24\u9636\u6bb5&#xff1a;\u68c0\u7d22 &#043; \u878d\u5408&#xff09;<\/h4>\n<p>Engram \u662f\u63d2\u5728 Transformer \u67d0\u4e9b\u5c42&#xff08;\u4e0d\u662f\u6bcf\u5c42&#xff09;\u7684\u4e00\u4e2a\u6a21\u5757&#xff1a;\u5bf9\u6bcf\u4e2a token \u4f4d\u7f6e t&#xff0c;\u5148\u7528\u5c40\u90e8 N-gram \u4f5c\u4e3a key \u53bb O(1) \u67e5\u8868\u62ff\u5230\u9759\u6001\u5411\u91cf&#xff0c;\u518d\u7528\u5f53\u524d hidden state \u505a\u4e0a\u4e0b\u6587\u95e8\u63a7\u878d\u5408&#xff0c;\u6700\u540e\u518d\u8d70\u6807\u51c6 Attention&#043;MoE\u3002<\/p>\n<p>\u8fd9\u70b9\u4e0e\u4f60\u7684\u89e3\u8bfb\u4e00\u81f4&#xff1a;Engram \u4e0d\u662f\u66ff\u4ee3 attention&#xff0c;\u800c\u662f\u628a\u4e00\u90e8\u5206\u201c\u53ef\u67e5\u8868\u7684\u4e1c\u897f\u201d\u524d\u7f6e\/\u5206\u62c5\u6389&#xff0c;\u4f7f attention \u66f4\u4e13\u6ce8\u201c\u5168\u5c40\u4e0a\u4e0b\u6587\u4e0e\u63a8\u7406\u201d\u3002<\/p>\n<hr \/>\n<h3>2. \u5173\u952e\u6280\u672f\u70b9&#xff1a;\u8bba\u6587\u91cc\u771f\u6b63\u201c\u7ad9\u5f97\u4f4f\u201d\u7684 4 \u4e2a\u521b\u65b0<\/h3>\n<h4>2.1 Tokenizer Compression&#xff1a;\u5148\u628a token ID \u505a\u201c\u8bed\u4e49\u5f52\u5e76\u201d<\/h4>\n<p>\u6807\u51c6 tokenizer \u4e3a\u4e86\u53ef\u9006\u91cd\u5efa&#xff0c;\u5e38\u628a\u8868\u9762\u4e0d\u540c\u4f46\u8bed\u4e49\u7b49\u4ef7\u7684 token&#xff08;\u5982 \u201cApple\u201d vs \u201c\u2423apple\u201d&#xff09;\u5206\u914d\u4e0d\u540c ID&#xff0c;\u5bfc\u81f4 N-gram key \u7a7a\u95f4\u88ab\u65e0\u610f\u4e49\u7a00\u91ca\u3002\u8bba\u6587\u505a\u4e86\u4e00\u4e2a\u8bcd\u8868\u6295\u5f71 P: V \u2192 V\u2019&#xff08;\u6ee1\u5c04&#xff09;&#xff0c;\u7528 NFKC\u3001lowercase \u7b49\u89c4\u5219\u628a\u201c\u7b49\u4ef7\u6587\u672c\u201d\u6298\u53e0\u6210 canonical ID&#xff0c;\u4ece\u800c\u63d0\u9ad8\u201c\u8bed\u4e49\u5bc6\u5ea6\u201d\u3002\u8bba\u6587\u62a5\u544a&#xff1a;\u5bf9 128k tokenizer \u6709 23% \u7684\u6709\u6548\u8bcd\u8868\u89c4\u6a21\u7f29\u51cf\u3002<\/p>\n<p>\u8fd9\u4e00\u6b65\u5f88\u5173\u952e&#xff1a;\u5426\u5219 N-gram \u8bb0\u5fc6\u4f1a\u628a\u5bb9\u91cf\u6d6a\u8d39\u5728\u201c\u5927\u5c0f\u5199\/\u7a7a\u683c\/\u5f52\u4e00\u5316\u5dee\u5f02\u201d\u4e0a&#xff0c;\u78b0\u649e\u4e0e\u7a00\u758f\u90fd\u4f1a\u66f4\u7cdf\u3002<\/p>\n<h4>2.2 Sparse Retrieval&#xff1a;Multi-head hashing \u7684 N-gram \u67e5\u8868&#xff08;O(1)&#xff09;<\/h4>\n<p>\u76f4\u63a5\u7ed9\u6240\u6709 N-gram \u5efa\u8868\u4e0d\u53ef\u884c&#xff0c;\u6240\u4ee5 Engram \u7528 hashing&#xff1a;<\/p>\n<ul>\n<li>\u5bf9\u6bcf\u4e2a N-gram \u9636\u6570 n&#xff08;\u8bba\u6587\u9ed8\u8ba4\u5173\u6ce8 2\/3-gram&#xff09;&#xff0c;\u4f7f\u7528 K \u4e2a\u72ec\u7acb hash head&#xff0c;\u6bcf\u4e2a head \u6620\u5c04\u5230\u4e00\u4e2a\u7d20\u6570\u5927\u5c0f\u7684 embedding table&#xff08;\u964d\u4f4e\u78b0\u649e&#xff09;\u3002<\/li>\n<li>hash \u5b9e\u73b0\u4e3a\u201c\u8f7b\u91cf multiplicative-XOR\u201d&#xff0c;\u6700\u540e\u628a\u5404 n\u3001\u5404 head \u7684\u5411\u91cf concat \u5f97\u5230\u8bb0\u5fc6\u5411\u91cf (e_t)\u3002<\/li>\n<\/ul>\n<p>\u4f60\u6587\u4e2d\u201c\u591a\u5934\u50cf\u5e03\u9686\u8fc7\u6ee4\u5668\/\u96c6\u6210\u964d\u4f4e\u78b0\u649e\u201d\u7684\u76f4\u89c9\u662f\u5bf9\u7684&#xff1b;\u4f46\u8bba\u6587\u66f4\u660e\u786e\u5f3a\u8c03&#xff1a;multi-head &#043; prime table size \u662f\u4e3a\u4e86\u5de5\u7a0b\u4e0a\u53ef\u63a7\u5730\u538b\u4f4e\u78b0\u649e\u4e0e\u566a\u58f0&#xff0c;\u540c\u65f6\u4fdd\u6301\u68c0\u7d22\u5e38\u6570\u5f00\u9500\u3002<\/p>\n<h4>2.3 Context-aware gating&#xff1a;\u7528 hidden state \u505a Query&#xff0c;\u51b3\u5b9a\u201c\u8bb0\u5fc6\u8981\u4e0d\u8981\u4fe1\u201d<\/h4>\n<p>\u68c0\u7d22\u5230\u7684 (e_t) \u662f\u201c\u4e0a\u4e0b\u6587\u65e0\u5173\u7684\u5148\u9a8c\u201d&#xff0c;\u4f1a\u6709\u4e24\u7c7b\u566a\u58f0&#xff1a;hash \u78b0\u649e\u3001\u4ee5\u53ca\u591a\u4e49\u6027&#xff08;polysemy&#xff09;\u3002\u8bba\u6587\u505a\u6cd5\u662f&#xff1a;<\/p>\n<ul>\n<li>(h_t)&#xff08;\u5df2\u7ecf\u7ecf\u8fc7\u524d\u9762 attention \u805a\u5408\u4e86\u5168\u5c40\u4e0a\u4e0b\u6587&#xff09;\u5f53 Query<\/li>\n<li>(e_t) \u7ecf\u6295\u5f71\u5f97\u5230 Key\/Value&#xff1a;(k_t&#061;W_K e_t,\\\\ v_t&#061;W_V e_t)<\/li>\n<li>\u7528 RMSNorm \u7a33\u5b9a\u8bad\u7ec3&#xff0c;\u7136\u540e gate&#xff1a;(\\\\alpha_t &#061; \\\\sigma(\\\\text{RMSNorm}(h_t)^T \\\\text{RMSNorm}(k_t)\/\\\\sqrt{d}))<\/li>\n<li>\u8f93\u51fa\u4e3a (\\\\tilde v_t &#061; \\\\alpha_t \\\\cdot v_t)&#xff1b;\u82e5\u8bb0\u5fc6\u4e0e\u4e0a\u4e0b\u6587\u51b2\u7a81&#xff0c;gate \u4f1a\u538b\u5230\u63a5\u8fd1 0\u3002<\/li>\n<\/ul>\n<p>\u6ce8\u610f&#xff1a;\u8bba\u6587 gate \u5c31\u662f\u6807\u51c6 sigmoid(dot) \u5f62\u5f0f&#xff08;\u914d RMSNorm&#xff09;&#xff0c;\u5e76\u6ca1\u6709\u4f60\u89e3\u8bfb\u91cc\u90a3\u79cd\u201csign*sqrt(|x|)\u201d\u7684\u7279\u6b8a\u975e\u7ebf\u6027&#xff08;\u90a3\u6bb5\u66f4\u50cf\u4f60\/\u4ed6\u4eba\u4e8c\u521b&#xff09;\u3002<\/p>\n<h4>2.4 Short depthwise causal conv&#xff1a;\u8ba9\u201c\u70b9\u67e5\u8868\u201d\u5e26\u4e00\u70b9\u5c40\u90e8\u878d\u5408\u80fd\u529b<\/h4>\n<p>\u5728 gated value \u5e8f\u5217 (\\\\tilde V) \u4e0a&#xff0c;\u8bba\u6587\u52a0\u4e86\u4e00\u4e2a\u77ed\u7684 depthwise causal Conv1D&#xff08;kernel&#061;4&#xff0c;dilation&#061;\u6700\u5927 N-gram \u9636&#xff09;&#xff0c;SiLU \u6fc0\u6d3b&#xff0c;\u5e76\u6b8b\u5dee\u76f8\u52a0&#xff1a;<br \/>\n(Y &#061; \\\\text{SiLU}(\\\\text{Conv1D}(\\\\text{RMSNorm}(\\\\tilde V))) &#043; \\\\tilde V)\u3002<\/p>\n<p>\u8bba\u6587\u4e5f\u63d0\u5230&#xff1a;\u5377\u79ef\u7684\u6536\u76ca\u4e0d\u5982 multi-branch \u878d\u5408\u3001gating\u3001tokenizer compression \u90a3\u4e48\u5927&#xff08;\u6d88\u878d\u91cc\u5377\u79ef\u53bb\u6389\u201c\u53ea\u5c0f\u5e45\u9000\u5316\u201d&#xff09;\u3002<\/p>\n<hr \/>\n<h3>3. \u548c MoE \u600e\u4e48\u914d&#xff1a;\u8bba\u6587\u6700\u91cd\u8981\u7684\u7ed3\u8bba\u5176\u5b9e\u662f\u201c\u7a00\u758f\u9884\u7b97\u5206\u914d\u5f8b\u201d<\/h3>\n<p>\u8bba\u6587\u63d0\u51fa Sparsity Allocation&#xff1a;\u5728 iso-parameter &amp; iso-FLOPs \u7ea6\u675f\u4e0b&#xff08;\u603b\u53c2\u4e00\u6837\u3001\u6bcf token \u6fc0\u6d3b\u53c2\u4e00\u6837&#xff09;&#xff0c;\u628a\u201c\u975e\u6fc0\u6d3b\u53c2\u6570\u9884\u7b97\u201d (P_{\\\\text{sparse}}) \u5728 MoE experts \u548c Engram memory \u4e4b\u95f4\u600e\u4e48\u5206\u624d\u6700\u4f18\u3002<\/p>\n<p>\u4ed6\u4eec\u5b9a\u4e49\u5206\u914d\u6bd4 (\\\\rho)&#xff1a;<\/p>\n<ul>\n<li>(P_{\\\\text{MoE}}^{(\\\\text{sparse})}&#061;\\\\rho P_{\\\\text{sparse}})<\/li>\n<li>(P_{\\\\text{Engram}}&#061;(1-\\\\rho)P_{\\\\text{sparse}})<\/li>\n<\/ul>\n<p>\u7ed3\u679c&#xff1a;\u51fa\u73b0\u7a33\u5b9a\u7684 U \u578b\u66f2\u7ebf&#xff1a;<\/p>\n<ul>\n<li>\u7eaf MoE&#xff08;(\\\\rho&#061;1)&#xff09;\u5e76\u975e\u6700\u4f18&#xff1b;<\/li>\n<li>\u6700\u4f18\u901a\u5e38\u5728 (\\\\rho \\\\approx 75%-80%)&#xff0c;\u4e5f\u5c31\u662f\u628a\u5927\u7ea6 20%\u201325% \u7684\u7a00\u758f\u9884\u7b97\u632a\u7ed9 Engram \u66f4\u597d&#xff1b;\u4e14\u5728\u4e0d\u540c\u7b97\u529b\u9884\u7b97\u4e0b\u4f4d\u7f6e\u7a33\u5b9a\u3002<\/li>\n<\/ul>\n<p>\u5e76\u4e14\u4e00\u4e2a\u5f88\u201c\u53cd\u76f4\u89c9\u201d\u7684\u70b9&#xff1a;\u5373\u4f7f MoE \u4e13\u5bb6\u6570\u88ab\u780d\u5230\u53ea\u5269 (\\\\rho\\\\approx 40%)&#xff0c;\u6027\u80fd\u4ecd\u80fd\u63a5\u8fd1\u7eaf MoE baseline\u3002<\/p>\n<p>\u8fd9\u6bd4\u201cEngram&#061;\u52a0\u901f\u77e5\u8bc6\u95ee\u7b54\u201d\u66f4\u5173\u952e&#xff1a;\u5b83\u8bf4\u660e Engram \u4e0d\u662f\u5c0f\u4fee\u5c0f\u8865&#xff0c;\u800c\u662f\u4e00\u79cd\u53ef\u4e0e MoE \u7ade\u4e89\u7a00\u758f\u9884\u7b97\u7684\u7b2c\u4e00\u7c7b\u5efa\u6a21\u539f\u8bed\u3002<\/p>\n<hr \/>\n<h3>4. \u8bba\u6587\u5b9e\u8bc1&#xff1a;\u4e3a\u4ec0\u4e48\u4e0d\u4ec5\u63d0\u5347\u77e5\u8bc6\u9898&#xff0c;\u8fd8\u63d0\u5347\u63a8\u7406\/\u4ee3\u7801\/\u6570\u5b66\/\u957f\u4e0a\u4e0b\u6587<\/h3>\n<h4>4.1 27B \u4e3b\u7ed3\u679c&#xff1a;\u4e0d\u4ec5\u77e5\u8bc6\u4efb\u52a1\u6da8&#xff0c;\u63a8\u7406\/\u4ee3\u7801\/\u6570\u5b66\u6da8\u5f97\u66f4\u201c\u610f\u5916\u201d<\/h4>\n<p>\u6458\u8981\u4e0e\u6b63\u6587\u90fd\u5f3a\u8c03&#xff1a;Engram-27B \u76f8\u5bf9 iso-parameter &amp; iso-FLOPs \u7684 MoE-27B&#xff0c;\u63d0\u5347\u4e0d\u4ec5\u5728 MMLU\/CMMLU \u7b49\u77e5\u8bc6\u96c6&#xff0c;\u8fd8\u5728 BBH\u3001ARC\u3001DROP\u3001HumanEval\u3001GSM8K\u3001MATH \u7b49\u3002<\/p>\n<p>\u8bba\u6587\u7684\u89e3\u91ca\u8def\u5f84\u662f\u4e24\u6761&#xff1a;<\/p>\n<li>\u65e9\u671f\u5c42\u51cf\u8d1f&#xff1a;Engram \u8ba9 backbone \u4e0d\u5fc5\u5728\u65e9\u671f\u5c42\u201c\u91cd\u5efa\u9759\u6001\u77e5\u8bc6\u201d&#xff0c;\u7b49\u6548\u201c\u52a0\u6df1\u4e86\u7528\u4e8e\u590d\u6742\u63a8\u7406\u7684\u6709\u6548\u6df1\u5ea6\u201d\u3002<\/li>\n<li>\u91ca\u653e attention \u5bb9\u91cf&#xff1a;\u628a\u5c40\u90e8\u4f9d\u8d56\u4ea4\u7ed9\u67e5\u8868&#xff0c;\u4f7f attention \u66f4\u805a\u7126\u5168\u5c40\u4e0a\u4e0b\u6587&#xff0c;\u56e0\u6b64\u957f\u4e0a\u4e0b\u6587\u68c0\u7d22\/\u63a8\u7406\u66f4\u5f3a\u3002<\/li>\n<h4>4.2 \u957f\u4e0a\u4e0b\u6587\u7ed3\u679c&#xff08;LongPPL \/ RULER&#xff09;&#xff1a;\u8bc1\u660e\u201cattention \u88ab\u89e3\u653e\u201d<\/h4>\n<p>\u5728 LongPPL \u4e0e RULER&#xff08;32k&#xff09;\u4e0a&#xff0c;Engram-27B \u5728\u591a\u4e2a\u68c0\u7d22\/\u8ddf\u8e2a\u4efb\u52a1\u663e\u8457\u5f3a\u4e8e MoE-27B&#xff0c;\u4f8b\u5982 Multi-Query NIAH\u3001Variable Tracking \u7b49&#xff08;\u8bba\u6587\u7ed9\u4e86\u4f8b\u5b50\u5bf9\u6bd4&#xff09;\u3002<\/p>\n<p>\u8fd9\u90e8\u5206\u5bf9\u4f60\u540e\u9762\u201c\u63a8\u7406\u9762\u5efa\u8bbe &#043; KV cache \u7cfb\u7edf\u201d\u6700\u76f8\u5173&#xff1a;Engram \u7684\u6536\u76ca\u4e0d\u53ea\u6765\u81ea\u201c\u7b97\u5f97\u66f4\u5c11\u201d&#xff0c;\u8fd8\u6765\u81ea\u201c\u6ce8\u610f\u529b\u66f4\u503c\u94b1\u201d\u3002<\/p>\n<hr \/>\n<h3>5. \u673a\u5236\u4e0e\u7cfb\u7edf&#xff1a;\u8fd9\u7bc7\u8bba\u6587\u975e\u5e38\u5de5\u7a0b\u5316\u7684\u4e24\u70b9&#xff08;\u4f60\u89e3\u8bfb\u91cc\u503c\u5f97\u5f3a\u5316&#xff09;<\/h3>\n<h4>5.1 \u201c\u786e\u5b9a\u6027\u5bfb\u5740\u201d\u5e26\u6765\u7684\u7cfb\u7edf\u7ea2\u5229&#xff1a;\u53ef\u9884\u53d6\u3001\u53ef\u5206\u5c42\u7f13\u5b58<\/h4>\n<p>Engram \u7684\u7d22\u5f15\u53ea\u4f9d\u8d56\u8f93\u5165 token \u5e8f\u5217&#xff0c;\u800c\u4e0d\u50cf MoE \u4f9d\u8d56\u8fd0\u884c\u65f6 hidden states \u8def\u7531&#xff0c;\u56e0\u6b64\u53ef\u4ee5\u505a prefetch-and-overlap&#xff1a;\u63d0\u524d\u77e5\u9053\u8981\u53d6\u54ea\u4e9b embedding&#xff0c;\u4ece host memory&#xff08;PCIe&#xff09;\u5f02\u6b65\u62c9\u53d6&#xff0c;\u5e76\u7528\u524d\u9762\u82e5\u5e72\u5c42\u8ba1\u7b97\u5f53\u201c\u906e\u853d\u7a97\u53e3\u201d&#xff0c;\u907f\u514d GPU stall&#xff1b;\u540c\u65f6 Engram \u653e\u5728\u54ea\u4e9b\u5c42\u8981\u540c\u65f6\u6ee1\u8db3\u201c\u5efa\u6a21\u6536\u76ca&#xff08;\u66f4\u65e9\u66f4\u597d&#xff09;\u201d\u4e0e\u201c\u7cfb\u7edf\u906e\u853d&#xff08;\u66f4\u6df1\u66f4\u597d&#xff09;\u201d\u7684\u6298\u4e2d\u3002<\/p>\n<p>\u6b64\u5916 N-gram \u8bbf\u95ee\u7b26\u5408 Zipf \u5206\u5e03&#xff0c;\u53ef\u505a \u591a\u7ea7\u7f13\u5b58\u5c42\u6b21&#xff1a;\u70ed embedding \u653e HBM\/DRAM&#xff0c;\u957f\u5c3e\u653e NVMe&#xff0c;\u5b9e\u73b0\u201c\u51e0\u4e4e\u4e0d\u589e\u52a0\u6709\u6548\u5ef6\u8fdf\u7684\u8d85\u5927\u8bb0\u5fc6\u201d\u3002<\/p>\n<h4>5.2 CPU offload \u7684\u5b9e\u6d4b\u5f00\u9500&#xff1a;\u2764\ufe0f% \u7684\u541e\u5410\u635f\u5931&#xff08;H800 \u5b9e\u9a8c&#xff09;<\/h4>\n<p>\u8bba\u6587\u5728 H800 \u4e0a\u505a\u4e86\u201c&#043;100B Engram&#xff08;CPU offload&#xff09;\u201d\u7684\u541e\u5410\u6d4b\u8bd5&#xff1a;<\/p>\n<ul>\n<li>4B dense&#xff1a;9031 tok\/s \u2192 8858 tok\/s<\/li>\n<li>8B dense&#xff1a;6315 tok\/s \u2192 6140 tok\/s<br \/>\n\u541e\u5410\u635f\u5931\u975e\u5e38\u5c0f&#xff1b;\u8bba\u6587\u4e5f\u5f3a\u8c03\u201c\u66f4\u4f18\u5316\u7684 locality-aware \u5b9e\u73b0\u4f1a\u66f4\u63a5\u8fd1\u96f6\u635f\u8017\u201d\u3002<br \/>\n\u6458\u8981\u91cc\u4e5f\u7ed9\u51fa\u201coffload 100B table overhead \u2764\ufe0f%\u201d\u7684\u7ed3\u8bba\u3002<\/li>\n<\/ul>\n<p>\u8fd9\u70b9\u80fd\u628a\u4f60\u7684\u201cMoonCake \u8fd9\u7c7b KV cache \u7cfb\u7edf\u201d\u601d\u8def\u4ece\u201c\u731c\u60f3\u201d\u62c9\u5230\u201c\u8bba\u6587\u8bc1\u636e\u94fe\u201d&#xff1a;\u5927\u8868\u4e0d\u4e00\u5b9a\u8981\u9a7b\u7559 HBM&#xff0c;\u53ea\u8981\u5bfb\u5740\u786e\u5b9a\u6027 &#043; \u9884\u53d6\u906e\u853d\u505a\u5bf9\u3002<\/p>\n<hr \/>\n<h3>6. \u628a\u4f60\u7684\u89e3\u8bfb\u5347\u7ea7\u6210\u201c\u66f4\u4e25\u8c28\u3001\u53ef\u843d\u5730\u201d\u7684\u63a8\u7406\u7cfb\u7edf\u6539\u9020\u601d\u8def&#xff08;\u7c7b MoonCake&#xff09;<\/h3>\n<p>\u4f60\u539f\u6587\u7684\u201c\u5927\u65b9\u5411\u201d\u662f&#xff1a;\u9759\u6001\u77e5\u8bc6\/\u56fa\u5b9a\u642d\u914d\u4e0d\u8981\u5360 KV cache&#xff0c;\u4e0d\u8981\u9760 attention \u91cd\u5efa\u3002\u6211\u5efa\u8bae\u628a\u65b9\u6848\u66f4\u660e\u786e\u5730\u62c6\u6210\u4e09\u5c42&#xff08;\u6a21\u578b\u3001\u8fd0\u884c\u65f6\u3001\u6570\u636e\/\u8bad\u7ec3&#xff09;&#xff1a;<\/p>\n<h4>6.1 \u6a21\u578b\u4fa7&#xff1a;\u4e0d\u8981\u628a Engram \u5f53\u6210\u201c\u53e6\u4e00\u4e2a attention\u201d<\/h4>\n<p>\u8bba\u6587\u7684 Engram \u878d\u5408\u65b9\u5f0f\u662f\u201c\u68c0\u7d22 \u2192 gating \u2192 \u77ed\u5377\u79ef \u2192 residual\u201d&#xff0c;\u7136\u540e\u518d\u8d70 attention&#043;MoE\u3002<br \/>\n\u6240\u4ee5\u5982\u679c\u4f60\u5728\u63a8\u7406\u5f15\u64ce\u91cc\u96c6\u6210&#xff0c;\u5efa\u8bae\u9075\u5faa\u8bba\u6587\u610f\u56fe&#xff1a;<\/p>\n<ul>\n<li>Engram \u662f memory primitive&#xff0c;\u4e3b\u8981\u8986\u76d6\u5c40\u90e8\u9759\u6001\u6a21\u5f0f&#xff08;2\/3-gram\u3001\u5b9e\u4f53\u3001\u4e60\u8bed\u3001\u516c\u5f0f\u5316\u77ed\u8bed\u7b49&#xff09;&#xff0c;\u8bba\u6587\u7684 gating \u53ef\u89c6\u5316\u4e5f\u8bc1\u660e\u786e\u5b9e\u4f1a\u5728\u8fd9\u4e9b\u4f4d\u7f6e\u5f3a\u6fc0\u6d3b\u3002<\/li>\n<li>attention \u4ecd\u8d1f\u8d23\u5168\u5c40\u4f9d\u8d56\u3001\u957f\u7a0b\u63a8\u7406\u3002<\/li>\n<\/ul>\n<p>**\u5de5\u7a0b\u542b\u4e49&#xff1a;**\u4f60\u4e0d\u8be5\u7528\u4e00\u4e2a\u5168\u5c40\u53ef\u5b66\u4e60 \u03b1 \u628a Engram \u4e0e attention \u7ebf\u6027\u6df7\u5408&#xff08;\u90a3\u4f1a\u6a21\u7cca\u201c\u67e5\u8868 vs \u63a8\u7406\u201d\u7684\u8fb9\u754c&#xff09;&#xff1b;\u66f4\u5e94\u628a\u5b83\u4f5c\u4e3a\u4e00\u4e2a\u72ec\u7acb residual \u5206\u652f&#xff0c;\u4fdd\u6301\u201c\u80fd\u88ab gate \u5173\u6389\u201d\u3002&#xff08;\u8fd9\u662f\u8bba\u6587\u7684\u7ed3\u6784\u3002&#xff09;<\/p>\n<h4>6.2 \u8fd0\u884c\u65f6\u4fa7&#xff1a;\u628a\u201c\u786e\u5b9a\u6027\u7d22\u5f15\u201d\u505a\u6210\u4e00\u7b49\u516c\u6c11&#xff08;\u9884\u53d6\u3001\u6279\u5904\u7406\u3001\u7f13\u5b58\u5206\u5c42&#xff09;<\/h4>\n<p>\u7ed3\u5408\u8bba\u6587 Section 2.5&#xff0c;\u4f60\u53ef\u4ee5\u628a\u63a8\u7406 runtime \u505a\u6210\u4e0b\u9762\u8fd9\u6837&#xff1a;<\/p>\n<p>(A) \u9884\u53d6\u4e0e\u8ba1\u7b97\u91cd\u53e0&#xff08;prefetch-and-overlap&#xff09;<\/p>\n<ul>\n<li>\u5728\u8fdb\u5165\u67d0\u4e2a Engram layer \u4e4b\u524d&#xff0c;\u57fa\u4e8e input token \u5e8f\u5217\u9884\u5148\u7b97\u597d hash IDs&#xff08;\u8fd9\u90e8\u5206\u7eaf CPU \u4e5f\u884c&#xff09;\u3002<\/li>\n<li>\u5f02\u6b65\u4ece host \u62c9\u53d6 embedding rows&#xff1b;\u7528\u524d\u5e8f Transformer blocks \u7684\u8ba1\u7b97\u5f53\u201c\u906e\u853d\u7a97\u53e3\u201d\u3002<\/li>\n<\/ul>\n<p>(B) \u591a\u7ea7\u7f13\u5b58&#xff08;HBM\/DRAM\/NVMe&#xff09;<\/p>\n<ul>\n<li>\u5229\u7528 Zipf \u5206\u5e03&#xff0c;\u628a\u70ed key \u5e38\u9a7b HBM&#xff08;\u6216 pinned DRAM&#xff09;&#xff0c;\u957f\u5c3e\u843d NVMe&#xff1b;\u4f60\u7684 scheduler \u76ee\u6807\u4e0d\u662f\u201c\u6240\u6709\u90fd\u5feb\u201d&#xff0c;\u800c\u662f\u201c\u70ed\u8def\u5f84\u6781\u5feb\u3001\u957f\u5c3e\u53ef\u63a5\u53d7\u201d\u3002<\/li>\n<\/ul>\n<p>\u00a9 \u6279\u5185\u53bb\u91cd\u4e0e\u901a\u4fe1\u538b\u7f29<\/p>\n<ul>\n<li>\u540c\u4e00 batch \u5185 N-gram hash \u5f88\u591a\u4f1a\u91cd\u590d&#xff08;\u5c24\u5176\u70ed\u6a21\u5f0f&#xff09;&#xff0c;\u53ef\u4ee5\u505a\u201cunique IDs \u2192 gather \u2192 scatter\u201d&#xff0c;\u51cf\u5c11 PCIe\/\u7f51\u7edc\u642c\u8fd0\u3002<\/li>\n<\/ul>\n<p>\u8fd9\u4e9b\u662f\u6bd4\u201c\u51cf\u5c11 KV cache \u4f20\u8f93\u201d\u66f4\u76f4\u63a5\u7684\u7cfb\u7edf\u6293\u624b&#xff1a;KV cache \u662f per-request\u3001per-step \u589e\u957f\u7684&#xff1b;Engram \u662f\u201c\u53ef\u7f13\u5b58\u3001\u53ef\u590d\u7528\u3001\u786e\u5b9a\u6027\u7d22\u5f15\u201d\u7684\u3002<\/p>\n<h4>6.3 \u6570\u636e\/\u8bad\u7ec3\u4fa7&#xff1a;\u522b\u5ffd\u89c6 tokenizer compression \u4e0e\u65e9\u671f\u5c42\u63d2\u5165<\/h4>\n<p>\u8bba\u6587\u6d88\u878d\u8868\u660e&#xff1a;tokenizer compression\u3001context-aware gating\u3001\u591a\u5206\u652f\u9002\u914d\u662f\u6700\u5173\u952e\u7684\u51e0\u4e2a\u7ec4\u4ef6\u3002<br \/>\n\u53e6\u5916 Engram \u63d2\u5165\u5c42\u4f4d\u8981\u6298\u4e2d\u201c\u5efa\u6a21\u504f\u597d\u65e9\u63d2\u201d\u4e0e\u201c\u7cfb\u7edf\u504f\u597d\u6df1\u63d2&#xff08;\u906e\u853d\u7a97\u53e3&#xff09;\u201d\u3002<\/p>\n<p>\u5de5\u7a0b\u5efa\u8bae&#xff08;\u66f4\u50cf\u8bba\u6587\u7684\u53e3\u5f84&#xff09;&#xff1a;<\/p>\n<ul>\n<li>\u5148\u5728\u5c11\u91cf\u5c42\u63d2&#xff08;\u8bba\u6587\u5f3a\u8c03\u4e0d\u662f\u6bcf\u5c42\u90fd\u7528&#xff09;\u3002<\/li>\n<li>\u5c42\u4f4d\u9009\u62e9\u8981\u8ba9 runtime \u80fd\u7a33\u5b9a\u906e\u853d PCIe \u5ef6\u8fdf&#xff08;\u4f8b\u5982\u628a\u7b2c\u4e00\u4e2a Engram \u653e\u5728\u201c\u524d\u9762\u5df2\u6709\u8db3\u591f\u7b97\u529b\u7a97\u53e3\u201d\u7684\u4f4d\u7f6e&#xff0c;\u800c\u4e0d\u662f\u7b2c 1 \u5c42\u5c31\u786c\u53d6 host \u6570\u636e&#xff09;\u3002<\/li>\n<\/ul>\n<hr \/>\n<h3>7. \u4f60\u89e3\u8bfb\u91cc\u54ea\u4e9b\u70b9\u9700\u8981\u201c\u964d\u7ea7\u4e3a\u63a8\u6d4b\/\u5f85\u9a8c\u8bc1\u201d<\/h3>\n<p>\u4e3a\u907f\u514d\u4ee5\u540e\u4f60\u5199\u5bf9\u5916\u6587\u7ae0\u6216\u505a\u65b9\u6848\u8bc4\u5ba1\u88ab\u6293\u4f4f\u6f0f\u6d1e&#xff0c;\u4e0b\u9762\u8fd9\u4e9b\u4f60\u539f\u6587\u91cc\u7684\u8868\u8ff0\u5efa\u8bae\u6539\u5199\u4e3a\u201c\u63a8\u6d4b\/\u793a\u4f8b\u201d&#xff0c;\u4e0d\u8981\u5f53\u6210\u8bba\u6587\u7ed3\u8bba&#xff1a;<\/p>\n<li>\u5177\u4f53\u4efb\u52a1\u7684\u5ef6\u8fdf\u6570\u5b57&#xff08;\u5982 MedQA 480ms\u2192310ms&#xff09;&#xff1a;\u8bba\u6587\u6458\u8981\/\u6b63\u6587\u7ed9\u7684\u662f benchmark \u63d0\u5347\u4e0e\u7cfb\u7edf\u541e\u5410\/\u5f00\u9500\u7ed3\u8bba&#xff0c;\u5e76\u6ca1\u6709\u4f60\u5217\u7684\u90a3\u4e9b\u7aef\u5230\u7aef\u5ef6\u8fdf\u3002\u4f60\u53ef\u7528\u8bba\u6587\u7684\u201cCPU offload \u2764\ufe0f% overhead\u201d\u4f5c\u4e3a\u53ef\u5f15\u7528\u8bc1\u636e\u3002<\/li>\n<li>\u7279\u6b8a gate \u975e\u7ebf\u6027&#xff08;sign*sqrt&#xff09;&#xff1a;\u8bba\u6587 gate \u662f RMSNorm \u540e dot &#043; sigmoid\u3002<\/li>\n<li>\u201cEngram \u76f4\u63a5\u51cf\u5c11 KV cache \u9700\u6c42 70%\u201d&#xff1a;\u8bba\u6587\u6ca1\u6709\u7ed9\u8fd9\u4e2a\u767e\u5206\u6bd4&#xff1b;\u66f4\u4e25\u8c28\u8bf4\u6cd5\u5e94\u662f\u201c\u628a\u5c40\u90e8\u4f9d\u8d56\/\u9759\u6001\u6a21\u5f0f\u4ece attention \u4e2d\u5265\u79bb&#xff0c;\u95f4\u63a5\u91ca\u653e attention capacity&#xff0c;\u5e76\u4e14 memory \u53ef host-offload\u201d\u3002<\/li>\n<hr \/>\n","protected":false},"excerpt":{"rendered":"<p>0. \u8fd9\u7bc7\u8bba\u6587\u5230\u5e95\u60f3\u89e3\u51b3\u4ec0\u4e48<br \/>\n\u6838\u5fc3\u77db\u76fe&#xff1a;MoE \u901a\u8fc7\u201c\u6761\u4ef6\u8ba1\u7b97\u201d\u628a\u53c2\u6570\u89c4\u6a21\u505a\u5927\u4f46\u4e0d\u6309\u6bd4\u4f8b\u589e\u52a0 FLOPs&#xff1b;\u7136\u800c Transformer \u6ca1\u6709\u539f\u751f\u201c\u77e5\u8bc6\u67e5\u8868\u201d\u539f\u8bed&#xff0c;\u5bfc\u81f4\u5f88\u591a\u201c\u672c\u8d28\u4e0a\u662f\u9759\u6001\u77e5\u8bc6\/\u56fa\u5b9a\u642d\u914d\/\u5c40\u90e8\u4f9d\u8d56\u201d\u7684\u4e1c\u897f&#xff0c;\u4e5f\u88ab\u8feb\u7528\u6ce8\u610f\u529bMLP \u5728\u591a\u5c42\u91cc\u201c\u7b97\u51fa\u6765\u201d&#xff0c;\u6d6a\u8d39\u6df1\u5ea6\u4e0e\u7b97\u529b\u3002\u8bba\u6587\u628a\u8bed\u8a00\u5efa\u6a21\u62c6\u6210\u4e24\u7c7b\u5f02\u8d28\u5b50\u4efb\u52a1&#xff1a;<br \/>\n\u7ec4\u5408\u63a8\u7406&#xff08;compositional reasoning&amp;#xff09<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[6336,74,4119,50,224,51],"topic":[],"class_list":["post-60102","post","type-post","status-publish","format-standard","hentry","category-server","tag-engram","tag-agent","tag-rag","tag-50","tag-224","tag-51"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.wsisp.com\/helps\/60102.html\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"og:description\" content=\"0. \u8fd9\u7bc7\u8bba\u6587\u5230\u5e95\u60f3\u89e3\u51b3\u4ec0\u4e48 \u6838\u5fc3\u77db\u76fe&#xff1a;MoE \u901a\u8fc7\u201c\u6761\u4ef6\u8ba1\u7b97\u201d\u628a\u53c2\u6570\u89c4\u6a21\u505a\u5927\u4f46\u4e0d\u6309\u6bd4\u4f8b\u589e\u52a0 FLOPs&#xff1b;\u7136\u800c Transformer \u6ca1\u6709\u539f\u751f\u201c\u77e5\u8bc6\u67e5\u8868\u201d\u539f\u8bed&#xff0c;\u5bfc\u81f4\u5f88\u591a\u201c\u672c\u8d28\u4e0a\u662f\u9759\u6001\u77e5\u8bc6\/\u56fa\u5b9a\u642d\u914d\/\u5c40\u90e8\u4f9d\u8d56\u201d\u7684\u4e1c\u897f&#xff0c;\u4e5f\u88ab\u8feb\u7528\u6ce8\u610f\u529bMLP \u5728\u591a\u5c42\u91cc\u201c\u7b97\u51fa\u6765\u201d&#xff0c;\u6d6a\u8d39\u6df1\u5ea6\u4e0e\u7b97\u529b\u3002\u8bba\u6587\u628a\u8bed\u8a00\u5efa\u6a21\u62c6\u6210\u4e24\u7c7b\u5f02\u8d28\u5b50\u4efb\u52a1&#xff1a; \u7ec4\u5408\u63a8\u7406&#xff08;compositional reasoning&amp;#xff09\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.wsisp.com\/helps\/60102.html\" \/>\n<meta property=\"og:site_name\" content=\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"article:published_time\" content=\"2026-01-14T14:50:19+00:00\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/60102.html\",\"url\":\"https:\/\/www.wsisp.com\/helps\/60102.html\",\"name\":\"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"isPartOf\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\"},\"datePublished\":\"2026-01-14T14:50:19+00:00\",\"dateModified\":\"2026-01-14T14:50:19+00:00\",\"author\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/60102.html#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.wsisp.com\/helps\/60102.html\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/60102.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.wsisp.com\/helps\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\",\"url\":\"https:\/\/www.wsisp.com\/helps\/\",\"name\":\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"description\":\"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"contentUrl\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"caption\":\"admin\"},\"sameAs\":[\"http:\/\/wp.wsisp.com\"],\"url\":\"https:\/\/www.wsisp.com\/helps\/author\/admin\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.wsisp.com\/helps\/60102.html","og_locale":"zh_CN","og_type":"article","og_title":"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","og_description":"0. \u8fd9\u7bc7\u8bba\u6587\u5230\u5e95\u60f3\u89e3\u51b3\u4ec0\u4e48 \u6838\u5fc3\u77db\u76fe&#xff1a;MoE \u901a\u8fc7\u201c\u6761\u4ef6\u8ba1\u7b97\u201d\u628a\u53c2\u6570\u89c4\u6a21\u505a\u5927\u4f46\u4e0d\u6309\u6bd4\u4f8b\u589e\u52a0 FLOPs&#xff1b;\u7136\u800c Transformer \u6ca1\u6709\u539f\u751f\u201c\u77e5\u8bc6\u67e5\u8868\u201d\u539f\u8bed&#xff0c;\u5bfc\u81f4\u5f88\u591a\u201c\u672c\u8d28\u4e0a\u662f\u9759\u6001\u77e5\u8bc6\/\u56fa\u5b9a\u642d\u914d\/\u5c40\u90e8\u4f9d\u8d56\u201d\u7684\u4e1c\u897f&#xff0c;\u4e5f\u88ab\u8feb\u7528\u6ce8\u610f\u529bMLP \u5728\u591a\u5c42\u91cc\u201c\u7b97\u51fa\u6765\u201d&#xff0c;\u6d6a\u8d39\u6df1\u5ea6\u4e0e\u7b97\u529b\u3002\u8bba\u6587\u628a\u8bed\u8a00\u5efa\u6a21\u62c6\u6210\u4e24\u7c7b\u5f02\u8d28\u5b50\u4efb\u52a1&#xff1a; \u7ec4\u5408\u63a8\u7406&#xff08;compositional reasoning&amp;#xff09","og_url":"https:\/\/www.wsisp.com\/helps\/60102.html","og_site_name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","article_published_time":"2026-01-14T14:50:19+00:00","author":"admin","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"4 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.wsisp.com\/helps\/60102.html","url":"https:\/\/www.wsisp.com\/helps\/60102.html","name":"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01 - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","isPartOf":{"@id":"https:\/\/www.wsisp.com\/helps\/#website"},"datePublished":"2026-01-14T14:50:19+00:00","dateModified":"2026-01-14T14:50:19+00:00","author":{"@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41"},"breadcrumb":{"@id":"https:\/\/www.wsisp.com\/helps\/60102.html#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.wsisp.com\/helps\/60102.html"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.wsisp.com\/helps\/60102.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.wsisp.com\/helps"},{"@type":"ListItem","position":2,"name":"Engram\uff01\u300aConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models\u300b\u89e3\u8bfb\uff01"}]},{"@type":"WebSite","@id":"https:\/\/www.wsisp.com\/helps\/#website","url":"https:\/\/www.wsisp.com\/helps\/","name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","description":"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41","name":"admin","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/","url":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","contentUrl":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","caption":"admin"},"sameAs":["http:\/\/wp.wsisp.com"],"url":"https:\/\/www.wsisp.com\/helps\/author\/admin"}]}},"_links":{"self":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/60102","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/comments?post=60102"}],"version-history":[{"count":0,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/60102\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media?parent=60102"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/categories?post=60102"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/tags?post=60102"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/topic?post=60102"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}