{"id":77558,"date":"2026-02-24T22:57:53","date_gmt":"2026-02-24T14:57:53","guid":{"rendered":"https:\/\/www.wsisp.com\/helps\/77558.html"},"modified":"2026-02-24T22:57:53","modified_gmt":"2026-02-24T14:57:53","slug":"%e8%ae%a1%e7%ae%97%e6%9c%ba%e8%a7%86%e8%a7%89cv%e9%a2%86%e5%9f%9f-swin-transformer","status":"publish","type":"post","link":"https:\/\/www.wsisp.com\/helps\/77558.html","title":{"rendered":"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer"},"content":{"rendered":"<h4>\u4e00\u3001Swin Transformer\u6838\u5fc3\u6982\u5ff5<\/h4>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Swin Transformer&#xff08;Shifted Window Transformer&#xff09;\u662f\u4e13\u4e3a\u89c6\u89c9\u4efb\u52a1\u8bbe\u8ba1\u7684 Transformer \u53d8\u4f53&#xff0c;\u89e3\u51b3\u4e86\u539f\u59cb Transformer \u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u65f6\u8ba1\u7b97\u91cf\u7206\u70b8\u7684\u95ee\u9898&#xff0c;\u6838\u5fc3\u521b\u65b0\u662f\u5206\u5c42\u7ed3\u6784\u548c\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u673a\u5236\u3002<\/p>\n<p>\u6838\u5fc3\u6982\u5ff5&#xff1a;<\/p>\n<ul>\n<li>\u5206\u5c42\u7279\u5f81\u63d0\u53d6&#xff1a;\u6a21\u4eff CNN \u7684\u5c42\u7ea7\u7ed3\u6784&#xff0c;\u901a\u8fc7 Patch Merging \u9010\u6b65\u7f29\u5c0f\u7279\u5f81\u56fe\u5c3a\u5bf8\u3001\u63d0\u5347\u901a\u9053\u6570&#xff0c;\u9002\u914d\u4e0d\u540c\u5c3a\u5ea6\u7684\u89c6\u89c9\u7279\u5f81\u3002<\/li>\n<li>\u7a97\u53e3\u6ce8\u610f\u529b&#xff08;Window Attention&#xff09;&#xff1a;\u5c06\u7279\u5f81\u56fe\u5212\u5206\u4e3a\u4e0d\u91cd\u53e0\u7684\u7a97\u53e3&#xff0c;\u4ec5\u5728\u7a97\u53e3\u5185\u8ba1\u7b97\u81ea\u6ce8\u610f\u529b&#xff0c;\u628a\u590d\u6742\u5ea6\u4ece <img decoding=\"async\" alt=\"O\\\\left ( HW \\\\right )^{2}\" class=\"mathcode\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc70090d0.png\" \/>\u964d\u4f4e\u5230<img decoding=\"async\" alt=\"O\\\\left (\\\\left ( HW\/M^{2} \\\\right ) M^{2} \\\\right )&#061;O\\\\left ( HW \\\\right )\" class=\"mathcode\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc701da46.png\" \/>&#xff0c;\u5176\u4e2dM \u4e3a\u7a97\u53e3\u5927\u5c0f\u3002<\/li>\n<li>\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b&#xff08;Shifted Window Attention&#xff09;&#xff1a;\u89e3\u51b3\u7a97\u53e3\u95f4\u4fe1\u606f\u5b64\u7acb\u95ee\u9898&#xff0c;\u901a\u8fc7\u5faa\u73af\u79fb\u4f4d\u751f\u6210\u65b0\u7a97\u53e3&#xff0c;\u540c\u65f6\u7528\u300c\u63a9\u7801&#xff08;Mask&#xff09;\u300d\u907f\u514d\u65e0\u6548\u8ba1\u7b97&#xff0c;\u4fdd\u8bc1\u7a97\u53e3\u5185\u6ce8\u610f\u529b\u7684\u6b63\u786e\u6027\u3002<\/li>\n<\/ul>\n<h4>\u4e8c\u3001Swin Transformer\u6570\u5b66\u516c\u5f0f<\/h4>\n<h5>&#xff08;1&#xff09;Patch Partition&#xff08;\u5206\u5757&#xff09;<\/h5>\n<p>\u5c06\u539f\u59cb\u56fe\u50cf&#xff08;H\u00d7W\u00d73&#xff09;\u5212\u5206\u4e3a\u5927\u5c0f\u4e3a 4\u00d74 \u7684\u4e0d\u91cd\u53e0 Patch&#xff0c;\u6bcf\u4e2a Patch \u5c55\u5e73\u4e3a\u4e00\u7ef4\u5411\u91cf&#xff1a;<\/p>\n<p><img decoding=\"async\" alt=\"Patch(i,j)&#061;Image\\\\left [ 4i:4i&#043;4,4j:4j&#043;4,: \\\\right ]\\\\rightarrow R^{4\\\\times 4\\\\times 3}\\\\rightarrow R ^{48}\" class=\"mathcode\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc702d044.png\" \/><\/p>\n<p>\u6700\u7ec8\u5f97\u5230 (H\u200b\/4\u00d7W\/4\u200b)\u00d748 \u7684\u7279\u5f81\u56fe&#xff0c;\u8bb0\u4e3a<img decoding=\"async\" alt=\"H_{0},W_{0},C_{0}\" class=\"mathcode\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc7040f4c.png\" \/>\u3002<\/p>\n<h5>&#xff08;2&#xff09;Window Attention \u8ba1\u7b97<\/h5>\n<p>\u5728\u6bcf\u4e2a\u7a97\u53e3\u5185\u8ba1\u7b97\u81ea\u6ce8\u610f\u529b&#xff0c;\u6838\u5fc3\u516c\u5f0f\u4e0e\u6807\u51c6\u81ea\u6ce8\u610f\u529b\u4e00\u81f4&#xff0c;\u4f46\u4f5c\u7528\u57df\u9650\u5236\u5728\u7a97\u53e3\u5185&#xff1a;<\/p>\n<p><img decoding=\"async\" alt=\"Attention(Q,K,V)&#061;Softmax\\\\left ( \\\\frac{QK^{T}}{\\\\sqrt{d_{k}}}&#043;M \\\\right )V\" class=\"mathcode\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc704d32b.png\" \/><\/p>\n<ul>\n<li>Q,K,V&#xff1a;\u67e5\u8be2 \/ \u952e \/ \u503c\u77e9\u9635&#xff0c;\u7531\u8f93\u5165\u7279\u5f81\u7ebf\u6027\u53d8\u6362\u5f97\u5230&#xff0c;dk\u200b \u4e3a Q\/K \u7684\u7ef4\u5ea6&#xff1b;<\/li>\n<li>M&#xff1a;\u63a9\u7801\u77e9\u9635\u5728\u4ec5 Shifted Window \u65f6\u751f\u6548&#xff0c;\u7528\u4e8e\u5c4f\u853d\u79fb\u4f4d\u540e\u8de8\u539f\u59cb\u7a97\u53e3\u7684\u65e0\u6548\u6ce8\u610f\u529b\u8ba1\u7b97\u3002<\/li>\n<\/ul>\n<h5>&#xff08;3&#xff09;Shifted Window \u79fb\u4f4d\u64cd\u4f5c<\/h5>\n<p>\u8bbe\u7a97\u53e3\u5927\u5c0f\u4e3a M&#xff0c;\u7279\u5f81\u56fe\u5c3a\u5bf8\u4e3a H\u00d7W&#xff0c;\u79fb\u4f4d\u91cf\u4e3a \u230aM\/2\u230b&#xff0c;\u79fb\u4f4d\u540e\u5750\u6807\u53d8\u6362&#xff1a;<\/p>\n<p><img decoding=\"async\" alt=\"\\\\left ( x^{&#039;},y^{&#039;} \\\\right )&#061; \\\\left ( x-\\\\left \\\\lfloor M\/2 \\\\right \\\\rfloor mod H, y-\\\\left \\\\lfloor M\/2 \\\\right \\\\rfloor mod W\\\\right )\" class=\"mathcode\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc705d0cb.png\" \/>;<\/p>\n<h5>&#xff08;4&#xff09;Patch Merging&#xff08;\u5206\u5c42\u4e0b\u91c7\u6837&#xff09;<\/h5>\n<p>\u5c06 2\u00d72 \u76f8\u90bb Patch \u62fc\u63a5&#xff0c;\u901a\u9053\u6570\u7ffb\u500d&#xff0c;\u5c3a\u5bf8\u51cf\u534a&#xff1a;<\/p>\n<p><img decoding=\"async\" alt=\"Out\\\\left [ i,j,: \\\\right ]&#061;In\\\\left [ 2i:2i&#043;2,2j:2j&#043;2,: \\\\right ]\\\\rightarrow R^{4C}\\\\rightarrow R^{2C}\" class=\"mathcode\" src=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc706c71c.png\" \/><\/p>\n<p>\u6700\u7ec8\u7279\u5f81\u56fe\u5c3a\u5bf8\u53d8\u4e3a H\/2\u00d7W\/2\u00d72C\u3002<\/p>\n<h4>\u4e09\u3001\u5b9e\u4f8b\u4ee3\u7801\u89e3\u91ca<\/h4>\n<h5>\u6a21\u5757\u4e00&#xff1a;\u6838\u5fc3\u4ee3\u7801\u5bfc\u5165<\/h5>\n<p>import torch<br \/>\nimport torch.nn as nn<br \/>\nimport torch.nn.functional as F<br \/>\nimport numpy as np <\/p>\n<h5>\u6a21\u5757\u4e8c&#xff1a;\u7a97\u53e3\u6ce8\u610f\u529b<\/h5>\n<p>class WindowAttention(nn.Module):<br \/>\n    def __init__(self, dim, window_size, num_heads):<br \/>\n        &#034;&#034;&#034;<br \/>\n        \u7a97\u53e3\u6ce8\u610f\u529b\u6a21\u5757\u521d\u59cb\u5316<br \/>\n        Args:<br \/>\n            dim: \u8f93\u5165\u7279\u5f81\u7684\u901a\u9053\u6570&#xff08;\u5982 96\u3001192&#xff09;<br \/>\n            window_size: \u7a97\u53e3\u5927\u5c0f&#xff08;M&#xff09;&#xff0c;\u9ed8\u8ba47&#xff0c;\u4ee3\u88687\u00d77\u7684\u7a97\u53e3<br \/>\n            num_heads: \u591a\u5934\u6ce8\u610f\u529b\u7684\u5934\u6570&#xff0c;\u9700\u6ee1\u8db3 dim % num_heads &#061;&#061; 0<br \/>\n        &#034;&#034;&#034;<br \/>\n        super().__init__()<br \/>\n        # \u4fdd\u5b58\u57fa\u7840\u53c2\u6570<br \/>\n        self.dim &#061; dim                      # \u8f93\u5165\u901a\u9053\u6570<br \/>\n        self.window_size &#061; window_size      # \u7a97\u53e3\u5927\u5c0f M<br \/>\n        self.num_heads &#061; num_heads          # \u6ce8\u610f\u529b\u5934\u6570<br \/>\n        self.head_dim &#061; dim \/\/ num_heads    # \u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u7684\u7ef4\u5ea6<br \/>\n        self.scale &#061; self.head_dim ** -0.5  # \u7f29\u653e\u56e0\u5b50 1\/\u221ad_k&#xff0c;\u9632\u6b62\u5185\u79ef\u503c\u8fc7\u5927<\/p>\n<p>        # \u7ebf\u6027\u53d8\u6362\u5c42&#xff1a;\u5c06\u8f93\u5165\u7279\u5f81\u4e00\u6b21\u6027\u6620\u5c04\u4e3aQ\u3001K\u3001V&#xff08;\u6548\u7387\u9ad8\u4e8e3\u4e2a\u72ec\u7acb\u7ebf\u6027\u5c42&#xff09;<br \/>\n        # \u8f93\u5165dim \u2192 \u8f93\u51fa3*dim&#xff08;Q\/K\/V\u5404\u5360dim&#xff09;<br \/>\n        self.qkv &#061; nn.Linear(dim, dim * 3)<br \/>\n        # \u8f93\u51fa\u6295\u5f71\u5c42&#xff1a;\u5c06\u6ce8\u610f\u529b\u8ba1\u7b97\u540e\u7684\u7279\u5f81\u6620\u5c04\u56de\u539f\u7ef4\u5ea6<br \/>\n        self.proj &#061; nn.Linear(dim, dim)<\/p>\n<p>        # \u76f8\u5bf9\u4f4d\u7f6e\u504f\u7f6e\u8868&#xff1a;\u89e3\u51b3\u7edd\u5bf9\u4f4d\u7f6e\u7f16\u7801\u7684\u5c40\u9650\u6027&#xff0c;\u6355\u6349\u7a97\u53e3\u5185\u4f4d\u7f6e\u5173\u7cfb<br \/>\n        # \u5c3a\u5bf8&#xff1a;(2M-1)\u00d7(2M-1) \u00d7 num_heads \u2192 \u8986\u76d6\u7a97\u53e3\u5185\u6240\u6709\u53ef\u80fd\u7684\u76f8\u5bf9\u4f4d\u7f6e<br \/>\n        self.relative_position_bias_table &#061; nn.Parameter(<br \/>\n            torch.zeros((2 * window_size &#8211; 1) * (2 * window_size &#8211; 1), num_heads)<br \/>\n        )<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u9884\u8ba1\u7b97\u76f8\u5bf9\u4f4d\u7f6e\u7d22\u5f15 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        # \u751f\u6210\u7a97\u53e3\u5185\u7684\u5750\u6807\u7f51\u683c&#xff1a;[0,1,&#8230;,M-1]<br \/>\n        coords &#061; torch.arange(self.window_size)<br \/>\n        # \u751f\u62102\u00d7M\u00d7M\u7684\u5750\u6807\u77e9\u9635&#xff1a;coords_grid[0]\u662f\u884c\u5750\u6807&#xff0c;coords_grid[1]\u662f\u5217\u5750\u6807<br \/>\n        coords_grid &#061; torch.stack(torch.meshgrid([coords, coords], indexing&#061;&#034;ij&#034;))<br \/>\n        # \u5c55\u5e73\u5750\u6807&#xff1a;2 \u00d7 M\u00b2&#xff08;\u628aM\u00d7M\u7684\u5750\u6807\u62c9\u6210\u4e00\u7ef4&#xff09;<br \/>\n        coords_flatten &#061; torch.flatten(coords_grid, 1)<br \/>\n        # \u8ba1\u7b97\u6240\u6709\u4f4d\u7f6e\u5bf9\u7684\u76f8\u5bf9\u5750\u6807&#xff1a;(2, M\u00b2, M\u00b2) \u2192 \u6bcf\u4e2a\u4f4d\u7f6e\u76f8\u5bf9\u4e8e\u5176\u4ed6\u4f4d\u7f6e\u7684\u504f\u79fb<br \/>\n        relative_coords &#061; coords_flatten[:, :, None] &#8211; coords_flatten[:, None, :]<br \/>\n        # \u7ef4\u5ea6\u91cd\u6392&#xff1a;(M\u00b2, M\u00b2, 2) \u2192 [\u4f4d\u7f6ei, \u4f4d\u7f6ej, \u884c\/\u5217\u504f\u79fb]<br \/>\n        relative_coords &#061; relative_coords.permute(1, 2, 0).contiguous()<br \/>\n        # \u5c06\u76f8\u5bf9\u5750\u6807\u4ece[-M&#043;1, M-1]\u6620\u5c04\u5230[0, 2M-2]&#xff08;\u907f\u514d\u8d1f\u6570\u7d22\u5f15&#xff09;<br \/>\n        relative_coords[:, :, 0] &#043;&#061; self.window_size &#8211; 1  # \u884c\u504f\u79fb\u4fee\u6b63<br \/>\n        relative_coords[:, :, 1] &#043;&#061; self.window_size &#8211; 1  # \u5217\u504f\u79fb\u4fee\u6b63<br \/>\n        # \u884c\u7d22\u5f15\u7f16\u7801&#xff1a;\u884c\u504f\u79fb \u00d7 (2M-1) &#043; \u5217\u504f\u79fb \u2192 \u552f\u4e00\u6807\u8bc6\u6bcf\u4e2a\u76f8\u5bf9\u4f4d\u7f6e<br \/>\n        relative_coords[:, :, 0] *&#061; 2 * self.window_size &#8211; 1<br \/>\n        # \u6c42\u548c\u5f97\u5230\u6700\u7ec8\u7684\u76f8\u5bf9\u4f4d\u7f6e\u7d22\u5f15&#xff1a;(M\u00b2, M\u00b2)<br \/>\n        relative_position_index &#061; relative_coords.sum(-1)<br \/>\n        # \u6ce8\u518c\u4e3a\u7f13\u51b2\u533a&#xff08;\u4e0d\u53c2\u4e0e\u68af\u5ea6\u66f4\u65b0&#xff09;<br \/>\n        self.register_buffer(&#034;relative_position_index&#034;, relative_position_index)<\/p>\n<p>    def forward(self, x, mask&#061;None):<br \/>\n        &#034;&#034;&#034;<br \/>\n        \u7a97\u53e3\u6ce8\u610f\u529b\u524d\u5411\u4f20\u64ad<br \/>\n        Args:<br \/>\n            x: \u8f93\u5165\u7279\u5f81&#xff0c;\u5f62\u72b6 [num_windows*B, M\u00b2, dim]<br \/>\n               &#8211; num_windows: \u7279\u5f81\u56fe\u5212\u5206\u7684\u7a97\u53e3\u603b\u6570<br \/>\n               &#8211; B: batch size<br \/>\n               &#8211; M\u00b2: \u5355\u4e2a\u7a97\u53e3\u7684\u50cf\u7d20\u6570<br \/>\n               &#8211; dim: \u901a\u9053\u6570<br \/>\n            mask: \u63a9\u7801\u77e9\u9635&#xff08;\u4ec5\u79fb\u4f4d\u7a97\u53e3\u65f6\u4f7f\u7528&#xff09;&#xff0c;\u5f62\u72b6 [num_windows, M\u00b2, M\u00b2]<br \/>\n        Returns:<br \/>\n            output: \u6ce8\u610f\u529b\u8ba1\u7b97\u540e\u7684\u7279\u5f81&#xff0c;\u5f62\u72b6 [num_windows*B, M\u00b2, dim]<br \/>\n        &#034;&#034;&#034;<br \/>\n        # \u83b7\u53d6\u8f93\u5165\u7ef4\u5ea6&#xff1a;B_&#061;num_windows*B, N&#061;M\u00b2, C&#061;dim<br \/>\n        B_, N, C &#061; x.shape<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u751f\u6210Q\/K\/V &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        # 1. \u7ebf\u6027\u53d8\u6362&#xff1a;[B_, N, C] \u2192 [B_, N, 3*C]<br \/>\n        # 2. \u7ef4\u5ea6\u91cd\u6392&#xff1a;[B_, N, 3, num_heads, head_dim] \u2192 \u62c6\u52063\u4e2a\u7ef4\u5ea6\u7ed9Q\/K\/V<br \/>\n        # 3. \u7ef4\u5ea6\u7f6e\u6362&#xff1a;[3, B_, num_heads, N, head_dim]<br \/>\n        qkv &#061; self.qkv(x).reshape(B_, N, 3, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)<br \/>\n        # \u62c6\u5206Q\/K\/V&#xff1a;\u6bcf\u4e2a\u7684\u5f62\u72b6\u90fd\u662f [B_, num_heads, N, head_dim]<br \/>\n        q, k, v &#061; qkv[0], qkv[1], qkv[2]<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u8ba1\u7b97\u6ce8\u610f\u529b\u5206\u6570 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        q &#061; q * self.scale  # \u7f29\u653eQ&#xff0c;\u5bf9\u5e94\u516c\u5f0f\u4e2d\u7684 1\/\u221ad_k<br \/>\n        # Q &#064; K^T&#xff1a;[B_, num_heads, N, head_dim] \u00d7 [B_, num_heads, head_dim, N] \u2192 [B_, num_heads, N, N]<br \/>\n        attn &#061; (q &#064; k.transpose(-2, -1))<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u6dfb\u52a0\u76f8\u5bf9\u4f4d\u7f6e\u504f\u7f6e &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        # \u4ece\u504f\u7f6e\u8868\u4e2d\u53d6\u51fa\u5bf9\u5e94\u4f4d\u7f6e\u7684\u504f\u7f6e&#xff1a;[M\u00b2*M\u00b2, num_heads] \u2192 [M\u00b2, M\u00b2, num_heads]<br \/>\n        relative_position_bias &#061; self.relative_position_bias_table[self.relative_position_index.view(-1)].view(<br \/>\n            self.window_size * self.window_size, self.window_size * self.window_size, -1<br \/>\n        )<br \/>\n        # \u7ef4\u5ea6\u91cd\u6392&#xff1a;[num_heads, M\u00b2, M\u00b2]<br \/>\n        relative_position_bias &#061; relative_position_bias.permute(2, 0, 1).contiguous()<br \/>\n        # \u6dfb\u52a0\u504f\u7f6e&#xff1a;[B_, num_heads, N, N] &#043; [1, num_heads, N, N] \u2192 \u5e7f\u64ad\u76f8\u52a0<br \/>\n        attn &#061; attn &#043; relative_position_bias.unsqueeze(0)<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u5e94\u7528\u63a9\u7801&#xff08;\u79fb\u4f4d\u7a97\u53e3\u4e13\u7528&#xff09; &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        if mask is not None:<br \/>\n            nW &#061; mask.shape[0]  # \u83b7\u53d6\u7a97\u53e3\u6570\u91cf<br \/>\n            # 1. \u7ef4\u5ea6\u9002\u914d&#xff1a;\u5c06attn\u62c6\u5206\u4e3a [B\/\/nW, nW, num_heads, N, N]<br \/>\n            # 2. \u63a9\u7801\u5e7f\u64ad&#xff1a;mask [nW, N, N] \u2192 [1, nW, 1, N, N]<br \/>\n            # 3. \u76f8\u52a0&#xff1a;\u5c06\u63a9\u7801\u503c&#xff08;-100&#xff09;\u52a0\u5230\u8de8\u7a97\u53e3\u7684\u6ce8\u610f\u529b\u5206\u6570\u4e0a<br \/>\n            attn &#061; attn.view(B_ \/\/ nW, nW, self.num_heads, N, N) &#043; mask.unsqueeze(1).unsqueeze(0)<br \/>\n            # \u8fd8\u539f\u7ef4\u5ea6&#xff1a;[B_, num_heads, N, N]<br \/>\n            attn &#061; attn.view(-1, self.num_heads, N, N)<br \/>\n            # Softmax\u5f52\u4e00\u5316&#xff1a;\u63a9\u7801\u4f4d\u7f6e\u7684-100\u4f1a\u88abSoftmax\u4e3a0&#xff0c;\u4e0d\u53c2\u4e0e\u8ba1\u7b97<br \/>\n            attn &#061; F.softmax(attn, dim&#061;-1)<br \/>\n        else:<br \/>\n            # \u65e0\u63a9\u7801\u65f6\u76f4\u63a5Softmax<br \/>\n            attn &#061; F.softmax(attn, dim&#061;-1)<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u6ce8\u610f\u529b\u52a0\u6743\u6c42\u548cV &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        # attn [B_, num_heads, N, N] \u00d7 v [B_, num_heads, N, head_dim] \u2192 [B_, num_heads, N, head_dim]<br \/>\n        # \u7ef4\u5ea6\u7f6e\u6362&#xff1a;[B_, N, num_heads, head_dim] \u2192 \u5408\u5e76\u5934\u7ef4\u5ea6 \u2192 [B_, N, dim]<br \/>\n        x &#061; (attn &#064; v).transpose(1, 2).reshape(B_, N, C)<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u8f93\u51fa\u6295\u5f71 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        x &#061; self.proj(x)  # \u7ebf\u6027\u53d8\u6362&#xff0c;\u4fdd\u6301\u7ef4\u5ea6\u4e0d\u53d8<br \/>\n        return x<\/p>\n<p># &#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061; \u8f85\u52a9\u51fd\u6570&#xff1a;\u7a97\u53e3\u5212\u5206\u4e0e\u8fd8\u539f &#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;&#061;<br \/>\ndef window_partition(x, window_size):<br \/>\n    &#034;&#034;&#034;<br \/>\n    \u5c06\u7279\u5f81\u56fe\u5212\u5206\u4e3a\u4e0d\u91cd\u53e0\u7684\u7a97\u53e3<br \/>\n    Args:<br \/>\n        x: \u8f93\u5165\u7279\u5f81&#xff0c;\u5f62\u72b6 [B, H, W, C]<br \/>\n        window_size: \u7a97\u53e3\u5927\u5c0f M<br \/>\n    Returns:<br \/>\n        windows: \u7a97\u53e3\u5316\u7279\u5f81&#xff0c;\u5f62\u72b6 [num_windows*B, M, M, C]<br \/>\n                 num_windows &#061; (H\/M) \u00d7 (W\/M)<br \/>\n    &#034;&#034;&#034;<br \/>\n    B, H, W, C &#061; x.shape<br \/>\n    # \u7ef4\u5ea6\u62c6\u5206&#xff1a;[B, H, W, C] \u2192 [B, H\/\/M, M, W\/\/M, M, C]<br \/>\n    x &#061; x.view(B, H \/\/ window_size, window_size, W \/\/ window_size, window_size, C)<br \/>\n    # \u7ef4\u5ea6\u7f6e\u6362&#xff1a;[B, H\/\/M, W\/\/M, M, M, C] \u2192 \u5408\u5e76\u524d\u4e09\u7ef4 \u2192 [num_windows*B, M, M, C]<br \/>\n    windows &#061; x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)<br \/>\n    return windows<\/p>\n<p>def window_reverse(windows, window_size, H, W):<br \/>\n    &#034;&#034;&#034;<br \/>\n    \u5c06\u7a97\u53e3\u5316\u7279\u5f81\u8fd8\u539f\u4e3a\u5b8c\u6574\u7279\u5f81\u56fe&#xff08;window_partition\u7684\u9006\u64cd\u4f5c&#xff09;<br \/>\n    Args:<br \/>\n        windows: \u7a97\u53e3\u5316\u7279\u5f81&#xff0c;\u5f62\u72b6 [num_windows*B, M, M, C]<br \/>\n        window_size: \u7a97\u53e3\u5927\u5c0f M<br \/>\n        H, W: \u539f\u59cb\u7279\u5f81\u56fe\u7684\u9ad8\u548c\u5bbd<br \/>\n    Returns:<br \/>\n        x: \u8fd8\u539f\u540e\u7684\u7279\u5f81\u56fe&#xff0c;\u5f62\u72b6 [B, H, W, C]<br \/>\n    &#034;&#034;&#034;<br \/>\n    # \u8ba1\u7b97batch size&#xff1a;num_windows &#061; (H*W)\/(M*M) \u2192 B &#061; total_windows \/ num_windows<br \/>\n    B &#061; int(windows.shape[0] \/ (H * W \/ window_size \/ window_size))<br \/>\n    # \u7ef4\u5ea6\u62c6\u5206&#xff1a;[num_windows*B, M, M, C] \u2192 [B, H\/\/M, W\/\/M, M, M, C]<br \/>\n    x &#061; windows.view(B, H \/\/ window_size, W \/\/ window_size, window_size, window_size, C)<br \/>\n    # \u7ef4\u5ea6\u7f6e\u6362&#xff1a;[B, H\/\/M, M, W\/\/M, M, C] \u2192 \u5408\u5e76\u7ef4\u5ea6 \u2192 [B, H, W, C]<br \/>\n    x &#061; x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, C)<br \/>\n    return x <\/p>\n<h5>\u6a21\u5757\u4e09&#xff1a;Swin Transformer Block<\/h5>\n<p>class SwinTransformerBlock(nn.Module):<br \/>\n    def __init__(self, dim, num_heads, window_size&#061;7, shift_size&#061;0):<br \/>\n        &#034;&#034;&#034;<br \/>\n        Swin Transformer \u57fa\u7840\u5757&#xff08;\u5305\u542b\u7a97\u53e3\u6ce8\u610f\u529b\/\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b&#xff09;<br \/>\n        Args:<br \/>\n            dim: \u8f93\u5165\u901a\u9053\u6570<br \/>\n            num_heads: \u6ce8\u610f\u529b\u5934\u6570<br \/>\n            window_size: \u7a97\u53e3\u5927\u5c0f M<br \/>\n            shift_size: \u79fb\u4f4d\u91cf&#xff08;0&#061;\u666e\u901a\u7a97\u53e3&#xff0c;M\/\/2&#061;\u79fb\u4f4d\u7a97\u53e3&#xff09;<br \/>\n        &#034;&#034;&#034;<br \/>\n        super().__init__()<br \/>\n        self.dim &#061; dim<br \/>\n        self.num_heads &#061; num_heads<br \/>\n        self.window_size &#061; window_size<br \/>\n        self.shift_size &#061; shift_size  # \u79fb\u4f4d\u91cf&#xff0c;\u6838\u5fc3\u53c2\u6570<\/p>\n<p>        # \u5c42\u5f52\u4e00\u5316&#xff08;Transformer\u6807\u51c6\u64cd\u4f5c&#xff0c;\u653e\u5728\u6ce8\u610f\u529b\u524d&#xff09;<br \/>\n        self.norm1 &#061; nn.LayerNorm(dim)<br \/>\n        # \u5b9e\u4f8b\u5316\u7a97\u53e3\u6ce8\u610f\u529b\u6a21\u5757<br \/>\n        self.attn &#061; WindowAttention(dim, window_size, num_heads)<\/p>\n<p>    def forward(self, x):<br \/>\n        &#034;&#034;&#034;<br \/>\n        Swin Block \u524d\u5411\u4f20\u64ad<br \/>\n        Args:<br \/>\n            x: \u8f93\u5165\u7279\u5f81&#xff0c;\u5f62\u72b6 [B, H, W, C]<br \/>\n        Returns:<br \/>\n            x: \u8f93\u51fa\u7279\u5f81&#xff0c;\u5f62\u72b6 [B, H, W, C]&#xff08;\u6b8b\u5dee\u8fde\u63a5\u540e&#xff09;<br \/>\n        &#034;&#034;&#034;<br \/>\n        B, H, W, C &#061; x.shape<br \/>\n        shortcut &#061; x  # \u4fdd\u5b58\u6b8b\u5dee\u8fde\u63a5\u7684\u8f93\u5165<\/p>\n<p>        # 1. \u5c42\u5f52\u4e00\u5316<br \/>\n        x &#061; self.norm1(x)<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u79fb\u4f4d\u64cd\u4f5c&#xff08;Shifted Window&#xff09; &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        if self.shift_size &gt; 0:<br \/>\n            # \u5faa\u73af\u79fb\u4f4d&#xff1a;\u5411\u5de6\u3001\u5411\u4e0a\u79fb\u52a8shift_size\u4e2a\u50cf\u7d20&#xff08;\u8d1f\u6570\u8868\u793a\u5de6\/\u4e0a\u79fb&#xff09;<br \/>\n            # \u4f8b\u5982M&#061;7&#xff0c;shift_size&#061;3 \u2192 \u5de6\u79fb3&#xff0c;\u4e0a\u79fb3<br \/>\n            shifted_x &#061; torch.roll(x, shifts&#061;(-self.shift_size, -self.shift_size), dims&#061;(1, 2))<\/p>\n<p>            # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u751f\u6210\u79fb\u4f4d\u7a97\u53e3\u7684\u63a9\u7801 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n            # 1. \u521d\u59cb\u5316\u63a9\u7801\u77e9\u9635&#xff1a;[1, H, W, 1]&#xff0c;\u7528\u4e8e\u6807\u8bb0\u4e0d\u540c\u539f\u59cb\u533a\u57df<br \/>\n            img_mask &#061; torch.zeros((1, H, W, 1), device&#061;x.device)<br \/>\n            # 2. \u5212\u5206\u79fb\u4f4d\u540e\u7684\u533a\u57df\u5207\u7247&#xff08;\u51713\u00d73&#061;9\u4e2a\u533a\u57df&#xff09;<br \/>\n            h_slices &#061; (slice(0, -self.window_size),          # \u4e0a\u534a\u90e8\u5206<br \/>\n                        slice(-self.window_size, -self.shift_size),  # \u4e2d\u95f4\u8fc7\u6e21\u533a<br \/>\n                        slice(-self.shift_size, None))        # \u4e0b\u534a\u90e8\u5206<br \/>\n            w_slices &#061; (slice(0, -self.window_size),          # \u5de6\u534a\u90e8\u5206<br \/>\n                        slice(-self.window_size, -self.shift_size),  # \u4e2d\u95f4\u8fc7\u6e21\u533a<br \/>\n                        slice(-self.shift_size, None))        # \u53f3\u534a\u90e8\u5206<br \/>\n            # 3. \u4e3a\u6bcf\u4e2a\u533a\u57df\u5206\u914d\u552f\u4e00\u6807\u7b7e&#xff08;0-8&#xff09;<br \/>\n            cnt &#061; 0<br \/>\n            for h in h_slices:<br \/>\n                for w in w_slices:<br \/>\n                    img_mask[:, h, w, :] &#061; cnt<br \/>\n                    cnt &#043;&#061; 1<br \/>\n            # 4. \u5c06\u63a9\u7801\u5212\u5206\u4e3a\u7a97\u53e3&#xff1a;[num_windows, M, M, 1]<br \/>\n            mask_windows &#061; window_partition(img_mask, self.window_size)<br \/>\n            # 5. \u5c55\u5e73\u63a9\u7801&#xff1a;[num_windows, M\u00b2]<br \/>\n            mask_windows &#061; mask_windows.view(-1, self.window_size * self.window_size)<br \/>\n            # 6. \u8ba1\u7b97\u6ce8\u610f\u529b\u63a9\u7801&#xff1a;\u5224\u65ad\u4e24\u4e2a\u4f4d\u7f6e\u662f\u5426\u5c5e\u4e8e\u540c\u4e00\u539f\u59cb\u533a\u57df<br \/>\n            #    &#8211; \u540c\u4e00\u533a\u57df&#xff1a;mask&#061;0 \u2192 Softmax\u540e\u6b63\u5e38\u8ba1\u7b97<br \/>\n            #    &#8211; \u4e0d\u540c\u533a\u57df&#xff1a;mask&#061;-100 \u2192 Softmax\u540e\u4e3a0&#xff0c;\u4e0d\u53c2\u4e0e\u8ba1\u7b97<br \/>\n            attn_mask &#061; mask_windows.unsqueeze(1) &#8211; mask_windows.unsqueeze(2)<br \/>\n            attn_mask &#061; attn_mask.masked_fill(attn_mask !&#061; 0, float(-100.0)).masked_fill(attn_mask &#061;&#061; 0, float(0.0))<br \/>\n        else:<br \/>\n            # \u666e\u901a\u7a97\u53e3&#xff1a;\u4e0d\u79fb\u4f4d&#xff0c;\u65e0\u63a9\u7801<br \/>\n            shifted_x &#061; x<br \/>\n            attn_mask &#061; None<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u7a97\u53e3\u6ce8\u610f\u529b\u8ba1\u7b97 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        # 1. \u5212\u5206\u7a97\u53e3&#xff1a;[B, H, W, C] \u2192 [num_windows*B, M, M, C]<br \/>\n        x_windows &#061; window_partition(shifted_x, self.window_size)<br \/>\n        # 2. \u5c55\u5e73\u7a97\u53e3&#xff1a;[num_windows*B, M\u00b2, C]&#xff08;\u9002\u914dWindowAttention\u8f93\u5165&#xff09;<br \/>\n        x_windows &#061; x_windows.view(-1, self.window_size * self.window_size, C)<br \/>\n        # 3. \u7a97\u53e3\u6ce8\u610f\u529b\u524d\u5411\u8ba1\u7b97<br \/>\n        attn_windows &#061; self.attn(x_windows, mask&#061;attn_mask)<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u8fd8\u539f\u7a97\u53e3\u4e3a\u7279\u5f81\u56fe &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        # 1. \u8fd8\u539f\u7a97\u53e3\u5f62\u72b6&#xff1a;[num_windows*B, M\u00b2, C] \u2192 [num_windows*B, M, M, C]<br \/>\n        attn_windows &#061; attn_windows.view(-1, self.window_size, self.window_size, C)<br \/>\n        # 2. \u7a97\u53e3\u8fd8\u539f\u4e3a\u7279\u5f81\u56fe&#xff1a;[num_windows*B, M, M, C] \u2192 [B, H, W, C]<br \/>\n        shifted_x &#061; window_reverse(attn_windows, self.window_size, H, W)<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u9006\u79fb\u4f4d&#xff08;\u6062\u590d\u539f\u59cb\u4f4d\u7f6e&#xff09; &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        if self.shift_size &gt; 0:<br \/>\n            # \u5411\u53f3\u3001\u5411\u4e0b\u79fb\u52a8shift_size\u4e2a\u50cf\u7d20&#xff0c;\u8fd8\u539f\u5230\u539f\u59cb\u4f4d\u7f6e<br \/>\n            x &#061; torch.roll(shifted_x, shifts&#061;(self.shift_size, self.shift_size), dims&#061;(1, 2))<br \/>\n        else:<br \/>\n            x &#061; shifted_x<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u6b8b\u5dee\u8fde\u63a5 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        x &#061; shortcut &#043; x  # \u6b8b\u5dee\u76f8\u52a0&#xff0c;\u63d0\u5347\u68af\u5ea6\u4f20\u64ad<\/p>\n<p>        return x <\/p>\n<h5>\u6a21\u5757\u56db&#xff1a;Patch Merging&#xff08;\u5206\u5c42\u4e0b\u91c7\u6837&#xff09;<\/h5>\n<p>class PatchMerging(nn.Module):<br \/>\n    def __init__(self, dim):<br \/>\n        &#034;&#034;&#034;<br \/>\n        Patch Merging \u6a21\u5757&#xff1a;\u5c062\u00d72\u76f8\u90bbPatch\u5408\u5e76&#xff0c;\u5b9e\u73b0\u4e0b\u91c7\u6837<br \/>\n        Args:<br \/>\n            dim: \u8f93\u5165\u901a\u9053\u6570<br \/>\n        &#034;&#034;&#034;<br \/>\n        super().__init__()<br \/>\n        self.dim &#061; dim<br \/>\n        # \u7ebf\u6027\u53d8\u6362&#xff1a;\u5c064*dim\u901a\u9053\u538b\u7f29\u4e3a2*dim&#xff08;\u4e0b\u91c7\u6837\u540e\u901a\u9053\u6570\u7ffb\u500d&#xff09;<br \/>\n        self.reduction &#061; nn.Linear(4 * dim, 2 * dim, bias&#061;False)<br \/>\n        # \u5c42\u5f52\u4e00\u5316&#xff1a;\u653e\u5728\u7ebf\u6027\u53d8\u6362\u524d&#xff0c;\u63d0\u5347\u7a33\u5b9a\u6027<br \/>\n        self.norm &#061; nn.LayerNorm(4 * dim)<\/p>\n<p>    def forward(self, x):<br \/>\n        &#034;&#034;&#034;<br \/>\n        Patch Merging \u524d\u5411\u4f20\u64ad<br \/>\n        Args:<br \/>\n            x: \u8f93\u5165\u7279\u5f81&#xff0c;\u5f62\u72b6 [B, H, W, C]<br \/>\n        Returns:<br \/>\n            x: \u4e0b\u91c7\u6837\u540e\u7684\u7279\u5f81&#xff0c;\u5f62\u72b6 [B, H\/2, W\/2, 2C]<br \/>\n        &#034;&#034;&#034;<br \/>\n        B, H, W, C &#061; x.shape<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; 2\u00d72 Patch\u5408\u5e76 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        # 1. \u7ef4\u5ea6\u62c6\u5206&#xff1a;[B, H, W, C] \u2192 [B, H\/\/2, 2, W\/\/2, 2, C]<br \/>\n        x &#061; x.view(B, H \/\/ 2, 2, W \/\/ 2, 2, C)<br \/>\n        # 2. \u7ef4\u5ea6\u7f6e\u6362&#xff1a;[B, H\/\/2, W\/\/2, 2, 2, C] \u2192 \u5408\u5e76\u6700\u540e\u4e09\u7ef4 \u2192 [B, H\/\/2, W\/\/2, 4C]<br \/>\n        x &#061; x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H \/\/ 2, W \/\/ 2, -1)<\/p>\n<p>        # &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211; \u5f52\u4e00\u5316&#043;\u901a\u9053\u538b\u7f29 &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n        x &#061; self.norm(x)       # \u5c42\u5f52\u4e00\u5316<br \/>\n        x &#061; self.reduction(x)  # 4C \u2192 2C&#xff0c;\u901a\u9053\u6570\u7ffb\u500d<\/p>\n<p>        return x<\/p>\n<h5>\u6a21\u5757\u4e94&#xff1a;\u6d4b\u8bd5\u4ee3\u7801<\/h5>\n<p>if __name__ &#061;&#061; &#034;__main__&#034;:<br \/>\n    # \u6a21\u62df\u8f93\u5165&#xff1a;batch_size&#061;2&#xff0c;\u7279\u5f81\u56fe56\u00d756&#xff0c;\u901a\u9053\u657096&#xff08;Swin-T\u7684\u7b2c\u4e00\u5c42\u7279\u5f81&#xff09;<br \/>\n    x &#061; torch.randn(2, 56, 56, 96)<\/p>\n<p>    # 1. \u6d4b\u8bd5\u666e\u901a\u7a97\u53e3\u6ce8\u610f\u529b\u5757&#xff08;\u65e0\u79fb\u4f4d&#xff09;<br \/>\n    block1 &#061; SwinTransformerBlock(dim&#061;96, num_heads&#061;8, window_size&#061;7, shift_size&#061;0)<br \/>\n    out1 &#061; block1(x)<br \/>\n    print(&#034;\u666e\u901a\u7a97\u53e3\u6ce8\u610f\u529b\u8f93\u51fa\u5f62\u72b6:&#034;, out1.shape)  # \u9884\u671f&#xff1a;torch.Size([2, 56, 56, 96])<\/p>\n<p>    # 2. \u6d4b\u8bd5\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u5757&#xff08;\u79fb\u4f4d\u91cf&#061;3&#xff09;<br \/>\n    block2 &#061; SwinTransformerBlock(dim&#061;96, num_heads&#061;8, window_size&#061;7, shift_size&#061;3)<br \/>\n    out2 &#061; block2(x)<br \/>\n    print(&#034;\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u8f93\u51fa\u5f62\u72b6:&#034;, out2.shape)  # \u9884\u671f&#xff1a;torch.Size([2, 56, 56, 96])<\/p>\n<p>    # 3. \u6d4b\u8bd5Patch Merging\u4e0b\u91c7\u6837<br \/>\n    patch_merge &#061; PatchMerging(dim&#061;96)<br \/>\n    out3 &#061; patch_merge(x)<br \/>\n    print(&#034;Patch Merging\u8f93\u51fa\u5f62\u72b6:&#034;, out3.shape)  # \u9884\u671f&#xff1a;torch.Size([2, 28, 28, 192]) <\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u7528 \u201c\u5206\u7a97\u53e3\u7b97\u6ce8\u610f\u529b\u201d \u964d\u4f4e\u8ba1\u7b97\u91cf&#xff0c;\u7528 \u201c\u632a\u7a97\u53e3 &#043; \u63a9\u7801\u201d \u6253\u901a\u7a97\u53e3\u95f4\u4fe1\u606f&#xff0c;\u7528 \u201c\u5408\u5e76\u50cf\u7d20\u5757\u201d \u6784\u5efa\u5206\u5c42\u7279\u5f81&#xff0c;\u6700\u7ec8\u8ba9 Transformer \u80fd\u9ad8\u6548\u5904\u7406\u56fe\u7247&#xff0c;\u65e2\u5feb\u53c8\u80fd\u5b66\u5230\u6709\u7528\u7684\u7279\u5f81\u3002\u3001<\/p>\n<h5>\u8fd0\u884c\u7ed3\u679c&#xff1a;<\/h5>\n<p>\u666e\u901a\u7a97\u53e3\u6ce8\u610f\u529b\u8f93\u51fa\u5f62\u72b6: torch.Size([2, 56, 56, 96])<\/p>\n<p>\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u8f93\u51fa\u5f62\u72b6: torch.Size([2, 56, 56, 96])<\/p>\n<p>Patch Merging\u8f93\u51fa\u5f62\u72b6: torch.Size([2, 28, 28, 192])<\/p>\n<ul>\n<li>\u8f93\u5165\u4e00\u5f20\u771f\u5b9e\u56fe\u7247&#xff08;\u6bd4\u5982\u732b\u3001\u72d7\u7684\u7167\u7247&#xff09;&#xff1b;<\/li>\n<li>\u6a21\u578b\u8f93\u51fa\u8fd9\u5f20\u56fe\u7247\u7684 \u201c\u9ad8\u7ea7\u7279\u5f81\u201d&#xff08;\u4e0d\u662f\u50cf\u7d20&#xff0c;\u662f\u80fd\u63cf\u8ff0 \u201c\u8fd9\u662f\u732b\u3001\u90a3\u662f\u72d7\u201d \u7684\u7279\u5f81&#xff09;&#xff1b;<\/li>\n<li>\u518d\u642d\u914d\u7b80\u5355\u7684\u5206\u7c7b \/ \u68c0\u6d4b\u5934&#xff0c;\u5c31\u80fd\u5b9e\u73b0\u56fe\u7247\u5206\u7c7b\u3001\u76ee\u6807\u68c0\u6d4b\u3001\u8bed\u4e49\u5206\u5272\u7b49\u89c6\u89c9\u4efb\u52a1&#xff08;\u6bd4\u5982\u8bc6\u522b\u56fe\u7247\u91cc\u6709\u4ec0\u4e48\u3001\u627e\u5230\u7269\u4f53\u7684\u4f4d\u7f6e&#xff09;\u3002<\/li>\n<\/ul>\n<h4>\u56db\u3001\u603b\u7ed3<\/h4>\n<ul>\n<li>\u6838\u5fc3\u521b\u65b0&#xff1a;Swin Transformer \u7528\u300c\u7a97\u53e3\u6ce8\u610f\u529b\u300d\u964d\u4f4e\u8ba1\u7b97\u590d\u6742\u5ea6&#xff0c;\u7528\u300c\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u300d\u89e3\u51b3\u7a97\u53e3\u95f4\u4fe1\u606f\u5b64\u7acb\u95ee\u9898&#xff0c;\u9002\u914d\u9ad8\u5206\u8fa8\u7387\u89c6\u89c9\u4efb\u52a1&#xff1b;<\/li>\n<li>\u6570\u5b66\u6838\u5fc3&#xff1a;\u7a97\u53e3\u5185\u81ea\u6ce8\u610f\u529b\u516c\u5f0f&#xff0c;\u5176\u4e2d\u63a9\u7801 M \u662f\u79fb\u4f4d\u7a97\u53e3\u7684\u5173\u952e&#xff1b;<\/li>\n<li>\u4ee3\u7801\u6838\u5fc3&#xff1a;\n<ul>\n<li>\u7a97\u53e3\u5212\u5206 \/ \u8fd8\u539f\u662f\u7a97\u53e3\u6ce8\u610f\u529b\u7684\u57fa\u7840&#xff1b;<\/li>\n<li>\u79fb\u4f4d &#043; \u63a9\u7801\u662f\u5b9e\u73b0 Shifted Window \u7684\u6838\u5fc3&#xff1b;<\/li>\n<li>Patch Merging \u5b9e\u73b0\u5206\u5c42\u4e0b\u91c7\u6837&#xff0c;\u6a21\u4eff CNN \u7684\u5c42\u7ea7\u7279\u5f81\u3002<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u4e00\u3001Swin Transformer\u6838\u5fc3\u6982\u5ff5 Swin Transformer&#xff08;Shifted Window Transformer&#xff09;\u662f\u4e13\u4e3a\u89c6\u89c9\u4efb\u52a1\u8bbe\u8ba1\u7684 Transformer \u53d8\u4f53&#xff0c;\u89e3\u51b3\u4e86\u539f\u59cb Transformer \u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u65f6\u8ba1\u7b97\u91cf\u7206\u70b8\u7684\u95ee\u9898&#xff0c;\u6838\u5fc3\u521b\u65b0\u662f\u5206\u5c42\u7ed3\u6784\u548c\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u673a\u5236\u3002<br \/>\n\u6838\u5fc3\u6982\u5ff5&#xff1a;<br \/>\n\u5206\u5c42\u7279\u5f81\u63d0\u53d6&#xff1a;\u6a21\u4eff CNN \u7684\u5c42\u7ea7\u7ed3\u6784&#xff0c;\u901a\u8fc7 Patch Mer<\/p>\n","protected":false},"author":2,"featured_media":77551,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[841,50,523],"topic":[],"class_list":["post-77558","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-server","tag-transformer","tag-50","tag-523"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.wsisp.com\/helps\/77558.html\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"og:description\" content=\"\u4e00\u3001Swin Transformer\u6838\u5fc3\u6982\u5ff5 Swin Transformer&#xff08;Shifted Window Transformer&#xff09;\u662f\u4e13\u4e3a\u89c6\u89c9\u4efb\u52a1\u8bbe\u8ba1\u7684 Transformer \u53d8\u4f53&#xff0c;\u89e3\u51b3\u4e86\u539f\u59cb Transformer \u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u65f6\u8ba1\u7b97\u91cf\u7206\u70b8\u7684\u95ee\u9898&#xff0c;\u6838\u5fc3\u521b\u65b0\u662f\u5206\u5c42\u7ed3\u6784\u548c\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u673a\u5236\u3002 \u6838\u5fc3\u6982\u5ff5&#xff1a; \u5206\u5c42\u7279\u5f81\u63d0\u53d6&#xff1a;\u6a21\u4eff CNN \u7684\u5c42\u7ea7\u7ed3\u6784&#xff0c;\u901a\u8fc7 Patch Mer\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.wsisp.com\/helps\/77558.html\" \/>\n<meta property=\"og:site_name\" content=\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-24T14:57:53+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc70090d0.png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/77558.html\",\"url\":\"https:\/\/www.wsisp.com\/helps\/77558.html\",\"name\":\"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"isPartOf\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\"},\"datePublished\":\"2026-02-24T14:57:53+00:00\",\"dateModified\":\"2026-02-24T14:57:53+00:00\",\"author\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.wsisp.com\/helps\/77558.html#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.wsisp.com\/helps\/77558.html\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/77558.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.wsisp.com\/helps\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#website\",\"url\":\"https:\/\/www.wsisp.com\/helps\/\",\"name\":\"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3\",\"description\":\"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"contentUrl\":\"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery\",\"caption\":\"admin\"},\"sameAs\":[\"http:\/\/wp.wsisp.com\"],\"url\":\"https:\/\/www.wsisp.com\/helps\/author\/admin\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.wsisp.com\/helps\/77558.html","og_locale":"zh_CN","og_type":"article","og_title":"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","og_description":"\u4e00\u3001Swin Transformer\u6838\u5fc3\u6982\u5ff5 Swin Transformer&#xff08;Shifted Window Transformer&#xff09;\u662f\u4e13\u4e3a\u89c6\u89c9\u4efb\u52a1\u8bbe\u8ba1\u7684 Transformer \u53d8\u4f53&#xff0c;\u89e3\u51b3\u4e86\u539f\u59cb Transformer \u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u65f6\u8ba1\u7b97\u91cf\u7206\u70b8\u7684\u95ee\u9898&#xff0c;\u6838\u5fc3\u521b\u65b0\u662f\u5206\u5c42\u7ed3\u6784\u548c\u79fb\u4f4d\u7a97\u53e3\u6ce8\u610f\u529b\u673a\u5236\u3002 \u6838\u5fc3\u6982\u5ff5&#xff1a; \u5206\u5c42\u7279\u5f81\u63d0\u53d6&#xff1a;\u6a21\u4eff CNN \u7684\u5c42\u7ea7\u7ed3\u6784&#xff0c;\u901a\u8fc7 Patch Mer","og_url":"https:\/\/www.wsisp.com\/helps\/77558.html","og_site_name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","article_published_time":"2026-02-24T14:57:53+00:00","og_image":[{"url":"https:\/\/www.wsisp.com\/helps\/wp-content\/uploads\/2026\/02\/20260224145752-699dbc70090d0.png"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"9 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.wsisp.com\/helps\/77558.html","url":"https:\/\/www.wsisp.com\/helps\/77558.html","name":"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer - \u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","isPartOf":{"@id":"https:\/\/www.wsisp.com\/helps\/#website"},"datePublished":"2026-02-24T14:57:53+00:00","dateModified":"2026-02-24T14:57:53+00:00","author":{"@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41"},"breadcrumb":{"@id":"https:\/\/www.wsisp.com\/helps\/77558.html#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.wsisp.com\/helps\/77558.html"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.wsisp.com\/helps\/77558.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.wsisp.com\/helps"},{"@type":"ListItem","position":2,"name":"\u8ba1\u7b97\u673a\u89c6\u89c9CV\u9886\u57df\u2014\u2014\u2014\u2014Swin Transformer"}]},{"@type":"WebSite","@id":"https:\/\/www.wsisp.com\/helps\/#website","url":"https:\/\/www.wsisp.com\/helps\/","name":"\u7f51\u7855\u4e92\u8054\u5e2e\u52a9\u4e2d\u5fc3","description":"\u9999\u6e2f\u670d\u52a1\u5668_\u9999\u6e2f\u4e91\u670d\u52a1\u5668\u8d44\u8baf_\u670d\u52a1\u5668\u5e2e\u52a9\u6587\u6863_\u670d\u52a1\u5668\u6559\u7a0b","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.wsisp.com\/helps\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/358e386c577a3ab51c4493330a20ad41","name":"admin","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.wsisp.com\/helps\/#\/schema\/person\/image\/","url":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","contentUrl":"https:\/\/gravatar.wp-china-yes.net\/avatar\/?s=96&d=mystery","caption":"admin"},"sameAs":["http:\/\/wp.wsisp.com"],"url":"https:\/\/www.wsisp.com\/helps\/author\/admin"}]}},"_links":{"self":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/77558","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/comments?post=77558"}],"version-history":[{"count":0,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/posts\/77558\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media\/77551"}],"wp:attachment":[{"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/media?parent=77558"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/categories?post=77558"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/tags?post=77558"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.wsisp.com\/helps\/wp-json\/wp\/v2\/topic?post=77558"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}