<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Lim's Technology</title>
    <link>https://c0mputermaster.tistory.com/</link>
    <description>
&amp;quot;Hello, I am Lim Seungtaek, a computer engineering student. Nice to meet you!&amp;quot;</description>
    <language>ko</language>
    <pubDate>Sun, 5 Apr 2026 19:52:43 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>임승택</managingEditor>
    <image>
      <title>Lim's Technology</title>
      <url>https://tistory1.daumcdn.net/tistory/6999043/attach/46248d177b654a388cdbdcbbed34eea4</url>
      <link>https://c0mputermaster.tistory.com</link>
    </image>
    <item>
      <title>생성 모델 정리하기 (MLE &amp;middot; VAE &amp;middot; GAN &amp;middot; Diffusion &amp;middot; Language Model)</title>
      <link>https://c0mputermaster.tistory.com/129</link>
      <description>&lt;h2 data-end=&quot;177&quot; data-start=&quot;122&quot; data-ke-size=&quot;size26&quot;&gt;1. 생성(Generation) vs 판별(Discrimination) &amp;amp; 왜 VAE가 중요한가&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;337&quot; data-start=&quot;179&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;248&quot; data-start=&quot;179&quot;&gt;&lt;b&gt;생성(Generation)&lt;/b&gt;: 데이터 분포 자체를 배워서, 거기서 &lt;b&gt;새 샘플을 뽑아내는 모델&lt;/b&gt;을 만드는 것.&lt;/li&gt;
&lt;li data-end=&quot;337&quot; data-start=&quot;249&quot;&gt;&lt;b&gt;판별(Discrimination)&lt;/b&gt;: 주어진 입력 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;에 대해 &lt;b&gt;라벨/클래스 &lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;를 맞히는 문제 (분류, 검출, 세그멘테이션 등).
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;509&quot; data-start=&quot;439&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;509&quot; data-start=&quot;439&quot;&gt;&lt;b&gt;Variational Autoencoder(VAE)&lt;/b&gt;, 즉 &lt;b&gt;Variational 방법론&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;715&quot; data-start=&quot;348&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;715&quot; data-start=&quot;510&quot;&gt;VAE를 이해하고, 그 안에 들어있는 &lt;b&gt;최대우도추정(MLE, Maximum Likelihood Estimation)&lt;/b&gt;,&lt;br /&gt;&lt;b&gt;변분추론(Variational Inference)&lt;/b&gt;, &lt;b&gt;KL Divergence&lt;/b&gt;, &lt;b&gt;ELBO&lt;/b&gt;, &lt;b&gt;Reparameterization Trick&lt;/b&gt;을 이해하는 것이&lt;br /&gt;생성&amp;middot;확률&amp;middot;베이즈&amp;middot;디퓨전 등으로 넘어가는 핵심&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://wikidocs.net/228770&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://wikidocs.net/228770&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1774108316362&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;01. 생성모델 (Generative Models) 이란?&quot; data-og-description=&quot;세상에 실제로 존재하는 객체는 확률 분포 $p(x)$로 나타낼 수 없습니다. 하지만 어떤 객체, 예를 들어 강아지의 모습이 담긴 학습 데이터셋 $D = [x_1, x_2, &amp;hellip;&quot; data-og-host=&quot;wikidocs.net&quot; data-og-source-url=&quot;https://wikidocs.net/228770&quot; data-og-url=&quot;https://wikidocs.net/228770&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/cSkerO/dJMb8TB8SbY/ztdKq1FJx20wo2p9u3XeA0/img.png?width=100&amp;amp;height=130&amp;amp;face=0_0_100_130,https://scrap.kakaocdn.net/dn/dJEVt0/dJMb8YpUHsK/qmiBO8L1MFFjK292Ir0VM0/img.png?width=1198&amp;amp;height=650&amp;amp;face=0_0_1198_650&quot;&gt;&lt;a href=&quot;https://wikidocs.net/228770&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://wikidocs.net/228770&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/cSkerO/dJMb8TB8SbY/ztdKq1FJx20wo2p9u3XeA0/img.png?width=100&amp;amp;height=130&amp;amp;face=0_0_100_130,https://scrap.kakaocdn.net/dn/dJEVt0/dJMb8YpUHsK/qmiBO8L1MFFjK292Ir0VM0/img.png?width=1198&amp;amp;height=650&amp;amp;face=0_0_1198_650');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;01. 생성모델 (Generative Models) 이란?&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;세상에 실제로 존재하는 객체는 확률 분포 $p(x)$로 나타낼 수 없습니다. 하지만 어떤 객체, 예를 들어 강아지의 모습이 담긴 학습 데이터셋 $D = [x_1, x_2, &amp;hellip;&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;wikidocs.net&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;755&quot; data-start=&quot;722&quot; data-ke-size=&quot;size26&quot;&gt;2. 최대우도추정(MLE)와 'Likelihood'&lt;/h2&gt;
&lt;h2 data-end=&quot;790&quot; data-start=&quot;757&quot; data-ke-size=&quot;size26&quot;&gt;2.1 Bayes Rule과 관측변수 vs 미지수 구분:&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;951&quot; data-start=&quot;813&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;870&quot; data-start=&quot;813&quot;&gt;&lt;b&gt;관측 가능한 것&lt;/b&gt;: 데이터 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;870&quot; data-start=&quot;843&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;870&quot; data-start=&quot;843&quot;&gt;이미 하드에 저장된 값, 내 눈에 보이는 값.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;951&quot; data-start=&quot;873&quot;&gt;&lt;b&gt;구하고 싶은 것(미지수, 파라미터)&lt;/b&gt;: &lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;951&quot; data-start=&quot;915&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;951&quot; data-start=&quot;915&quot;&gt;신경망의 weight, bias 같은 학습해야 할 파라미터들.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;961&quot; data-start=&quot;792&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;961&quot; data-start=&quot;953&quot;&gt;&lt;b&gt;Base Rule:&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;289&quot; data-origin-height=&quot;107&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ZZSxE/dJMb99SrkOT/GlmkdS3eBlEvqpYCFyKXf0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ZZSxE/dJMb99SrkOT/GlmkdS3eBlEvqpYCFyKXf0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ZZSxE/dJMb99SrkOT/GlmkdS3eBlEvqpYCFyKXf0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FZZSxE%2FdJMb99SrkOT%2FGlmkdS3eBlEvqpYCFyKXf0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;289&quot; height=&quot;107&quot; data-origin-width=&quot;289&quot; data-origin-height=&quot;107&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1320&quot; data-start=&quot;1028&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1320&quot; data-start=&quot;1028&quot;&gt;여기서:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;578&quot; data-origin-height=&quot;109&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dcU8G4/dJMcadN0NeR/wSnJmJowRW7pK6XfTKTlPk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dcU8G4/dJMcadN0NeR/wSnJmJowRW7pK6XfTKTlPk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dcU8G4/dJMcadN0NeR/wSnJmJowRW7pK6XfTKTlPk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdcU8G4%2FdJMcadN0NeR%2FwSnJmJowRW7pK6XfTKTlPk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;578&quot; height=&quot;109&quot; data-origin-width=&quot;578&quot; data-origin-height=&quot;109&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-end=&quot;1360&quot; data-start=&quot;1322&quot; data-ke-size=&quot;size26&quot;&gt;2.2 생성적 관점: 데이터는 &amp;lsquo;어딘가의 분포&amp;rsquo;에서 샘플링된 것&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1582&quot; data-start=&quot;1362&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1457&quot; data-start=&quot;1362&quot;&gt;현실 세계에서 우리가 관측하는 데이터 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;들은,&lt;br /&gt;&lt;b&gt;어딘가 미지의 세계에 있는 진짜 분포 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; 에서 샘플링되어 하드에 저장된 것이라고 가정.&lt;/li&gt;
&lt;li data-end=&quot;1582&quot; data-start=&quot;1458&quot;&gt;우리는 &lt;b&gt;진짜 분포 &lt;span&gt;&lt;span&gt;p(x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;를 알 수 없지만,
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1582&quot; data-start=&quot;1496&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1545&quot; data-start=&quot;1496&quot;&gt;어떤 파라미터 &lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;를 가지는 모델 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 만들어&lt;/li&gt;
&lt;li data-end=&quot;1582&quot; data-start=&quot;1548&quot;&gt;실제 분포 &lt;span&gt;&lt;span&gt;p(x)&lt;/span&gt;&lt;/span&gt;를 &lt;b&gt;최대한 흉내내고 싶다.&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;a href=&quot;https://angeloyeo.github.io/2020/07/17/MLE.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://angeloyeo.github.io/2020/07/17/MLE.html&lt;/a&gt;&lt;/b&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1773554065041&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;최대우도법(MLE) - 공돌이의 수학정리노트 (Angelo's Math Notes)&quot; data-og-description=&quot;&quot; data-og-host=&quot;angeloyeo.github.io&quot; data-og-source-url=&quot;https://angeloyeo.github.io/2020/07/17/MLE.html&quot; data-og-url=&quot;https://angeloyeo.github.io/2020/07/17/MLE.html&quot; data-og-image=&quot;&quot;&gt;&lt;a href=&quot;https://angeloyeo.github.io/2020/07/17/MLE.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://angeloyeo.github.io/2020/07/17/MLE.html&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url();&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;최대우도법(MLE) - 공돌이의 수학정리노트 (Angelo's Math Notes)&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;angeloyeo.github.io&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;1623&quot; data-start=&quot;1584&quot; data-ke-size=&quot;size26&quot;&gt;2.3 MLE의 목표: 가장 &amp;lsquo;그럴싸한&amp;rsquo; &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 찾기&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;630&quot; data-origin-height=&quot;349&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xoEsP/dJMcacaycsj/TI56M1ucXfyjCAhNqe8H6k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xoEsP/dJMcacaycsj/TI56M1ucXfyjCAhNqe8H6k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xoEsP/dJMcacaycsj/TI56M1ucXfyjCAhNqe8H6k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxoEsP%2FdJMcacaycsj%2FTI56M1ucXfyjCAhNqe8H6k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;630&quot; height=&quot;349&quot; data-origin-width=&quot;630&quot; data-origin-height=&quot;349&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2053&quot; data-start=&quot;2041&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2172&quot; data-start=&quot;2054&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2172&quot; data-start=&quot;2054&quot;&gt;&lt;b&gt;Likelihood = &amp;ldquo;그럴싸한 정도&amp;rdquo;&lt;/b&gt;&lt;br /&gt;&amp;rarr; &amp;ldquo;이 &lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;라면, 지금 내가 가진 데이터가 나올 법해 보이나?&amp;rdquo;&lt;br /&gt;&amp;rarr; 그 &amp;lsquo;정도&amp;rsquo;를 최대화하는 &lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;를 찾는게 MLE.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;2209&quot; data-start=&quot;2174&quot; data-ke-size=&quot;size26&quot;&gt;2.4 Likelihood는 &amp;ldquo;확률분포&amp;rdquo;가 아니라 &amp;ldquo;함수&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;337&quot; data-origin-height=&quot;66&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zS4OH/dJMcachjLwV/ZGODCx8fQL4pNxtLVNMF8k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zS4OH/dJMcachjLwV/ZGODCx8fQL4pNxtLVNMF8k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zS4OH/dJMcachjLwV/ZGODCx8fQL4pNxtLVNMF8k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FzS4OH%2FdJMcachjLwV%2FZGODCx8fQL4pNxtLVNMF8k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;337&quot; height=&quot;66&quot; data-origin-width=&quot;337&quot; data-origin-height=&quot;66&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2751&quot; data-start=&quot;2211&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2405&quot; data-start=&quot;2339&quot;&gt;일반적으로는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;mu;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&amp;sigma;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;가 &lt;b&gt;주어져 있고&lt;/b&gt;, &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;가 변하는 함수 &amp;rarr; &lt;b&gt;확률분포 함수&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;2508&quot; data-start=&quot;2406&quot;&gt;그런데 MLE에서는 상황이 &lt;b&gt;거꾸로&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2508&quot; data-start=&quot;2434&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2458&quot; data-start=&quot;2434&quot;&gt;우리는 관측된 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;들을 알고 있고,&lt;/li&gt;
&lt;li data-end=&quot;2508&quot; data-start=&quot;2461&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;mu;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&amp;sigma;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;라는 미지수(&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;)를 찾고 싶다.&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2613&quot; data-start=&quot;2509&quot;&gt;즉:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2613&quot; data-start=&quot;2516&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2576&quot; data-start=&quot;2516&quot;&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;는 고정, &lt;span&gt;&lt;span&gt;&amp;mu;,&amp;sigma;&lt;/span&gt;&lt;/span&gt;가 변수인 함수 &amp;rarr; 이것이 &lt;b&gt;Likelihood 함수&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;2613&quot; data-start=&quot;2579&quot;&gt;이 함수값이 1이 될 필요도 없고, 1을 넘을 수도 있음.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2751&quot; data-start=&quot;2614&quot;&gt;그래서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2751&quot; data-start=&quot;2623&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2690&quot; data-start=&quot;2623&quot;&gt;&lt;b&gt;Likelihood는 &amp;lsquo;확률(probability)&amp;rsquo;라고 부르지 않고, &amp;lsquo;함수(function)&amp;rsquo;라고 부른다.&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;2751&quot; data-start=&quot;2693&quot;&gt;&amp;ldquo;내가 가진 데이터가 주어졌을 때, 파라미터 &lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;가 얼마나 그럴싸한지&amp;rdquo;를 나타내는 값.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 data-end=&quot;2786&quot; data-start=&quot;2758&quot;&gt;&amp;nbsp;&lt;/h1&gt;
&lt;h2 data-end=&quot;2786&quot; data-start=&quot;2758&quot; data-ke-size=&quot;size26&quot;&gt;3. 생성 모델링 = 데이터 분포에 대한 MLE&lt;/h2&gt;
&lt;h2 data-end=&quot;2805&quot; data-start=&quot;2788&quot; data-ke-size=&quot;size26&quot;&gt;3.1 생성 문제의 목적식&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2995&quot; data-start=&quot;2807&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2877&quot; data-start=&quot;2807&quot;&gt;생성 모델링의 핵심&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;766&quot; data-origin-height=&quot;244&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bnSg2G/dJMcaaR6BgG/p4lccmKB0bKDFl8NbvtYK0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bnSg2G/dJMcaaR6BgG/p4lccmKB0bKDFl8NbvtYK0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bnSg2G/dJMcaaR6BgG/p4lccmKB0bKDFl8NbvtYK0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbnSg2G%2FdJMcaaR6BgG%2Fp4lccmKB0bKDFl8NbvtYK0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;766&quot; height=&quot;244&quot; data-origin-width=&quot;766&quot; data-origin-height=&quot;244&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2995&quot; data-start=&quot;2807&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2995&quot; data-start=&quot;2878&quot;&gt;여기에서 &lt;span&gt;&lt;span&gt;p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;가 무엇인지에 따라:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2995&quot; data-start=&quot;2914&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2934&quot; data-start=&quot;2914&quot;&gt;판별 문제(분류, 세그멘테이션)도&lt;/li&gt;
&lt;li data-end=&quot;2956&quot; data-start=&quot;2937&quot;&gt;이미지 생성, 텍스트 생성 등도&lt;/li&gt;
&lt;li data-end=&quot;2995&quot; data-start=&quot;2959&quot;&gt;&lt;b&gt;&amp;ldquo;결국 MLE 관점에서 동일한 틀&amp;rdquo;&lt;/b&gt; 안에서 설명 가능.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;3024&quot; data-start=&quot;2997&quot; data-ke-size=&quot;size26&quot;&gt;3.2 판별 테스크(Classification)와의 연결&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;721&quot; data-origin-height=&quot;247&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cmoGPG/dJMcagY3eSS/SfWoNyjkN5SCCY3UkqWWjK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cmoGPG/dJMcagY3eSS/SfWoNyjkN5SCCY3UkqWWjK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cmoGPG/dJMcagY3eSS/SfWoNyjkN5SCCY3UkqWWjK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcmoGPG%2FdJMcagY3eSS%2FSfWoNyjkN5SCCY3UkqWWjK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;721&quot; height=&quot;247&quot; data-origin-width=&quot;721&quot; data-origin-height=&quot;247&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3300&quot; data-start=&quot;3026&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3257&quot; data-start=&quot;3026&quot;&gt;분류 문제:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3257&quot; data-start=&quot;3037&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3112&quot; data-start=&quot;3037&quot;&gt;입력 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 넣었을 때, 정답 라벨 &lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;가 나올 확률 &lt;span&gt;&lt;span&gt;p&amp;theta;(y∣x)&lt;/span&gt;&lt;/span&gt;의 로그를 최대화하는 문제.&amp;nbsp;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3257&quot; data-start=&quot;3037&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3257&quot; data-start=&quot;3115&quot;&gt;이때 &lt;b&gt;&lt;span&gt;&lt;span&gt;p&amp;theta;(y∣x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;를:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3257&quot; data-start=&quot;3149&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3198&quot; data-start=&quot;3149&quot;&gt;Bernoulli 분포로 두면 &amp;rarr; 이진 분류 (Binary Cross Entropy)&lt;/li&gt;
&lt;li data-end=&quot;3257&quot; data-start=&quot;3203&quot;&gt;Multinomial 분포로 두면 &amp;rarr; 다중 분류 (Softmax + Cross Entropy)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;3300&quot; data-start=&quot;3258&quot;&gt;즉,
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3300&quot; data-start=&quot;3265&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3300&quot; data-start=&quot;3265&quot;&gt;&lt;b&gt;분류 문제도 &amp;ldquo;로그 우도 최대화&amp;rdquo;의 특수 케이스.&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;3315&quot; data-start=&quot;3302&quot; data-ke-size=&quot;size26&quot;&gt;3.3 생성 테스크&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3558&quot; data-start=&quot;3317&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3396&quot; data-start=&quot;3317&quot;&gt;생성에서는:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3396&quot; data-start=&quot;3328&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3345&quot; data-start=&quot;3328&quot;&gt;클래스 &lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;가 아니라,&lt;/li&gt;
&lt;li data-end=&quot;3396&quot; data-start=&quot;3348&quot;&gt;&lt;b&gt;데이터 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&amp;nbsp;자체의 분포 &lt;span&gt;&lt;span&gt;p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&amp;nbsp;를 배우고 싶다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;3459&quot; data-start=&quot;3397&quot;&gt;그래서 생성 모델의 목적도 똑같이:&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;max⁡&amp;theta; log⁡p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;3558&quot; data-start=&quot;3460&quot;&gt;다만 이 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 구현하는 방식에 따라:&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;221&quot; data-start=&quot;178&quot; data-section-id=&quot;1el6gps&quot;&gt;&lt;b&gt;명시적 생성 모델 (Explicit Generative Model)&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;265&quot; data-start=&quot;222&quot; data-section-id=&quot;4xpctd&quot;&gt;&lt;b&gt;암묵적 생성 모델 (Implicit Generative Model)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1433&quot; data-origin-height=&quot;717&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/btAknp/dJMcaio8UWD/YiGRRagD3q33Ed9jUV5AIK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/btAknp/dJMcaio8UWD/YiGRRagD3q33Ed9jUV5AIK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/btAknp/dJMcaio8UWD/YiGRRagD3q33Ed9jUV5AIK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbtAknp%2FdJMcaio8UWD%2FYiGRRagD3q33Ed9jUV5AIK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;628&quot; height=&quot;314&quot; data-origin-width=&quot;1433&quot; data-origin-height=&quot;717&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;a href=&quot;https://minsuksung-ai.tistory.com/12&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://minsuksung-ai.tistory.com/12&lt;/a&gt;&lt;/b&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1774108401089&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;생성모델(Generative model)이란 무엇일까?&quot; data-og-description=&quot;내일이 기말고사라서 간단하게 강의 정리도 해야해서, 오늘은 비지도학습(Unsupervised learning) 중에서 클러스터링(Clustering)과 함께 가장 대표적인 예시 중 하나인 생성모델(Generative model)에 관련해&quot; data-og-host=&quot;minsuksung-ai.tistory.com&quot; data-og-source-url=&quot;https://minsuksung-ai.tistory.com/12&quot; data-og-url=&quot;https://minsuksung-ai.tistory.com/12&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/bL88Jy/dJMb81fRSX2/HRaSVXcv6LdogqSvvFKDAk/img.png?width=800&amp;amp;height=403&amp;amp;face=0_0_800_403,https://scrap.kakaocdn.net/dn/btIhRc/dJMb87NVGWF/dY7UG5XTVA94u0nlkyVahK/img.png?width=800&amp;amp;height=403&amp;amp;face=0_0_800_403,https://scrap.kakaocdn.net/dn/mDJkG/dJMb85WSAs2/r1o9wTps4caKLLqMTfwskk/img.png?width=986&amp;amp;height=698&amp;amp;face=0_0_986_698&quot;&gt;&lt;a href=&quot;https://minsuksung-ai.tistory.com/12&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://minsuksung-ai.tistory.com/12&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/bL88Jy/dJMb81fRSX2/HRaSVXcv6LdogqSvvFKDAk/img.png?width=800&amp;amp;height=403&amp;amp;face=0_0_800_403,https://scrap.kakaocdn.net/dn/btIhRc/dJMb87NVGWF/dY7UG5XTVA94u0nlkyVahK/img.png?width=800&amp;amp;height=403&amp;amp;face=0_0_800_403,https://scrap.kakaocdn.net/dn/mDJkG/dJMb85WSAs2/r1o9wTps4caKLLqMTfwskk/img.png?width=986&amp;amp;height=698&amp;amp;face=0_0_986_698');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;생성모델(Generative model)이란 무엇일까?&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;내일이 기말고사라서 간단하게 강의 정리도 해야해서, 오늘은 비지도학습(Unsupervised learning) 중에서 클러스터링(Clustering)과 함께 가장 대표적인 예시 중 하나인 생성모델(Generative model)에 관련해&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;minsuksung-ai.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;3606&quot; data-start=&quot;3565&quot; data-ke-size=&quot;size26&quot;&gt;4. 명시적(Explicit) vs 암묵적(Implicit) 생성 모델&lt;/h2&gt;
&lt;h2 data-end=&quot;3624&quot; data-start=&quot;3608&quot; data-ke-size=&quot;size26&quot;&gt;4.1 명시적 생성 모델 (Explicit)&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4258&quot; data-start=&quot;3626&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3721&quot; data-start=&quot;3626&quot;&gt;확률 분포 &lt;b&gt;&lt;span&gt;&lt;span&gt;p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; 를 수식으로 명시적으로 정의하는 모델&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;407&quot; data-start=&quot;365&quot; data-ke-size=&quot;size16&quot;&gt;즉 모델이 &lt;b&gt;데이터의 likelihood를 계산할 수 있도록 설계&lt;/b&gt;된다.&lt;/p&gt;
&lt;p data-end=&quot;414&quot; data-start=&quot;409&quot; data-ke-size=&quot;size16&quot;&gt;- 예시 분포&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;464&quot; data-start=&quot;415&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;425&quot; data-start=&quot;415&quot; data-section-id=&quot;9124yl&quot;&gt;Gaussian&lt;/li&gt;
&lt;li data-end=&quot;437&quot; data-start=&quot;426&quot; data-section-id=&quot;o0kcio&quot;&gt;Bernoulli&lt;/li&gt;
&lt;li data-end=&quot;451&quot; data-start=&quot;438&quot; data-section-id=&quot;rn7rt5&quot;&gt;Multinomial&lt;/li&gt;
&lt;li data-end=&quot;464&quot; data-start=&quot;452&quot; data-section-id=&quot;omlehv&quot;&gt;VAE의 일부 구성&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4258&quot; data-start=&quot;3626&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4258&quot; data-start=&quot;3722&quot;&gt;여기서 두 분류:&lt;/li&gt;
&lt;li data-end=&quot;4258&quot; data-start=&quot;3722&quot;&gt;&lt;b&gt; (1) Tractable Explicit Model (정확 계산 가능)&lt;/b&gt; &lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3995&quot; data-start=&quot;3767&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3995&quot; data-start=&quot;3824&quot;&gt;데이터의 &lt;b&gt;정확한 확률값&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;3995&quot; data-start=&quot;3824&quot;&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)를 직접 계산할 수 있는 모델&lt;/span&gt;&amp;nbsp;&lt;/li&gt;
&lt;li data-end=&quot;3995&quot; data-start=&quot;3824&quot;&gt;대표 예: &lt;b&gt;Autoregressive Model&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;316&quot; data-origin-height=&quot;89&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/NG2ig/dJMcacI9zxy/8WZ6TW9WwhlwOPpiV3QZBk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/NG2ig/dJMcacI9zxy/8WZ6TW9WwhlwOPpiV3QZBk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/NG2ig/dJMcacI9zxy/8WZ6TW9WwhlwOPpiV3QZBk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FNG2ig%2FdJMcacI9zxy%2F8WZ6TW9WwhlwOPpiV3QZBk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;316&quot; height=&quot;89&quot; data-origin-width=&quot;316&quot; data-origin-height=&quot;89&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;761&quot; data-start=&quot;749&quot; data-section-id=&quot;3t1j2k&quot;&gt;전체 문장의 확률을&lt;/li&gt;
&lt;li data-end=&quot;788&quot; data-start=&quot;762&quot; data-section-id=&quot;1smrzop&quot;&gt;&lt;b&gt;각 토큰의 조건부 확률의 곱&lt;/b&gt;으로 분해&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;319&quot; data-origin-height=&quot;52&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/mhkmS/dJMcaiWQmQA/L4ZBG8jGCk69ziPrtuthoK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/mhkmS/dJMcaiWQmQA/L4ZBG8jGCk69ziPrtuthoK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/mhkmS/dJMcaiWQmQA/L4ZBG8jGCk69ziPrtuthoK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FmhkmS%2FdJMcaiWQmQA%2FL4ZBG8jGCk69ziPrtuthoK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;319&quot; height=&quot;52&quot; data-origin-width=&quot;319&quot; data-origin-height=&quot;52&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;894&quot; data-start=&quot;854&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;894&quot; data-start=&quot;854&quot; data-ke-size=&quot;size16&quot;&gt;각 단계의 확률을 &lt;b&gt;Cross Entropy&lt;/b&gt;로 바로 학습 가능하다.&lt;/p&gt;
&lt;p data-end=&quot;901&quot; data-start=&quot;896&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;대표 모델&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;952&quot; data-start=&quot;902&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;935&quot; data-start=&quot;902&quot; data-section-id=&quot;ue9jny&quot;&gt;Language Model (Transformer LM)&lt;/li&gt;
&lt;li data-end=&quot;941&quot; data-start=&quot;936&quot; data-section-id=&quot;1o4hez&quot;&gt;GPT&lt;/li&gt;
&lt;li data-end=&quot;952&quot; data-start=&quot;942&quot; data-section-id=&quot;py34tf&quot;&gt;PixelCNN&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;956&quot; data-start=&quot;954&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;특징&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1019&quot; data-start=&quot;958&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;983&quot; data-start=&quot;958&quot; data-section-id=&quot;b2ja63&quot;&gt;likelihood &lt;b&gt;정확 계산 가능&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;992&quot; data-start=&quot;984&quot; data-section-id=&quot;b69uuw&quot;&gt;학습 안정적&lt;/li&gt;
&lt;li data-end=&quot;1019&quot; data-start=&quot;993&quot; data-section-id=&quot;83ljpx&quot;&gt;샘플 생성은 &lt;b&gt;순차적이라 느릴 수 있음&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;(2) Intractable Explicit Model (계산 불가능 &amp;rarr; 근사 필요) &lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1081&quot; data-start=&quot;1077&quot; data-ke-size=&quot;size16&quot;&gt;여기서는&amp;nbsp;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;) 를 &lt;b&gt;수식으로는 정의하지만&lt;/b&gt;, 실제로 계산하려고 하면 &lt;b&gt;적분 때문에 계산이 불가능&lt;/b&gt;해진다. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;244&quot; data-origin-height=&quot;67&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bUgBrz/dJMcaflyFjl/6TX2BptKK9uqxbUlqyzaF1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bUgBrz/dJMcaflyFjl/6TX2BptKK9uqxbUlqyzaF1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bUgBrz/dJMcaflyFjl/6TX2BptKK9uqxbUlqyzaF1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbUgBrz%2FdJMcaflyFjl%2F6TX2BptKK9uqxbUlqyzaF1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;244&quot; height=&quot;67&quot; data-origin-width=&quot;244&quot; data-origin-height=&quot;67&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;1081&quot; data-start=&quot;1077&quot; data-ke-size=&quot;size16&quot;&gt;이 적분이 &lt;b&gt;high dimensional이라 계산 불가&lt;/b&gt;.&lt;/p&gt;
&lt;p data-end=&quot;1081&quot; data-start=&quot;1077&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1266&quot; data-start=&quot;1250&quot; data-ke-size=&quot;size16&quot;&gt;그래서 다음 방법을 사용한다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1329&quot; data-start=&quot;1268&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1301&quot; data-start=&quot;1268&quot; data-section-id=&quot;1e3eovj&quot;&gt;&lt;b&gt;ELBO (Evidence Lower Bound)&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1329&quot; data-start=&quot;1302&quot; data-section-id=&quot;1il6jn3&quot;&gt;&lt;b&gt;Variational Inference&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1332&quot; data-start=&quot;1331&quot; data-ke-size=&quot;size16&quot;&gt;즉&lt;u&gt; &lt;span style=&quot;letter-spacing: 0px;&quot;&gt;likelihood를 직접 계산하는 대신 &lt;/span&gt;&lt;b&gt;하한(lower bound)&lt;/b&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt; 을 최대화&lt;/span&gt;&lt;/u&gt;&lt;/p&gt;
&lt;p data-end=&quot;1081&quot; data-start=&quot;1077&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1391&quot; data-start=&quot;1386&quot; data-ke-size=&quot;size16&quot;&gt;대표 모델&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1453&quot; data-start=&quot;1393&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1424&quot; data-start=&quot;1393&quot; data-section-id=&quot;v07yd8&quot;&gt;VAE (Variational Autoencoder)&lt;/li&gt;
&lt;li data-end=&quot;1453&quot; data-start=&quot;1425&quot; data-section-id=&quot;1v565t1&quot;&gt;Diffusion Model (변분적 해석에서)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1457&quot; data-start=&quot;1455&quot; data-ke-size=&quot;size16&quot;&gt;특징&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1515&quot; data-start=&quot;1459&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1470&quot; data-start=&quot;1459&quot; data-section-id=&quot;19ooq39&quot;&gt;확률 모델은 존재&lt;/li&gt;
&lt;li data-end=&quot;1500&quot; data-start=&quot;1471&quot; data-section-id=&quot;yt3s4y&quot;&gt;하지만 likelihood &lt;b&gt;직접 계산 불가&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1515&quot; data-start=&quot;1501&quot; data-section-id=&quot;1kmjsus&quot;&gt;대신 &lt;b&gt;근사 학습&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;4286&quot; data-start=&quot;4260&quot; data-ke-size=&quot;size26&quot;&gt;4.2 암묵적 생성 모델 (Implicit Generative Model)&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4531&quot; data-start=&quot;4288&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4326&quot; data-start=&quot;4288&quot;&gt;여기서는 아예 &lt;b&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/b&gt;&lt;span&gt;&lt;b&gt;)&lt;/b&gt;를 명시적으로 정의하지 않는다&lt;/span&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1636&quot; data-start=&quot;1621&quot; data-section-id=&quot;15xk4sg&quot;&gt;likelihood 없음&lt;/li&gt;
&lt;li data-end=&quot;1645&quot; data-start=&quot;1637&quot; data-section-id=&quot;svq3hm&quot;&gt;확률식 없음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;4382&quot; data-start=&quot;4327&quot;&gt;대신:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4382&quot; data-start=&quot;4335&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4382&quot; data-start=&quot;4335&quot;&gt;&lt;b&gt;신경망에 모든 걸 맡기고&lt;/b&gt;, 그 네트워크가 데이터 샘플을 만들어내도록 학습.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;4531&quot; data-start=&quot;4383&quot;&gt;대표 예: &lt;b&gt;GAN(Generative Adversarial Network)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4531&quot; data-start=&quot;4433&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4482&quot; data-start=&quot;4433&quot;&gt;Generator가 정의하는 분포가 &lt;b&gt;암묵적으로&lt;/b&gt; 데이터 분포를 흉내내도록 학습.&lt;/li&gt;
&lt;li data-end=&quot;4531&quot; data-start=&quot;4485&quot;&gt;&amp;ldquo;Likelihood를 명시적으로 계산하지 않고도&amp;rdquo; 생성 모델을 학습하는 방식.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1754&quot; data-start=&quot;1734&quot; data-section-id=&quot;p81r52&quot;&gt;&lt;b&gt;Generator &lt;span&gt;&lt;span&gt;G(z)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1779&quot; data-start=&quot;1755&quot; data-section-id=&quot;lj1zn4&quot;&gt;&lt;b&gt;Discriminator &lt;span&gt;&lt;span&gt;D(x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;607&quot; data-origin-height=&quot;200&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cQjMlh/dJMcah4KxNf/qsyW64sndACHAE8DTEFFrK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cQjMlh/dJMcah4KxNf/qsyW64sndACHAE8DTEFFrK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cQjMlh/dJMcah4KxNf/qsyW64sndACHAE8DTEFFrK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcQjMlh%2FdJMcah4KxNf%2FqsyW64sndACHAE8DTEFFrK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;607&quot; height=&quot;200&quot; data-origin-width=&quot;607&quot; data-origin-height=&quot;200&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-end=&quot;4571&quot; data-start=&quot;4538&quot; data-ke-size=&quot;size26&quot;&gt;5. GAN 암묵적(Implicit) 생성 모델 대표&lt;/h2&gt;
&lt;h2 data-end=&quot;4610&quot; data-start=&quot;4573&quot; data-ke-size=&quot;size26&quot;&gt;5.1 구조: Generator vs Discriminator&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4853&quot; data-start=&quot;4612&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4728&quot; data-start=&quot;4612&quot;&gt;&lt;b&gt;Generator &lt;span&gt;&lt;span&gt;G(z)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4728&quot; data-start=&quot;4640&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4698&quot; data-start=&quot;4640&quot;&gt;랜덤 노이즈 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;z&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; (레이턴트 스페이스에서 샘플) &amp;rarr; 가짜 데이터 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;~&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 생성.&lt;/li&gt;
&lt;li data-end=&quot;4728&quot; data-start=&quot;4701&quot;&gt;예: 소리/이미지/음성 등을 생성한다고 가정.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;4853&quot; data-start=&quot;4729&quot;&gt;&lt;b&gt;Discriminator &lt;span&gt;&lt;span&gt;D(x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4853&quot; data-start=&quot;4761&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4817&quot; data-start=&quot;4761&quot;&gt;입력 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;가 &lt;b&gt;진짜(real)&lt;/b&gt; 인지 &lt;b&gt;가짜(fake)&lt;/b&gt; 인지 구분하는 이진 분류기.&lt;/li&gt;
&lt;li data-end=&quot;4853&quot; data-start=&quot;4820&quot;&gt;CNN, Transformer 등 어떤 구조든 상관없음.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5375&quot; data-start=&quot;4879&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4942&quot; data-start=&quot;4879&quot;&gt;&lt;b&gt;Generator = 위조지폐범&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4942&quot; data-start=&quot;4905&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4942&quot; data-start=&quot;4905&quot;&gt;진짜 돈과 구분 안 될 정도로 정교한 위조지폐를 만들려고 노력.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;4998&quot; data-start=&quot;4943&quot;&gt;&lt;b&gt;Discriminator = 경찰관&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4998&quot; data-start=&quot;4971&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4998&quot; data-start=&quot;4971&quot;&gt;입력된 돈이 진짜인지 가짜인지 판별하는 역할.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;5256&quot; data-start=&quot;4999&quot;&gt;&lt;b&gt;학습 과정:&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5256&quot; data-start=&quot;5010&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5066&quot; data-start=&quot;5010&quot;&gt;진짜 데이터 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&amp;nbsp;&amp;rarr; &lt;b&gt;Discriminator&lt;/b&gt;는 &lt;b&gt;1(Real)&lt;/b&gt; 이라고 맞히도록 훈련.&lt;/li&gt;
&lt;li data-end=&quot;5146&quot; data-start=&quot;5069&quot;&gt;&lt;b&gt;Generator&lt;/b&gt;가 만든 가짜 데이터 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;~&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &amp;rarr; Discriminator는 &lt;b&gt;0(Fake)&lt;/b&gt; 라고 맞히도록 훈련.&lt;/li&gt;
&lt;li data-end=&quot;5256&quot; data-start=&quot;5149&quot;&gt;동시에 &lt;b&gt;Generator&lt;/b&gt;는 &lt;b&gt;Discriminator&lt;/b&gt;를 속여서,
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5256&quot; data-start=&quot;5190&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5256&quot; data-start=&quot;5190&quot;&gt;가짜 &lt;span&gt;&lt;span&gt;x~&lt;/span&gt;&lt;/span&gt;를 넣었을 때 &lt;b&gt;Discriminator&lt;/b&gt;가 &lt;b&gt;1(Real)&lt;/b&gt; 이라고 착각하도록 학습.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;5375&quot; data-start=&quot;5258&quot;&gt;이 관계를 미니맥스 게임(minimax game)으로 표현:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5375&quot; data-start=&quot;5300&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5330&quot; data-start=&quot;5300&quot;&gt;&lt;b&gt;Discriminator&lt;/b&gt;는 정확도를 올리려고 하고,&lt;/li&gt;
&lt;li data-end=&quot;5375&quot; data-start=&quot;5333&quot;&gt;&lt;b&gt;Generator&lt;/b&gt;는 &lt;b&gt;Discriminator&lt;/b&gt;의 정확도를 떨어뜨리려고 함.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;5406&quot; data-start=&quot;5377&quot; data-ke-size=&quot;size26&quot;&gt;5.2 학습 종료 시점: D의 정확도 &amp;asymp; 50%&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5602&quot; data-start=&quot;5408&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5490&quot; data-start=&quot;5408&quot;&gt;Generator가 충분히 잘 학습되면:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5490&quot; data-start=&quot;5435&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5490&quot; data-start=&quot;5435&quot;&gt;Discriminator는 &lt;b&gt;진짜 vs 가짜를 거의 구분 못 해서 50% 확률&lt;/b&gt;로 찍게 됨.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;5602&quot; data-start=&quot;5491&quot;&gt;그 시점에서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5602&quot; data-start=&quot;5503&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5546&quot; data-start=&quot;5503&quot;&gt;Generator가 이미 &lt;b&gt;데이터 분포를 잘 흉내내는 모델&lt;/b&gt;이 된 것.&lt;/li&gt;
&lt;li data-end=&quot;5602&quot; data-start=&quot;5549&quot;&gt;이후에는 Discriminator는 버리고, &lt;b&gt;Generator만 사용해서 샘플 생성&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;5636&quot; data-start=&quot;5604&quot; data-ke-size=&quot;size26&quot;&gt;5.4 GAN은 &amp;ldquo;모델 이름&amp;rdquo;이라기보다 &amp;ldquo;로스 구조&amp;rdquo;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5795&quot; data-start=&quot;5638&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5686&quot; data-start=&quot;5638&quot;&gt;실제로는 &lt;b&gt;Adversarial Loss&lt;/b&gt;를 붙인 어떤 구조도 다 &amp;ldquo;GAN류&amp;rdquo;.&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5795&quot; data-start=&quot;5694&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5737&quot; data-start=&quot;5694&quot;&gt;특정 모듈에 &lt;b&gt;Discriminator 역할&lt;/b&gt;을 하는 신경망을 붙여서,&lt;/li&gt;
&lt;li data-end=&quot;5795&quot; data-start=&quot;5740&quot;&gt;둘을 적대적으로 학습시키는 구조 전체를 &lt;b&gt;어드버서리얼(Adversarial)&lt;/b&gt; 하다고 부름.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;5861&quot; data-start=&quot;5802&quot; data-ke-size=&quot;size26&quot;&gt;6. VAE(Variational Autoencoder): 명시적 + 근사(Variational) 기법&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666; text-align: center;&quot;&gt;- VAE는 이름은 오토인코더지만, 사실 일반적인 &amp;lsquo;Autoencoder&amp;rsquo;랑은 완전히 다른 기술&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666; text-align: center;&quot;&gt;- 일반적인 Autoencoder와 달리 &lt;b&gt;확률 기반 생성 모델&lt;/b&gt;이다 &lt;/span&gt;&lt;/p&gt;
&lt;h2 data-end=&quot;5977&quot; data-start=&quot;5938&quot; data-ke-size=&quot;size26&quot;&gt;6.1 목표: &lt;span&gt;&lt;span&gt;p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;를 직접 모델링&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6247&quot; data-start=&quot;5979&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;6037&quot; data-start=&quot;5979&quot;&gt;생성 모델링의 목적은 같음&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;207&quot; data-origin-height=&quot;47&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/2f54P/dJMcafToNa8/jgqhNo562KYO1HfbflNMhk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/2f54P/dJMcafToNa8/jgqhNo562KYO1HfbflNMhk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/2f54P/dJMcafToNa8/jgqhNo562KYO1HfbflNMhk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F2f54P%2FdJMcafToNa8%2FjgqhNo562KYO1HfbflNMhk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;207&quot; height=&quot;47&quot; data-origin-width=&quot;207&quot; data-origin-height=&quot;47&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6247&quot; data-start=&quot;5979&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;6142&quot; data-start=&quot;6038&quot;&gt;하지만 직접 모델링하기 어렵기 때문에 &lt;b&gt;잠재 변수 (latent variable) &lt;span&gt;&lt;span&gt;z&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;를 도입한다. &lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6142&quot; data-start=&quot;6063&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;497&quot; data-start=&quot;470&quot; data-section-id=&quot;1ahd8rd&quot;&gt;데이터 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 생성하는 &lt;b&gt;숨은 원인&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;530&quot; data-start=&quot;498&quot; data-section-id=&quot;m62cvi&quot;&gt;저차원 표현 (latent representation)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;256&quot; data-origin-height=&quot;66&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/k52Cv/dJMb996Ij7Z/u7IptHPW8CM4DyAqyENCD0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/k52Cv/dJMb996Ij7Z/u7IptHPW8CM4DyAqyENCD0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/k52Cv/dJMb996Ij7Z/u7IptHPW8CM4DyAqyENCD0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fk52Cv%2FdJMb996Ij7Z%2Fu7IptHPW8CM4DyAqyENCD0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;256&quot; height=&quot;66&quot; data-origin-width=&quot;256&quot; data-origin-height=&quot;66&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6247&quot; data-start=&quot;6158&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;6247&quot; data-start=&quot;6209&quot;&gt;확률 모델은 다음과 같이 표현된다.&amp;nbsp; 이 적분은 보통 &lt;b&gt;고차원이라 계산 불가능 (intractable)&lt;/b&gt; 이다.&lt;/li&gt;
&lt;li data-end=&quot;6247&quot; data-start=&quot;6209&quot;&gt;우리가 진짜 알고 싶은 건&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;후분포&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;p&amp;theta;(z∣x)&lt;span&gt;&amp;nbsp;&lt;/span&gt;즉&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;posterior distribution&lt;/b&gt;인데 이것도 직접 계산하기 어렵다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;6283&quot; data-start=&quot;6249&quot; data-ke-size=&quot;size26&quot;&gt;6.2 Variational Inference와 ELBO&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6968&quot; data-start=&quot;6285&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;6375&quot; data-start=&quot;6285&quot;&gt;posterior를 직접 계산할 수 없기 때문에 &lt;b&gt;근사 분포&lt;/b&gt;를 정의한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;102&quot; data-origin-height=&quot;43&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/TS1My/dJMcadOKpNW/PKdcUtKCXPgGKKveFVyNdk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/TS1My/dJMcadOKpNW/PKdcUtKCXPgGKKveFVyNdk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/TS1My/dJMcadOKpNW/PKdcUtKCXPgGKKveFVyNdk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FTS1My%2FdJMcadOKpNW%2FPKdcUtKCXPgGKKveFVyNdk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;102&quot; height=&quot;43&quot; data-origin-width=&quot;102&quot; data-origin-height=&quot;43&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;858&quot; data-start=&quot;845&quot; data-section-id=&quot;1vbtm9o&quot;&gt;&lt;b&gt;Encoder&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;893&quot; data-start=&quot;859&quot; data-section-id=&quot;i93c2m&quot;&gt;posterior &lt;b&gt;&lt;span&gt;&lt;span&gt;p&amp;theta;(z∣x&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;를 근사&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;442&quot; data-origin-height=&quot;42&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cLs83t/dJMcafePcUP/FSx5SwoazJmxTZWuwZAuZK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cLs83t/dJMcafePcUP/FSx5SwoazJmxTZWuwZAuZK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cLs83t/dJMcafePcUP/FSx5SwoazJmxTZWuwZAuZK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcLs83t%2FdJMcafePcUP%2FFSx5SwoazJmxTZWuwZAuZK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;442&quot; height=&quot;42&quot; data-origin-width=&quot;442&quot; data-origin-height=&quot;42&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6968&quot; data-start=&quot;6285&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;6624&quot; data-start=&quot;6525&quot;&gt;&amp;nbsp;여기서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6624&quot; data-start=&quot;6534&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1060&quot; data-start=&quot;1025&quot; data-section-id=&quot;nix0l0&quot;&gt;&lt;b&gt;ELBO&lt;/b&gt; : Evidence Lower Bound&lt;/li&gt;
&lt;li data-end=&quot;1099&quot; data-start=&quot;1061&quot; data-section-id=&quot;1xnrp4c&quot;&gt;&lt;b&gt;KL&lt;/b&gt; : Kullback&amp;ndash;Leibler divergence&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;6727&quot; data-start=&quot;6625&quot;&gt;KL 항은 항상 &amp;ge; 0이므로:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;179&quot; data-origin-height=&quot;50&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bpCUEC/dJMcaiCyHhk/7FKjXRRYCJea5U3zxx1XA1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bpCUEC/dJMcaiCyHhk/7FKjXRRYCJea5U3zxx1XA1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bpCUEC/dJMcaiCyHhk/7FKjXRRYCJea5U3zxx1XA1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbpCUEC%2FdJMcaiCyHhk%2F7FKjXRRYCJea5U3zxx1XA1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;179&quot; height=&quot;50&quot; data-origin-width=&quot;179&quot; data-origin-height=&quot;50&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ELBO는 &lt;span&gt;&lt;span&gt;log⁡p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;&amp;nbsp;의 &lt;b&gt;하한(lower bound)&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6968&quot; data-start=&quot;6285&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;6968&quot; data-start=&quot;6729&quot;&gt;VAE의 아이디어:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;6968&quot; data-start=&quot;6744&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: none;&quot; data-end=&quot;6968&quot; data-start=&quot;6845&quot;&gt;&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;643&quot; data-origin-height=&quot;175&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cEmrgR/dJMcacWEiza/7QVKLNz7b3V9gOuvk08n8K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cEmrgR/dJMcacWEiza/7QVKLNz7b3V9gOuvk08n8K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cEmrgR/dJMcacWEiza/7QVKLNz7b3V9gOuvk08n8K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcEmrgR%2FdJMcacWEiza%2F7QVKLNz7b3V9gOuvk08n8K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;643&quot; height=&quot;175&quot; data-origin-width=&quot;643&quot; data-origin-height=&quot;175&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-end=&quot;7027&quot; data-start=&quot;6970&quot; data-ke-size=&quot;size26&quot;&gt;6.3 ELBO의 두 항: Reconstruction Loss + KL Regularization&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ELBO는 두 항으로 분해된다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;464&quot; data-origin-height=&quot;54&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/lnXjD/dJMcagLyaQm/Eb0zkoTW9b6yvvPVhyWvVk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/lnXjD/dJMcagLyaQm/Eb0zkoTW9b6yvvPVhyWvVk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/lnXjD/dJMcagLyaQm/Eb0zkoTW9b6yvvPVhyWvVk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FlnXjD%2FdJMcagLyaQm%2FEb0zkoTW9b6yvvPVhyWvVk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;464&quot; height=&quot;54&quot; data-origin-width=&quot;464&quot; data-origin-height=&quot;54&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1. Reconstruction Term &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;210&quot; data-origin-height=&quot;41&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/p2LIQ/dJMcajuG6xJ/HsKESYK1CqCwrWwKNsz1y0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/p2LIQ/dJMcajuG6xJ/HsKESYK1CqCwrWwKNsz1y0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/p2LIQ/dJMcajuG6xJ/HsKESYK1CqCwrWwKNsz1y0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fp2LIQ%2FdJMcajuG6xJ%2FHsKESYK1CqCwrWwKNsz1y0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;210&quot; height=&quot;41&quot; data-origin-width=&quot;210&quot; data-origin-height=&quot;41&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;1647&quot; data-start=&quot;1639&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1647&quot; data-start=&quot;1639&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;latent&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;&lt;span&gt;z&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;로부터&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;원래 데이터&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;를 얼마나 잘 복원하는가&lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1685&quot; data-start=&quot;1649&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1658&quot; data-start=&quot;1649&quot; data-section-id=&quot;d0ggr7&quot;&gt;&lt;b&gt;MSE&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1685&quot; data-start=&quot;1659&quot; data-section-id=&quot;18eqia&quot;&gt;&lt;b&gt;Binary Cross Entropy&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1716&quot; data-start=&quot;1687&quot; data-ke-size=&quot;size16&quot;&gt;등의 reconstruction loss로 사용한다.&lt;/p&gt;
&lt;p data-end=&quot;1716&quot; data-start=&quot;1687&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2. KL Regularization &lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;213&quot; data-origin-height=&quot;42&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/6Zc18/dJMcaflyFCS/ZLIDgLOpDdLmbITai8WUKk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/6Zc18/dJMcaflyFCS/ZLIDgLOpDdLmbITai8WUKk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/6Zc18/dJMcaflyFCS/ZLIDgLOpDdLmbITai8WUKk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F6Zc18%2FdJMcaflyFCS%2FZLIDgLOpDdLmbITai8WUKk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;213&quot; height=&quot;42&quot; data-origin-width=&quot;213&quot; data-origin-height=&quot;42&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;174&quot; data-origin-height=&quot;48&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b2LVDf/dJMcagLyaQF/U8aR60Mgz83GSshLlc30Sk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b2LVDf/dJMcagLyaQF/U8aR60Mgz83GSshLlc30Sk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b2LVDf/dJMcagLyaQF/U8aR60Mgz83GSshLlc30Sk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb2LVDf%2FdJMcagLyaQF%2FU8aR60Mgz83GSshLlc30Sk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;174&quot; height=&quot;48&quot; data-origin-width=&quot;174&quot; data-origin-height=&quot;48&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;인코더가 만드는 latent 분포가 &lt;b&gt;정규분포를 따르도록 규제&lt;/b&gt; &lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-end=&quot;1906&quot; data-start=&quot;1898&quot; data-ke-size=&quot;size16&quot;&gt;실제 구현에서는 Encoder가 다음을 출력한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;125&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/n3Ja3/dJMcadVwPAg/VMOgZH24BU711ERE1dpC6k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/n3Ja3/dJMcadVwPAg/VMOgZH24BU711ERE1dpC6k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/n3Ja3/dJMcadVwPAg/VMOgZH24BU711ERE1dpC6k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fn3Ja3%2FdJMcadVwPAg%2FVMOgZH24BU711ERE1dpC6k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;279&quot; height=&quot;125&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;125&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Gaussian과 &lt;b&gt;N(0, I)&lt;/b&gt; 사이의 KL divergence를 계산해 loss에 추가한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;636&quot; data-origin-height=&quot;114&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nwjde/dJMcahDEpXO/6u4XDlkVnE8H9Kefn3KYpK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nwjde/dJMcahDEpXO/6u4XDlkVnE8H9Kefn3KYpK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nwjde/dJMcahDEpXO/6u4XDlkVnE8H9Kefn3KYpK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fnwjde%2FdJMcahDEpXO%2F6u4XDlkVnE8H9Kefn3KYpK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;636&quot; height=&quot;114&quot; data-origin-width=&quot;636&quot; data-origin-height=&quot;114&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;재구성 성능 + latent 공간 정규화&lt;/p&gt;
&lt;h2 data-end=&quot;7997&quot; data-start=&quot;7950&quot; data-ke-size=&quot;size26&quot;&gt;6.4 Reparameterization Trick (리파라미터라이제이션 트릭)&lt;/h2&gt;
&lt;p data-end=&quot;8002&quot; data-start=&quot;7999&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;8002&quot; data-start=&quot;7999&quot; data-ke-size=&quot;size16&quot;&gt;문제:&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;616&quot; data-origin-height=&quot;208&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cB0qnk/dJMcaaLjQ8I/BV8u2nFK9KPWvcLNyqC1SK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cB0qnk/dJMcaaLjQ8I/BV8u2nFK9KPWvcLNyqC1SK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cB0qnk/dJMcaaLjQ8I/BV8u2nFK9KPWvcLNyqC1SK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcB0qnk%2FdJMcaaLjQ8I%2FBV8u2nFK9KPWvcLNyqC1SK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;515&quot; height=&quot;174&quot; data-origin-width=&quot;616&quot; data-origin-height=&quot;208&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하지만:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8197&quot; data-start=&quot;8122&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8144&quot; data-start=&quot;8122&quot;&gt;&lt;b&gt;샘플링은 미분 불가능한 연산&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;8197&quot; data-start=&quot;8147&quot;&gt;gradient가 encoder로 전달되지 않음.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;8202&quot; data-start=&quot;8199&quot; data-ke-size=&quot;size16&quot;&gt;해결:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8426&quot; data-start=&quot;8203&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8329&quot; data-start=&quot;8203&quot;&gt;&lt;b&gt;표준정규 분포&lt;/b&gt;에서 샘플링한 &lt;span&gt;&lt;span&gt;ϵ&lt;/span&gt;&lt;/span&gt;을 이용해 재매개변수화:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;650&quot; data-origin-height=&quot;114&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ERZva/dJMcaivMmIA/HX1LEGkpnhPiukqrGkCBe1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ERZva/dJMcaivMmIA/HX1LEGkpnhPiukqrGkCBe1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ERZva/dJMcaivMmIA/HX1LEGkpnhPiukqrGkCBe1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FERZva%2FdJMcaivMmIA%2FHX1LEGkpnhPiukqrGkCBe1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;650&quot; height=&quot;114&quot; data-origin-width=&quot;650&quot; data-origin-height=&quot;114&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8426&quot; data-start=&quot;8203&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8426&quot; data-start=&quot;8330&quot;&gt;이렇게 하면:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8426&quot; data-start=&quot;8342&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2714&quot; data-start=&quot;2686&quot; data-section-id=&quot;zv8bs3&quot;&gt;샘플링은 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;ϵ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 에서만 발생&lt;/li&gt;
&lt;li data-end=&quot;2731&quot; data-start=&quot;2715&quot; data-section-id=&quot;17zlmrt&quot;&gt;나머지는 &lt;b&gt;선형 연산&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;8430&quot; data-start=&quot;8428&quot; data-ke-size=&quot;size16&quot;&gt;즉:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8553&quot; data-start=&quot;8431&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8481&quot; data-start=&quot;8431&quot;&gt;&amp;ldquo;평균이 &lt;span&gt;&lt;span&gt;&amp;mu;&lt;/span&gt;&lt;/span&gt;, 분산이 &lt;span&gt;&lt;span&gt;&amp;sigma;&lt;/span&gt;&lt;/span&gt;인 Gaussian에서 샘플링&amp;rdquo;을&lt;/li&gt;
&lt;li data-end=&quot;8517&quot; data-start=&quot;8482&quot;&gt;&amp;ldquo;표준정규 분포에서 샘플링 + 선형변환&amp;rdquo; 으로 바꾸는 트릭.&lt;/li&gt;
&lt;li data-end=&quot;8553&quot; data-start=&quot;8518&quot;&gt;이것이 &lt;b&gt;Reparameterization Trick&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;따라서 &lt;b&gt;backpropagation 가능,&lt;/b&gt; 이 트릭이 중요한 이유:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8643&quot; data-start=&quot;8569&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8580&quot; data-start=&quot;8569&quot;&gt;VAE뿐 아니라,&lt;/li&gt;
&lt;li data-end=&quot;8643&quot; data-start=&quot;8581&quot;&gt;&lt;b&gt;Diffusion 모델&lt;/b&gt;에서도 평균&amp;middot;분산을 계속 다루기 때문에&lt;br /&gt;이 아이디어가 반복적으로 사용됨.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;8659&quot; data-start=&quot;8645&quot; data-ke-size=&quot;size26&quot;&gt;6.5 학습 후 사용&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8876&quot; data-start=&quot;8661&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8876&quot; data-start=&quot;8661&quot;&gt;학습이 끝나면:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8876&quot; data-start=&quot;8674&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8713&quot; data-start=&quot;8674&quot;&gt;인코더 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;q&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;ϕ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;z&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;는 &lt;b&gt;버릴 수 있음&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;8783&quot; data-start=&quot;8716&quot;&gt;&lt;span&gt;&lt;span&gt;z&lt;/span&gt;&lt;span&gt;&amp;sim;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;N&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;0&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;I&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt; &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; (또는 적절한 선형변환)으로 본 뒤&lt;/li&gt;
&lt;li data-end=&quot;8876&quot; data-start=&quot;8827&quot;&gt;디코더 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&amp;sim;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;z&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt; 에 넣어 &lt;b&gt;새로운 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 생성&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;8925&quot; data-start=&quot;8883&quot; data-ke-size=&quot;size26&quot;&gt;7. 언어 모델 (Autoregressive Model)&lt;/h2&gt;
&lt;p data-end=&quot;147&quot; data-start=&quot;112&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- 명시적(Explicit) + Tractable 생성 모델 &lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;147&quot; data-start=&quot;112&quot; data-ke-size=&quot;size16&quot;&gt;언어 모델은 &lt;b&gt;오토리그레시브 분해(autoregressive factorization)&lt;/b&gt; 를 사용해 문장 전체의 확률을 계산한다.&lt;/p&gt;
&lt;h2 data-end=&quot;8944&quot; data-start=&quot;8927&quot; data-ke-size=&quot;size26&quot;&gt;7.1 오토리그레시브 분해&lt;/h2&gt;
&lt;p data-end=&quot;8952&quot; data-start=&quot;8946&quot; data-ke-size=&quot;size16&quot;&gt;문장 예시:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;8984&quot; data-start=&quot;8953&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;8967&quot; data-start=&quot;8953&quot;&gt;입력: &amp;ldquo;넌 누구야?&amp;rdquo;&lt;/li&gt;
&lt;li data-end=&quot;8984&quot; data-start=&quot;8968&quot;&gt;출력: &amp;ldquo;나는 챗GPT야&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;300&quot; data-start=&quot;289&quot; data-ke-size=&quot;size16&quot;&gt;문장을 토큰으로 보면&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;[x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;3]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-end=&quot;348&quot; data-start=&quot;323&quot; data-ke-size=&quot;size16&quot;&gt;언어 모델은 문장의 확률을 다음처럼 분해한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;400&quot; data-origin-height=&quot;58&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/R5xf6/dJMcahDEtKD/XQuaQPBrdl6DkNxz3BJZn1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/R5xf6/dJMcahDEtKD/XQuaQPBrdl6DkNxz3BJZn1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/R5xf6/dJMcahDEtKD/XQuaQPBrdl6DkNxz3BJZn1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FR5xf6%2FdJMcahDEtKD%2FXQuaQPBrdl6DkNxz3BJZn1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;58&quot; data-origin-width=&quot;400&quot; data-origin-height=&quot;58&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9276&quot; data-start=&quot;9129&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9211&quot; data-start=&quot;9129&quot;&gt;각각의 조건부 분포:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9211&quot; data-start=&quot;9145&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9158&quot; data-start=&quot;9145&quot;&gt;&amp;ldquo;나는&amp;rdquo;이 나올 확률&lt;/li&gt;
&lt;li data-end=&quot;9182&quot; data-start=&quot;9161&quot;&gt;&amp;ldquo;나는&amp;rdquo; 다음에 &amp;ldquo;챗&amp;rdquo;이 나올 확률&lt;/li&gt;
&lt;li data-end=&quot;9211&quot; data-start=&quot;9185&quot;&gt;&amp;ldquo;나는 챗&amp;rdquo; 다음에 &amp;ldquo;GPT야&amp;rdquo;가 나올 확률&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;9276&quot; data-start=&quot;9212&quot;&gt;이걸 모두 &lt;b&gt;Cross Entropy&lt;/b&gt;로 학습:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9276&quot; data-start=&quot;9245&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9276&quot; data-start=&quot;9245&quot;&gt;결국 &lt;b&gt;분류 문제의 연속&lt;/b&gt;처럼 푸는 것.&lt;/li&gt;
&lt;li data-end=&quot;9276&quot; data-start=&quot;9245&quot;&gt;각 단계는 &lt;b&gt;Vocabulary 전체에 대한 softmax 분류 문제&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; &lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;) =&amp;gt; &lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;) =&amp;gt; &lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;h2 data-end=&quot;9304&quot; data-start=&quot;9278&quot; data-ke-size=&quot;size26&quot;&gt;7.2 Tractable 모델인 이유&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9596&quot; data-start=&quot;9306&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9410&quot; data-start=&quot;9306&quot;&gt;오토리그레시브 모델은 각 step에서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9410&quot; data-start=&quot;9332&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9368&quot; data-start=&quot;9332&quot;&gt;Vocabulary에 대한 softmax 확률을 내기 때문에,&lt;/li&gt;
&lt;li data-end=&quot;9410&quot; data-start=&quot;9371&quot;&gt;그 문장/토큰 시퀀스가 나올 &lt;b&gt;정확한 확률&lt;/b&gt;을 계산할 수 있음.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;310&quot; data-origin-height=&quot;86&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/qhea3/dJMcaaq1seF/bFgb4XajEN3fwaBKzzzK71/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/qhea3/dJMcaaq1seF/bFgb4XajEN3fwaBKzzzK71/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/qhea3/dJMcaaq1seF/bFgb4XajEN3fwaBKzzzK71/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fqhea3%2FdJMcaaq1seF%2FbFgb4XajEN3fwaBKzzzK71%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;310&quot; height=&quot;86&quot; data-origin-width=&quot;310&quot; data-origin-height=&quot;86&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9596&quot; data-start=&quot;9306&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9596&quot; data-start=&quot;9411&quot;&gt;이걸 이용해서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9596&quot; data-start=&quot;9424&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9596&quot; data-start=&quot;9424&quot;&gt;&amp;ldquo;GPT 스타일 텍스트인지, 사람이 쓴 텍스트인지&amp;rdquo; 구분하는 &lt;b&gt;GPT Detector&lt;/b&gt; 같은 것도 만들 수 있음:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9596&quot; data-start=&quot;9496&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9512&quot; data-start=&quot;9496&quot;&gt;어떤 문장이 주어졌을 때,&lt;/li&gt;
&lt;li data-end=&quot;9554&quot; data-start=&quot;9517&quot;&gt;언어모델이 부여하는 확률이 너무 높다 &amp;rarr; AI가 쓴 걸로 의심.&lt;/li&gt;
&lt;li data-end=&quot;9596&quot; data-start=&quot;9559&quot;&gt;특정 구간에서 확률이 이상하게 낮다 &amp;rarr; 사람이 쓴 흔적일 수도.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Explicit + Tractable 생성 모델 &lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;h2 data-end=&quot;9651&quot; data-start=&quot;9603&quot; data-ke-size=&quot;size26&quot;&gt;8. Diffusion Model&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;노이즈 &amp;rarr; 데이터로 복원하는 과정&lt;/b&gt;을 학습하는 생성 모델이다.&lt;/p&gt;
&lt;h2 data-end=&quot;9689&quot; data-start=&quot;9653&quot; data-ke-size=&quot;size26&quot;&gt;8.1 Forward Process: 깨부수기(노이즈 추가)&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9967&quot; data-start=&quot;9691&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9884&quot; data-start=&quot;9691&quot;&gt;원본 이미지 &lt;span&gt;&lt;span&gt;x0&lt;/span&gt;&lt;/span&gt;가 있을 때,
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9884&quot; data-start=&quot;9717&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9837&quot; data-start=&quot;9717&quot;&gt;매 스텝마다 Gaussian 노이즈를 조금씩 더해&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;9884&quot; data-start=&quot;9840&quot;&gt;&lt;span&gt;&lt;span&gt;xT&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;가 되면 &lt;b&gt;원본을 전혀 알아볼 수 없는 순수 노이즈&lt;/b&gt;가 됨.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;256&quot; data-origin-height=&quot;46&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cqhOz4/dJMcaio3E6E/EUwjqyxQr9b29RgTPS1mY1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cqhOz4/dJMcaio3E6E/EUwjqyxQr9b29RgTPS1mY1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cqhOz4/dJMcaio3E6E/EUwjqyxQr9b29RgTPS1mY1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcqhOz4%2FdJMcaio3E6E%2FEUwjqyxQr9b29RgTPS1mY1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;256&quot; height=&quot;46&quot; data-origin-width=&quot;256&quot; data-origin-height=&quot;46&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9967&quot; data-start=&quot;9691&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9967&quot; data-start=&quot;9885&quot;&gt;이 과정은:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;9967&quot; data-start=&quot;9896&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;9967&quot; data-start=&quot;9896&quot;&gt;각 step의 변환이 &lt;b&gt;정규분포 수식으로 명시&lt;/b&gt;되어 있어&lt;br /&gt;코드로 쉽게 구현 가능 (Randn + 계수 조합).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;126&quot; data-origin-height=&quot;49&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bAXFpa/dJMcacWEmde/SbcqPdQpOLq0njQT9uYRB0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bAXFpa/dJMcacWEmde/SbcqPdQpOLq0njQT9uYRB0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bAXFpa/dJMcacWEmde/SbcqPdQpOLq0njQT9uYRB0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbAXFpa%2FdJMcacWEmde%2FSbcqPdQpOLq0njQT9uYRB0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;126&quot; height=&quot;49&quot; data-origin-width=&quot;126&quot; data-origin-height=&quot;49&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-end=&quot;10005&quot; data-start=&quot;9969&quot; data-ke-size=&quot;size26&quot;&gt;8.2 Reverse Process: 정리하기(노이즈 제거)&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10269&quot; data-start=&quot;10007&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10097&quot; data-start=&quot;10007&quot;&gt;우리가 원하는 것은 반대 방향&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;223&quot; data-origin-height=&quot;42&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bh1wEP/dJMcaf6USuv/JNIqtjm3O71fxTy6k3kEbK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bh1wEP/dJMcaf6USuv/JNIqtjm3O71fxTy6k3kEbK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bh1wEP/dJMcaf6USuv/JNIqtjm3O71fxTy6k3kEbK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbh1wEP%2FdJMcaf6USuv%2FJNIqtjm3O71fxTy6k3kEbK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;223&quot; height=&quot;42&quot; data-origin-width=&quot;223&quot; data-origin-height=&quot;42&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10269&quot; data-start=&quot;10007&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10140&quot; data-start=&quot;10098&quot;&gt;즉, &lt;b&gt;순수 노이즈에서 시작해 데이터로 가는 과정&lt;/b&gt;을 학습&lt;/li&gt;
&lt;li data-end=&quot;10269&quot; data-start=&quot;10141&quot;&gt;문제:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10269&quot; data-start=&quot;10149&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10200&quot; data-start=&quot;10149&quot;&gt;&lt;span&gt;&lt;span&gt;p(xt&amp;minus;1∣xt) &lt;/span&gt;&lt;/span&gt;같은 역전달 분포는 &lt;b&gt;직접 알 수 없음&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;10269&quot; data-start=&quot;10203&quot;&gt;그래서 여기에 &lt;b&gt;신경망&lt;/b&gt;을 써서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10269&quot; data-start=&quot;10230&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10269&quot; data-start=&quot;10230&quot;&gt;각 step에서 &lt;b&gt;제거해야 할 노이즈&lt;/b&gt;를 예측하도록 학습시킨다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;92&quot; data-origin-height=&quot;60&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/0sHzx/dJMcagSjlNJ/hV8koHMImvUv24zbNruKDk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/0sHzx/dJMcagSjlNJ/hV8koHMImvUv24zbNruKDk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/0sHzx/dJMcagSjlNJ/hV8koHMImvUv24zbNruKDk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F0sHzx%2FdJMcagSjlNJ%2FhV8koHMImvUv24zbNruKDk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;92&quot; height=&quot;60&quot; data-origin-width=&quot;92&quot; data-origin-height=&quot;60&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;10294&quot; data-start=&quot;10271&quot; data-ke-size=&quot;size26&quot;&gt;8.3 &amp;lsquo;아이 방 어질러 놓기&amp;rsquo;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10589&quot; data-start=&quot;10296&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10356&quot; data-start=&quot;10296&quot;&gt;Forward:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10356&quot; data-start=&quot;10309&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10356&quot; data-start=&quot;10309&quot;&gt;아이가 방에서 장난쳐서 물건을 마구 어질러서 &lt;b&gt;쓰레기장(노이즈)&lt;/b&gt; 상태가 됨.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;10483&quot; data-start=&quot;10357&quot;&gt;Reverse:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10483&quot; data-start=&quot;10370&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10409&quot; data-start=&quot;10370&quot;&gt;부모(모델)는 &amp;ldquo;원래 방이 어땠는지 한 번 본 적이 있다&amp;rdquo;고 가정.&lt;/li&gt;
&lt;li data-end=&quot;10483&quot; data-start=&quot;10412&quot;&gt;그 기억(트레이닝 데이터)을 활용해서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10483&quot; data-start=&quot;10440&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10483&quot; data-start=&quot;10440&quot;&gt;현재 쓰레기장 상태에서 &lt;b&gt;조금씩 정리해 나가며&lt;/b&gt; 원래 방처럼 만들어감.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;10589&quot; data-start=&quot;10484&quot;&gt;완벽히 똑같은 방이 아니라:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10589&quot; data-start=&quot;10504&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10539&quot; data-start=&quot;10504&quot;&gt;원래 방과는 &lt;b&gt;다르지만 그럴싸한 새로운 방&lt;/b&gt;이 되는 것.&lt;/li&gt;
&lt;li data-end=&quot;10589&quot; data-start=&quot;10542&quot;&gt;즉, &lt;b&gt;train set과는 다르지만 같은 분포에서 나온 새로운 샘플 생성&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;10628&quot; data-start=&quot;10591&quot; data-ke-size=&quot;size26&quot;&gt;8.4 Diffusion과 VAE/Variational의 연결&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10963&quot; data-start=&quot;10630&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10684&quot; data-start=&quot;10630&quot;&gt;Diffusion의 목적도 결국: &lt;span&gt;&lt;span&gt;&lt;span&gt;max⁡log⁡p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;10757&quot; data-start=&quot;10685&quot;&gt;이 목표에서 출발해 Variational 방식으로 유도하면,
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10757&quot; data-start=&quot;10723&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10757&quot; data-start=&quot;10723&quot;&gt;VAE 때처럼 &lt;b&gt;ELBO 형태의 학습 목적&lt;/b&gt;이 나온다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;10877&quot; data-start=&quot;10758&quot;&gt;이 유도 과정에서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10877&quot; data-start=&quot;10773&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10817&quot; data-start=&quot;10773&quot;&gt;&lt;b&gt;Forward process(노이즈 추가)는 명시적인 Gaussian 수식.&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;10877&quot; data-start=&quot;10820&quot;&gt;&lt;b&gt;Reverse process는 신경망이 &amp;ldquo;얼마나 노이즈를 지워야 하는지&amp;rdquo;를 예측하도록 학습.&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;10963&quot; data-start=&quot;10878&quot;&gt;그래서:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;10963&quot; data-start=&quot;10887&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;10963&quot; data-start=&quot;10887&quot;&gt;&lt;b&gt;Reparameterization Trick&lt;/b&gt;, Gaussian 평균&amp;middot;분산 다루기 등이&lt;br /&gt;Diffusion에서도 핵심.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;10990&quot; data-start=&quot;10970&quot; data-ke-size=&quot;size26&quot;&gt;9. 정리&lt;/h2&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;12246&quot; data-start=&quot;10992&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;11105&quot; data-start=&quot;10992&quot;&gt;&lt;b&gt;생성 vs 판별&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;11105&quot; data-start=&quot;11011&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11058&quot; data-start=&quot;11011&quot;&gt;판별 (Discriminative Model) :&lt;b&gt; &lt;span&gt;&lt;span&gt;p&amp;theta;(y∣x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;를 학습 &amp;rarr; 분류/검출/세그멘테이션.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11058&quot; data-start=&quot;11011&quot;&gt;입력 &lt;span&gt;&lt;span&gt;xx&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;가 주어졌을 때 &lt;b&gt;라벨 &lt;span&gt;&lt;span&gt;yy&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; 의 확률을 학습&lt;/li&gt;
&lt;li data-end=&quot;11058&quot; data-start=&quot;11011&quot;&gt;사용 분야
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;298&quot; data-start=&quot;233&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;254&quot; data-start=&quot;233&quot; data-section-id=&quot;19ng1pf&quot;&gt;분류 (classification)&lt;/li&gt;
&lt;li data-end=&quot;274&quot; data-start=&quot;255&quot; data-section-id=&quot;izzycz&quot;&gt;객체 검출 (detection)&lt;/li&gt;
&lt;li data-end=&quot;298&quot; data-start=&quot;275&quot; data-section-id=&quot;beegq7&quot;&gt;세그멘테이션 (segmentation)&lt;/li&gt;
&lt;/ul&gt;
예시 모델
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;374&quot; data-start=&quot;307&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;330&quot; data-start=&quot;307&quot; data-section-id=&quot;1rx6rqf&quot;&gt;Logistic Regression&lt;/li&gt;
&lt;li data-end=&quot;349&quot; data-start=&quot;331&quot; data-section-id=&quot;7pdyno&quot;&gt;CNN classifier&lt;/li&gt;
&lt;li data-end=&quot;374&quot; data-start=&quot;350&quot; data-section-id=&quot;9kbsjw&quot;&gt;&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;11105&quot; data-start=&quot;11062&quot;&gt;생성 (Generative Model) :&lt;b&gt; &lt;span&gt;&lt;span&gt;p&amp;theta;(x)&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;를 학습 &amp;rarr; 데이터 분포 자체를 모델링.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;11225&quot; data-start=&quot;11107&quot;&gt;&lt;b&gt;MLE &amp;amp; Likelihood&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;11225&quot; data-start=&quot;11134&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11180&quot; data-start=&quot;11134&quot;&gt;생성 모델의 기본 목표는 데이터가 나올 &amp;ldquo; &lt;b&gt;Likelihood&lt;/b&gt; &amp;rdquo;을 최대화하는 파라미터 &lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;를 찾는 것.&lt;/li&gt;
&lt;li data-end=&quot;11180&quot; data-start=&quot;11134&quot;&gt;데이터가 생성될 확률이 가장 높아지도록 파라미터를 찾는다&lt;/li&gt;
&lt;li data-end=&quot;11225&quot; data-start=&quot;11184&quot;&gt;Likelihood는 확률분포가 아니라, &lt;b&gt;파라미터에 대한 함수&lt;/b&gt;.&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;700&quot; data-start=&quot;683&quot; data-section-id=&quot;l7dkc6&quot;&gt;&lt;span&gt;&lt;span&gt;x &lt;/span&gt;&lt;/span&gt;: 고정된 데이터&lt;/li&gt;
&lt;li data-end=&quot;718&quot; data-start=&quot;701&quot; data-section-id=&quot;dmqc7v&quot;&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; : 변수&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;11477&quot; data-start=&quot;11227&quot;&gt;&lt;b&gt;명시적 vs 암묵적 생성 모델&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;11477&quot; data-start=&quot;11254&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11433&quot; data-start=&quot;11254&quot;&gt;명시적 (Explicit) : &lt;span&gt;&lt;span&gt;p&amp;theta;(x&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 수식으로 명시 (Gaussian, 오토리그레시브 LM, VAE, Diffusion 등).
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;11433&quot; data-start=&quot;11333&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11374&quot; data-start=&quot;11333&quot;&gt;&lt;b&gt;Tractable&lt;/b&gt;: 언어모델처럼 정확한 &lt;b&gt;likelihood&lt;/b&gt; 계산 가능. (GPT)&lt;/li&gt;
&lt;li data-end=&quot;11433&quot; data-start=&quot;11380&quot;&gt;&lt;b&gt;Approximate&lt;/b&gt;: Likelihood 계산 불가능 &amp;rarr; &lt;b&gt;근사&lt;/b&gt; , VAE, Diffusion처럼 ELBO/Variational로 근사.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;11477&quot; data-start=&quot;11437&quot;&gt;암묵적 (Implicit) : GAN처럼 모델이 정의하는 분포를 수식으로 직접 쓰지 않음.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11477&quot; data-start=&quot;11437&quot;&gt;샘플 생성 과정만 학습 (GAN)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;11634&quot; data-start=&quot;11479&quot;&gt;&lt;b&gt;GAN&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;11634&quot; data-start=&quot;11493&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11534&quot; data-start=&quot;11493&quot;&gt;Generator vs Discriminator의 &lt;b&gt;적대적 학습&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;11601&quot; data-start=&quot;11538&quot;&gt;Discriminator 정확도 &amp;asymp; 50%가 되면 Generator가 분포를 잘 흉내내고 있다고 볼 수 있음.&lt;/li&gt;
&lt;li data-end=&quot;11634&quot; data-start=&quot;11605&quot;&gt;Adversarial Loss 구조 자체가 핵심.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;11797&quot; data-start=&quot;11636&quot;&gt;&lt;b&gt;VAE&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;11797&quot; data-start=&quot;11650&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11705&quot; data-start=&quot;11650&quot;&gt;&lt;span&gt;&lt;span&gt;log⁡p&amp;theta;(x&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 직접 최적화하기 어렵기 때문에 &lt;b&gt;ELBO&lt;/b&gt;를 최대화.&lt;/li&gt;
&lt;li data-end=&quot;11734&quot; data-start=&quot;11709&quot;&gt;ELBO = 재구성 손실 + KL 정규화.&lt;/li&gt;
&lt;li data-end=&quot;11797&quot; data-start=&quot;11738&quot;&gt;Reparameterization Trick으로 샘플링 과정까지 미분 가능하게 만들어 backprop.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;11980&quot; data-start=&quot;11799&quot;&gt;&lt;b&gt;Diffusion&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;11980&quot; data-start=&quot;11819&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;11858&quot; data-start=&quot;11819&quot;&gt;Forward: 데이터에 점점 노이즈 추가 &amp;rarr; pure noise.&lt;/li&gt;
&lt;li data-end=&quot;11923&quot; data-start=&quot;11862&quot;&gt;Reverse: 노이즈에서 출발해 노이즈를 조금씩 지워가며 데이터를 복원하는 과정을 &lt;b&gt;신경망이 학습&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;11980&quot; data-start=&quot;11927&quot;&gt;Variational 관점에서 유도하면, VAE와 비슷한 형태의 목적식(ELBO)이 나온다.&lt;b&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;생성 모델링, 특히 VAE&amp;middot;Diffusion을 제대로 이해하려면:&lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;12246&quot; data-start=&quot;11996&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: none;&quot; data-end=&quot;12143&quot; data-start=&quot;11996&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;12143&quot; data-start=&quot;12040&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;12143&quot; data-start=&quot;12040&quot;&gt;&lt;b&gt;MLE, Likelihood, Bayes, Variational Inference, KL, Reparameterization&lt;/b&gt; 등을&lt;br /&gt;연결해서 보는 관점이 필수.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/129</guid>
      <comments>https://c0mputermaster.tistory.com/129#entry129comment</comments>
      <pubDate>Sat, 22 Nov 2025 01:14:07 +0900</pubDate>
    </item>
    <item>
      <title>[Object Tracking] Visual Object Tracking (VOT) 알아보기 (Distance Learning)</title>
      <link>https://c0mputermaster.tistory.com/119</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot;&gt;이번 포스팅에서는&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;b&gt;Visual&amp;nbsp;Object&amp;nbsp;Tracking&amp;nbsp;(VOT)&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot;&gt;에 대해 다뤄보았다&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;141&quot; data-start=&quot;122&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;Visual&amp;nbsp;Object&amp;nbsp;Tracking&amp;nbsp;(VOT)&lt;/b&gt;&lt;/h2&gt;
&lt;p data-end=&quot;296&quot; data-start=&quot;143&quot; data-ke-size=&quot;size16&quot;&gt;Visual Object Tracking (VOT)은 비디오 상에서 특정 객체(Target Object)의 움직임을 &lt;b&gt;지속적으로 추적하는 기술&lt;/b&gt;이다.&lt;br /&gt;즉, 비디오의 각 프레임에서 목표 객체의 위치를 지속적으로 예측하고, 시간의 흐름에 따라 그 궤적을 추적한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1123&quot; data-origin-height=&quot;605&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rMjJ4/dJMcajAwLIC/9m0STLHN0dOPOmFQmRV8D1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rMjJ4/dJMcajAwLIC/9m0STLHN0dOPOmFQmRV8D1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rMjJ4/dJMcajAwLIC/9m0STLHN0dOPOmFQmRV8D1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrMjJ4%2FdJMcajAwLIC%2F9m0STLHN0dOPOmFQmRV8D1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;726&quot; height=&quot;391&quot; data-origin-width=&quot;1123&quot; data-origin-height=&quot;605&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;397&quot; data-start=&quot;298&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;347&quot; data-start=&quot;298&quot;&gt;&lt;b&gt;Single Object Tracking (SOT)&lt;/b&gt; : 하나의 객체를 추적&lt;/li&gt;
&lt;li data-end=&quot;397&quot; data-start=&quot;348&quot;&gt;&lt;b&gt;Multi Object Tracking (MOT)&lt;/b&gt; : 여러 객체를 동시에 추적&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;443&quot; data-start=&quot;399&quot; data-ke-size=&quot;size16&quot;&gt;이번 포스팅에서는 &lt;b&gt;Single Object Tracking&lt;/b&gt;을 중심으로 다룬다.&lt;/p&gt;
&lt;p data-end=&quot;443&quot; data-start=&quot;399&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;491&quot; data-start=&quot;450&quot; data-ke-size=&quot;size26&quot;&gt;2. Object Detection vs Object Tracking&lt;/h2&gt;
&lt;div&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1181&quot; data-origin-height=&quot;557&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ky5Qt/dJMcai9soBr/0Ungb0kjGWEo5RE8wnrxk0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ky5Qt/dJMcai9soBr/0Ungb0kjGWEo5RE8wnrxk0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ky5Qt/dJMcai9soBr/0Ungb0kjGWEo5RE8wnrxk0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fky5Qt%2FdJMcai9soBr%2F0Ungb0kjGWEo5RE8wnrxk0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;330&quot; data-origin-width=&quot;1181&quot; data-origin-height=&quot;557&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;

&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 210px;&quot; border=&quot;1&quot; data-end=&quot;1045&quot; data-start=&quot;493&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;Object Detection&lt;/td&gt;
&lt;td&gt;Object Tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt; &lt;b&gt;목적&lt;/b&gt; &lt;/b&gt;&lt;/td&gt;
&lt;td&gt;한 장의 이미지 내에서 객체의 위치와 클래스 식별&lt;/td&gt;
&lt;td&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;비디오에서 객체의&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;시간적 변화와 일관성 유지&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;713&quot; data-start=&quot;655&quot;&gt;
&lt;td style=&quot;height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;667&quot; data-start=&quot;655&quot;&gt;&lt;b&gt;입력 단위&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;692&quot; data-start=&quot;667&quot; data-col-size=&quot;sm&quot;&gt;Single Frame (정적인 이미지)&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;713&quot; data-start=&quot;692&quot; data-col-size=&quot;sm&quot;&gt;Video (연속된 Frame)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;796&quot; data-start=&quot;714&quot;&gt;
&lt;td style=&quot;height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;723&quot; data-start=&quot;714&quot;&gt;&lt;b&gt;출력&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;752&quot; data-start=&quot;723&quot; data-col-size=&quot;sm&quot;&gt;Bounding Box + Class Label&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;796&quot; data-start=&quot;752&quot; data-col-size=&quot;sm&quot;&gt;연속된 Frame 상의 &lt;b&gt;객체 ID + Bounding Box 궤적&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;868&quot; data-start=&quot;797&quot;&gt;
&lt;td style=&quot;height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;809&quot; data-start=&quot;797&quot;&gt;&lt;b&gt;학습 내용&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;834&quot; data-start=&quot;809&quot; data-col-size=&quot;sm&quot;&gt;객체의 &lt;b&gt;시멘틱 정보 (의미)&lt;/b&gt; 학습&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;868&quot; data-start=&quot;834&quot; data-col-size=&quot;sm&quot;&gt;객체 간 &lt;b&gt;관계(Correspondence)&lt;/b&gt; 학습&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;915&quot; data-start=&quot;869&quot;&gt;
&lt;td style=&quot;height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;892&quot; data-start=&quot;869&quot;&gt;&lt;b&gt;Temporal Context&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;897&quot; data-start=&quot;892&quot; data-col-size=&quot;sm&quot;&gt;없음&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;915&quot; data-start=&quot;897&quot; data-col-size=&quot;sm&quot;&gt;있음 (시간적 문맥 고려)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;995&quot; data-start=&quot;916&quot;&gt;
&lt;td style=&quot;height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;929&quot; data-start=&quot;916&quot;&gt;&lt;b&gt;관심 포인트&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;963&quot; data-start=&quot;929&quot; data-col-size=&quot;sm&quot;&gt;정확도(Accuracy) &amp;amp; 효율성(Efficiency)&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;995&quot; data-start=&quot;963&quot; data-col-size=&quot;sm&quot;&gt;ID 일관성, Occlusion(가림), 이동 추적&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;1045&quot; data-start=&quot;996&quot;&gt;
&lt;td style=&quot;height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;1008&quot; data-start=&quot;996&quot;&gt;&lt;b&gt;응용 분야&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;1021&quot; data-start=&quot;1008&quot; data-col-size=&quot;sm&quot;&gt;정적인 이미지 분석&lt;/td&gt;
&lt;td style=&quot;height: 21px;&quot; data-end=&quot;1045&quot; data-start=&quot;1021&quot; data-col-size=&quot;sm&quot;&gt;비디오 기반 궤적 추적 및 행동 분석&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Object Detection은 &amp;ldquo;무엇이 어디에 있는가&amp;rdquo;를, &lt;/span&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Object Tracking은 &amp;ldquo;이전 프레임의 객체가 다음 프레임에서 어디로 이동했는가&amp;rdquo;를 다룬다.&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p data-end=&quot;1154&quot; data-start=&quot;1047&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;1186&quot; data-start=&quot;1161&quot; data-ke-size=&quot;size26&quot;&gt;Object Tracking의 개념&lt;/h2&gt;
&lt;p data-end=&quot;1289&quot; data-start=&quot;1188&quot; data-ke-size=&quot;size16&quot;&gt;비디오가 진행될 때, 동일한 객체를 &lt;b&gt;시간적으로 연결(Association)&lt;/b&gt; 해야 한다. 즉, 프레임 간 객체의 &lt;b&gt;Identity(정체성)&lt;/b&gt; 를 유지하는 것이 핵심이다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1154&quot; data-origin-height=&quot;646&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cXWQfh/dJMcaihjFk8/wSjyxrTThqmBVnk6UwtYO0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cXWQfh/dJMcaihjFk8/wSjyxrTThqmBVnk6UwtYO0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cXWQfh/dJMcaihjFk8/wSjyxrTThqmBVnk6UwtYO0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcXWQfh%2FdJMcaihjFk8%2FwSjyxrTThqmBVnk6UwtYO0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;780&quot; height=&quot;437&quot; data-origin-width=&quot;1154&quot; data-origin-height=&quot;646&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;1289&quot; data-start=&quot;1188&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1355&quot; data-start=&quot;1291&quot; data-ke-size=&quot;size16&quot;&gt;Tracking은 매 프레임마다 새롭게 검출된 객체들이 이전 프레임의 어떤 객체와 동일한지를 판단하는 과정이다.&lt;/p&gt;
&lt;p data-end=&quot;1380&quot; data-start=&quot;1357&quot; data-ke-size=&quot;size16&quot;&gt;이때 중요한 구성요소는 다음 두 가지이다:&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;1604&quot; data-start=&quot;1382&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;1503&quot; data-start=&quot;1382&quot;&gt;&lt;b&gt;Appearance Representation (외형 표현)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1503&quot; data-start=&quot;1428&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1458&quot; data-start=&quot;1428&quot;&gt;객체의 외형 정보를 어떤 방식으로 표현할 것인가&lt;/li&gt;
&lt;li data-end=&quot;1503&quot; data-start=&quot;1462&quot;&gt;RGB, Histogram, Feature Descriptor 등 사용&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1604&quot; data-start=&quot;1504&quot;&gt;&lt;b&gt;Data Association (데이터 연관성)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1604&quot; data-start=&quot;1543&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1578&quot; data-start=&quot;1543&quot;&gt;이전 프레임의 객체와 현재 프레임의 객체를 매칭하는 과정&lt;/li&gt;
&lt;li data-end=&quot;1604&quot; data-start=&quot;1582&quot;&gt;유사도, 거리, 확률 기반 매칭 수행&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 data-end=&quot;1650&quot; data-start=&quot;1611&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;1. Appearance Representation (외형 표현)&lt;/b&gt;&lt;/h2&gt;
&lt;p data-end=&quot;3466&quot; data-start=&quot;3432&quot; data-ke-size=&quot;size16&quot;&gt;Tracking의 주요 난제 (Challenges)&lt;/p&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-end=&quot;3765&quot; data-start=&quot;3468&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody data-end=&quot;3765&quot; data-start=&quot;3496&quot;&gt;
&lt;tr data-end=&quot;3543&quot; data-start=&quot;3496&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;3522&quot; data-start=&quot;3496&quot;&gt;&lt;b&gt;Deformation (형태 변화)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;3543&quot; data-start=&quot;3522&quot; data-col-size=&quot;sm&quot;&gt;객체의 포즈나 모습이 계속 바뀜&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;3592&quot; data-start=&quot;3544&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;3565&quot; data-start=&quot;3544&quot;&gt;&lt;b&gt;Occlusion (가림)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;3592&quot; data-start=&quot;3565&quot; data-col-size=&quot;sm&quot;&gt;객체가 다른 물체에 의해 일시적으로 가려짐&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;3638&quot; data-start=&quot;3593&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;3620&quot; data-start=&quot;3593&quot;&gt;&lt;b&gt;Fast Motion (급격한 이동)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;3638&quot; data-start=&quot;3620&quot; data-col-size=&quot;sm&quot;&gt;짧은 시간에 큰 이동 발생&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;3703&quot; data-start=&quot;3639&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;3673&quot; data-start=&quot;3639&quot;&gt;&lt;b&gt;Illumination Change (조명 변화)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;3703&quot; data-start=&quot;3673&quot; data-col-size=&quot;sm&quot;&gt;빛의 세기와 색 변화로 Appearance 변화&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;3765&quot; data-start=&quot;3704&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;3737&quot; data-start=&quot;3704&quot;&gt;&lt;b&gt;Scale / Rotation Variation&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;3765&quot; data-start=&quot;3737&quot; data-col-size=&quot;sm&quot;&gt;크기나 방향 변화에 대한 불변성 확보 어려움&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
이러한 이유로 Object Tracking은 &lt;b&gt;컴퓨터 비전 분야에서 가장 어려운 태스크 중 하나&lt;/b&gt;로 꼽힌다.&lt;/div&gt;
&lt;h3 data-end=&quot;1901&quot; data-start=&quot;1883&quot; data-ke-size=&quot;size23&quot;&gt;대표적인 전통적 특징 표현&lt;/h3&gt;
&lt;div&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;723&quot; data-origin-height=&quot;192&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vCWaQ/dJMcacVHH42/h4DAQ0ejmkPy1fC1XnS2ZK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vCWaQ/dJMcacVHH42/h4DAQ0ejmkPy1fC1XnS2ZK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vCWaQ/dJMcacVHH42/h4DAQ0ejmkPy1fC1XnS2ZK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FvCWaQ%2FdJMcacVHH42%2Fh4DAQ0ejmkPy1fC1XnS2ZK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;723&quot; height=&quot;192&quot; data-origin-width=&quot;723&quot; data-origin-height=&quot;192&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;br /&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-end=&quot;2179&quot; data-start=&quot;1903&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody data-end=&quot;2179&quot; data-start=&quot;1931&quot;&gt;
&lt;tr data-end=&quot;1975&quot; data-start=&quot;1931&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;1955&quot; data-start=&quot;1931&quot;&gt;&lt;b&gt;Histogram (히스토그램)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;1975&quot; data-start=&quot;1955&quot; data-col-size=&quot;sm&quot;&gt;픽셀 값의 분포를 확률로 표현&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;2049&quot; data-start=&quot;1976&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;2020&quot; data-start=&quot;1976&quot;&gt;&lt;b&gt;HOG (Histogram of Oriented Gradients)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;2049&quot; data-start=&quot;2020&quot; data-col-size=&quot;sm&quot;&gt;Gradient 방향 분포를 이용한 형태 특징&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;2122&quot; data-start=&quot;2050&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;2097&quot; data-start=&quot;2050&quot;&gt;&lt;b&gt;SIFT (Scale-Invariant Feature Transform)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;2122&quot; data-start=&quot;2097&quot; data-col-size=&quot;sm&quot;&gt;크기&amp;middot;회전에 불변한 키포인트 기반 특징&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;2179&quot; data-start=&quot;2123&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;2147&quot; data-start=&quot;2123&quot;&gt;&lt;b&gt;Optical Flow (광류)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;2179&quot; data-start=&quot;2147&quot; data-col-size=&quot;sm&quot;&gt;픽셀의 움직임 벡터를 계산하여 모션 기반 추적 수행&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;h2 data-end=&quot;2212&quot; data-start=&quot;2186&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;2. Association (연관성 판단)&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;562&quot; data-origin-height=&quot;219&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zifpk/dJMcakzq2No/a4VsKhqe9IPuwAXLHLLkH1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zifpk/dJMcakzq2No/a4VsKhqe9IPuwAXLHLLkH1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zifpk/dJMcakzq2No/a4VsKhqe9IPuwAXLHLLkH1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fzifpk%2FdJMcakzq2No%2Fa4VsKhqe9IPuwAXLHLLkH1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;562&quot; height=&quot;219&quot; data-origin-width=&quot;562&quot; data-origin-height=&quot;219&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2305&quot; data-start=&quot;2214&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2305&quot; data-start=&quot;2214&quot; data-ke-size=&quot;size16&quot;&gt;추적할 객체(Template 또는 Target)가 주어졌을 때, 다음 프레임의 후보 영역(Candidates) 중에서 &lt;b&gt;가장 유사한 객체를 찾는 과정&lt;/b&gt;이다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2427&quot; data-start=&quot;2307&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2335&quot; data-start=&quot;2307&quot;&gt;후보 중 &lt;b&gt;유사도가 가장 높은 영역&lt;/b&gt;을 선택&lt;/li&gt;
&lt;li data-end=&quot;2380&quot; data-start=&quot;2336&quot;&gt;여러 객체 추적 시, 각 객체에 ID를 부여하고 &lt;b&gt;Matching&lt;/b&gt; 수행&lt;/li&gt;
&lt;li data-end=&quot;2427&quot; data-start=&quot;2381&quot;&gt;필요 시 &lt;b&gt;속도, 위치 제약 조건(Constraint)&lt;/b&gt; 을 추가할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;2503&quot; data-start=&quot;2429&quot; data-ke-size=&quot;size16&quot;&gt;또한, 배경(Background)과의 차이를 계산하여 타겟과 배경의 Appearance 차이를 활용한 Matching도 가능하다.&lt;/p&gt;
&lt;p data-end=&quot;2503&quot; data-start=&quot;2429&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-end=&quot;2503&quot; data-start=&quot;2429&quot; data-ke-size=&quot;size20&quot;&gt;전통적인 방법 vs 딥러닝 방법&lt;/h4&gt;
&lt;h2 data-end=&quot;2562&quot; data-start=&quot;2510&quot; data-ke-size=&quot;size26&quot;&gt;1. 전통적 방법: Histogram Back Projection + Mean Shift&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1084&quot; data-origin-height=&quot;563&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/OG5Oh/dJMcaap3bmm/HAz7541amdGLRZBKJ7kgE0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/OG5Oh/dJMcaap3bmm/HAz7541amdGLRZBKJ7kgE0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/OG5Oh/dJMcaap3bmm/HAz7541amdGLRZBKJ7kgE0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FOG5Oh%2FdJMcaap3bmm%2FHAz7541amdGLRZBKJ7kgE0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;777&quot; height=&quot;404&quot; data-origin-width=&quot;1084&quot; data-origin-height=&quot;563&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 data-end=&quot;2597&quot; data-start=&quot;2564&quot; data-ke-size=&quot;size23&quot;&gt;(1) Histogram Back Projection&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2729&quot; data-start=&quot;2599&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2644&quot; data-start=&quot;2599&quot;&gt;첫 번째 프레임에서 타겟 객체를 선택하여 &lt;b&gt;픽셀 값 히스토그램&lt;/b&gt;을 구함&lt;/li&gt;
&lt;li data-end=&quot;2688&quot; data-start=&quot;2645&quot;&gt;각 픽셀의 값에 해당하는 확률로 영상을 다시 매핑 &amp;rarr; &amp;ldquo;확률 맵&amp;rdquo; 생성&lt;/li&gt;
&lt;li data-end=&quot;2729&quot; data-start=&quot;2689&quot;&gt;이 확률 맵에서 밝을수록(확률이 높을수록) 객체가 존재할 가능성이 큼&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;2773&quot; data-start=&quot;2731&quot; data-ke-size=&quot;size16&quot;&gt;즉, 타겟의 색 분포를 기반으로 비디오 전체에서 유사 영역을 찾는 방법.&lt;/p&gt;
&lt;h3 data-end=&quot;2815&quot; data-start=&quot;2775&quot; data-ke-size=&quot;size23&quot;&gt;(2) Mean Shift (모드 시킹, Mode Seeking)&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2952&quot; data-start=&quot;2817&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2864&quot; data-start=&quot;2817&quot;&gt;히스토그램 백프로젝션 결과에서 &lt;b&gt;확률 밀도가 높은 영역으로 윈도우를 이동&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;2905&quot; data-start=&quot;2865&quot;&gt;프레임마다 밝은 영역(확률이 높은 영역)으로 이동하며 객체를 추적&lt;/li&gt;
&lt;li data-end=&quot;2952&quot; data-start=&quot;2906&quot;&gt;&amp;ldquo;모드(최빈값)&amp;rdquo;를 향해 이동하므로 &lt;b&gt;클러스터링 기반 최적화&lt;/b&gt;로 볼 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3085&quot; data-start=&quot;2954&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Mean Shift Tracking (CamShift 포함)&lt;/b&gt;&lt;br /&gt;Appearance = 히스토그램, Association = Mean Shift 이동 &amp;rarr; 대표적인 전통적 Visual Object Tracking 방법&lt;/p&gt;
&lt;p data-end=&quot;3085&quot; data-start=&quot;2954&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;3127&quot; data-start=&quot;3092&quot; data-ke-size=&quot;size26&quot;&gt;2. 딥러닝 기반 Visual Object Tracking&lt;/h2&gt;
&lt;p data-end=&quot;3230&quot; data-start=&quot;3129&quot; data-ke-size=&quot;size16&quot;&gt;전통적 방식은 Appearance 변화에 취약하다. 그래서 &lt;b&gt;Neural Network&lt;/b&gt;를 사용하여 &lt;b&gt;Feature Representation(표현 학습)&lt;/b&gt; 을 수행한다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3362&quot; data-start=&quot;3232&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3293&quot; data-start=&quot;3232&quot;&gt;&lt;b&gt;Color Intensity&lt;/b&gt; 대신 &lt;b&gt;Feature Embedding&lt;/b&gt;을 학습하여 구분력 향상&lt;/li&gt;
&lt;li data-end=&quot;3326&quot; data-start=&quot;3294&quot;&gt;유사 객체가 많은 장면에서도 올바른 타겟 구분 가능&lt;/li&gt;
&lt;li data-end=&quot;3362&quot; data-start=&quot;3327&quot;&gt;Object의 시멘틱 의미보다는 &amp;ldquo;일관된 표현 유지&amp;rdquo;에 초점&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3425&quot; data-start=&quot;3364&quot; data-ke-size=&quot;size16&quot;&gt;즉, &lt;b&gt;딥러닝 기반 Tracker&lt;/b&gt;는 객체 간 &lt;b&gt;일관성 있는 표현&lt;/b&gt;을 학습하여 강건한 추적을 수행한다.&lt;/p&gt;
&lt;p data-end=&quot;2503&quot; data-start=&quot;2429&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;2503&quot; data-start=&quot;2429&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt; Distance Learning / Similarity Learning&lt;/b&gt;&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;딥러닝을 활용한 object Tracking 방법에는 거리 유사도를 활용하는 방법이 있다.&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;484&quot; data-start=&quot;213&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;385&quot; data-start=&quot;213&quot;&gt;&lt;b&gt;Distance Learning (거리 학습)&lt;/b&gt; 혹은 &lt;b&gt;Similarity Learning (유사도 학습)&lt;/b&gt;&lt;br /&gt;- 두 데이터 간의 &lt;b&gt;거리(distance)&lt;/b&gt; 또는 &lt;b&gt;유사도(similarity)&lt;/b&gt; 를 학습하는 개념.&lt;br /&gt;- 사도와 거리는 서로 &lt;b&gt;역수 관계&lt;/b&gt;로 이해할 수 있다.&lt;/li&gt;
&lt;li data-end=&quot;484&quot; data-start=&quot;387&quot;&gt;단순히 &amp;ldquo;이 데이터가 어떤 클래스에 속하는가?&amp;rdquo;를 학습하는 &lt;b&gt;classification&lt;/b&gt; 문제가 아니라, &lt;b&gt;두 데이터가 얼마나 유사한가&lt;/b&gt;를 학습하는 문제다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 원샷 러닝 (One-Shot Learning) &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1015&quot; data-origin-height=&quot;569&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zd0WN/dJMcaa4EKVB/RC3mSsRMkWiW6EMAJNTVjK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zd0WN/dJMcaa4EKVB/RC3mSsRMkWiW6EMAJNTVjK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zd0WN/dJMcaa4EKVB/RC3mSsRMkWiW6EMAJNTVjK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fzd0WN%2FdJMcaa4EKVB%2FRC3mSsRMkWiW6EMAJNTVjK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;727&quot; height=&quot;408&quot; data-origin-width=&quot;1015&quot; data-origin-height=&quot;569&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;897&quot; data-start=&quot;733&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;568&quot; data-start=&quot;531&quot;&gt;&lt;b&gt;클래스마다 단 한 장의 이미지&lt;/b&gt;만 존재하는 학습 문제.&lt;/li&gt;
&lt;li data-end=&quot;617&quot; data-start=&quot;569&quot;&gt;기존 분류기(classifier)는 다수의 데이터를 요구하므로, 학습이 어렵다.&lt;/li&gt;
&lt;li data-end=&quot;720&quot; data-start=&quot;618&quot;&gt;예시: &lt;b&gt;Omniglot Dataset&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;720&quot; data-start=&quot;649&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;668&quot; data-start=&quot;649&quot;&gt;여러 고대 문자들로 구성됨.&lt;/li&gt;
&lt;li data-end=&quot;720&quot; data-start=&quot;671&quot;&gt;20-way one-shot &amp;rarr; 클래스가 20개이고, 클래스마다 이미지 1장만 존재.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;897&quot; data-start=&quot;857&quot;&gt;인간은 별도의 학습 없이도 유사도 계산을 통해 원샷 러닝을 수행한다. 즉, &lt;b&gt;유사도 판단 능력&lt;/b&gt;을 자연스럽게 수행한다.&lt;/li&gt;
&lt;li data-end=&quot;897&quot; data-start=&quot;857&quot;&gt;원 샷 러닝은 인간처럼 &amp;ldquo;이거랑 이게 비슷하다&amp;rdquo;를 &lt;b&gt;AI가 학습하도록 만드는 것&lt;/b&gt;이 목표이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;970&quot; data-origin-height=&quot;534&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/FU79X/dJMcahipjCj/hFRCgKsPK6siWXQ0o0clkk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/FU79X/dJMcahipjCj/hFRCgKsPK6siWXQ0o0clkk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/FU79X/dJMcahipjCj/hFRCgKsPK6siWXQ0o0clkk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FFU79X%2FdJMcahipjCj%2FhFRCgKsPK6siWXQ0o0clkk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;651&quot; height=&quot;358&quot; data-origin-width=&quot;970&quot; data-origin-height=&quot;534&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1064&quot; data-start=&quot;996&quot;&gt;AI는 시각적 외형이 아니라, &lt;b&gt;뉴럴 네트워크를 통해 latent space (잠재 공간)&lt;/b&gt; 에 데이터를 매핑한다.&lt;/li&gt;
&lt;li data-end=&quot;1118&quot; data-start=&quot;1065&quot;&gt;이 공간에서는 &lt;b&gt;유사한 데이터끼리 가깝게, 다른 데이터끼리는 멀게&lt;/b&gt; 위치하도록 학습한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;1162&quot; data-start=&quot;1125&quot; data-ke-size=&quot;size26&quot;&gt;Latent Space와 Feature Embedding&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1408&quot; data-start=&quot;1164&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1227&quot; data-start=&quot;1164&quot;&gt;입력 데이터는 &lt;b&gt;공유된 파라미터를 가진 모델&lt;/b&gt;(shared weights)을 통해 잠재공간에 매핑된다.&lt;/li&gt;
&lt;li data-end=&quot;1289&quot; data-start=&quot;1228&quot;&gt;모델은 입력 데이터를 받아 &lt;b&gt;유사도 계산이 용이한 벡터(feature vector)&lt;/b&gt; 로 변환한다.&lt;/li&gt;
&lt;li data-end=&quot;1347&quot; data-start=&quot;1290&quot;&gt;이 공간에서 &lt;b&gt;가까운 벡터 = 유사한 데이터&lt;/b&gt;, &lt;b&gt;먼 벡터 = 다른 데이터&lt;/b&gt;를 의미한다.&lt;/li&gt;
&lt;li data-end=&quot;1408&quot; data-start=&quot;1348&quot;&gt;즉, Classification이 아니라 &lt;b&gt;Embedding + Similarity 계산 문제&lt;/b&gt;이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt; Siamese Network 구조 &lt;/b&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;961&quot; data-origin-height=&quot;514&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bUfmJ3/dJMcaaQ7zLT/BOHh9iaRoqrQbjgk7gVoSK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bUfmJ3/dJMcaaQ7zLT/BOHh9iaRoqrQbjgk7gVoSK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bUfmJ3/dJMcaaQ7zLT/BOHh9iaRoqrQbjgk7gVoSK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbUfmJ3%2FdJMcaaQ7zLT%2FBOHh9iaRoqrQbjgk7gVoSK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;783&quot; height=&quot;419&quot; data-origin-width=&quot;961&quot; data-origin-height=&quot;514&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;두 개의 동일한 네트워크가 파라미터를 공유하는 구조이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: none;&quot; data-end=&quot;1610&quot; data-start=&quot;1498&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1610&quot; data-start=&quot;1556&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1585&quot; data-start=&quot;1556&quot;&gt;F: 동일한 뉴럴 네트워크 (CNN, MLP 등)&lt;/li&gt;
&lt;li data-end=&quot;1610&quot; data-start=&quot;1588&quot;&gt;h1, h2: 두 입력의 임베딩 벡터&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1673&quot; data-start=&quot;1611&quot;&gt;두 벡터 간의 &lt;b&gt;유사도(distance)&lt;/b&gt; 를 계산해, positive/negative 관계를 학습한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Contrastive Learning (대조 학습) &lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1102&quot; data-origin-height=&quot;512&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cl5pax/dJMcaaXS8GJ/8htJAgZFOBlcUetkYVBbN1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cl5pax/dJMcaaXS8GJ/8htJAgZFOBlcUetkYVBbN1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cl5pax/dJMcaaXS8GJ/8htJAgZFOBlcUetkYVBbN1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcl5pax%2FdJMcaaXS8GJ%2F8htJAgZFOBlcUetkYVBbN1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;720&quot; height=&quot;335&quot; data-origin-width=&quot;1102&quot; data-origin-height=&quot;512&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1859&quot; data-start=&quot;1716&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1757&quot; data-start=&quot;1716&quot;&gt;&lt;b&gt;Positive pair:&lt;/b&gt; 같은 클래스 (유사한 두 데이터)&lt;/li&gt;
&lt;li data-end=&quot;1798&quot; data-start=&quot;1758&quot;&gt;&lt;b&gt;Negative pair:&lt;/b&gt; 다른 클래스 (비유사한 두 데이터)&lt;/li&gt;
&lt;li data-end=&quot;1859&quot; data-start=&quot;1799&quot;&gt;목표:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1859&quot; data-start=&quot;1807&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1832&quot; data-start=&quot;1807&quot;&gt;Positive pair는 &lt;b&gt;가깝게&lt;/b&gt;,&lt;/li&gt;
&lt;li data-end=&quot;1859&quot; data-start=&quot;1835&quot;&gt;Negative pair는 &lt;b&gt;멀게&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2100&quot; data-start=&quot;1889&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1913&quot; data-start=&quot;1889&quot;&gt;Positive pair &amp;rarr; 타깃 1, Negative pair &amp;rarr; 타깃 0으로 하여 Binary Cross Entropy (BCE)를 사용하는 방법이 있겠고 그게 아니라면 Contrastive Loss를 사용하는 방법이 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Contrastive Loss&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;608&quot; data-origin-height=&quot;150&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/8SAFC/dJMcake8hXF/21TqTDGuiXp8vdNkwEPV4K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/8SAFC/dJMcake8hXF/21TqTDGuiXp8vdNkwEPV4K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/8SAFC/dJMcake8hXF/21TqTDGuiXp8vdNkwEPV4K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F8SAFC%2FdJMcake8hXF%2F21TqTDGuiXp8vdNkwEPV4K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;608&quot; height=&quot;150&quot; data-origin-width=&quot;608&quot; data-origin-height=&quot;150&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2422&quot; data-start=&quot;2371&quot;&gt;Positive pair: &amp;rarr; 두 벡터 거리를 최소화 (Pull together)&lt;/li&gt;
&lt;li data-end=&quot;2474&quot; data-start=&quot;2423&quot;&gt;Negative pair: &amp;rarr; 두 벡터를 &amp;epsilon; 이상 떨어뜨림 (Push apart)&lt;/li&gt;
&lt;li data-end=&quot;2544&quot; data-start=&quot;2475&quot;&gt;반복:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2544&quot; data-start=&quot;2490&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2619&quot; data-start=&quot;2585&quot;&gt;Positive pair &amp;rarr; 가까워지도록 loss 감소&lt;/li&gt;
&lt;li data-end=&quot;2660&quot; data-start=&quot;2620&quot;&gt;Negative pair &amp;rarr; &amp;epsilon; 이상 멀어질 때까지 loss 유지&lt;/li&gt;
&lt;li data-end=&quot;2701&quot; data-start=&quot;2661&quot;&gt;이후 거리가 충분히 벌어지면 gradient가 0 &amp;rarr; 업데이트 중단.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Face Recognition 예제 &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1051&quot; data-origin-height=&quot;508&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/uB5OL/dJMcaeMKL8U/ZOzGoPkN0y7u2CtvB5kN5K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/uB5OL/dJMcaeMKL8U/ZOzGoPkN0y7u2CtvB5kN5K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/uB5OL/dJMcaeMKL8U/ZOzGoPkN0y7u2CtvB5kN5K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FuB5OL%2FdJMcaeMKL8U%2FZOzGoPkN0y7u2CtvB5kN5K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;765&quot; height=&quot;370&quot; data-origin-width=&quot;1051&quot; data-origin-height=&quot;508&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2783&quot; data-start=&quot;2748&quot; data-ke-size=&quot;size16&quot;&gt;얼굴 인식을 &lt;b&gt;classification 문제로 학습&lt;/b&gt;하면,&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2828&quot; data-start=&quot;2786&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2814&quot; data-start=&quot;2786&quot;&gt;새로운 사람이 추가될 때마다 모델 재학습 필요.&lt;/li&gt;
&lt;li data-end=&quot;2828&quot; data-start=&quot;2817&quot;&gt;실용성이 떨어짐.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;2862&quot; data-start=&quot;2830&quot; data-ke-size=&quot;size16&quot;&gt;- Distance Learning 기반으로 극복&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3039&quot; data-start=&quot;2863&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2894&quot; data-start=&quot;2863&quot;&gt;DB에는 각 사람의 &lt;b&gt;대표 얼굴 한 장&lt;/b&gt;만 저장.&lt;/li&gt;
&lt;li data-end=&quot;2991&quot; data-start=&quot;2895&quot;&gt;새 입력 이미지가 들어오면:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2991&quot; data-start=&quot;2915&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2936&quot; data-start=&quot;2915&quot;&gt;CNN을 통해 임베딩 벡터로 변환.&lt;/li&gt;
&lt;li data-end=&quot;2968&quot; data-start=&quot;2939&quot;&gt;DB에 저장된 임베딩 벡터들과 &lt;b&gt;거리 계산&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;2991&quot; data-start=&quot;2971&quot;&gt;가장 가까운 벡터 = 동일 인물.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;3039&quot; data-start=&quot;2992&quot;&gt;새로운 인물이 추가되면 &lt;b&gt;다시 학습할 필요 없이&lt;/b&gt; DB에 벡터만 추가하면 됨.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-end=&quot;3090&quot; data-start=&quot;3076&quot; data-ke-size=&quot;size23&quot;&gt;Triplet 구조&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3163&quot; data-start=&quot;3091&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3129&quot; data-start=&quot;3091&quot;&gt;세 개의 입력 (Anchor, Positive, Negative)&lt;/li&gt;
&lt;li data-end=&quot;3163&quot; data-start=&quot;3130&quot;&gt;세 개의 동일한 CNN을 통해 각각 임베딩 벡터로 변환.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Anchor&amp;nbsp;&amp;rarr;&amp;nbsp;CNN&amp;nbsp;&amp;rarr;&amp;nbsp;f(A) &lt;br /&gt;Positive&amp;nbsp;&amp;rarr;&amp;nbsp;CNN&amp;nbsp;&amp;rarr;&amp;nbsp;f(P) &lt;br /&gt;Negative&amp;nbsp;&amp;rarr;&amp;nbsp;CNN&amp;nbsp;&amp;rarr;&amp;nbsp;f(N)&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;474&quot; data-origin-height=&quot;140&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Vxgos/dJMcacnRKg3/Hawkk4Lhqa2Qv3Nvy5Qni0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Vxgos/dJMcacnRKg3/Hawkk4Lhqa2Qv3Nvy5Qni0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Vxgos/dJMcacnRKg3/Hawkk4Lhqa2Qv3Nvy5Qni0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FVxgos%2FdJMcacnRKg3%2FHawkk4Lhqa2Qv3Nvy5Qni0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;474&quot; height=&quot;140&quot; data-origin-width=&quot;474&quot; data-origin-height=&quot;140&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3409&quot; data-start=&quot;3308&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3323&quot; data-start=&quot;3308&quot;&gt;&lt;b&gt;&amp;alpha;:&lt;/b&gt; margin&lt;/li&gt;
&lt;li data-end=&quot;3409&quot; data-start=&quot;3324&quot;&gt;목적:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3409&quot; data-start=&quot;3332&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3345&quot; data-start=&quot;3332&quot;&gt;A-P 거리는 작게,&lt;/li&gt;
&lt;li data-end=&quot;3361&quot; data-start=&quot;3348&quot;&gt;A-N 거리는 크게.&lt;/li&gt;
&lt;li data-end=&quot;3409&quot; data-start=&quot;3364&quot;&gt;단, (A,P) 거리가 (A,N) 거리보다 &lt;b&gt;&amp;alpha; 이상 작게&lt;/b&gt; 되도록 학습.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3498&quot; data-start=&quot;3422&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3498&quot; data-start=&quot;3462&quot;&gt;마진 &amp;alpha;는 &amp;ldquo;최소한 이 정도 차이는 나야 한다&amp;rdquo;는 거리 기준.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Contrastive Learning 학습 시 고려사항&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1057&quot; data-origin-height=&quot;514&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b54a6f/dJMb99SdfgB/TPM9JleKCx5mxjOmk7fRjk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b54a6f/dJMb99SdfgB/TPM9JleKCx5mxjOmk7fRjk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b54a6f/dJMb99SdfgB/TPM9JleKCx5mxjOmk7fRjk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb54a6f%2FdJMb99SdfgB%2FTPM9JleKCx5mxjOmk7fRjk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;763&quot; height=&quot;371&quot; data-origin-width=&quot;1057&quot; data-origin-height=&quot;514&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;3569&quot; data-start=&quot;3544&quot; data-ke-size=&quot;size16&quot;&gt;(1) Data Augmentation&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3668&quot; data-start=&quot;3570&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3640&quot; data-start=&quot;3570&quot;&gt;같은 이미지에서 다양한 변형을 만들어 &lt;b&gt;Positive pair 생성&lt;/b&gt;&lt;br /&gt;(예: 채도 제거, 회전, 크롭 등)&lt;/li&gt;
&lt;li data-end=&quot;3668&quot; data-start=&quot;3641&quot;&gt;변형된 두 이미지는 &lt;b&gt;같은 객체&lt;/b&gt;로 간주한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3694&quot; data-start=&quot;3670&quot; data-ke-size=&quot;size16&quot;&gt;(2) Large Batch Size&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3806&quot; data-start=&quot;3695&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3748&quot; data-start=&quot;3695&quot;&gt;한 배치 내에서 다양한 &lt;b&gt;Positive / Negative pair&lt;/b&gt; 를 구성해야 함.&lt;/li&gt;
&lt;li data-end=&quot;3787&quot; data-start=&quot;3749&quot;&gt;배치가 클수록 contrastive loss가 안정적으로 계산됨.&lt;/li&gt;
&lt;li data-end=&quot;3806&quot; data-start=&quot;3788&quot;&gt;GPU 메모리 요구량이 크다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3836&quot; data-start=&quot;3808&quot; data-ke-size=&quot;size16&quot;&gt;(3) Hard Negative Mining&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3965&quot; data-start=&quot;3837&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3881&quot; data-start=&quot;3837&quot;&gt;학습을 잘 시키려면 &lt;b&gt;어려운 Negative Sample&lt;/b&gt; 이 필요하다.&lt;/li&gt;
&lt;li data-end=&quot;3914&quot; data-start=&quot;3882&quot;&gt;예: 비슷하게 생겼지만 실제로는 다른 클래스의 이미지.&lt;/li&gt;
&lt;li data-end=&quot;3965&quot; data-start=&quot;3915&quot;&gt;쉬운 Negative보다 어려운 Negative를 사용해야 모델이 더 정교하게 학습된다.&lt;/li&gt;
&lt;/ul&gt;</description>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/119</guid>
      <comments>https://c0mputermaster.tistory.com/119#entry119comment</comments>
      <pubDate>Thu, 25 Sep 2025 17:45:09 +0900</pubDate>
    </item>
    <item>
      <title>[Segmentation] DeepLab, Mask R-CNN, PanopticFPN</title>
      <link>https://c0mputermaster.tistory.com/118</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;이번 포스팅에서는 시멘틱 세그멘테이션(Semantic Segmentation) 에 대해 저번 포스팅에 이어서, 보다 세밀한 픽셀 단위의 인식 과정을 이해하기 위해 Dilated Convolution(또는 Atrous Convolution) 과 이를 기반으로 한 대표 모델인 DeepLab 시리즈를 살펴보고, 이어지는 Mask R-CNN 과 Panoptic FPN 으로 확장되는 세그멘테이션 계열 모델들을 함께 알아보겠다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&amp;nbsp;Recap) 시멘틱 세그멘테이션(Semantic Segmentation)의 개념&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;시멘틱 세그멘테이션이란 &lt;b&gt;영상의 각 픽셀이 어떤 클래스(의미)에 속하는지를 구분&lt;/b&gt;하는 문제이다. 즉, 입력 이미지의 모든 픽셀마다 &amp;ldquo;이 픽셀이 하늘인가? 도로인가? 사람인가?&amp;rdquo;를 예측하는 과정이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이를 수행하기 위해서는 각 픽셀의 정보뿐 아니라 &lt;b&gt;주변 영역의 문맥(Context)&lt;/b&gt; 을 함께 고려해야 한다.&lt;br /&gt;&amp;rarr; 이러한 &amp;ldquo;픽셀이 주변을 얼마나 넓게 보는가&amp;rdquo;를 설명하는 개념이 바로 &lt;b&gt;Receptive Field(수용 영역)&lt;/b&gt; 이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt; Receptive Field &lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;867&quot; data-origin-height=&quot;439&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bKAGz4/dJMb87UvqI6/xrXgf2PojMUcnWuJvvufK1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bKAGz4/dJMb87UvqI6/xrXgf2PojMUcnWuJvvufK1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bKAGz4/dJMb87UvqI6/xrXgf2PojMUcnWuJvvufK1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbKAGz4%2FdJMb87UvqI6%2FxrXgf2PojMUcnWuJvvufK1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;607&quot; height=&quot;307&quot; data-origin-width=&quot;867&quot; data-origin-height=&quot;439&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;정의:&lt;/b&gt; 특정 Feature를 형성하기 위해 필요한 입력 이미지의 영역 크기.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;642&quot; data-start=&quot;540&quot;&gt;즉, 한 Feature가 &amp;ldquo;얼마만큼의 입력 픽셀을 기반으로 만들어졌는가&amp;rdquo;를 나타낸다.&lt;/li&gt;
&lt;li data-end=&quot;702&quot; data-start=&quot;643&quot;&gt;Receptive Field가 넓을수록 &amp;rarr; 더 넓은 문맥 정보를 반영한 Feature를 학습 가능.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Receptive Field를 키우는 기존 접근&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1079&quot; data-start=&quot;952&quot;&gt;&lt;b&gt;Pooling (Max/Average Pooling)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1079&quot; data-start=&quot;994&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1044&quot; data-start=&quot;994&quot;&gt;Feature Map 크기를 줄여서 Receptive Field를 간접적으로 확장.&lt;/li&gt;
&lt;li data-end=&quot;1079&quot; data-start=&quot;1048&quot;&gt;단점: 해상도 손실로 인해 디테일 정보가 사라짐.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1196&quot; data-start=&quot;1081&quot;&gt;&lt;b&gt;Strided Convolution&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1196&quot; data-start=&quot;1113&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1175&quot; data-start=&quot;1113&quot;&gt;Convolution의 stride를 키워 feature를 압축하면서 Receptive Field 확장.&lt;/li&gt;
&lt;li data-end=&quot;1196&quot; data-start=&quot;1179&quot;&gt;단점: 정보 손실 존재.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1293&quot; data-start=&quot;1198&quot;&gt;&lt;b&gt;Filter Size를 직접 키우기 (예: 3&amp;times;3 &amp;rarr; 7&amp;times;7)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1293&quot; data-start=&quot;1245&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1269&quot; data-start=&quot;1245&quot;&gt;파라미터 수와 연산량이 급격히 증가.&lt;/li&gt;
&lt;li data-end=&quot;1293&quot; data-start=&quot;1273&quot;&gt;계산비용과 과적합 위험 증가.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Dilated (Atrous) Convolution&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;940&quot; data-origin-height=&quot;449&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c511SK/dJMb9PTUqUR/EEcmo1aWE09VKcfKu6bpu1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c511SK/dJMb9PTUqUR/EEcmo1aWE09VKcfKu6bpu1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c511SK/dJMb9PTUqUR/EEcmo1aWE09VKcfKu6bpu1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc511SK%2FdJMb9PTUqUR%2FEEcmo1aWE09VKcfKu6bpu1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;649&quot; height=&quot;310&quot; data-origin-width=&quot;940&quot; data-origin-height=&quot;449&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;필터 커널 사이에 &amp;ldquo;구멍(hole)&amp;rdquo;을 뚫어 간격을 띄운 형태의 컨볼루션. 즉, &lt;b&gt;같은 수의 파라미터(같은 3&amp;times;3 필터)&lt;/b&gt; 를 사용하면서 &lt;b&gt;더 넓은 영역을 커버&lt;/b&gt;하도록 설계한 기법.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1735&quot; data-start=&quot;1590&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1631&quot; data-start=&quot;1590&quot;&gt;&lt;b&gt;D = 1:&lt;/b&gt; 일반적인 Convolution (연속된 픽셀 관찰)&lt;/li&gt;
&lt;li data-end=&quot;1670&quot; data-start=&quot;1632&quot;&gt;&lt;b&gt;D = 2:&lt;/b&gt; 한 칸씩 건너뛰며 관찰 (구멍이 하나씩 생김)&lt;/li&gt;
&lt;li data-end=&quot;1735&quot; data-start=&quot;1671&quot;&gt;&lt;b&gt;D = 3:&lt;/b&gt; 두 칸을 띄워 관찰 &amp;rarr; 중간 픽셀은 보지 않지만, Receptive Field가 넓어짐.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3&amp;times;3 필터는 여전히 9개의 파라미터를 사용하지만 실제로는 5&amp;times;5, 7&amp;times;7 수준의 넓은 영역을 관찰하는 효과를 냄. 연산량 증가 없이 Receptive Field 확장 가능.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;픽셀 단위 분류(Semantic Segmentation)는 디테일 정보와 전역 정보 모두 필요함. Pooling으로 해상도를 줄이는 대신 &lt;b&gt;Dilated Convolution&lt;/b&gt;을 사용해 &lt;b&gt;세밀한 픽셀 정보는 유지하면서 전역 문맥 정보까지 확보&lt;/b&gt; 가능.&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;DeepLab&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;807&quot; data-origin-height=&quot;452&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bi4Geo/dJMb9N9DeOW/iabk4KP17aV51S0sOQW3t1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bi4Geo/dJMb9N9DeOW/iabk4KP17aV51S0sOQW3t1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bi4Geo/dJMb9N9DeOW/iabk4KP17aV51S0sOQW3t1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbi4Geo%2FdJMb9N9DeOW%2Fiabk4KP17aV51S0sOQW3t1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;687&quot; height=&quot;385&quot; data-origin-width=&quot;807&quot; data-origin-height=&quot;452&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2627&quot; data-start=&quot;2579&quot;&gt;&lt;b&gt;Dilated Convolution을 도입한 대표적 시멘틱 세그멘테이션 모델&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;2681&quot; data-start=&quot;2628&quot;&gt;Google이 제안, 버전별로 진화:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2681&quot; data-start=&quot;2653&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2681&quot; data-start=&quot;2653&quot;&gt;DeepLab v1 &amp;rarr; v2 &amp;rarr; v3 &amp;rarr; v3+&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2752&quot; data-start=&quot;2682&quot;&gt;&lt;b&gt;기본 철학: &lt;/b&gt;Receptive Field를 넓히되, 공간 해상도 손실 없이 픽셀 단위 분류 정밀도를 높이자.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2808&quot; data-start=&quot;2782&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(1) Encoder-Decoder 구조&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2905&quot; data-start=&quot;2809&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2840&quot; data-start=&quot;2809&quot;&gt;Encoder: CNN 기반 (예: ResNet)&lt;/li&gt;
&lt;li data-end=&quot;2905&quot; data-start=&quot;2841&quot;&gt;Decoder: Bilinear Upsampling 또는 Transposed Convolution으로 크기 복원&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;2950&quot; data-start=&quot;2907&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(2) 핵심: Atrous (Dilated) Convolution 도입&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2996&quot; data-start=&quot;2951&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2996&quot; data-start=&quot;2951&quot;&gt;Receptive Field를 확장하면서 Feature의 공간적 크기는 유지.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3013&quot; data-start=&quot;2998&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(3) 결과적 문제점&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3089&quot; data-start=&quot;3014&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3089&quot; data-start=&quot;3014&quot;&gt;Upsampling만으로 복원하면 경계가 흐릿하고 Coarse한 결과 발생.&amp;rarr; 후처리(Post-processing) 필요.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; DeepLab의 후처리: Fully Connected Conditional Random Field (CRF)&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- egmentation 결과를 &lt;b&gt;정제(Refine)&lt;/b&gt; 하기 위한 별도의 후처리 단계. 신경망 학습과는 별개로 수행되며, 예측된 확률 맵(Logit Map)을 입력으로 받는다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;733&quot; data-origin-height=&quot;458&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/scopF/dJMb9Qyv8sT/SNtYwNvdgAaRL2VV6KDNQ1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/scopF/dJMb9Qyv8sT/SNtYwNvdgAaRL2VV6KDNQ1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/scopF/dJMb9Qyv8sT/SNtYwNvdgAaRL2VV6KDNQ1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FscopF%2FdJMb9Qyv8sT%2FSNtYwNvdgAaRL2VV6KDNQ1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;629&quot; height=&quot;393&quot; data-origin-width=&quot;733&quot; data-origin-height=&quot;458&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3441&quot; data-start=&quot;3283&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3363&quot; data-start=&quot;3283&quot;&gt;입력 이미지와 예측된 Segmentation Score Map을 기반으로, 픽셀 간 유사도(색상&amp;middot;위치)를 고려해 경계선을 부드럽게 맞춤.&lt;/li&gt;
&lt;li data-end=&quot;3441&quot; data-start=&quot;3364&quot;&gt;&lt;b&gt;Belief Propagation&lt;/b&gt;, &lt;b&gt;MCMC&lt;/b&gt;, &lt;b&gt;Bilateral Filter&lt;/b&gt; 등 확률 그래프 기반 알고리즘 사용.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3456&quot; data-start=&quot;3443&quot; data-ke-size=&quot;size16&quot;&gt;(2) Loss (&amp;nbsp; Unary&lt;span&gt; Term + Pairwise Term )&lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3631&quot; data-start=&quot;3457&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3541&quot; data-start=&quot;3457&quot;&gt;&lt;b&gt;Unary Term:&lt;/b&gt;&lt;br /&gt;각 픽셀의 예측 확률(딥러닝 결과값). Cross Entropy Loss와 유사한 단일 픽셀 기반 정보.&lt;/li&gt;
&lt;li data-end=&quot;3631&quot; data-start=&quot;3543&quot;&gt;&lt;b&gt;Pairwise Term:&lt;/b&gt;&lt;br /&gt;인접 픽셀 간의 관계(색상 차이, 거리 등) 고려. 가까우면서 색상 유사한 픽셀은 같은 클래스로 유도.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3669&quot; data-start=&quot;3633&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Atrous Spatial Pyramid Pooling&lt;/b&gt; (ASPP)&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;807&quot; data-origin-height=&quot;422&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cXWLnX/dJMb8WrUqNz/vF7WqI2yKoGvLeAm0k38p1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cXWLnX/dJMb8WrUqNz/vF7WqI2yKoGvLeAm0k38p1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cXWLnX/dJMb8WrUqNz/vF7WqI2yKoGvLeAm0k38p1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcXWLnX%2FdJMb8WrUqNz%2FvF7WqI2yKoGvLeAm0k38p1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;583&quot; height=&quot;305&quot; data-origin-width=&quot;807&quot; data-origin-height=&quot;422&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;서로 다른 dilation factor를 병렬로 적용해 다양한 스케일의 정보 취합. 다양한 스케일의 문맥 정보를 통합.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DeepLab v3+에서 ASPP와 전체 아키텍쳐&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;824&quot; data-origin-height=&quot;422&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Ii0R2/dJMb9QFhtQW/Wgvzg5boA9PV2QbpYyEuY1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Ii0R2/dJMb9QFhtQW/Wgvzg5boA9PV2QbpYyEuY1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Ii0R2/dJMb9QFhtQW/Wgvzg5boA9PV2QbpYyEuY1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FIi0R2%2FdJMb9QFhtQW%2FWgvzg5boA9PV2QbpYyEuY1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;678&quot; height=&quot;347&quot; data-origin-width=&quot;824&quot; data-origin-height=&quot;422&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;div&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 75.1163%;&quot; border=&quot;1&quot; data-end=&quot;4687&quot; data-start=&quot;4442&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody data-end=&quot;4687&quot; data-start=&quot;4498&quot;&gt;
&lt;tr data-end=&quot;4522&quot; data-start=&quot;4498&quot;&gt;
&lt;td style=&quot;width: 31.2791%;&quot; data-col-size=&quot;sm&quot; data-end=&quot;4509&quot; data-start=&quot;4498&quot;&gt;1&amp;times;1 Conv&lt;/td&gt;
&lt;td style=&quot;width: 9.18605%;&quot; data-end=&quot;4513&quot; data-start=&quot;4509&quot; data-col-size=&quot;sm&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;width: 34.5349%;&quot; data-end=&quot;4522&quot; data-start=&quot;4513&quot; data-col-size=&quot;sm&quot;&gt;채널 축소&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;4560&quot; data-start=&quot;4523&quot;&gt;
&lt;td style=&quot;width: 31.2791%;&quot; data-col-size=&quot;sm&quot; data-end=&quot;4534&quot; data-start=&quot;4523&quot;&gt;3&amp;times;3 Conv&lt;/td&gt;
&lt;td style=&quot;width: 9.18605%;&quot; data-end=&quot;4538&quot; data-start=&quot;4534&quot; data-col-size=&quot;sm&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;width: 34.5349%;&quot; data-end=&quot;4560&quot; data-start=&quot;4538&quot; data-col-size=&quot;sm&quot;&gt;좁은 Receptive Field&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;4599&quot; data-start=&quot;4561&quot;&gt;
&lt;td style=&quot;width: 31.2791%;&quot; data-col-size=&quot;sm&quot; data-end=&quot;4572&quot; data-start=&quot;4561&quot;&gt;3&amp;times;3 Conv&lt;/td&gt;
&lt;td style=&quot;width: 9.18605%;&quot; data-end=&quot;4577&quot; data-start=&quot;4572&quot; data-col-size=&quot;sm&quot;&gt;12&lt;/td&gt;
&lt;td style=&quot;width: 34.5349%;&quot; data-end=&quot;4599&quot; data-start=&quot;4577&quot; data-col-size=&quot;sm&quot;&gt;중간 Receptive Field&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;4638&quot; data-start=&quot;4600&quot;&gt;
&lt;td style=&quot;width: 31.2791%;&quot; data-col-size=&quot;sm&quot; data-end=&quot;4611&quot; data-start=&quot;4600&quot;&gt;3&amp;times;3 Conv&lt;/td&gt;
&lt;td style=&quot;width: 9.18605%;&quot; data-end=&quot;4616&quot; data-start=&quot;4611&quot; data-col-size=&quot;sm&quot;&gt;18&lt;/td&gt;
&lt;td style=&quot;width: 34.5349%;&quot; data-end=&quot;4638&quot; data-start=&quot;4616&quot; data-col-size=&quot;sm&quot;&gt;넓은 Receptive Field&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;4687&quot; data-start=&quot;4639&quot;&gt;
&lt;td style=&quot;width: 31.2791%;&quot; data-col-size=&quot;sm&quot; data-end=&quot;4655&quot; data-start=&quot;4639&quot;&gt;Image Pooling&lt;/td&gt;
&lt;td style=&quot;width: 9.18605%;&quot; data-end=&quot;4659&quot; data-start=&quot;4655&quot; data-col-size=&quot;sm&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;width: 34.5349%;&quot; data-end=&quot;4687&quot; data-start=&quot;4659&quot; data-col-size=&quot;sm&quot;&gt;전역 문맥(Global Context) 반영&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p data-end=&quot;4737&quot; data-start=&quot;4689&quot; data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 이 결과들을 &lt;b&gt;Concat &amp;rarr; 1&amp;times;1 Conv &amp;rarr; Feature Map 통합&lt;/b&gt;.&lt;/p&gt;
&lt;p data-end=&quot;4737&quot; data-start=&quot;4689&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Encoder&lt;/b&gt;: Dilated Conv + ASPP&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Decoder&lt;/b&gt;: Upsampling + Low-level Feature 병합&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;입력 해상도가 (H&amp;times;W)라면, 인코더 출력 Feature는 약 (H/16 &amp;times; W/16) 수준으로 축소됨.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Decoder는 이를 4배 업샘플링 후, (H/4 &amp;times; W/4) 레벨의 Feature와 Concat하여 세밀한 복원 수행.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;5041&quot; data-start=&quot;5024&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;+ 그 후 최종 출력 및 후처리로 구&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5149&quot; data-start=&quot;5043&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5080&quot; data-start=&quot;5043&quot;&gt;Softmax를 통과하여 각 픽셀별 클래스 확률 분포 생성.&lt;/li&gt;
&lt;li data-end=&quot;5104&quot; data-start=&quot;5081&quot;&gt;경계가 흐릿할 경우 CRF로 정제.&lt;/li&gt;
&lt;li data-end=&quot;5149&quot; data-start=&quot;5105&quot;&gt;반복 적용 시 점차 Ground Truth에 가까운 세그멘테이션 결과 획득.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;912&quot; data-origin-height=&quot;441&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bfaI11/dJMb9NIyFBH/4giH4b8aKcV8GYuS2K9rWk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bfaI11/dJMb9NIyFBH/4giH4b8aKcV8GYuS2K9rWk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bfaI11/dJMb9NIyFBH/4giH4b8aKcV8GYuS2K9rWk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbfaI11%2FdJMb9NIyFBH%2F4giH4b8aKcV8GYuS2K9rWk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;734&quot; height=&quot;355&quot; data-origin-width=&quot;912&quot; data-origin-height=&quot;441&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;5598&quot; data-start=&quot;5546&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;5598&quot; data-start=&quot;5546&quot; data-ke-size=&quot;size16&quot;&gt;Probabilistic Graphical Model 확률 그래프 모델 &lt;b&gt;(CRF의 배경)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;5782&quot; data-start=&quot;5600&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;5710&quot; data-start=&quot;5600&quot;&gt;CRF는 확률 그래프 모델(Probabilistic Graphical Model, PGM)의 한 종류로, &lt;b&gt;픽셀을 노드&lt;/b&gt;, &lt;b&gt;픽셀 간 관계를 엣지&lt;/b&gt;로 하는 그래프 구조로 표현한다.&lt;/li&gt;
&lt;li data-end=&quot;5782&quot; data-start=&quot;5711&quot;&gt;Fully Connected CRF는 모든 픽셀 쌍이 연결되어 있어,전역적으로 경계 정제를 수행하는 효과를 가진다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;정리&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;724&quot; data-origin-height=&quot;258&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bQWoBo/dJMb9Mv7ipk/vBR9bh7mYBh9CdtPyyp7kK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bQWoBo/dJMb9Mv7ipk/vBR9bh7mYBh9CdtPyyp7kK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bQWoBo/dJMb9Mv7ipk/vBR9bh7mYBh9CdtPyyp7kK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbQWoBo%2FdJMb9Mv7ipk%2FvBR9bh7mYBh9CdtPyyp7kK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;690&quot; height=&quot;246&quot; data-origin-width=&quot;724&quot; data-origin-height=&quot;258&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Instance Segmentation&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;시멘틱 세그멘테이션은 영상의 각 &lt;b&gt;픽셀마다 클래스(의미)를 예측&lt;/b&gt;하는 문제이다. 예를 들어 자동차가 여러 대 있는 이미지에서, 모든 자동차 픽셀은 전부 같은 &amp;ldquo;car&amp;rdquo; 클래스로 라벨링된다. 따라서 &amp;ldquo;자동차 1&amp;rdquo;, &amp;ldquo;자동차 2&amp;rdquo;처럼 &lt;b&gt;개별 객체(Instance)를 구분하지 못한다&lt;/b&gt;는 한계가 존재한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;847&quot; data-origin-height=&quot;440&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/q0WsY/dJMb9QFht6U/soK2PgpgEv6rYuTIQKiAY0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/q0WsY/dJMb9QFht6U/soK2PgpgEv6rYuTIQKiAY0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/q0WsY/dJMb9QFht6U/soK2PgpgEv6rYuTIQKiAY0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fq0WsY%2FdJMb9QFht6U%2FsoK2PgpgEv6rYuTIQKiAY0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;637&quot; height=&quot;331&quot; data-origin-width=&quot;847&quot; data-origin-height=&quot;440&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;458&quot; data-start=&quot;426&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;458&quot; data-start=&quot;426&quot; data-ke-size=&quot;size16&quot;&gt;Instance Segmentation은 &lt;b&gt;시멘틱 세그멘테이션 + 객체 인스턴스 구분(Object Instance Identification) &lt;/b&gt;각 픽셀의 클래스뿐 아니라 &lt;b&gt;&amp;ldquo;어떤 객체에 속하는 픽셀인가&amp;rdquo;&lt;/b&gt; 를 함께 예측한다.&lt;/p&gt;
&lt;p data-end=&quot;458&quot; data-start=&quot;426&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 시멘틱 세그멘테이션이 인스턴스를 구분하지 못하므로, &lt;b&gt;Object Detection 기능을 결합&lt;/b&gt;하는 것이 핵심 아이디어다. 객체 검출로 얻은 &lt;b&gt;Bounding Box&lt;/b&gt; 내부에서 픽셀 분류를 수행하면 각 객체별 마스크가 생성된다.&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;데이터셋과 첼린지로는 대표적으로 &lt;b&gt;DAVIS Challenge (Densely Annotated Video Segmentation)와 데이터셋&lt;/b&gt;이 있는데 각 객체 인스턴스별로 픽셀 단위 마스크가 제공되어 인스턴스 세그멘테이션 학습에 활용 가하고 여러 객체가 프레임마다 개별적으로 라벨링되어 있어 Tracking, Segmentation 연구의 기반이 된다.&lt;/p&gt;
&lt;h4 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;Mask R-CNN&lt;/b&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;904&quot; data-origin-height=&quot;459&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bLOEQx/dJMb9Mv7iOL/h7SpAssAKe6zZInSGDZKAk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bLOEQx/dJMb9Mv7iOL/h7SpAssAKe6zZInSGDZKAk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bLOEQx/dJMb9Mv7iOL/h7SpAssAKe6zZInSGDZKAk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbLOEQx%2FdJMb9Mv7iOL%2Fh7SpAssAKe6zZInSGDZKAk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;904&quot; height=&quot;459&quot; data-origin-width=&quot;904&quot; data-origin-height=&quot;459&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;1569&quot; data-start=&quot;1543&quot; data-ke-size=&quot;size16&quot;&gt;Mask R-CNN의 핵심 아이디어&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1720&quot; data-start=&quot;1571&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1633&quot; data-start=&quot;1571&quot;&gt;Faster R-CNN에 &lt;b&gt;Segmentation Mask Prediction Branch&lt;/b&gt;를 추가.&lt;/li&gt;
&lt;li data-end=&quot;1720&quot; data-start=&quot;1634&quot;&gt;즉, Object Detection (Faster R-CNN) + Semantic Segmentation = Instance Segmentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;837&quot; data-origin-height=&quot;371&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/JA07L/dJMb9NaIBz3/BvQbOgLb1DckEVSKKBmwk0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/JA07L/dJMb9NaIBz3/BvQbOgLb1DckEVSKKBmwk0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/JA07L/dJMb9NaIBz3/BvQbOgLb1DckEVSKKBmwk0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FJA07L%2FdJMb9NaIBz3%2FBvQbOgLb1DckEVSKKBmwk0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;724&quot; height=&quot;321&quot; data-origin-width=&quot;837&quot; data-origin-height=&quot;371&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;Classification + Bounding Box Regression + Mask Prediction까지 세 가지를 동시에 학습하는 &lt;b&gt;Multi-Task Learning&lt;/b&gt; 구조이다.&amp;nbsp;&lt;b&gt; ROIpooling 대신 ROIAlign이 들어간 형태&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Multi-Task Learning 효과&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;756&quot; data-start=&quot;679&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2032&quot; data-start=&quot;1980&quot;&gt;서로 관련성 높은 태스크를 함께 학습하면 &lt;b&gt;공유 Feature 표현력이 향상&lt;/b&gt;된다.&lt;/li&gt;
&lt;li data-end=&quot;2087&quot; data-start=&quot;2033&quot;&gt;Mask Branch 추가만으로도 Classification 성능이 향상되는 경우가 있다.&lt;/li&gt;
&lt;li data-end=&quot;2153&quot; data-start=&quot;2088&quot;&gt;이유: Segmentation Loss 학습 중 추출된 세밀한 Feature가 다른 태스크에도 도움이 되기 때문.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre id=&quot;code_1760798924720&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;입력 이미지  
   &amp;darr;  
Backbone CNN  
   &amp;darr;  
Region Proposal Network (RPN)  
   &amp;darr;  
ROI Align &amp;rarr; Feature Extraction  
   ├── Classification + Bounding Box Regression (Faster R-CNN 기능)  
   └── Mask Head (FCN 기반 Segmentation Branch)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2451&quot; data-start=&quot;2409&quot; data-ke-size=&quot;size16&quot;&gt;각 ROI마다 &lt;b&gt;14&amp;times;14&amp;times;256 Feature Map&lt;/b&gt;을 생성한 뒤&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2556&quot; data-start=&quot;2454&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2505&quot; data-start=&quot;2454&quot;&gt;Classification + Box Regression &amp;rarr; 객체 종류 및 위치 예측&lt;/li&gt;
&lt;li data-end=&quot;2556&quot; data-start=&quot;2508&quot;&gt;Mask Branch &amp;rarr; 픽셀 단위 이진 마스크 예측 (0 = 배경, 1 = 객체)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;2921&quot; data-start=&quot;2895&quot; data-ke-size=&quot;size16&quot;&gt;마스크 예측 (14&amp;times;14&amp;times;C 출력)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3062&quot; data-start=&quot;2923&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2981&quot; data-start=&quot;2923&quot;&gt;클래스 수 C 만큼 채널을 두고, 각 채널은 해당 클래스의 Binary Mask (0 또는 1).&lt;/li&gt;
&lt;li data-end=&quot;3023&quot; data-start=&quot;2982&quot;&gt;예: 사람 &amp;rarr; 14&amp;times;14 Mask, 개 &amp;rarr; 14&amp;times;14 Mask 등.&lt;/li&gt;
&lt;li data-end=&quot;3062&quot; data-start=&quot;3024&quot;&gt;Upsampling 을 통해 입력 크기 (H&amp;times;W) 에 맞게 복원.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; &lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;ROI Pooling&lt;/b&gt;&amp;nbsp; vs ROI Align&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;874&quot; data-origin-height=&quot;431&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ujPiF/dJMb8V0PZIJ/A7AqBLO8Xy8zTJDAGDWvK0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ujPiF/dJMb8V0PZIJ/A7AqBLO8Xy8zTJDAGDWvK0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ujPiF/dJMb8V0PZIJ/A7AqBLO8Xy8zTJDAGDWvK0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FujPiF%2FdJMb8V0PZIJ%2FA7AqBLO8Xy8zTJDAGDWvK0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;694&quot; height=&quot;342&quot; data-origin-width=&quot;874&quot; data-origin-height=&quot;431&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-end=&quot;2842&quot; data-start=&quot;2596&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;ROI&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt; &lt;b&gt;&lt;b&gt;Pooling&lt;/b&gt;&amp;nbsp;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt; &lt;b&gt;ROI&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt; &lt;b&gt;Align&lt;/b&gt; &lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;2724&quot; data-start=&quot;2666&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;2671&quot; data-start=&quot;2666&quot;&gt;연산&lt;/td&gt;
&lt;td data-end=&quot;2700&quot; data-start=&quot;2671&quot; data-col-size=&quot;sm&quot;&gt;Feature Map 을 정수 좌표로 강제 맞춤&lt;/td&gt;
&lt;td data-end=&quot;2724&quot; data-start=&quot;2700&quot; data-col-size=&quot;sm&quot;&gt;&lt;b&gt;부동소수점 좌표 보정 및 보간&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;2782&quot; data-start=&quot;2725&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;2731&quot; data-start=&quot;2725&quot;&gt;문제점&lt;/td&gt;
&lt;td data-end=&quot;2750&quot; data-start=&quot;2731&quot; data-col-size=&quot;sm&quot;&gt;정수 좌표 매핑 시 오차 발생&lt;/td&gt;
&lt;td data-end=&quot;2782&quot; data-start=&quot;2750&quot; data-col-size=&quot;sm&quot;&gt;위치 정확도 향상 (Interpolation 사용)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;2842&quot; data-start=&quot;2783&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;2788&quot; data-start=&quot;2783&quot;&gt;방식&lt;/td&gt;
&lt;td data-end=&quot;2802&quot; data-start=&quot;2788&quot; data-col-size=&quot;sm&quot;&gt;Max Pooling&lt;/td&gt;
&lt;td data-end=&quot;2842&quot; data-start=&quot;2802&quot; data-col-size=&quot;sm&quot;&gt;Bilinear Interpolation &amp;rarr; Max Pooling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p data-end=&quot;2888&quot; data-start=&quot;2844&quot; data-ke-size=&quot;size16&quot;&gt;&amp;rarr; &lt;b&gt;ROI Align&lt;/b&gt; 은 Mask R-CNN의 정확도 상승의 요소중 하나&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;918&quot; data-origin-height=&quot;411&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bmn4ge/dJMb9gRrQA8/G3InKYmLNL4QUsX1n5zt70/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bmn4ge/dJMb9gRrQA8/G3InKYmLNL4QUsX1n5zt70/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bmn4ge/dJMb9gRrQA8/G3InKYmLNL4QUsX1n5zt70/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbmn4ge%2FdJMb9gRrQA8%2FG3InKYmLNL4QUsX1n5zt70%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;759&quot; height=&quot;340&quot; data-origin-width=&quot;918&quot; data-origin-height=&quot;411&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2888&quot; data-start=&quot;2844&quot; data-ke-size=&quot;size16&quot;&gt;ROI Align은 &lt;b&gt;ROI Pooling에서 발생하는 좌표 quantization(정수 반올림) 오차 문제를 해결&lt;/b&gt;하기 위해 제안되었다. 즉, 원래의 feature map 상의 위치를 &lt;b&gt;소수점 단위로 보존&lt;/b&gt;하고, 그 지점의 feature 값을 &lt;b&gt;bilinear interpolation&lt;/b&gt;을 통해 더 정확히 샘플링한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;715&quot; data-origin-height=&quot;158&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/k1dbW/dJMb9PzB6wY/5vmbrcIw4zUKBcCXAwTazK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/k1dbW/dJMb9PzB6wY/5vmbrcIw4zUKBcCXAwTazK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/k1dbW/dJMb9PzB6wY/5vmbrcIw4zUKBcCXAwTazK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fk1dbW%2FdJMb9PzB6wY%2F5vmbrcIw4zUKBcCXAwTazK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;715&quot; height=&quot;158&quot; data-origin-width=&quot;715&quot; data-origin-height=&quot;158&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2888&quot; data-start=&quot;2844&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2888&quot; data-start=&quot;2844&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; RCNN 계열 정리 &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;703&quot; data-origin-height=&quot;234&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b8PRUh/dJMb9X5o6hZ/cNUYIHckKx2BEtWZAfxu0K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b8PRUh/dJMb9X5o6hZ/cNUYIHckKx2BEtWZAfxu0K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b8PRUh/dJMb9X5o6hZ/cNUYIHckKx2BEtWZAfxu0K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb8PRUh%2FdJMb9X5o6hZ%2FcNUYIHckKx2BEtWZAfxu0K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;703&quot; height=&quot;234&quot; data-origin-width=&quot;703&quot; data-origin-height=&quot;234&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;Panoptic segmentation&lt;/b&gt;&lt;/h2&gt;
&lt;p data-end=&quot;174&quot; data-start=&quot;149&quot; data-ke-size=&quot;size16&quot;&gt;- Segmentation의 발전 과정&lt;/p&gt;
&lt;p data-end=&quot;298&quot; data-start=&quot;176&quot; data-ke-size=&quot;size16&quot;&gt;Segmentation은 크게 &lt;b&gt;Semantic Segmentation&lt;/b&gt;, &lt;b&gt;Instance Segmentation&lt;/b&gt;, 그리고 이 두 가지를 결합한 &lt;b&gt;Panoptic Segmentation&lt;/b&gt;으로 발전해 왔다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;684&quot; data-origin-height=&quot;473&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/51A2X/dJMb9Pl4Ymp/N5hlvLsf2NVMxCtrRm6yq0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/51A2X/dJMb9Pl4Ymp/N5hlvLsf2NVMxCtrRm6yq0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/51A2X/dJMb9Pl4Ymp/N5hlvLsf2NVMxCtrRm6yq0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F51A2X%2FdJMb9Pl4Ymp%2FN5hlvLsf2NVMxCtrRm6yq0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;555&quot; height=&quot;384&quot; data-origin-width=&quot;684&quot; data-origin-height=&quot;473&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;298&quot; data-start=&quot;176&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;456&quot; data-start=&quot;300&quot;&gt;&lt;u&gt;&lt;b&gt;Semantic Segmentation&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;입력된 이미지의 각 픽셀을 &amp;lsquo;무엇인지(what)&amp;rsquo; 기준으로 구분한다.&lt;br /&gt;예를 들어, 하늘, 건물, 도로, 자동차 등을 분류하지만, 동일 클래스 내 개별 객체 구분(자동차1, 자동차2 등)은 하지 않는다.&lt;/li&gt;
&lt;li data-end=&quot;629&quot; data-start=&quot;458&quot;&gt;&lt;u&gt;&lt;b&gt;Instance Segmentation&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;b&gt;Countable한 객체(Things)&lt;/b&gt; &amp;mdash; 예: 사람, 자동차 등 &amp;mdash; 에 대해&lt;br /&gt;각각의 인스턴스에 &lt;b&gt;고유한 ID를 부여&lt;/b&gt;하여 구분한다. 하지만, 하늘&amp;middot;풀&amp;middot;도로 같은 Stuff(Count 불가능한 영역)은 다루지 않는다.&lt;/li&gt;
&lt;li data-end=&quot;891&quot; data-start=&quot;631&quot;&gt;&lt;u&gt;&lt;b&gt;Panoptic Segmentation&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;Semantic과 Instance Segmentation을 &lt;b&gt;통합한 형태&lt;/b&gt;이다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;891&quot; data-start=&quot;714&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;774&quot; data-start=&quot;714&quot;&gt;&lt;b&gt;Things(Countable)&lt;/b&gt; &amp;rarr; Instance Segmentation 수행 (ID 부여)&lt;/li&gt;
&lt;li data-end=&quot;891&quot; data-start=&quot;777&quot;&gt;&lt;b&gt;Stuff(Uncountable)&lt;/b&gt; &amp;rarr; Semantic Segmentation 수행 (클래스 단위)&lt;br /&gt;따라서 하늘&amp;middot;건물은 하나의 클래스 마스크로, 자동차&amp;middot;사람은 인스턴스별로 구분된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Penoptic Segmentation의 대표 모델 &amp;ndash; &lt;b&gt;Penoptic FPN&lt;/b&gt;&lt;/h4&gt;
&lt;p data-end=&quot;995&quot; data-start=&quot;953&quot; data-ke-size=&quot;size16&quot;&gt;(1) FPN(Feature Pyramid Network) Recap&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1258&quot; data-start=&quot;996&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1258&quot; data-start=&quot;1159&quot;&gt;FPN은 이러한 서로 다른 레벨의 feature들을 &lt;b&gt;multi-scale로 fusion&lt;/b&gt;하여,&lt;b&gt;정보 손실 없이 풍부한 feature map&lt;/b&gt;을 만드는 구조&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a style=&quot;background-color: #e6f5ff; color: #0070d1; text-align: start;&quot; href=&quot;https://c0mputermaster.tistory.com/118&quot;&gt;2025.09.25 - [분류 전체보기] - [Segmentation] DeepLab, Mask R-CNN, PanopticFPN&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;813&quot; data-origin-height=&quot;420&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/I71gQ/dJMb9WZJdqe/PXwLd5a1Flzs2iNSYSlZjk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/I71gQ/dJMb9WZJdqe/PXwLd5a1Flzs2iNSYSlZjk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/I71gQ/dJMb9WZJdqe/PXwLd5a1Flzs2iNSYSlZjk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FI71gQ%2FdJMb9WZJdqe%2FPXwLd5a1Flzs2iNSYSlZjk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;610&quot; height=&quot;315&quot; data-origin-width=&quot;813&quot; data-origin-height=&quot;420&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;1286&quot; data-start=&quot;1260&quot; data-ke-size=&quot;size16&quot;&gt;(2) Penoptic FPN 구조 개요&lt;/p&gt;
&lt;p data-end=&quot;1325&quot; data-start=&quot;1287&quot; data-ke-size=&quot;size16&quot;&gt;- Penoptic FPN은 FPN을 기반으로 두 개의 브랜치로 나뉜다.&lt;/p&gt;
&lt;p data-end=&quot;1365&quot; data-start=&quot;1327&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;상단 브랜치 (Instance Segmentation)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1536&quot; data-start=&quot;1366&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1390&quot; data-start=&quot;1366&quot;&gt;Mask R-CNN 구조 그대로 사용&lt;/li&gt;
&lt;li data-end=&quot;1440&quot; data-start=&quot;1391&quot;&gt;&lt;i&gt;Region Proposal Network &amp;rarr; ROI Align &amp;rarr; Mask Head&lt;/i&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1536&quot; data-start=&quot;1451&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1469&quot; data-start=&quot;1451&quot;&gt;&lt;b&gt;Class (분류)&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1500&quot; data-start=&quot;1472&quot;&gt;&lt;b&gt;Bounding Box (좌표 예측)&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1536&quot; data-start=&quot;1503&quot;&gt;&lt;b&gt;Instance Mask (픽셀 단위 마스크)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1576&quot; data-start=&quot;1538&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;하단 브랜치 (Semantic Segmentation)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1757&quot; data-start=&quot;1577&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1635&quot; data-start=&quot;1577&quot;&gt;FPN에서 추출한 shared feature를 이용하여 pixel 단위의 segmentation 수행&lt;/li&gt;
&lt;li data-end=&quot;1690&quot; data-start=&quot;1636&quot;&gt;하늘, 건물, 도로 등 &lt;b&gt;Stuff 클래스에 대한 Semantic Segmentation&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1757&quot; data-start=&quot;1691&quot;&gt;CNN decoder를 사용하여 feature map을 upsampling (32분의1 &amp;rarr; 4분의1 &amp;rarr; 원본 크기)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Asymmetric Feature Pyramid Network (비대칭 FPN) &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1836&quot; data-origin-height=&quot;855&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bmD1n6/dJMb9YwtF3y/0c4Vjnd1pK1Ekst7vb9Wok/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bmD1n6/dJMb9YwtF3y/0c4Vjnd1pK1Ekst7vb9Wok/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bmD1n6/dJMb9YwtF3y/0c4Vjnd1pK1Ekst7vb9Wok/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbmD1n6%2FdJMb9YwtF3y%2F0c4Vjnd1pK1Ekst7vb9Wok%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;705&quot; height=&quot;328&quot; data-origin-width=&quot;1836&quot; data-origin-height=&quot;855&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1868&quot; data-start=&quot;1816&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1868&quot; data-start=&quot;1816&quot; data-ke-size=&quot;size16&quot;&gt;Penoptic FPN의 중요한 특징은 &lt;b&gt;비대칭적(asymmetric) FPN 구조&lt;/b&gt;이다.&lt;/p&gt;
&lt;p data-end=&quot;1868&quot; data-start=&quot;1816&quot; data-ke-size=&quot;size16&quot;&gt;기존 RetinaNet, FPN은 &lt;b&gt;Symmetric 구조&lt;/b&gt;를 사용한다. 모든 단계의 feature가 동일한 채널 수(C)로 유지됨.&lt;/p&gt;
&lt;p data-end=&quot;1868&quot; data-start=&quot;1816&quot; data-ke-size=&quot;size16&quot;&gt;Penoptic FPN은 &lt;b&gt;Asymmetric 구조&lt;/b&gt;를 사용:&lt;/p&gt;
&lt;p data-end=&quot;1868&quot; data-start=&quot;1816&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2078&quot; data-start=&quot;1992&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2041&quot; data-start=&quot;1992&quot;&gt;백본에서 나온 feature의 채널 수를 &lt;b&gt;1&amp;times;1 Convolution으로 축소&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;2078&quot; data-start=&quot;2044&quot;&gt;예: 1024 &amp;rarr; 256 채널, 512 &amp;rarr; 256 채널 등&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2181&quot; data-start=&quot;2079&quot;&gt;&lt;b&gt;이유:&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2181&quot; data-start=&quot;2093&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2116&quot; data-start=&quot;2093&quot;&gt;연산량(곱셈&amp;middot;덧셈)을 줄이기 위해서&lt;/li&gt;
&lt;li data-end=&quot;2158&quot; data-start=&quot;2119&quot;&gt;Memory, computational complexity 절감&lt;/li&gt;
&lt;li data-end=&quot;2181&quot; data-start=&quot;2161&quot;&gt;성능 저하 없이 효율적 연산 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1868&quot; data-start=&quot;1816&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1868&quot; data-start=&quot;1816&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Semantic Segmentation 하부 Branch 세부 구조&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;882&quot; data-origin-height=&quot;357&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cLP1Sc/dJMb9gjBUtv/CMeKs33U8C8ny8wEDN8gH0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cLP1Sc/dJMb9gjBUtv/CMeKs33U8C8ny8wEDN8gH0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cLP1Sc/dJMb9gjBUtv/CMeKs33U8C8ny8wEDN8gH0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcLP1Sc%2FdJMb9gjBUtv%2FCMeKs33U8C8ny8wEDN8gH0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;660&quot; height=&quot;267&quot; data-origin-width=&quot;882&quot; data-origin-height=&quot;357&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2372&quot; data-start=&quot;2331&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2372&quot; data-start=&quot;2331&quot; data-ke-size=&quot;size16&quot;&gt;하단 브랜치에서 수행되는 시멘틱 세그멘테이션은 다음 조건을 만족해야 한다&lt;/p&gt;
&lt;p data-end=&quot;2372&quot; data-start=&quot;2331&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;2408&quot; data-start=&quot;2373&quot;&gt;&lt;b&gt;High-resolution feature 확보&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;2440&quot; data-start=&quot;2409&quot;&gt;&lt;b&gt;Multi-level feature 고려&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;2466&quot; data-start=&quot;2441&quot;&gt;&lt;b&gt;풍부한 semantic 정보 유지&lt;/b&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1607&quot; data-origin-height=&quot;856&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bmbLGp/dJMb9OU0f84/3SXIo0KkVbjDew66kW41Jk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bmbLGp/dJMb9OU0f84/3SXIo0KkVbjDew66kW41Jk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bmbLGp/dJMb9OU0f84/3SXIo0KkVbjDew66kW41Jk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbmbLGp%2FdJMb9OU0f84%2F3SXIo0KkVbjDew66kW41Jk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;648&quot; height=&quot;345&quot; data-origin-width=&quot;1607&quot; data-origin-height=&quot;856&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;2531&quot; data-start=&quot;2487&quot;&gt;Backbone + Neck(FPN)에서 얻은 feature를 기반으로 함.&lt;/li&gt;
&lt;li data-end=&quot;2568&quot; data-start=&quot;2532&quot;&gt;각 feature map은 원본 대비 1/32 크기에서 출발.&lt;/li&gt;
&lt;li data-end=&quot;2672&quot; data-start=&quot;2569&quot;&gt;Convolution &amp;rarr; Upsampling(&amp;times;2) &amp;rarr; Convolution &amp;rarr; Upsampling(&amp;times;2) 반복&lt;br /&gt;&amp;rarr; 최종적으로 &lt;b&gt;1/4 크기 feature map&lt;/b&gt; 생성.&lt;/li&gt;
&lt;li data-end=&quot;2704&quot; data-start=&quot;2673&quot;&gt;각 단계에서 채널 수 조정 (예: 256 &amp;rarr; 128)&lt;/li&gt;
&lt;li data-end=&quot;2757&quot; data-start=&quot;2705&quot;&gt;모든 feature map의 크기와 채널이 동일하므로 element-wise sum 가능.&lt;/li&gt;
&lt;li data-end=&quot;2817&quot; data-start=&quot;2758&quot;&gt;최종적으로 합쳐진 feature를 convolution 후 &lt;b&gt;4배 업샘플링하여 원본 크기&lt;/b&gt;로 복원.&lt;/li&gt;
&lt;li data-end=&quot;2882&quot; data-start=&quot;2818&quot;&gt;출력: &lt;b&gt;Stuff&lt;/b&gt; 클래스(예: Grass, Sky, Building 등)에 대한 &lt;b&gt;Semantic Mask&lt;/b&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 학습 (Training) &amp;mdash; Multi-Task Learning &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1809&quot; data-origin-height=&quot;889&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/TGbDL/dJMb9YwtGap/5TlFqyPNrBaDMIZvbIUbaK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/TGbDL/dJMb9YwtGap/5TlFqyPNrBaDMIZvbIUbaK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/TGbDL/dJMb9YwtGap/5TlFqyPNrBaDMIZvbIUbaK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FTGbDL%2FdJMb9YwtGap%2F5TlFqyPNrBaDMIZvbIUbaK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;687&quot; height=&quot;338&quot; data-origin-width=&quot;1809&quot; data-origin-height=&quot;889&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2930&quot; data-start=&quot;2889&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;3043&quot; data-start=&quot;2932&quot; data-ke-size=&quot;size16&quot;&gt;Penoptic FPN은 &lt;b&gt;Instance Segmentation + Semantic Segmentation&lt;/b&gt;을 동시에 수행해야 하므로 &lt;b&gt;Multi-Task Learning&lt;/b&gt; 구조를 사용한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;182&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/z9ytW/dJMb83EzrCM/p9ziOalpGfFmvABTgMP4k1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/z9ytW/dJMb83EzrCM/p9ziOalpGfFmvABTgMP4k1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/z9ytW/dJMb83EzrCM/p9ziOalpGfFmvABTgMP4k1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fz9ytW%2FdJMb83EzrCM%2Fp9ziOalpGfFmvABTgMP4k1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;279&quot; height=&quot;182&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;182&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;3043&quot; data-start=&quot;2932&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;3095&quot; data-start=&quot;3045&quot; data-ke-size=&quot;size16&quot;&gt;(1) Instance Segmentation Loss&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3252&quot; data-start=&quot;3096&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3150&quot; data-start=&quot;3096&quot;&gt;&lt;b&gt;Classification Loss:&lt;/b&gt; 객체 분류용 Cross Entropy Loss&lt;/li&gt;
&lt;li data-end=&quot;3210&quot; data-start=&quot;3151&quot;&gt;&lt;b&gt;Bounding Box Regression Loss:&lt;/b&gt; Smooth L1 또는 MSE Loss&lt;/li&gt;
&lt;li data-end=&quot;3252&quot; data-start=&quot;3211&quot;&gt;&lt;b&gt;Mask Loss:&lt;/b&gt; 픽셀 단위 Cross Entropy Loss&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3288&quot; data-start=&quot;3254&quot; data-ke-size=&quot;size16&quot;&gt;(2) Semantic Segmentation Loss&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3334&quot; data-start=&quot;3289&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3334&quot; data-start=&quot;3289&quot;&gt;Stuff 클래스용 Pixel-wise Cross Entropy Loss 추가&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;최근 발전 방향&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1792&quot; data-origin-height=&quot;885&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/PLjQh/dJMb9ap9401/WqDc52HnnwWYExMVfjqurK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/PLjQh/dJMb9ap9401/WqDc52HnnwWYExMVfjqurK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/PLjQh/dJMb9ap9401/WqDc52HnnwWYExMVfjqurK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FPLjQh%2FdJMb9ap9401%2FWqDc52HnnwWYExMVfjqurK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;698&quot; height=&quot;345&quot; data-origin-width=&quot;1792&quot; data-origin-height=&quot;885&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;3745&quot; data-start=&quot;3710&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(1) Transformer 기반 Segmentation&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;3769&quot; data-start=&quot;3746&quot; data-ke-size=&quot;size16&quot;&gt;CNN 대신 Transformer를 활용:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3866&quot; data-start=&quot;3770&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3814&quot; data-start=&quot;3770&quot;&gt;&lt;b&gt;SegFormer&lt;/b&gt;, &lt;b&gt;MaskFormer&lt;/b&gt; 등의 모델이 대표적&lt;/li&gt;
&lt;li data-end=&quot;3866&quot; data-start=&quot;3815&quot;&gt;장점: Long-range dependency를 학습하여 더 넓은 공간적 관계 인식 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3895&quot; data-start=&quot;3868&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(2) 비디오 기반 Segmentation&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4018&quot; data-start=&quot;3896&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3939&quot; data-start=&quot;3896&quot;&gt;단일 이미지가 아닌 &lt;b&gt;연속 프레임(video)&lt;/b&gt; 단위 세그멘테이션 수행&lt;/li&gt;
&lt;li data-end=&quot;4018&quot; data-start=&quot;3940&quot;&gt;시간적 일관성을 유지하기 위해 &lt;b&gt;Neural Memory 모듈&lt;/b&gt;을 사용&lt;br /&gt;(하드웨어 메모리가 아닌, 뉴럴 네트워크 기반 메모리)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;4060&quot; data-start=&quot;4020&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(3) Foundation Model 기반 Segmentation&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4225&quot; data-start=&quot;4061&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4225&quot; data-start=&quot;4061&quot;&gt;&lt;b&gt;Segment Anything Model (SAM)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4225&quot; data-start=&quot;4100&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4132&quot; data-start=&quot;4100&quot;&gt;Meta가 공개한 범용 세그멘테이션 파운데이션 모델&lt;/li&gt;
&lt;li data-end=&quot;4185&quot; data-start=&quot;4135&quot;&gt;방대한 데이터로 학습되어 &lt;b&gt;제로샷(Zero-shot) Segmentation 가능&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;4225&quot; data-start=&quot;4188&quot;&gt;다양한 downstream task에 fine-tuning 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(4) 생성 모델 기반 Segmentation (Generative Segmentation)&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;4387&quot; data-start=&quot;4287&quot; data-ke-size=&quot;size16&quot;&gt;Segmentation은 전통적으로 &lt;b&gt;판별(discriminative)&lt;/b&gt; 태스크였다.&lt;br /&gt;하지만 최근에는 &lt;b&gt;생성(Generative)&lt;/b&gt; 모델을 이용한 접근이 연구되고 있다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4688&quot; data-start=&quot;4389&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4597&quot; data-start=&quot;4389&quot;&gt;&lt;b&gt;Diffusion Model&lt;/b&gt; 기반 접근:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;4597&quot; data-start=&quot;4420&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4506&quot; data-start=&quot;4420&quot;&gt;Diffusion 과정에서의 Attention Map을 이용하여&lt;br /&gt;모델이 &quot;어디를 보고 생성하는가&quot;를 분석 &amp;rarr; Segmentation으로 활용&lt;/li&gt;
&lt;li data-end=&quot;4597&quot; data-start=&quot;4509&quot;&gt;Attention Map은 곧 시각적 집중 영역(Salient Region)으로&lt;br /&gt;Segmentation Mask와 동일한 역할 수행 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;4688&quot; data-start=&quot;4599&quot;&gt;관련 연구: &lt;b&gt;SegRef&lt;/b&gt;, &lt;b&gt;Deformer&lt;/b&gt;, &lt;b&gt;DiffusionNet&lt;/b&gt; 등&lt;br /&gt;&amp;rarr; 생성 모델의 시각적 집중 영역을 세그멘테이션으로 변환&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;정리&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;4745&quot; data-start=&quot;4704&quot;&gt;&lt;b&gt;Semantic Segmentation&lt;/b&gt;: 클래스별 영역 분류&lt;/li&gt;
&lt;li data-end=&quot;4789&quot; data-start=&quot;4746&quot;&gt;&lt;b&gt;Instance Segmentation&lt;/b&gt;: 개별 객체별 ID 분류&lt;/li&gt;
&lt;li data-end=&quot;4832&quot; data-start=&quot;4790&quot;&gt;&lt;b&gt;Panoptic Segmentation&lt;/b&gt;: 두 가지의 통합 형태&lt;/li&gt;
&lt;li data-end=&quot;4914&quot; data-start=&quot;4833&quot;&gt;&lt;b&gt;Penoptic FPN&lt;/b&gt;: FPN 기반 멀티태스크 학습 구조로 효율적이며,&lt;br /&gt;Stuff와 Things를 동시에 세그멘테이션 수행, Asymmetric FPN 구조를 통해 연산 효율 극대화&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/118</guid>
      <comments>https://c0mputermaster.tistory.com/118#entry118comment</comments>
      <pubDate>Thu, 25 Sep 2025 17:27:55 +0900</pubDate>
    </item>
    <item>
      <title>[Segmentation] Semantic Segmentation 알아보기 FCN, U-Net</title>
      <link>https://c0mputermaster.tistory.com/116</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;이번 포스팅에서는 &lt;b&gt;시멘틱 세그멘테이션(Semantic Segmentation)&lt;/b&gt; 에 대해 다뤄보았다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 시멘틱 세그멘테이션이란?&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;347&quot; data-start=&quot;317&quot;&gt;&lt;b&gt;Segmentation&lt;/b&gt; &amp;rarr; 분할, 나누기&lt;/li&gt;
&lt;li data-end=&quot;372&quot; data-start=&quot;348&quot;&gt;&lt;b&gt;Semantic&lt;/b&gt; &amp;rarr; 의미론적인&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우리는 컴퓨터 비전을 배웠기 때문에 Image Segmentation 방법론에 대해 들어봤을 것이다. &lt;span style=&quot;letter-spacing: 0px;&quot;&gt;의미와 상관없이 비슷한 색, 밝기, 질감 등을 기준으로 영역을 나누는 방법론 등이 있었는데 정리해보자면&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Thresholding, Edge Detection / Watershed, Clustering기법, Graph-based Segmentation등이 있었다. 더 나아가서는 Selective Search같은 방법도 배웠다.&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오늘 알아볼 &lt;b&gt;Semantic Segmentation&lt;/b&gt;은 단순히 색상이나 질감이 아니라, &amp;ldquo;이 픽셀이 무엇인가?&amp;rdquo;를 학습 기반으로 분류한다.&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Instance Segmentation은 다음에 포스팅 해보겠다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;908&quot; data-origin-height=&quot;306&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/BWBrw/btsQ3EMkHJS/sKhMe9PY1phm7Xru2urP3k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/BWBrw/btsQ3EMkHJS/sKhMe9PY1phm7Xru2urP3k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/BWBrw/btsQ3EMkHJS/sKhMe9PY1phm7Xru2urP3k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FBWBrw%2FbtsQ3EMkHJS%2FsKhMe9PY1phm7Xru2urP3k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;730&quot; height=&quot;246&quot; data-origin-width=&quot;908&quot; data-origin-height=&quot;306&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;예를 들어 사람과 고양이가 함께 있는 사진이 있다면, 모델은 &lt;b&gt;모든 픽셀마다 &amp;lsquo;이건 사람&amp;rsquo;, &amp;lsquo;이건 고양이&amp;rsquo;, &amp;lsquo;이건 배경&amp;rsquo;&lt;/b&gt; 이라고 예측한다. 즉, 이미지의 각 픽셀에 &lt;b&gt;클래스 라벨(Class Label)&lt;/b&gt; 이 부여됨&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;740&quot; data-origin-height=&quot;295&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/6CJZ9/btsQ3wHzVRq/Yf00maxFk25MA7PvtPxck1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/6CJZ9/btsQ3wHzVRq/Yf00maxFk25MA7PvtPxck1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/6CJZ9/btsQ3wHzVRq/Yf00maxFk25MA7PvtPxck1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F6CJZ9%2FbtsQ3wHzVRq%2FYf00maxFk25MA7PvtPxck1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;549&quot; height=&quot;219&quot; data-origin-width=&quot;740&quot; data-origin-height=&quot;295&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2237&quot; data-start=&quot;2168&quot; data-ke-size=&quot;size16&quot;&gt;하지만 픽셀 하나만 보고는 &amp;ldquo;이게 뭔지&amp;rdquo; 알 수 없다. 주변 문맥(Context)이 없으면 사람도, 모델도 구분할 수 없는것이다.&lt;/p&gt;
&lt;p data-end=&quot;2334&quot; data-start=&quot;2239&quot; data-ke-size=&quot;size16&quot;&gt;그래서 모델은 보통 &lt;b&gt;주변 영역(윈도우, window)&lt;/b&gt; 을 함께 참고하는데&lt;/p&gt;
&lt;p data-end=&quot;2334&quot; data-start=&quot;2239&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;368&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bawTyd/btsQ4Onbmqy/iKSEcN1dDkZSfuhFgni2R0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bawTyd/btsQ4Onbmqy/iKSEcN1dDkZSfuhFgni2R0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bawTyd/btsQ4Onbmqy/iKSEcN1dDkZSfuhFgni2R0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbawTyd%2FbtsQ4Onbmqy%2FiKSEcN1dDkZSfuhFgni2R0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;583&quot; height=&quot;255&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;368&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2334&quot; data-start=&quot;2239&quot; data-ke-size=&quot;size16&quot;&gt;각 픽셀마다 윈도우를 적용하면 겹치는 &lt;b&gt;중복 연산&lt;/b&gt;이 너무 많아져 비효율적이다.&lt;/p&gt;
&lt;p data-end=&quot;2334&quot; data-start=&quot;2239&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;971&quot; data-origin-height=&quot;384&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/00Fwm/btsQ5actAco/nLuw7Kv7sQhDffs7m6RzIk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/00Fwm/btsQ5actAco/nLuw7Kv7sQhDffs7m6RzIk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/00Fwm/btsQ5actAco/nLuw7Kv7sQhDffs7m6RzIk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F00Fwm%2FbtsQ5actAco%2FnLuw7Kv7sQhDffs7m6RzIk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;584&quot; height=&quot;231&quot; data-origin-width=&quot;971&quot; data-origin-height=&quot;384&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 문제를 해결하기 위해 &lt;b&gt;이미지 전체를 CNN에 통째로 &lt;/b&gt;입력하여 사용하는 아이디어가 제시되었다. 하지만 위 사진 처럼 CNN을 사용하게 되면 &lt;b&gt;Low-level Feature들만&amp;nbsp;&lt;/b&gt;사용되는 느낌이 있어 다운샘플링이 필요하다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Encoder&amp;ndash;Decoder&lt;/b&gt; 구조&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;972&quot; data-origin-height=&quot;348&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/1wIu9/btsQ3yFk8gt/IRKKrQ7CkKG12qJhOVzlQ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/1wIu9/btsQ3yFk8gt/IRKKrQ7CkKG12qJhOVzlQ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/1wIu9/btsQ3yFk8gt/IRKKrQ7CkKG12qJhOVzlQ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F1wIu9%2FbtsQ3yFk8gt%2FIRKKrQ7CkKG12qJhOVzlQ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;617&quot; height=&quot;221&quot; data-origin-width=&quot;972&quot; data-origin-height=&quot;348&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 한계를 해결하기 위해 등장한 구조가 바로 &lt;b&gt;Encoder&amp;ndash;Decoder&lt;/b&gt; 구조이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Encoder (인코더):&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 인코더의 목표는 이미지의 의미를 요약하는 것이다. 원본 이미지를 점점 압축해가며, 중요한 정보만 남기고 나머지는 버린다.&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1399&quot; data-start=&quot;1344&quot;&gt;입력 이미지를 여러 단계의 &lt;b&gt;Convolution + Pooling&lt;/b&gt; 을 통해 다운샘플링&lt;/li&gt;
&lt;li data-end=&quot;1454&quot; data-start=&quot;1400&quot;&gt;이미지의 크기(H&amp;times;W)는 줄어들지만, 채널(Channel, Feature Map)은 많아짐&lt;/li&gt;
&lt;li data-end=&quot;1514&quot; data-start=&quot;1455&quot;&gt;결과적으로, &lt;b&gt;전역적인 문맥(Context)&lt;/b&gt; 을 담은 &lt;b&gt;압축된 표현(Feature)&lt;/b&gt; 을 얻음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Decoder (디코더):&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;1799&quot; data-start=&quot;1709&quot; data-ke-size=&quot;size16&quot;&gt;디코더는 인코더의 반대 역할을 수행한다. 압축된 Feature Map을 다시 &lt;b&gt;원래 이미지 크기&lt;/b&gt;로 되돌리면서,&lt;br /&gt;&lt;b&gt;Semantic Segmentation에는 &lt;/b&gt;각 픽셀에 &amp;ldquo;무엇인지&amp;rdquo;를 예측한다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1946&quot; data-start=&quot;1801&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1868&quot; data-start=&quot;1801&quot;&gt;&lt;b&gt;Up Sampling&lt;/b&gt;, &lt;b&gt;Transpose Convolution&lt;/b&gt;, &lt;b&gt;Unpooling&lt;/b&gt; 등을 사용&lt;/li&gt;
&lt;li data-end=&quot;1909&quot; data-start=&quot;1869&quot;&gt;점점 크기를 복원하며, 각 픽셀에 대한 &lt;b&gt;클래스 확률&lt;/b&gt;을 출력&lt;/li&gt;
&lt;li data-end=&quot;1946&quot; data-start=&quot;1910&quot;&gt;최종 출력은 입력 이미지와 동일한 크기의 &lt;b&gt;H&amp;times;W 마스크&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Up Sampling 기법 &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;886&quot; data-origin-height=&quot;362&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bJtfIQ/btsQ3pIB5A4/kvlN0ntmFqqyxEEvW2agGK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bJtfIQ/btsQ3pIB5A4/kvlN0ntmFqqyxEEvW2agGK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bJtfIQ/btsQ3pIB5A4/kvlN0ntmFqqyxEEvW2agGK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbJtfIQ%2FbtsQ3pIB5A4%2FkvlN0ntmFqqyxEEvW2agGK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;647&quot; height=&quot;264&quot; data-origin-width=&quot;886&quot; data-origin-height=&quot;362&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이것도 영상처리나 비전 공부에서 봤을 것이다. 줄어든 영상의 크기를 다시 늘려줘야 하기 때문에 &lt;b&gt;Up Sampling 기법&lt;/b&gt;을 사용해야하는데 가장 가까운 픽셀 값을 그대로 복사해서 확대하는&lt;b&gt; Nearest Neighbor, &lt;/b&gt;값을 그대로 유지하고 나머지는 0으로 채우는 방법&lt;b&gt; Bed of Nails, &lt;/b&gt;최댓값의 위치(index) 를 저장해두었다가 사용하는&lt;b&gt; Max Unpooling 등&lt;/b&gt;이 있겠다.&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;944&quot; data-origin-height=&quot;434&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bE7YJX/btsQ5BOpFDb/unQ2cRp1rDVAi9ImHeccd0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bE7YJX/btsQ5BOpFDb/unQ2cRp1rDVAi9ImHeccd0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bE7YJX/btsQ5BOpFDb/unQ2cRp1rDVAi9ImHeccd0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbE7YJX%2FbtsQ5BOpFDb%2FunQ2cRp1rDVAi9ImHeccd0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;744&quot; height=&quot;342&quot; data-origin-width=&quot;944&quot; data-origin-height=&quot;434&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그중에서 디코더의 핵심은&amp;nbsp; &lt;b&gt;Transpose Convolution&lt;/b&gt; (Deconvolution)이다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Transpose Convolution은 일반 Convolution의 &amp;ldquo;역방향&amp;rdquo;처럼 작동하며, &lt;b&gt;Feature Map의 크기를 키우는 역할&lt;/b&gt;을 한다.&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;3359&quot; data-start=&quot;3353&quot; data-ke-size=&quot;size16&quot;&gt;예를 들어&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;860&quot; data-origin-height=&quot;323&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zMBhk/btsQ5Yo5BkS/JdfGIhrjtZd7g3ucgJoRaK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zMBhk/btsQ5Yo5BkS/JdfGIhrjtZd7g3ucgJoRaK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zMBhk/btsQ5Yo5BkS/JdfGIhrjtZd7g3ucgJoRaK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FzMBhk%2FbtsQ5Yo5BkS%2FJdfGIhrjtZd7g3ucgJoRaK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;674&quot; height=&quot;253&quot; data-origin-width=&quot;860&quot; data-origin-height=&quot;323&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3457&quot; data-start=&quot;3361&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3405&quot; data-start=&quot;3361&quot;&gt;&lt;b&gt;일반 Convolution: 4&amp;times;4 입력 &amp;rarr; 3&amp;times;3 필터 &amp;rarr; 2&amp;times;2 출력&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;567&quot; data-origin-height=&quot;331&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Yn2Vf/btsQ5GWuTZL/m1uHzk6oKujbHn2l9RNEa0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Yn2Vf/btsQ5GWuTZL/m1uHzk6oKujbHn2l9RNEa0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Yn2Vf/btsQ5GWuTZL/m1uHzk6oKujbHn2l9RNEa0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FYn2Vf%2FbtsQ5GWuTZL%2Fm1uHzk6oKujbHn2l9RNEa0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;469&quot; height=&quot;274&quot; data-origin-width=&quot;567&quot; data-origin-height=&quot;331&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3457&quot; data-start=&quot;3361&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3457&quot; data-start=&quot;3406&quot;&gt;&lt;b&gt;Transpose Convolution: 2&amp;times;2 입력 &amp;rarr; 3&amp;times;3 필터 &amp;rarr; 4&amp;times;4 출력&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3490&quot; data-start=&quot;3459&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;참고&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;3533&quot; data-start=&quot;3497&quot; data-ke-size=&quot;size14&quot;&gt;&lt;b&gt;1. 체크보드 패턴(Checkerboard Artifact)&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;3533&quot; data-start=&quot;3497&quot; data-ke-size=&quot;size14&quot;&gt;- Transpose Convolution에서는 &lt;b&gt;필터 겹침(Overlap)&lt;/b&gt; 구간이 생긴다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;blob&quot; data-origin-width=&quot;951&quot; data-origin-height=&quot;460&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bafE4h/btsQ3x7Dxjn/eWzq2mU14KT34KFC160MkK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bafE4h/btsQ3x7Dxjn/eWzq2mU14KT34KFC160MkK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bafE4h/btsQ3x7Dxjn/eWzq2mU14KT34KFC160MkK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbafE4h%2FbtsQ3x7Dxjn%2FeWzq2mU14KT34KFC160MkK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;714&quot; height=&quot;343&quot; data-filename=&quot;blob&quot; data-origin-width=&quot;951&quot; data-origin-height=&quot;460&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;3666&quot; data-start=&quot;3655&quot; data-ke-size=&quot;size14&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;따라서&lt;/span&gt; &lt;b&gt;Transpose Convolution&lt;/b&gt; 사용시에는 Stride와 Kernel Size를 적절히 설정 (예: Stride=2, Kernel=2) 하거나&amp;nbsp; Bilinear / Nearest Neighbor Interpolation 후 일반 Conv로 정제하는 테크닉이 필요하다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2. &amp;ldquo;Transpose(전치)&amp;rdquo;라는 이름의 이유 &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;619&quot; data-origin-height=&quot;377&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/y92FZ/btsQ5XqaYWL/xGKoFVrme2kh34Nv9M0P1K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/y92FZ/btsQ5XqaYWL/xGKoFVrme2kh34Nv9M0P1K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/y92FZ/btsQ5XqaYWL/xGKoFVrme2kh34Nv9M0P1K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fy92FZ%2FbtsQ5XqaYWL%2FxGKoFVrme2kh34Nv9M0P1K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;453&quot; height=&quot;276&quot; data-origin-width=&quot;619&quot; data-origin-height=&quot;377&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;컨볼루션 연산은 실제로 다음과 같은 &lt;b&gt;행렬 곱 형태&lt;/b&gt;로 쓸 수 있수 있는데 &lt;b&gt;. y = W &amp;times; x&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1171&quot; data-start=&quot;1127&quot;&gt;x는 입력 이미지(4&amp;times;4)를 &lt;b&gt;벡터 형태(16&amp;times;1)&lt;/b&gt; 로 펼친 것&lt;/li&gt;
&lt;li data-end=&quot;1211&quot; data-start=&quot;1172&quot;&gt;W는 필터를 적용하는 역할을 하는 &lt;b&gt;큰 행렬(4&amp;times;16)&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1241&quot; data-start=&quot;1212&quot;&gt;y는 출력(2&amp;times;2)을 펼친 &lt;b&gt;4&amp;times;1 벡터&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1308&quot; data-start=&quot;1282&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1308&quot; data-start=&quot;1282&quot; data-ke-size=&quot;size16&quot;&gt;스파스 행렬(Sparse Matrix)&lt;/p&gt;
&lt;p data-end=&quot;1368&quot; data-start=&quot;1310&quot; data-ke-size=&quot;size16&quot;&gt;이때의 W는 대부분이 &lt;b&gt;0(Zero)&lt;/b&gt; 인 &lt;b&gt;희소행렬(Sparse Matrix)&lt;/b&gt; 이 되는데&amp;nbsp; 예를 들어, W의 한 행(row)은 필터가 특정 위치에서 본 3&amp;times;3 영역의 위치를 반영한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;그렇게 이제 x' = Wᵀ &amp;times; y으로 보고 W가 전치된 &lt;b&gt;Wᵀ와 &lt;b&gt;y를 보고 &lt;/b&gt; &lt;b&gt;x'를 구할 수 있게 되는것이다.&lt;/b&gt;&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 참고로 x'는 예측 값이고 x는 아님&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://excelsior-cjh.tistory.com/130&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://excelsior-cjh.tistory.com/130&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1760002110997&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[코딩더매트릭스]Chap05 - 행렬 The Matrix&quot; data-og-description=&quot;깃헙으로 Jupyter Notebook을 볼 경우 LaTex 문법이 깨지는 경우가 있어 되도록 nbviewer로 보는 것을 추천한다. ​ nbviewer에서 보기Chap 05 - 행렬(The Matrix)5.1 행렬이란 무엇인가?5.1.1 전통적인 행렬일반적&quot; data-og-host=&quot;excelsior-cjh.tistory.com&quot; data-og-source-url=&quot;https://excelsior-cjh.tistory.com/130&quot; data-og-url=&quot;https://excelsior-cjh.tistory.com/130&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/xPWk6/hyZKOLE8do/dlTYm6roYM8G7hc7VXbY4K/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/qq80e/hyZKOEUih7/Sm9ychCwp3dBWWkyCievg0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/boUEkt/hyZKP4TJ8V/ucXO4j25khNU131MmJKJkk/img.png?width=1122&amp;amp;height=1002&amp;amp;face=0_0_1122_1002&quot;&gt;&lt;a href=&quot;https://excelsior-cjh.tistory.com/130&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://excelsior-cjh.tistory.com/130&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/xPWk6/hyZKOLE8do/dlTYm6roYM8G7hc7VXbY4K/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/qq80e/hyZKOEUih7/Sm9ychCwp3dBWWkyCievg0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/boUEkt/hyZKP4TJ8V/ucXO4j25khNU131MmJKJkk/img.png?width=1122&amp;amp;height=1002&amp;amp;face=0_0_1122_1002');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[코딩더매트릭스]Chap05 - 행렬 The Matrix&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;깃헙으로 Jupyter Notebook을 볼 경우 LaTex 문법이 깨지는 경우가 있어 되도록 nbviewer로 보는 것을 추천한다. ​ nbviewer에서 보기Chap 05 - 행렬(The Matrix)5.1 행렬이란 무엇인가?5.1.1 전통적인 행렬일반적&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;excelsior-cjh.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-end=&quot;151&quot; data-start=&quot;121&quot; data-ke-size=&quot;size20&quot;&gt;시멘틱 세그멘테이션과 Dense Prediction&lt;/h4&gt;
&lt;p data-end=&quot;176&quot; data-start=&quot;153&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Dense Prediction의 개념&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;394&quot; data-start=&quot;177&quot; data-ke-size=&quot;size16&quot;&gt;Dense Prediction은 이미지의 각 픽셀 단위로 예측을 수행하는 작업을 의미한다. 예를 들어 RGB 이미지로부터 깊이(depth)를 추론하거나, 각 픽셀이 어떤 객체에 속하는지를 예측하는 시멘틱 세그멘테이션(Semantic Segmentation)이 대표적인 예다. 이런 Dense한 예측을 위해서는 인코더(Encoder)와 디코더(Decoder)가 연결된 구조가 자주 사용된다.&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt; FCN (Fully Convolutional Network) &lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;447&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/w6COA/btsQ4l6GQMe/HHIVKgBelLyj92T7EJ97Qk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/w6COA/btsQ4l6GQMe/HHIVKgBelLyj92T7EJ97Qk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/w6COA/btsQ4l6GQMe/HHIVKgBelLyj92T7EJ97Qk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fw6COA%2FbtsQ4l6GQMe%2FHHIVKgBelLyj92T7EJ97Qk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;774&quot; height=&quot;411&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;447&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;FCN은 시멘틱 세그멘테이션 분야에서 인코더&amp;ndash;디코더 구조를 처음으로 도입한 모델 중 하나이다.&lt;br /&gt;입력 이미지는 인코더를 거치며 점차 크기가 줄어드는 &lt;b&gt;다운샘플링(Encoding)&lt;/b&gt; 과정을 거치고, 이후 디코더에서 &lt;b&gt;업샘플링(Decoding)&lt;/b&gt; 되어 픽셀 단위로 분류되는 결과(세그멘테이션 마스크)를 출력한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;특징&lt;/p&gt;
&lt;p data-end=&quot;875&quot; data-start=&quot;845&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1. Fully Connected Layer 제거&lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;1049&quot; data-start=&quot;876&quot; data-ke-size=&quot;size16&quot;&gt;일반적인 CNN 분류(Classification)에서는 마지막에 Flatten 후 FC Layer를 통해 최종 클래스를 예측한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;825&quot; data-origin-height=&quot;455&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bOTmv3/btsQ2VOLboL/og3mmwmCn9KQ0kyMDL0CZ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bOTmv3/btsQ2VOLboL/og3mmwmCn9KQ0kyMDL0CZ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bOTmv3/btsQ2VOLboL/og3mmwmCn9KQ0kyMDL0CZ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbOTmv3%2FbtsQ2VOLboL%2Fog3mmwmCn9KQ0kyMDL0CZ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;439&quot; height=&quot;242&quot; data-origin-width=&quot;825&quot; data-origin-height=&quot;455&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;1049&quot; data-start=&quot;876&quot; data-ke-size=&quot;size16&quot;&gt;하지만 FCN에서는 이를 &lt;b&gt;컨볼루션(Convolution)&lt;/b&gt; 으로 대체하여 &amp;ldquo;Fully Convolutional&amp;rdquo; 구조를 만들었다.&lt;/p&gt;
&lt;p data-end=&quot;1081&quot; data-start=&quot;1051&quot; data-ke-size=&quot;size14&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1081&quot; data-start=&quot;1051&quot; data-ke-size=&quot;size14&quot;&gt;&lt;b&gt;Fully Connected Layer의 문제점&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;1431&quot; data-start=&quot;1082&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;1259&quot; data-start=&quot;1082&quot;&gt;&lt;b&gt;위치 정보 손실(Location Information Loss)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1259&quot; data-start=&quot;1130&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1194&quot; data-start=&quot;1130&quot;&gt;FC Layer는 입력을 일렬로 펼치므로 공간적 위치 정보(spatial information)가 사라진다.&lt;/li&gt;
&lt;li data-end=&quot;1259&quot; data-start=&quot;1198&quot;&gt;Convolution은 지역적 필터를 사용하므로 위치 정보를 보존하지만, FC Layer는 이를 파괴한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1431&quot; data-start=&quot;1261&quot;&gt;&lt;b&gt;입력 크기 고정 문제(Input Size Dependency)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1431&quot; data-start=&quot;1308&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1377&quot; data-start=&quot;1308&quot;&gt;FC Layer는 입력 Feature의 차원 수가 고정되어 있어, 학습한 입력 크기 이외의 이미지는 처리할 수 없다.&lt;/li&gt;
&lt;li data-end=&quot;1431&quot; data-start=&quot;1381&quot;&gt;Convolution으로 대체하면 입력 크기 변화에 유연해지며, 위치 정보도 유지된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-end=&quot;1490&quot; data-start=&quot;1481&quot; data-ke-size=&quot;size23&quot;&gt;기존 방식&lt;/h3&gt;
&lt;p data-end=&quot;1561&quot; data-start=&quot;1491&quot; data-ke-size=&quot;size16&quot;&gt;예: 512&amp;times;7&amp;times;7 Feature &amp;rarr; 4096차원 벡터로 Flatten&lt;br /&gt;&amp;rarr; 512&amp;times;7&amp;times;7 &amp;times; 4096개의 weight 필요&lt;/p&gt;
&lt;h3 data-end=&quot;1573&quot; data-start=&quot;1563&quot; data-ke-size=&quot;size23&quot;&gt;FCN 방식&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1749&quot; data-start=&quot;1574&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1614&quot; data-start=&quot;1574&quot;&gt;동일한 연산을 &lt;b&gt;7&amp;times;7 Convolution Filter&lt;/b&gt;로 구현&lt;/li&gt;
&lt;li data-end=&quot;1644&quot; data-start=&quot;1615&quot;&gt;필터를 4096개 사용하면 동일한 연산 수행 가능&lt;/li&gt;
&lt;li data-end=&quot;1698&quot; data-start=&quot;1645&quot;&gt;출력은 1&amp;times;1&amp;times;4096 텐서(즉, height=1, width=1, channel=4096)&lt;/li&gt;
&lt;li data-end=&quot;1749&quot; data-start=&quot;1699&quot;&gt;이후 1&amp;times;1 Convolution을 적용하여 21개 클래스(파스칼 VOC 기준)로 매핑&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1822&quot; data-start=&quot;1751&quot; data-ke-size=&quot;size16&quot;&gt;이 방식은 FC Layer를 Convolution으로 바꾸어 공간 정보를 유지하면서도 다양한 입력 크기를 처리할 수 있게 한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- 1&amp;times;1 Convolution을 이용한 Class Prediction&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;890&quot; data-origin-height=&quot;448&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bHB0cM/btsQ4ia5Wl6/QA4FpEnIEmeLgeXUQjork1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bHB0cM/btsQ4ia5Wl6/QA4FpEnIEmeLgeXUQjork1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bHB0cM/btsQ4ia5Wl6/QA4FpEnIEmeLgeXUQjork1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbHB0cM%2FbtsQ4ia5Wl6%2FQA4FpEnIEmeLgeXUQjork1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;790&quot; height=&quot;398&quot; data-origin-width=&quot;890&quot; data-origin-height=&quot;448&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;1872&quot; data-start=&quot;1829&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1994&quot; data-start=&quot;1873&quot; data-ke-size=&quot;size16&quot;&gt;4096채널 Feature Map에 대해 1&amp;times;1 필터를 21개 사용하면, 각 픽셀마다 21개의 클래스 확률을 추론할 수 있다.&lt;br /&gt;즉, &lt;b&gt;픽셀 단위의 분류(Semantic Segmentation)&lt;/b&gt; 가 가능해진다. 이 구조 덕분에 FCN은 Fully Connected Layer 없이도 선형 연산(Linear Transformation)과 같은 효과를 낼 수 있다.&lt;/p&gt;
&lt;p data-end=&quot;2080&quot; data-start=&quot;1996&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2080&quot; data-start=&quot;1996&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- 업샘플링(Decoding): Transpose Convolution &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;854&quot; data-origin-height=&quot;426&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bQTEd7/btsQ5jURDJl/sNVIp81cX2wRPelJX6E2KK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bQTEd7/btsQ5jURDJl/sNVIp81cX2wRPelJX6E2KK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bQTEd7/btsQ5jURDJl/sNVIp81cX2wRPelJX6E2KK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbQTEd7%2FbtsQ5jURDJl%2FsNVIp81cX2wRPelJX6E2KK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;692&quot; height=&quot;345&quot; data-origin-width=&quot;854&quot; data-origin-height=&quot;426&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2266&quot; data-start=&quot;2131&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2266&quot; data-start=&quot;2131&quot; data-ke-size=&quot;size16&quot;&gt;인코더에서 축소된 Feature Map을 원래 이미지 크기로 복원하기 위해 &lt;b&gt;Transpose Convolution(전치 합성곱)&lt;/b&gt; 을 사용한다.&lt;br /&gt;과거에는 Deconvolution이라는 용어를 썼지만, 현재는 잘못된 표현으로 간주된다.&lt;/p&gt;
&lt;p data-end=&quot;2266&quot; data-start=&quot;2131&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-start=&quot;2268&quot; data-end=&quot;2340&quot; data-ke-size=&quot;size16&quot;&gt;예를 들어, 인코더 출력이 4&amp;times;4라면 Transpose Convolution으로 32배 확대하여 128&amp;times;128로 복원할 수 있다.&lt;/p&gt;
&lt;p data-start=&quot;2342&quot; data-end=&quot;2457&quot; data-ke-size=&quot;size16&quot;&gt;하지만 한 번에 너무 크게 업샘플링하면 결과가&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Coarse(엉성)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;해진다.&lt;br /&gt;이를 해결하기 위해 FCN에서는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Multi-Level Feature와 Skip Architecture&lt;/b&gt;를 도입했다.&lt;/p&gt;
&lt;p data-end=&quot;2340&quot; data-start=&quot;2268&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2340&quot; data-start=&quot;2268&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Skip Architecture (Skip Connection)&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;964&quot; data-origin-height=&quot;316&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/3zpNK/btsQ3E6FkT8/HzMoNzRXUHfLr7DjUfOd7k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/3zpNK/btsQ3E6FkT8/HzMoNzRXUHfLr7DjUfOd7k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/3zpNK/btsQ3E6FkT8/HzMoNzRXUHfLr7DjUfOd7k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F3zpNK%2FbtsQ3E6FkT8%2FHzMoNzRXUHfLr7DjUfOd7k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;964&quot; height=&quot;316&quot; data-origin-width=&quot;964&quot; data-origin-height=&quot;316&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;2651&quot; data-start=&quot;2607&quot; data-ke-size=&quot;size16&quot;&gt;중간 Feature를 활용해 여러 단계로 업샘플링하면서 세밀한 정보를 복원한다.&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;2786&quot; data-start=&quot;2652&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;2709&quot; data-start=&quot;2652&quot;&gt;마지막 Feature를 업샘플링하여 중간 Feature와 합친다 (Element-wise Sum)&lt;/li&gt;
&lt;li data-end=&quot;2744&quot; data-start=&quot;2710&quot;&gt;다시 업샘플링하여 더 낮은 레벨의 Feature와 합친다&lt;/li&gt;
&lt;li data-end=&quot;2786&quot; data-start=&quot;2745&quot;&gt;마지막으로 전체 크기로 업샘플링하여 Softmax로 픽셀별 분류 수행&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-end=&quot;2891&quot; data-start=&quot;2788&quot; data-ke-size=&quot;size16&quot;&gt;이 구조는 Residual Network(ResNet)의 Identity Skip과 유사하며, 중간 Feature를 &lt;b&gt;Refine(정제)&lt;/b&gt; 하여 더 파인(Fine)한 결과를 만든다.&lt;/p&gt;
&lt;p data-end=&quot;2891&quot; data-start=&quot;2788&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2891&quot; data-start=&quot;2788&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;정리 : FCN의 주요 기여점 (Contribution)&lt;/b&gt;&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot; data-start=&quot;3241&quot; data-end=&quot;3574&quot;&gt;
&lt;li data-start=&quot;3241&quot; data-end=&quot;3361&quot;&gt;&lt;b&gt;Fully Convolutional Operation 도입&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;3286&quot; data-end=&quot;3361&quot;&gt;
&lt;li data-start=&quot;3286&quot; data-end=&quot;3361&quot;&gt;FC Layer를 모두 Convolution으로 대체하여
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;3327&quot; data-end=&quot;3361&quot;&gt;
&lt;li data-start=&quot;3327&quot; data-end=&quot;3342&quot;&gt;위치 정보 손실 방지&lt;/li&gt;
&lt;li data-start=&quot;3348&quot; data-end=&quot;3361&quot;&gt;입력 크기 제약 해소&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;3363&quot; data-end=&quot;3452&quot;&gt;&lt;b&gt;Transpose Convolution을 통한 학습 가능한 업샘플링&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;3413&quot; data-end=&quot;3452&quot;&gt;
&lt;li data-start=&quot;3413&quot; data-end=&quot;3452&quot;&gt;단순한 보간이 아닌, 학습 가능한 업샘플링 구조로 세밀한 복원 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;3454&quot; data-end=&quot;3574&quot;&gt;&lt;b&gt;Skip Architecture 도입 (Coarse to Fine)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;3504&quot; data-end=&quot;3574&quot;&gt;
&lt;li data-start=&quot;3504&quot; data-end=&quot;3574&quot;&gt;Multi-Level Feature 결합을 통해 엉성한 결과를 정제하고,&lt;br /&gt;세밀한 경계 복원이 가능한 구조 제안&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 data-end=&quot;2891&quot; data-start=&quot;2788&quot; data-ke-size=&quot;size26&quot;&gt;U-Net&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;760&quot; data-origin-height=&quot;451&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/SnXyI/btsQ5C0XHT2/hE7oh36g6CsRszVBkkltO1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/SnXyI/btsQ5C0XHT2/hE7oh36g6CsRszVBkkltO1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/SnXyI/btsQ5C0XHT2/hE7oh36g6CsRszVBkkltO1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FSnXyI%2FbtsQ5C0XHT2%2FhE7oh36g6CsRszVBkkltO1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;629&quot; height=&quot;373&quot; data-origin-width=&quot;760&quot; data-origin-height=&quot;451&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;355&quot; data-start=&quot;169&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;355&quot; data-start=&quot;169&quot; data-ke-size=&quot;size16&quot;&gt;U-Net은 네트워크의 모양이 &lt;b&gt;U자 형태&lt;/b&gt;를 띠기 때문에 붙여진 이름이다.&lt;br /&gt;구조적으로는 앞서 설명한 &lt;b&gt;FCN(Fully Convolutional Network)&lt;/b&gt; 과 유사하지만, &lt;b&gt;인코더(Encoder)와 디코더(Decoder)&lt;/b&gt; 를 연결하는 &lt;b&gt;스킵 커넥션(Skip Connection)&lt;/b&gt; 방식에서 차별화된다.&lt;/p&gt;
&lt;p data-end=&quot;503&quot; data-start=&quot;357&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;503&quot; data-start=&quot;357&quot; data-ke-size=&quot;size16&quot;&gt;특히 &lt;b&gt;인코더에서 추출된 Feature Map을 디코더로 전달할 때 덧셈(Sum)&lt;/b&gt; 이 아닌 &lt;b&gt;Concatenate(채널 방향 결합)&lt;/b&gt; 방식을 사용하는 것이 특징이다. 이 구조는 로컬라이제이션(위치 정보)을 보다 정확하게 보존하기 위해 설계되었다.&lt;/p&gt;
&lt;p data-end=&quot;503&quot; data-start=&quot;357&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;503&quot; data-start=&quot;357&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- 인코더(Contracting Path) &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;853&quot; data-origin-height=&quot;363&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/BhBjg/btsQ3kU2ZKT/6jbKCuBLCkZONRvEX4vSfk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/BhBjg/btsQ3kU2ZKT/6jbKCuBLCkZONRvEX4vSfk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/BhBjg/btsQ3kU2ZKT/6jbKCuBLCkZONRvEX4vSfk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FBhBjg%2FbtsQ3kU2ZKT%2F6jbKCuBLCkZONRvEX4vSfk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;818&quot; height=&quot;348&quot; data-origin-width=&quot;853&quot; data-origin-height=&quot;363&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2122&quot; data-start=&quot;2096&quot;&gt;입력: 572&amp;times;572&amp;times;1&lt;/li&gt;
&lt;li data-end=&quot;2188&quot; data-start=&quot;2123&quot;&gt;연산: Convolution(3&amp;times;3, padding 없음) &amp;rarr; Activation &amp;rarr; Normalization&lt;/li&gt;
&lt;li data-end=&quot;2220&quot; data-start=&quot;2189&quot;&gt;결과: 570&amp;times;570&amp;times;64 Feature Map 생성&lt;/li&gt;
&lt;li data-end=&quot;2257&quot; data-start=&quot;2221&quot;&gt;또 한 번 3&amp;times;3 Convolution &amp;rarr; 568&amp;times;568&amp;times;64&lt;/li&gt;
&lt;li data-end=&quot;2346&quot; data-start=&quot;2258&quot;&gt;이후 Pooling(Max Pool, stride=2)을 사용하여&lt;br /&gt;공간 크기를 절반(284&amp;times;284)으로 줄이면서 채널 수는 2배(128)로 증가시킴&lt;/li&gt;
&lt;li data-end=&quot;2413&quot; data-start=&quot;2347&quot;&gt;이 과정을 반복하여 점차 공간 크기를 줄이고, 채널 수를 늘림&lt;br /&gt;&amp;rarr; Feature의 추상화 수준이 점점 높아짐&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-start=&quot;2519&quot; data-end=&quot;2544&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- 디코더(Expanding Path)&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;853&quot; data-origin-height=&quot;344&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cmxrYY/btsQ5SWOa86/46D2owBKyCGzJNcnNoZxKK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cmxrYY/btsQ5SWOa86/46D2owBKyCGzJNcnNoZxKK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cmxrYY/btsQ5SWOa86/46D2owBKyCGzJNcnNoZxKK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcmxrYY%2FbtsQ5SWOa86%2F46D2owBKyCGzJNcnNoZxKK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;660&quot; height=&quot;266&quot; data-origin-width=&quot;853&quot; data-origin-height=&quot;344&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2566&quot; data-start=&quot;2546&quot; data-ke-size=&quot;size16&quot;&gt;디코더는 인코더의 반대 역할을 한다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2845&quot; data-start=&quot;2568&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2663&quot; data-start=&quot;2568&quot;&gt;&lt;b&gt;Transpose Convolution (UpConv)&lt;/b&gt; 사용
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2663&quot; data-start=&quot;2612&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2632&quot; data-start=&quot;2612&quot;&gt;2&amp;times;2 필터, stride=2&lt;/li&gt;
&lt;li data-end=&quot;2663&quot; data-start=&quot;2635&quot;&gt;Feature Map의 공간 크기를 2배씩 키움&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2718&quot; data-start=&quot;2664&quot;&gt;각 단계마다 Convolution(3&amp;times;3)을 2회 적용하여 Feature를 정제(refine)&lt;/li&gt;
&lt;li data-end=&quot;2845&quot; data-start=&quot;2719&quot;&gt;인코더의 Feature를 &lt;b&gt;Concatenate하여 결합&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2845&quot; data-start=&quot;2758&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2797&quot; data-start=&quot;2758&quot;&gt;공간 크기가 맞지 않으면 Crop하여 정렬 후 Concatenate&lt;/li&gt;
&lt;li data-end=&quot;2845&quot; data-start=&quot;2800&quot;&gt;Concatenate는 &lt;b&gt;채널 방향(channel-wise)&lt;/b&gt; 으로 수행됨&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;2926&quot; data-start=&quot;2847&quot; data-ke-size=&quot;size16&quot;&gt;이 과정을 반복하여 Feature Map 크기를 점점 늘리고 채널 수를 절반씩 줄여&lt;br /&gt;최종적으로 입력과 동일한 크기의 출력 이미지를 얻는다.&lt;/p&gt;
&lt;p data-end=&quot;2926&quot; data-start=&quot;2847&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2926&quot; data-start=&quot;2847&quot; data-ke-size=&quot;size16&quot;&gt;결과적으로&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2926&quot; data-start=&quot;2847&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3198&quot; data-start=&quot;3041&quot;&gt;&lt;b&gt;위치 정보(Localization) 보존&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3198&quot; data-start=&quot;3076&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3120&quot; data-start=&quot;3076&quot;&gt;인코더 하단부로 갈수록 Feature가 추상화되어 위치 정보가 손실된다.&lt;/li&gt;
&lt;li data-end=&quot;3198&quot; data-start=&quot;3124&quot;&gt;상단부 Feature(공간 크기 큰 Feature)를 직접 전달하여&lt;br /&gt;세밀한 위치 정보(로컬 정보)를 보존할 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;3378&quot; data-start=&quot;3200&quot;&gt;&lt;b&gt;Context 정보와의 결합&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3378&quot; data-start=&quot;3228&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3273&quot; data-start=&quot;3228&quot;&gt;낮은 레벨의 Feature는 위치 정보가 풍부하지만 문맥 정보가 부족하다.&lt;/li&gt;
&lt;li data-end=&quot;3322&quot; data-start=&quot;3277&quot;&gt;깊은 레벨의 Feature는 문맥 정보는 풍부하지만 위치 정보가 손상된다.&lt;/li&gt;
&lt;li data-end=&quot;3378&quot; data-start=&quot;3326&quot;&gt;두 Feature를 Concatenate하여 &lt;b&gt;로컬 + 글로벌 정보&lt;/b&gt;를 모두 활용한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;+ 백프로파게이션 시, 스킵 커넥션은 그래디언트(Gradient)의 흐름을 원활히 해주어 &lt;b&gt;Vanishing Gradient(그래디언트 소실)&lt;/b&gt; 문제를 완화한다. 이는 &lt;b&gt;ResNet의 Identity Shortcut&lt;/b&gt;이 가지는 이론적 효과와 동일하다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/1505.04597&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/abs/1505.04597&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1760029436979&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;U-Net: Convolutional Networks for Biomedical Image Segmentation&quot; data-og-description=&quot;There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated &quot; data-og-host=&quot;arxiv.org&quot; data-og-source-url=&quot;https://arxiv.org/abs/1505.04597&quot; data-og-url=&quot;https://arxiv.org/abs/1505.04597v1&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/buYFat/hyZKaPEDU4/WTxYe9Krt3oAQLKdUdf0oK/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/F5nq2/hyZKLImFQC/1pzkle1qDCMtgsoBvhZnt0/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/1505.04597&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://arxiv.org/abs/1505.04597&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/buYFat/hyZKaPEDU4/WTxYe9Krt3oAQLKdUdf0oK/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/F5nq2/hyZKLImFQC/1pzkle1qDCMtgsoBvhZnt0/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;U-Net: Convolutional Networks for Biomedical Image Segmentation&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;arxiv.org&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/1411.4038&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/abs/1411.4038&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1760029684408&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;Fully Convolutional Networks for Semantic Segmentation&quot; data-og-description=&quot;Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build&quot; data-og-host=&quot;arxiv.org&quot; data-og-source-url=&quot;https://arxiv.org/abs/1411.4038&quot; data-og-url=&quot;https://arxiv.org/abs/1411.4038v2&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/xBVhs/hyZKHlEBzG/v2b8Z4szuMktNFoR3PLNdK/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/bNZ51E/hyZKEPZ4Kb/M3ntK8YuK4eUoP0Jcapg40/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/1411.4038&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://arxiv.org/abs/1411.4038&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/xBVhs/hyZKHlEBzG/v2b8Z4szuMktNFoR3PLNdK/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/bNZ51E/hyZKEPZ4Kb/M3ntK8YuK4eUoP0Jcapg40/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Fully Convolutional Networks for Semantic Segmentation&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;arxiv.org&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Computer Vision1/Computer Vision</category>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/116</guid>
      <comments>https://c0mputermaster.tistory.com/116#entry116comment</comments>
      <pubDate>Wed, 24 Sep 2025 19:54:31 +0900</pubDate>
    </item>
    <item>
      <title>[Deep Learning] Partial Fine-Tuning 해보기</title>
      <link>https://c0mputermaster.tistory.com/114</link>
      <description>&lt;blockquote data-ke-style=&quot;style2&quot;&gt;Feature Extractor만 활용해보는 ResNet50 전이학습&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;361&quot; data-start=&quot;278&quot; data-ke-size=&quot;size16&quot;&gt;딥러닝 모델을 학습할 때, &lt;b&gt;모델 전체를 처음부터 학습(From Scratch)&lt;/b&gt; 하는 것은 시간도 오래 걸리고, 많은 데이터가 필요하다.&lt;/p&gt;
&lt;p data-end=&quot;496&quot; data-start=&quot;363&quot; data-ke-size=&quot;size16&quot;&gt;그래서 우리는 보통 &lt;b&gt;전이학습(Transfer Learning)&lt;/b&gt; 을 이용한다. 그중에서도 이번 글에서는 &lt;b&gt;Partial Fine-Tuning&lt;/b&gt;, 즉 &lt;b&gt;Feature Extractor만 사용하는 전이학습&lt;/b&gt;을 직접 실험해본다.&lt;/p&gt;
&lt;p data-end=&quot;496&quot; data-start=&quot;363&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-end=&quot;529&quot; data-start=&quot;503&quot; data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;Fine-Tuning의 세 가지 방식&lt;/b&gt;&lt;/h4&gt;
&lt;p data-end=&quot;547&quot; data-start=&quot;530&quot; data-ke-size=&quot;size16&quot;&gt;먼저 개념을 간단히 정리해보자.&lt;/p&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 63px;&quot; border=&quot;1&quot; data-end=&quot;843&quot; data-start=&quot;549&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;673&quot; data-start=&quot;625&quot;&gt;
&lt;td style=&quot;width: 36.279%; height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;644&quot; data-start=&quot;625&quot;&gt;&lt;b&gt;From Scratch&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 63.6047%; height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;662&quot; data-start=&quot;644&quot;&gt;랜덤 초기화로 처음부터 학습&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;744&quot; data-start=&quot;674&quot;&gt;
&lt;td style=&quot;width: 36.279%; height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;697&quot; data-start=&quot;674&quot;&gt;&lt;b&gt;Full Fine-Tuning&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 63.6047%; height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;733&quot; data-start=&quot;697&quot;&gt;ImageNet 등 사전학습 가중치로 초기화 후 전체 재학습&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot; data-end=&quot;843&quot; data-start=&quot;745&quot;&gt;
&lt;td style=&quot;width: 36.279%; height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;791&quot; data-start=&quot;745&quot;&gt;&lt;b&gt;Partial Fine-Tuning (Feature Extractor)&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 63.6047%; height: 21px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;831&quot; data-start=&quot;791&quot;&gt;사전학습된 백본을 고정(Freeze)하고 마지막 FC 레이어만 학습&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;Partial Fine-Tuning은 모델이 이미 학습한 &lt;/span&gt;&lt;b&gt;일반적인 시각적 특징(feature)&lt;/b&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;&amp;nbsp;을 그대로 활용하고, &lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;새 데이터셋에 맞게&amp;nbsp;&lt;/span&gt;&lt;b&gt;마지막 분류기(classifier)&lt;/b&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;&amp;nbsp;만 조정하는 방식이다.&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 코드 구현&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1759684967819&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import torch
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torchvision import models
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import os, time, copy

# 학습 함수 정의
def train_resnet(model, criterion, optimizer, scheduler, num_epochs=25):
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    for epoch in range(num_epochs):
        print(f'-------------- epoch {epoch+1} ----------------')&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1759685215883&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;        for phase in ['train', 'val']:
            model.train() if phase == 'train' else model.eval()
            running_loss, running_corrects = 0.0, 0

            for inputs, labels in dataloaders[phase]:
                inputs, labels = inputs.to(DEVICE), labels.to(DEVICE)
                
                optimizer.zero_grad()

                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            if phase == 'val' and epoch_acc &amp;gt; best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print(f'Best val Acc so far: {best_acc:.4f}')
    model.load_state_dict(best_model_wts)
    return model&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1684&quot; data-start=&quot;1652&quot;&gt;&lt;b&gt; model.train() =&amp;gt; train 단계:&lt;/b&gt; 모델이 가중치를 업데이트함&lt;/li&gt;
&lt;li data-end=&quot;1720&quot; data-start=&quot;1685&quot;&gt;&lt;b&gt; model.eval()&amp;nbsp; =&amp;gt; val 단계:&lt;/b&gt; 학습된 모델의 일반화 성능을 확인함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; optimizer.zero_grad()&lt;/b&gt; &amp;rarr; 이전 배치에서 계산된 gradient 초기화&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;u&gt;=&amp;gt; PyTorch는 기본적으로 gradient를 누적하므로 매 step마다 초기화&lt;/u&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1759685518147&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;with torch.set_grad_enabled(phase == 'train'):
    outputs = model(inputs)
    _, preds = torch.max(outputs, 1)
    loss = criterion(outputs, labels)
    if phase == 'train':
        loss.backward()
        optimizer.step()&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2645&quot; data-start=&quot;2509&quot;&gt;torch.set_grad_enabled(phase == 'train')&lt;br /&gt;&amp;rarr; &lt;b&gt;학습 단계(train)에서만 gradient 계산 활성화&lt;/b&gt;,&lt;br /&gt;&lt;b&gt;검증(val)에서는 비활성화&lt;/b&gt; &amp;rarr; 메모리 절약 및 속도 향상&lt;/li&gt;
&lt;li data-end=&quot;2700&quot; data-start=&quot;2646&quot;&gt;&lt;b&gt;outputs = model(inputs)&lt;/b&gt; &amp;rarr; forward propagation 수행&lt;/li&gt;
&lt;li data-end=&quot;2780&quot; data-start=&quot;2701&quot;&gt;&lt;b&gt;torch.max(outputs, 1)&lt;/b&gt; &amp;rarr; softmax 결과 중 가장 높은 확률을 가진 클래스의 index를 예측(preds)&lt;/li&gt;
&lt;li data-end=&quot;2835&quot; data-start=&quot;2781&quot;&gt;&lt;b&gt;criterion(outputs, labels)&lt;/b&gt; &amp;rarr; CrossEntropyLoss 계산&lt;/li&gt;
&lt;li data-end=&quot;2880&quot; data-start=&quot;2836&quot;&gt;&lt;b&gt;loss.backward()&lt;/b&gt; &amp;rarr; gradient 계산 (오차 역전파)&lt;/li&gt;
&lt;li data-end=&quot;2912&quot; data-start=&quot;2881&quot;&gt;&lt;b&gt;optimizer.step()&lt;/b&gt; &amp;rarr; 가중치 갱신&lt;/li&gt;
&lt;/ul&gt;
&lt;pre id=&quot;code_1759685557325&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;if phase == 'train':
    scheduler.step()&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3345&quot; data-start=&quot;3304&quot;&gt;학습 단계가 끝날 때마다 &lt;b&gt;StepLR 스케줄러&lt;/b&gt;를 한 번 호출&lt;/li&gt;
&lt;li data-end=&quot;3410&quot; data-start=&quot;3346&quot;&gt;일정 에폭(step_size)마다 학습률을 gamma 비율만큼 줄임&lt;/li&gt;
&lt;/ul&gt;
&lt;pre id=&quot;code_1759685692035&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;data_transforms = {
    'train': transforms.Compose([
        transforms.Resize([64, 64]),          # 이미지 크기 통일
        transforms.RandomHorizontalFlip(),    # 좌우 반전
        transforms.RandomVerticalFlip(),      # 상하 반전
        transforms.RandomCrop(52),            # 무작위 크롭 (데이터 증강)
        transforms.ToTensor(),                # Tensor로 변환
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])  # ImageNet 정규화 값
    ]),
    'val': transforms.Compose([
        transforms.Resize([64, 64]),
        transforms.RandomCrop(52),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])
}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Normalize()는 ImageNet 데이터셋의 평균, 표준편차로 정규화하여 &lt;/span&gt;&lt;b&gt;사전학습된 모델(ResNet50)&lt;/b&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt; 의 입력 분포와 맞춰준다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt; 데이터셋 및 데이터로더 정의&lt;/b&gt;&lt;/h4&gt;
&lt;pre id=&quot;code_1759685765772&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;data_dir = './splitted'  # train, val 폴더가 들어있는 상위 폴더
image_datasets = {
    x: ImageFolder(root=os.path.join(data_dir, x),
                   transform=data_transforms[x])
    for x in ['train', 'val']
}

dataloaders = {
    x: DataLoader(image_datasets[x],
                  batch_size=BATCH_SIZE,
                  shuffle=True,
                  num_workers=4)   # CPU 4개로 데이터 병렬 로드
    for x in ['train', 'val']
}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; ImageFolder : 폴더 구조를 자동으로 라벨링하여 Dataset으로 변환&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt; ResNet50 모델 정의 ( From Scratch, Full Fine-Tuning, Partial Fine-Tuning )&lt;/b&gt;&lt;/h4&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;&lt;b&gt;From Scratch&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1759686079020&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;resnet = models.resnet50(pretrained=False)

num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 33)

resnet = resnet.to(DEVICE)  # GPU로 이동&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;완전 처음부터 학습&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;in_features : 기존 fc 레이어의 &lt;b&gt;입력 뉴런 수(in_features)&lt;/b&gt; 를 가져오고&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;resnet.fc : 기존 1000-class 출력을 33-class로 교체한다&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;&lt;b&gt;Full Fine-Tuning&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1759686600444&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;resnet = models.resnet50(pretrained=True)
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 33)

resnet = resnet.to(DEVICE)  # GPU로 이동&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;pretrained=True로 &lt;b&gt;ImageNet 사전학습 가중치&lt;/b&gt;를 불러와서 train&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;&lt;b&gt;Partial Fine-Tuning&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1759686643421&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;resnet = models.resnet50(pretrained=True)
print(&quot;&amp;gt;&amp;gt;&amp;gt; Using Partial Fine-Tuning version (Feature Extractor)&quot;)

for param in resnet.parameters():
    param.requires_grad = False  # Feature Extractor 부분 Freeze

num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 33)

for param in resnet.fc.parameters():
    param.requires_grad = True   # 마지막 FC만 학습

resnet = resnet.to(DEVICE)  # GPU로 이동&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3166&quot; data-start=&quot;3114&quot;&gt;pretrained=True로 &lt;b&gt;ImageNet 사전학습 가중치&lt;/b&gt;를 불러옵니다.&lt;/li&gt;
&lt;li data-end=&quot;3243&quot; data-start=&quot;3167&quot;&gt;requires_grad=False &amp;rarr; &lt;b&gt;ResNet의 백본(Feature Extractor)&lt;/b&gt; 부분을 동결시켜 학습 제외&lt;/li&gt;
&lt;li data-end=&quot;3337&quot; data-start=&quot;3293&quot;&gt;requires_grad=True : 새로운 FC 레이어만 학습 대상&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 손실함수 / 옵티마이저 / 스케줄러 설정 &lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1759686723964&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;criterion = nn.CrossEntropyLoss()  # 다중 클래스 분류용 Loss

optimizer_ft = optim.Adam(
    filter(lambda p: p.requires_grad, resnet.parameters()),
    lr=0.001
)

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;225&quot; data-origin-height=&quot;84&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cN5a37/btsQ2jaJlqP/3nlKktbqysRZe6WKK8xmt1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cN5a37/btsQ2jaJlqP/3nlKktbqysRZe6WKK8xmt1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cN5a37/btsQ2jaJlqP/3nlKktbqysRZe6WKK8xmt1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcN5a37%2FbtsQ2jaJlqP%2F3nlKktbqysRZe6WKK8xmt1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;225&quot; height=&quot;84&quot; data-origin-width=&quot;225&quot; data-origin-height=&quot;84&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;From Scratch&lt;/b&gt;는 가중치를 처음부터 학습했음에도 95.61%로 높은 성능을 보였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Full Fine-Tuning&lt;/b&gt;은 사전학습된 가중치를 기반으로 모든 레이어를 재학습해 99.13%의 최고 정확도를 기록했다.&lt;br /&gt;반면 예상외로 &lt;b&gt;Partial Fine-Tuning&lt;/b&gt;은 Feature Extractor를 고정하고 FC 레이어만 학습해 79.19%로 가장 낮은 정확도를 보였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;데이터셋의 이미지 특성이 ImageNet과 달라서 그런 것 같기도 하고 모델이 &lt;b&gt;새로운 데이터셋의 특성에 충분히 적응하지 못했던 것 같다.&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;검증 및 결과&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1759687129733&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

def evaluate(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(DEVICE), target.to(DEVICE)
            output = model(data)

            test_loss += F.cross_entropy(output, target, reduction='sum').item()

            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_accuracy = 100. * correct / len(test_loader.dataset)
    return test_loss, test_accuracy

if __name__ == '__main__':
    USE_CUDA = torch.cuda.is_available()
    DEVICE = torch.device(&quot;cuda&quot; if USE_CUDA else &quot;cpu&quot;)
    BATCH_SIZE = 256

    transform_resNet = transforms.Compose([
        transforms.Resize([64, 64]),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    # Evaluate model
    test_resNet = ImageFolder(root='./splitted/test', transform=transform_resNet)
    test_loader_resNet = torch.utils.data.DataLoader(test_resNet, batch_size=BATCH_SIZE, shuffle=False, num_workers=4)

    '''
    Compare the transfer-learned model with the learned model from scratch
    '''
    resnet = torch.load('resnet50_from_partial_0.7919.pt', weights_only=False)   # change model
    # resnet = torch.load('resnet50_from_pretrained_0.9913.pt') # change model
    # resnet = torch.load('resnet50_from_scratch_0.9561.pt') # change model
    print(resnet)
    resnet.to(DEVICE)
    test_loss, test_accuracy = evaluate(resnet, test_loader_resNet)

    print('test acc: ', test_accuracy)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;ResNet50&lt;/b&gt; Train과 Test 데이터셋으로 검증 결과&amp;nbsp;&lt;/p&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-end=&quot;616&quot; data-start=&quot;194&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody data-end=&quot;616&quot; data-start=&quot;350&quot;&gt;
&lt;tr data-end=&quot;414&quot; data-start=&quot;350&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;369&quot; data-start=&quot;350&quot;&gt;&lt;b&gt;From Scratch&lt;/b&gt;&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;380&quot; data-start=&quot;369&quot;&gt;랜덤 초기화&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;389&quot; data-start=&quot;380&quot;&gt;전체 레이어&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;401&quot; data-start=&quot;389&quot;&gt;약 &lt;b&gt;95%&lt;/b&gt;&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;414&quot; data-start=&quot;401&quot;&gt;&lt;b&gt;89.5%&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;500&quot; data-start=&quot;415&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;438&quot; data-start=&quot;415&quot;&gt;&lt;b&gt;Full Fine-Tuning&lt;/b&gt;&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;462&quot; data-start=&quot;438&quot;&gt;ImageNet pretrained&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;475&quot; data-start=&quot;462&quot;&gt;전체 레이어 재학습&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;487&quot; data-start=&quot;475&quot;&gt;약 &lt;b&gt;99%&lt;/b&gt;&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;500&quot; data-start=&quot;487&quot;&gt;&lt;b&gt;98.5%&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;616&quot; data-start=&quot;501&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;527&quot; data-start=&quot;501&quot;&gt;&lt;b&gt;Partial Fine-Tuning&lt;/b&gt;&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;551&quot; data-start=&quot;527&quot;&gt;ImageNet pretrained&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;590&quot; data-start=&quot;551&quot;&gt;FC layer만 재학습 (feature extractor 고정)&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;602&quot; data-start=&quot;590&quot;&gt;약 &lt;b&gt;79%&lt;/b&gt;&lt;/td&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;616&quot; data-start=&quot;602&quot;&gt;&lt;b&gt;73.3%&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;762&quot; data-start=&quot;638&quot;&gt;&lt;b&gt;Full Fine-Tuning&lt;/b&gt;은 사전학습된 가중치를 기반으로 전체를 재학습하여 가장 높은 정확도를 기록했다.&lt;br /&gt;&amp;rarr; 사전학습 모델의 일반적 시각 특징을 유지하면서, 새로운 도메인에 맞게 최적화되었기 때문.&lt;/li&gt;
&lt;li data-end=&quot;861&quot; data-start=&quot;764&quot;&gt;&lt;b&gt;From Scratch&lt;/b&gt;는 ImageNet 사전학습을 사용하지 않고도 약 89%의 테스트 정확도를 달성했지만,&lt;br /&gt;수렴 속도가 느리고 많은 데이터가 필요했다.&lt;/li&gt;
&lt;li data-end=&quot;1029&quot; data-start=&quot;863&quot;&gt;&lt;b&gt;Partial Fine-Tuning&lt;/b&gt;은 Feature Extractor를 고정했기 때문에&lt;br /&gt;새로운 도메인의 세부적인 질감이나 색상 패턴을 학습하지 못해 &lt;b&gt;성능이 20% 이상 낮게&lt;/b&gt; 나타났다.&lt;br /&gt;즉, 사전학습된 일반적인 특성만으로는 도메인 적응이 어렵다는 한계가 드러났다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 참고자료 &lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://hi-ai0913.tistory.com/32&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://hi-ai0913.tistory.com/32&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1759687899692&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[딥러닝] 전이학습(Transfer learning)과 파인튜닝(Fine tuning)&quot; data-og-description=&quot;전이 학습(Transfer Learning)과 파인 튜닝(Fine-Tuning)은 현대 딥러닝 연구와 실용화에서 핵심적인 역할을 하는 전략입니다. 이들은 특히 데이터가 제한적이거나 특정 작업에 대한 사전 지식이 필요한 &quot; data-og-host=&quot;hi-ai0913.tistory.com&quot; data-og-source-url=&quot;https://hi-ai0913.tistory.com/32&quot; data-og-url=&quot;https://hi-ai0913.tistory.com/32&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/bMZQu3/hyZKcGjv8X/J97nqV7Ba6mzt7KZ6RzCYK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/p4aaY/hyZKxqxIC2/HZDDqtpBcUyxp839aUzCRk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://hi-ai0913.tistory.com/32&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://hi-ai0913.tistory.com/32&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/bMZQu3/hyZKcGjv8X/J97nqV7Ba6mzt7KZ6RzCYK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/p4aaY/hyZKxqxIC2/HZDDqtpBcUyxp839aUzCRk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[딥러닝] 전이학습(Transfer learning)과 파인튜닝(Fine tuning)&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;전이 학습(Transfer Learning)과 파인 튜닝(Fine-Tuning)은 현대 딥러닝 연구와 실용화에서 핵심적인 역할을 하는 전략입니다. 이들은 특히 데이터가 제한적이거나 특정 작업에 대한 사전 지식이 필요한&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;hi-ai0913.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/95&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.06.14 - [Technology Notes] - [Deep Learning] Transfer Learning과 Knowledge distillation&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1759688010279&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[Deep Learning] Transfer Learning과 Knowledge distillation&quot; data-og-description=&quot;Pre-trained Model 개념 Pre-trained Model (사전 학습 모델)은 대규모 데이터셋으로 이미 학습이 끝난 모델.이 모델은 특정 문제를 풀기 위해서 처음부터 학습한 것이 아니라, 충분히 크고 일반적인 데이&quot; data-og-host=&quot;c0mputermaster.tistory.com&quot; data-og-source-url=&quot;https://c0mputermaster.tistory.com/95&quot; data-og-url=&quot;https://c0mputermaster.tistory.com/95&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/vRiuh/hyZKzhAh32/v9uqc28AIg07VJuRYMYGA0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/btLRuT/hyZKjZHA7H/ke2To47ORHPXV7O4k5G2kK/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/Oabbq/hyZKHSDpWm/KiVwpXHjeo7lKJWEN8dmq1/img.png?width=816&amp;amp;height=394&amp;amp;face=0_0_816_394&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/95&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://c0mputermaster.tistory.com/95&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/vRiuh/hyZKzhAh32/v9uqc28AIg07VJuRYMYGA0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/btLRuT/hyZKjZHA7H/ke2To47ORHPXV7O4k5G2kK/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/Oabbq/hyZKHSDpWm/KiVwpXHjeo7lKJWEN8dmq1/img.png?width=816&amp;amp;height=394&amp;amp;face=0_0_816_394');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[Deep Learning] Transfer Learning과 Knowledge distillation&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;Pre-trained Model 개념 Pre-trained Model (사전 학습 모델)은 대규모 데이터셋으로 이미 학습이 끝난 모델.이 모델은 특정 문제를 풀기 위해서 처음부터 학습한 것이 아니라, 충분히 크고 일반적인 데이&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;c0mputermaster.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;</description>
      <category>Computer Vision1/Project</category>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/114</guid>
      <comments>https://c0mputermaster.tistory.com/114#entry114comment</comments>
      <pubDate>Fri, 19 Sep 2025 18:50:43 +0900</pubDate>
    </item>
    <item>
      <title>[Object Detection] One-Stage Object Detection - YOLO, SSD, RetinaNet</title>
      <link>https://c0mputermaster.tistory.com/112</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;One-Stage Object Detection 개요&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;833&quot; data-origin-height=&quot;460&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bcUPQe/btsQXGqxhvs/0b4DsokpHsCWRn8ikyLH90/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bcUPQe/btsQXGqxhvs/0b4DsokpHsCWRn8ikyLH90/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bcUPQe/btsQXGqxhvs/0b4DsokpHsCWRn8ikyLH90/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbcUPQe%2FbtsQXGqxhvs%2F0b4DsokpHsCWRn8ikyLH90%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;601&quot; height=&quot;332&quot; data-origin-width=&quot;833&quot; data-origin-height=&quot;460&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우리는 이전까지 &lt;b&gt;Two-Stage Object Detection을&amp;nbsp;&lt;/b&gt;리뷰하였고 이번에는 Region Proposal 과정을 생략하고 Bounding Box Regression + Classification 동시에 수행하는&lt;b&gt; One-Stage Detector&lt;/b&gt;를 살펴보겠다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt; YOLO&lt;/b&gt;&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;YOLO&lt;/b&gt; = You Only Look Once.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Region Proposal을 생략하고 &lt;b&gt;한 번의 Forward Pass&lt;/b&gt;로 Detection 수행.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;YOLOv1 &amp;rarr; v2 &amp;rarr; v3 &amp;rarr; &amp;hellip; &amp;rarr; v5, v8, v10까지 지속 개발 되어 요즘도 널리 쓰이는 대표적인 One-Stage Detector이다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;898&quot; data-origin-height=&quot;255&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bAgwZA/btsQ2fy5HDI/jGSZ2uM3dYHtcOG3yvRPgK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bAgwZA/btsQ2fy5HDI/jGSZ2uM3dYHtcOG3yvRPgK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bAgwZA/btsQ2fy5HDI/jGSZ2uM3dYHtcOG3yvRPgK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbAgwZA%2FbtsQ2fy5HDI%2FjGSZ2uM3dYHtcOG3yvRPgK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;726&quot; height=&quot;206&quot; data-origin-width=&quot;898&quot; data-origin-height=&quot;255&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우선 YOLO는 입력 이미지를 &lt;b&gt;S&amp;times;S Grid&lt;/b&gt;로 나눔. (YOLO v1에서는 7&amp;times;7)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;966&quot; data-start=&quot;883&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;909&quot; data-start=&quot;883&quot;&gt;단순히 이미지를 잘라내는 전처리가 아님.&lt;/li&gt;
&lt;li data-end=&quot;966&quot; data-start=&quot;913&quot;&gt;Feature Map의 한 픽셀이 원본 이미지의 일정 영역(64&amp;times;64)을 담당하는 개념.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;각 Grid Cell은&lt;/b&gt; &lt;b&gt;B개의 Bounding Box를&lt;/b&gt; 예측한다. (YOLO v1에서는 B=2).&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1106&quot; data-start=&quot;987&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1080&quot; data-start=&quot;1036&quot;&gt;각 Box는 (x, y, w, h, confidence) 5개 값 출력.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;따라서 &lt;b&gt;한 Grid Cell 출력 차원&lt;/b&gt;: 5*B + C. ( 예: B=2, C=20 &amp;rarr; 30차원. )&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1224&quot; data-start=&quot;1178&quot;&gt;전체 7&amp;times;7=49개의 Grid Cell &amp;rarr; 총 98개의 Box 후보 생성.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1343&quot; data-origin-height=&quot;719&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bIWHXJ/btsQ118ZHSP/Bg10oyoi1bSA8HBrk7wDX0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bIWHXJ/btsQ118ZHSP/Bg10oyoi1bSA8HBrk7wDX0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bIWHXJ/btsQ118ZHSP/Bg10oyoi1bSA8HBrk7wDX0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbIWHXJ%2FbtsQ118ZHSP%2FBg10oyoi1bSA8HBrk7wDX0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;679&quot; height=&quot;364&quot; data-origin-width=&quot;1343&quot; data-origin-height=&quot;719&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;448&amp;times;448&amp;times;3 RGB 이미지 =&amp;gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Backbone: &lt;b&gt;&lt;i&gt;GoogLeNet&lt;/i&gt; &lt;/b&gt;기반 Convolution Network.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Fully Connected Layer를 통해 4096차원 벡터 &amp;rarr; 1470차원 벡터로 변환하는데 이걸 다시 &lt;b&gt;7&amp;times;7&amp;times;30 Feature로 변환&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;558&quot; data-origin-height=&quot;269&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b9qVv0/btsQ1T38bvW/C9xgK0tIFLaRrR4NuDgSH0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b9qVv0/btsQ1T38bvW/C9xgK0tIFLaRrR4NuDgSH0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b9qVv0/btsQ1T38bvW/C9xgK0tIFLaRrR4NuDgSH0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb9qVv0%2FbtsQ1T38bvW%2FC9xgK0tIFLaRrR4NuDgSH0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;504&quot; height=&quot;243&quot; data-origin-width=&quot;558&quot; data-origin-height=&quot;269&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1560&quot; data-start=&quot;1517&quot;&gt;7&amp;times;7 Grid 기준, 각 픽셀은 원본 이미지의 64&amp;times;64 영역 담당.&lt;/li&gt;
&lt;li data-end=&quot;1710&quot; data-start=&quot;1561&quot;&gt;각 Grid Cell 출력 벡터(30차원):
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1710&quot; data-start=&quot;1590&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1630&quot; data-start=&quot;1590&quot;&gt;첫 5개: Box1 (x, y, w, h, confidence).&lt;/li&gt;
&lt;li data-end=&quot;1674&quot; data-start=&quot;1633&quot;&gt;다음 5개: Box2 (x, y, w, h, confidence).&lt;/li&gt;
&lt;li data-end=&quot;1710&quot; data-start=&quot;1677&quot;&gt;나머지 20개: Class Probabilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;최종 Confidence &amp;times; Class Probability = &lt;b&gt;Class-specific Score&lt;/b&gt;을 구함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;데이터셋 클래스 수(C)에 따라 마지막 출력 차원 변경이 필요하&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Loss Function &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1261&quot; data-origin-height=&quot;723&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cEMbxO/btsQ3zpWUTt/RkbGHlmDtPues0vUycWRS1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cEMbxO/btsQ3zpWUTt/RkbGHlmDtPues0vUycWRS1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cEMbxO/btsQ3zpWUTt/RkbGHlmDtPues0vUycWRS1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcEMbxO%2FbtsQ3zpWUTt%2FRkbGHlmDtPues0vUycWRS1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;656&quot; height=&quot;376&quot; data-origin-width=&quot;1261&quot; data-origin-height=&quot;723&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2406&quot; data-start=&quot;2252&quot;&gt;&lt;b&gt;Bounding Box Regression Loss&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2406&quot; data-start=&quot;2293&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2335&quot; data-start=&quot;2293&quot;&gt;(x, y, w, h) 예측값과 Ground Truth 차이 최소화.&lt;/li&gt;
&lt;li data-end=&quot;2369&quot; data-start=&quot;2339&quot;&gt;모든 Grid Cell &amp;times; Box에 대해 계산.&lt;/li&gt;
&lt;li data-end=&quot;2406&quot; data-start=&quot;2373&quot;&gt;&amp;lambda; (lambda)라는 하이퍼파라미터로 가중치 조절.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2546&quot; data-start=&quot;2407&quot;&gt;&lt;b&gt;Confidence Loss&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2546&quot; data-start=&quot;2435&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2470&quot; data-start=&quot;2435&quot;&gt;Object 존재 시: Confidence &amp;rarr; 1 근접.&lt;/li&gt;
&lt;li data-end=&quot;2509&quot; data-start=&quot;2474&quot;&gt;No Object 시: Confidence &amp;rarr; 0 근접.&lt;/li&gt;
&lt;li data-end=&quot;2546&quot; data-start=&quot;2513&quot;&gt;Indicator Function(지시 함수) 사용.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2630&quot; data-start=&quot;2547&quot;&gt;&lt;b&gt;Classification Loss&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2630&quot; data-start=&quot;2579&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2630&quot; data-start=&quot;2579&quot;&gt;클래스별 예측 확률 분포가 Ground Truth One-hot 벡터와 가까워지도록 학습.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- YOLO의 장단점&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;987&quot; data-origin-height=&quot;663&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Oijnb/btsQ2oiqRHP/flAQiNKWD9IKq9Btnfkp5k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Oijnb/btsQ2oiqRHP/flAQiNKWD9IKq9Btnfkp5k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Oijnb/btsQ2oiqRHP/flAQiNKWD9IKq9Btnfkp5k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FOijnb%2FbtsQ2oiqRHP%2FflAQiNKWD9IKq9Btnfkp5k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;533&quot; height=&quot;358&quot; data-origin-width=&quot;987&quot; data-origin-height=&quot;663&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Faster R-CNN 대비 6배 빠르고 Background Error가 적음&lt;/span&gt;&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;420&quot; data-start=&quot;217&quot;&gt;&lt;b&gt;Two-Stage Detector (예: Faster R-CNN)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;420&quot; data-start=&quot;264&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;321&quot; data-start=&quot;264&quot;&gt;RPN(Region Proposal Network)이 수많은 후보 영역(anchors)을 생성.&lt;/li&gt;
&lt;li data-end=&quot;364&quot; data-start=&quot;324&quot;&gt;이 중 많은 영역이 사실상 &lt;b&gt;배경인데도 객체 후보로 전달&lt;/b&gt;됨.&lt;/li&gt;
&lt;li data-end=&quot;420&quot; data-start=&quot;367&quot;&gt;Classification 단계에서 배경을 걸러내야 하므로 &lt;b&gt;FP가 증가&lt;/b&gt;하는 경향.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;644&quot; data-start=&quot;422&quot;&gt;&lt;b&gt;One-Stage Detector (YOLO)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;644&quot; data-start=&quot;458&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;492&quot; data-start=&quot;458&quot;&gt;이미지 전체를 &lt;b&gt;고정된 Grid Cell&lt;/b&gt;로 나눔.&lt;/li&gt;
&lt;li data-end=&quot;553&quot; data-start=&quot;495&quot;&gt;각 Grid Cell은 자신이 담당하는 영역 안에 객체가 있으면 예측, 없으면 &amp;ldquo;배경&amp;rdquo;으로 처리.&lt;/li&gt;
&lt;li data-end=&quot;585&quot; data-start=&quot;556&quot;&gt;불필요한 수천 개의 후보 영역을 만들지 않음.&lt;/li&gt;
&lt;li data-end=&quot;644&quot; data-start=&quot;588&quot;&gt;따라서 &lt;b&gt;배경을 객체로 잘못 잡는 경우(Background Error)가 구조적으로 줄어듦&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하지만 한 Grid Cell은 사실상 하나의 객체만 예측 가능하고 여러 객체가 겹치면 성능 저하 문제가 존재하고, 작은 객체 Localization 한계, 입력 크기 고정.&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; color: #333333; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;614&quot; data-start=&quot;582&quot;&gt;YOLO v1은 최종 7&amp;times;7 Feature만 사용&lt;/li&gt;
&lt;li data-end=&quot;689&quot; data-start=&quot;615&quot;&gt;CNN의 뒤쪽 Feature는 시맨틱 정보(무엇인지)는 잘 잡지만, 공간 정보(어디에 있는지)는 손실됨&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;YOLOv2~v3 이후에는 &lt;b&gt;Multi-Scale Feature Map(FPN)&lt;/b&gt;, &lt;b&gt;Anchor Box&lt;/b&gt;, &lt;b&gt;Stride 축소&lt;/b&gt; 등을 도입해 작은 객체도 잘 잡도록 개선됨.&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;SSD&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;671&quot; data-origin-height=&quot;347&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/sgP2n/btsQ34iVMSS/kipNrsVvCKaW5C9Zc8vbMK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/sgP2n/btsQ34iVMSS/kipNrsVvCKaW5C9Zc8vbMK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/sgP2n/btsQ34iVMSS/kipNrsVvCKaW5C9Zc8vbMK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FsgP2n%2FbtsQ34iVMSS%2FkipNrsVvCKaW5C9Zc8vbMK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;559&quot; height=&quot;289&quot; data-origin-width=&quot;671&quot; data-origin-height=&quot;347&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;487&quot; data-start=&quot;185&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;228&quot; data-start=&quot;185&quot;&gt;&lt;b&gt;SSD = Single Shot MultiBox Detector&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;273&quot; data-start=&quot;229&quot;&gt;이름 그대로 한 번(Shot)에 Detection을 끝내는 구조.&lt;/li&gt;
&lt;li data-end=&quot;316&quot; data-start=&quot;274&quot;&gt;YOLO와 마찬가지로 &lt;b&gt;Region Proposal 과정 없음&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하지만 &lt;b&gt;YOLO와 차별점&lt;/b&gt;은 &lt;b&gt;Multiple Feature Map&lt;/b&gt;을 활용한다는 것이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;i&gt;- YOLO&lt;/i&gt;&lt;/b&gt;: 마지막 하나의 Feature Map만 사용 (7&amp;times;7).&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;i&gt;- SSD:&lt;/i&gt;&lt;/b&gt; 여러 층에서 나온 Feature Map을 동시에 사용 &amp;rarr; 작은 물체부터 큰 물체까지 다양한 크기 탐지 가능.&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; color: #333333; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;514&quot; data-end=&quot;614&quot;&gt;&lt;b&gt;YOLO v1&lt;/b&gt;: 448&amp;times;448 입력 &amp;rarr; CNN &amp;rarr; 마지막 7&amp;times;7 Feature Map 사용.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;576&quot; data-end=&quot;614&quot;&gt;
&lt;li data-start=&quot;576&quot; data-end=&quot;614&quot;&gt;단점: 해상도(7&amp;times;7)가 너무 작아서 작은 물체 검출에 한계.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;615&quot; data-end=&quot;838&quot;&gt;&lt;b&gt;SSD&lt;/b&gt;: 300&amp;times;300 입력 이미지 &amp;rarr; CNN(VGG16) &amp;rarr;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;중간 Feature Map들을 여러 개 사용&lt;/b&gt;.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;690&quot; data-end=&quot;838&quot;&gt;
&lt;li data-start=&quot;690&quot; data-end=&quot;734&quot;&gt;예: 38&amp;times;38, 19&amp;times;19, 10&amp;times;10, 5&amp;times;5, 3&amp;times;3, 1&amp;times;1 등.&lt;/li&gt;
&lt;li data-start=&quot;737&quot; data-end=&quot;803&quot;&gt;각각의 Feature Map에서 Classification + Bounding Box Regression 수행.&lt;/li&gt;
&lt;li data-start=&quot;806&quot; data-end=&quot;838&quot;&gt;따라서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;다양한 스케일의 객체&lt;/b&gt;를 탐지할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; SSD의 핵심 특징 &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;746&quot; data-origin-height=&quot;450&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Ez0jt/btsQ1pvv0h3/klhAiT9ntxrJkGh2k7HDk0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Ez0jt/btsQ1pvv0h3/klhAiT9ntxrJkGh2k7HDk0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Ez0jt/btsQ1pvv0h3/klhAiT9ntxrJkGh2k7HDk0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FEz0jt%2FbtsQ1pvv0h3%2FklhAiT9ntxrJkGh2k7HDk0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;655&quot; height=&quot;395&quot; data-origin-width=&quot;746&quot; data-origin-height=&quot;450&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;890&quot; data-start=&quot;862&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;890&quot; data-start=&quot;862&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(1) Multiple Feature Map&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;890&quot; data-start=&quot;862&quot;&gt;다양한 크기의 Feature Map을 모두 활용.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;1048&quot; data-start=&quot;1016&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(2) Default Box (Anchor Box)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1289&quot; data-start=&quot;1049&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1081&quot; data-start=&quot;1049&quot;&gt;Faster R-CNN의 Anchor 개념을 차용.&lt;/li&gt;
&lt;li data-end=&quot;1139&quot; data-start=&quot;1082&quot;&gt;각 Grid Cell에 여러 개의 &lt;b&gt;Default Box&lt;/b&gt;(미리 정의된 크기와 비율) 설정.&lt;/li&gt;
&lt;li data-end=&quot;1173&quot; data-start=&quot;1140&quot;&gt;예: 한 Cell에 4~6개의 Default Box.&lt;/li&gt;
&lt;li data-end=&quot;1231&quot; data-start=&quot;1174&quot;&gt;네트워크는 이 Default Box를 기준으로 &lt;b&gt;좌표(x, y, w, h) 보정값&lt;/b&gt;을 학습.&lt;/li&gt;
&lt;li data-end=&quot;1289&quot; data-start=&quot;1232&quot;&gt;즉, Ground Truth와 가까워지도록 &lt;b&gt;Default Box &amp;rarr; 실제 객체 박스로 변환&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; SSD 구조 &lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;입력: 300&amp;times;300 (SSD300) / 512&amp;times;512 (SSD512).&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;767&quot; data-origin-height=&quot;413&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/PZ4M4/btsQ2oCJJSa/g5EXqK4ONeluMltfDN8VZ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/PZ4M4/btsQ2oCJJSa/g5EXqK4ONeluMltfDN8VZ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/PZ4M4/btsQ2oCJJSa/g5EXqK4ONeluMltfDN8VZ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FPZ4M4%2FbtsQ2oCJJSa%2Fg5EXqK4ONeluMltfDN8VZ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;576&quot; height=&quot;310&quot; data-origin-width=&quot;767&quot; data-origin-height=&quot;413&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;각 Layer 마다 Feature Map을 구함 (빨간색은 엥커박스 수)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1596&quot; data-start=&quot;1510&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1541&quot; data-start=&quot;1510&quot;&gt;Classification: C개의 클래스 확률.&lt;/li&gt;
&lt;li data-end=&quot;1596&quot; data-start=&quot;1544&quot;&gt;Regression: Default Box 좌표 보정값 (&amp;Delta;x, &amp;Delta;y, &amp;Delta;w, &amp;Delta;h).&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;최종적으로 &lt;b&gt;8732개의 Box 예측(SSD300 기준)&lt;/b&gt;.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1674&quot; data-start=&quot;1639&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1674&quot; data-start=&quot;1639&quot;&gt;YOLO v1: 98개만 예측 &amp;rarr; SSD는 훨씬 많은 후보.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;362&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/eyReSA/btsQ19lMq6p/rg90uV8ft4al8c0ka0NYXK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eyReSA/btsQ19lMq6p/rg90uV8ft4al8c0ka0NYXK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eyReSA/btsQ19lMq6p/rg90uV8ft4al8c0ka0NYXK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeyReSA%2FbtsQ19lMq6p%2Frg90uV8ft4al8c0ka0NYXK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;619&quot; height=&quot;266&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;362&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2509&quot; data-start=&quot;2488&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;하지만 작은 객체 탐지 여전히 약하고 Box 개수 너무 많아서 느려짐 (8732개) &lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2656&quot; data-start=&quot;2513&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2546&quot; data-start=&quot;2513&quot;&gt;앞쪽 Feature Map(큰 해상도)을 사용하지만,&lt;/li&gt;
&lt;li data-end=&quot;2620&quot; data-start=&quot;2550&quot;&gt;CNN 앞단 Feature는 &lt;b&gt;Low-level Feature&lt;/b&gt;라 충분히 &quot;영글지 못한 정보&quot; &amp;rarr; 예측 성능 낮음.&lt;/li&gt;
&lt;li data-end=&quot;2656&quot; data-start=&quot;2624&quot;&gt;실제로 작은 비행기, 자전거, 새 등 검출률 낮음.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Loss Function&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;743&quot; data-origin-height=&quot;335&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/yy8Te/btsQ3vVoccA/5v5tEzyNmyYnWEy8WQZw11/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/yy8Te/btsQ3vVoccA/5v5tEzyNmyYnWEy8WQZw11/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/yy8Te/btsQ3vVoccA/5v5tEzyNmyYnWEy8WQZw11/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fyy8Te%2FbtsQ3vVoccA%2F5v5tEzyNmyYnWEy8WQZw11%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;552&quot; height=&quot;249&quot; data-origin-width=&quot;743&quot; data-origin-height=&quot;335&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1981&quot; data-start=&quot;1948&quot;&gt;Faster R-CNN과 거의 동일한 Loss 사용.:
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;2176&quot; data-start=&quot;1990&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;2071&quot; data-start=&quot;1990&quot;&gt;&lt;b&gt;Classification Loss&lt;/b&gt;: Cross-Entropy (Softmax 기반).
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2071&quot; data-start=&quot;2053&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2071&quot; data-start=&quot;2053&quot;&gt;객체 클래스 예측 정확도.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2176&quot; data-start=&quot;2074&quot;&gt;&lt;b&gt;Bounding Box Regression Loss&lt;/b&gt;: Smooth L1 Loss.
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2176&quot; data-start=&quot;2134&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2176&quot; data-start=&quot;2134&quot;&gt;Default Box &amp;rarr; Ground Truth 박스로의 좌표 보정.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;하지만 SSD도 작은 객체 검출력이 올라가긴 했으나 큰 객체 검출에 비해 낮은 성능을 보였다.&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt; RetinaNet &lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;436&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/o0mhh/btsQ4B1N5LH/OVlPHNtfjF5tgbieVq5lj1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/o0mhh/btsQ4B1N5LH/OVlPHNtfjF5tgbieVq5lj1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/o0mhh/btsQ4B1N5LH/OVlPHNtfjF5tgbieVq5lj1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fo0mhh%2FbtsQ4B1N5LH%2FOVlPHNtfjF5tgbieVq5lj1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;657&quot; height=&quot;340&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;436&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;209&quot; data-start=&quot;153&quot;&gt;Retina = 망막(눈), Net = 신경망 &amp;rarr; 눈처럼 이미지에서 객체를 잘 잡아내고자 함.&lt;/li&gt;
&lt;li data-end=&quot;424&quot; data-start=&quot;210&quot;&gt;&lt;b&gt;One-Stage Detector의 문제&lt;/b&gt;를 해결하기 위해 제안됨.
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;424&quot; data-start=&quot;257&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;326&quot; data-start=&quot;257&quot;&gt;Two-Stage(Faster R-CNN 등): Region Proposal로 배경 대부분을 걸러냄 &amp;rarr; FP 줄어듦.&lt;/li&gt;
&lt;li data-end=&quot;424&quot; data-start=&quot;329&quot;&gt;One-Stage(YOLO, SSD 등): Region Proposal 없음 &amp;rarr; &lt;b&gt;Negative(배경) 샘플이 너무 많아 Class Imbalance 발생&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-end=&quot;445&quot; data-start=&quot;426&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;Class Imbalance&lt;/b&gt;&lt;/h3&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;680&quot; data-start=&quot;446&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;576&quot; data-start=&quot;446&quot;&gt;&lt;b&gt;Foreground vs Background 불균형&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;576&quot; data-start=&quot;487&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;532&quot; data-start=&quot;487&quot;&gt;이미지 대부분은 배경, 객체는 소수 &amp;rarr; 배경 Anchor가 수천~수만 개.&lt;/li&gt;
&lt;li data-end=&quot;576&quot; data-start=&quot;536&quot;&gt;예: COCO dataset &amp;rarr; 객체 &amp;lt; 1%, 배경 &amp;gt; 99%.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;680&quot; data-start=&quot;578&quot;&gt;&lt;b&gt;Foreground 내부 불균형&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;680&quot; data-start=&quot;608&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;651&quot; data-start=&quot;608&quot;&gt;&quot;사람&quot; 클래스는 수십만 개, &quot;드문 객체(예: 산삼..?)&quot;는 몇 개.&lt;/li&gt;
&lt;li data-end=&quot;680&quot; data-start=&quot;655&quot;&gt;데이터셋 내 클래스 분포가 극도로 불균형.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;결과적으로 One-Stage Detector는 &lt;b&gt;쉽고 많은 Negative 샘플&lt;/b&gt;에 학습이 끌려 성능이 떨어진다고 생각&lt;/p&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; RetinaNet의 핵심 아이디어 &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;928&quot; data-origin-height=&quot;440&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/buPqjV/btsQ32SYbuV/uaqLtbN5iY3VVToPtJkR5k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/buPqjV/btsQ32SYbuV/uaqLtbN5iY3VVToPtJkR5k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/buPqjV/btsQ32SYbuV/uaqLtbN5iY3VVToPtJkR5k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbuPqjV%2FbtsQ32SYbuV%2FuaqLtbN5iY3VVToPtJkR5k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;590&quot; height=&quot;280&quot; data-origin-width=&quot;928&quot; data-origin-height=&quot;440&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;text-align: left;&quot; data-end=&quot;799&quot; data-start=&quot;781&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;(1) Focal Loss&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;737&quot; data-origin-height=&quot;168&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Gcstb/btsQ3ExXYuZ/yDETFb3HRoHr38S4juWLPK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Gcstb/btsQ3ExXYuZ/yDETFb3HRoHr38S4juWLPK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Gcstb/btsQ3ExXYuZ/yDETFb3HRoHr38S4juWLPK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FGcstb%2FbtsQ3ExXYuZ%2FyDETFb3HRoHr38S4juWLPK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;553&quot; height=&quot;126&quot; data-origin-width=&quot;737&quot; data-origin-height=&quot;168&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;931&quot; data-start=&quot;800&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;839&quot; data-start=&quot;800&quot;&gt;기본 Cross-Entropy에 &lt;b&gt;가중치(감마)&lt;/b&gt; 추가.&lt;/li&gt;
&lt;li data-end=&quot;931&quot; data-start=&quot;840&quot;&gt;잘 분류되는 &amp;ldquo;쉬운 예제&amp;rdquo;(특히 배경)에 대해서는 Loss를 줄이고,&lt;br /&gt;어려운 예제(희소 클래스, 작은 객체)에는 Loss를 집중(focus).&lt;/li&gt;
&lt;li data-end=&quot;931&quot; data-start=&quot;840&quot;&gt;Negative 샘플 억제, Positive 샘플 강조 &amp;rarr; Class Imbalance 완화.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; (2) Feature Pyramid Network (FPN, Neck 구조) &lt;/b&gt;&lt;/p&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;813&quot; data-origin-height=&quot;403&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dcRdq1/btsQ4mjnXNd/uwbH0kbuUqACrvTcHtRTe0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dcRdq1/btsQ4mjnXNd/uwbH0kbuUqACrvTcHtRTe0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dcRdq1/btsQ4mjnXNd/uwbH0kbuUqACrvTcHtRTe0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdcRdq1%2FbtsQ4mjnXNd%2FuwbH0kbuUqACrvTcHtRTe0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;594&quot; height=&quot;294&quot; data-origin-width=&quot;813&quot; data-origin-height=&quot;403&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;멀티 스케일 객체 검출 성능 개선을 목표로&amp;nbsp;&lt;/b&gt; Backbone(ResNet)과 Head 사이에 &lt;b&gt;Neck = FPN&lt;/b&gt; 삽입&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Neck&lt;span&gt;?&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;744&quot; data-origin-height=&quot;364&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c2ZMKg/btsQ3E5LuOo/ZGEgqBck3s6f8svNj4yYQ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c2ZMKg/btsQ3E5LuOo/ZGEgqBck3s6f8svNj4yYQ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c2ZMKg/btsQ3E5LuOo/ZGEgqBck3s6f8svNj4yYQ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc2ZMKg%2FbtsQ3E5LuOo%2FZGEgqBck3s6f8svNj4yYQ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;631&quot; height=&quot;309&quot; data-origin-width=&quot;744&quot; data-origin-height=&quot;364&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Backbone에서 뽑은 Feature Map을 &lt;b&gt;가공&amp;middot;변환&amp;middot;융합&lt;/b&gt;하는 중간 처리 단계.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;여러 스케일의 Feature를 결합해서 &lt;b&gt;멀티스케일 특징&lt;/b&gt;을 강화.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;749&quot; data-start=&quot;682&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;FPN&lt;/b&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;902&quot; data-origin-height=&quot;417&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/pwXDl/btsQ1XS2gaf/ph4ocVa2o5JXsFJ9PvbHB0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/pwXDl/btsQ1XS2gaf/ph4ocVa2o5JXsFJ9PvbHB0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/pwXDl/btsQ1XS2gaf/ph4ocVa2o5JXsFJ9PvbHB0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FpwXDl%2FbtsQ1XS2gaf%2Fph4ocVa2o5JXsFJ9PvbHB0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;683&quot; height=&quot;316&quot; data-origin-width=&quot;902&quot; data-origin-height=&quot;417&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Scale Invariant 달성 방법에는 여러가지가 있다&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;h3 data-end=&quot;206&quot; data-start=&quot;165&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;A. Multi-scale Input (이미지 피라미드)&lt;/b&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;338&quot; data-start=&quot;207&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;246&quot; data-start=&quot;207&quot;&gt;입력 이미지를 여러 크기로 줄이거나 키워서 CNN에 각각 넣음.&lt;/li&gt;
&lt;li data-end=&quot;279&quot; data-start=&quot;247&quot;&gt;각각에서 Detection을 수행 후 결과를 합침.&lt;/li&gt;
&lt;li data-end=&quot;297&quot; data-start=&quot;280&quot;&gt;장점: 단순하고 효과적.&lt;/li&gt;
&lt;li data-end=&quot;338&quot; data-start=&quot;298&quot;&gt;단점: 입력마다 CNN을 돌려야 하므로 &lt;b&gt;속도가 매우 느림&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-end=&quot;389&quot; data-start=&quot;345&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;B. Single Feature Map (YOLO v1 방식)&lt;/b&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;556&quot; data-start=&quot;390&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;438&quot; data-start=&quot;390&quot;&gt;입력 이미지를 CNN에 넣고, &lt;b&gt;마지막 Feature Map 하나만 사용&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;478&quot; data-start=&quot;439&quot;&gt;예: YOLO v1 &amp;rarr; 7&amp;times;7 Feature Map 기반 예측.&lt;/li&gt;
&lt;li data-end=&quot;490&quot; data-start=&quot;479&quot;&gt;장점: 빠름.&lt;/li&gt;
&lt;li data-end=&quot;556&quot; data-start=&quot;491&quot;&gt;단점:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;556&quot; data-start=&quot;501&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;533&quot; data-start=&quot;501&quot;&gt;해상도가 너무 낮아 &lt;b&gt;작은 객체 탐지에 취약&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;556&quot; data-start=&quot;536&quot;&gt;겹쳐 있는 객체 탐지 어려움.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-end=&quot;602&quot; data-start=&quot;563&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;C. Multi-Feature Map (SSD 방식)&lt;/b&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;815&quot; data-start=&quot;603&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;659&quot; data-start=&quot;603&quot;&gt;CNN의 &lt;b&gt;중간 Feature Map&lt;/b&gt;과 &lt;b&gt;마지막 Feature Map&lt;/b&gt;을 모두 사용.&lt;/li&gt;
&lt;li data-end=&quot;688&quot; data-start=&quot;660&quot;&gt;작은 Feature Map &amp;rarr; 큰 객체 탐지&lt;/li&gt;
&lt;li data-end=&quot;717&quot; data-start=&quot;689&quot;&gt;큰 Feature Map &amp;rarr; 작은 객체 탐지&lt;/li&gt;
&lt;li data-end=&quot;743&quot; data-start=&quot;718&quot;&gt;장점: 다양한 크기의 객체 처리 가능.&lt;/li&gt;
&lt;li data-end=&quot;815&quot; data-start=&quot;744&quot;&gt;단점: 앞단 Feature는 아직 &lt;b&gt;영글지 않은 Low-level Feature&lt;/b&gt;라서 작은 객체 탐지 성능이 떨어짐.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-end=&quot;869&quot; data-start=&quot;822&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;D. Feature Fusion (FPN, RetinaNet 방식)&lt;/b&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1107&quot; data-start=&quot;870&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;886&quot; data-start=&quot;870&quot;&gt;SSD의 단점을 보완.&lt;/li&gt;
&lt;li data-end=&quot;934&quot; data-start=&quot;887&quot;&gt;&lt;b&gt;Higher-level Feature&lt;/b&gt; (의미 풍부, 공간 해상도 낮음)&lt;/li&gt;
&lt;li data-end=&quot;1033&quot; data-start=&quot;935&quot;&gt;&lt;b&gt;Lower-level Feature&lt;/b&gt; (공간 해상도 큼, 의미 정보 적음)&lt;br /&gt;&amp;rarr; 업샘플링 + 1x1 Conv으로 크기와 채널 맞춰서 &lt;b&gt;합성(Fusion)&lt;/b&gt;&lt;/li&gt;
&lt;li data-end=&quot;1066&quot; data-start=&quot;1034&quot;&gt;결과적으로 작은 객체와 큰 객체 모두 탐지가 가능.&lt;/li&gt;
&lt;li data-end=&quot;1107&quot; data-start=&quot;1067&quot;&gt;RetinaNet은 이 구조를 Neck(FPN)으로 채택.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;946&quot; data-origin-height=&quot;417&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/InbGP/btsQ2pn5hdS/XXO8plTwh6YVHP53kfSg4k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/InbGP/btsQ2pn5hdS/XXO8plTwh6YVHP53kfSg4k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/InbGP/btsQ2pn5hdS/XXO8plTwh6YVHP53kfSg4k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FInbGP%2FbtsQ2pn5hdS%2FXXO8plTwh6YVHP53kfSg4k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;699&quot; height=&quot;308&quot; data-origin-width=&quot;946&quot; data-origin-height=&quot;417&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Bilinear Interpolation&lt;/b&gt;을 사용해서 이미지 크기를 늘려서 맞추고 채널은 1X1 Conv로 맞춰서 Feature를 &lt;b&gt;같은 위치 (x,y) 픽셀별, 채널별로 더함&lt;/b&gt;.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Performance&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;785&quot; data-origin-height=&quot;416&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bPNZyL/btsQ37UddFP/6auKLo6f3067uAfqSC3D8K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bPNZyL/btsQ37UddFP/6auKLo6f3067uAfqSC3D8K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bPNZyL/btsQ37UddFP/6auKLo6f3067uAfqSC3D8K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbPNZyL%2FbtsQ37UddFP%2F6auKLo6f3067uAfqSC3D8K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;629&quot; height=&quot;333&quot; data-origin-width=&quot;785&quot; data-origin-height=&quot;416&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 One-Stage Detector(YOLO, SSD)보다 &lt;b&gt;AP(정확도) 크게 향상&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; 최근 연구 동향&lt;/b&gt;&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;h2 data-end=&quot;112&quot; data-start=&quot;93&quot; data-ke-size=&quot;size26&quot;&gt;1. Backbone의 다양화&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;212&quot; data-start=&quot;113&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;188&quot; data-start=&quot;113&quot;&gt;기존 ResNet 대신 &lt;b&gt;EfficientNet&lt;/b&gt; 계열을 Backbone으로 활용한 Object Detection 기법 등장&lt;/li&gt;
&lt;li data-end=&quot;212&quot; data-start=&quot;189&quot;&gt;효율성과 정확도를 동시에 추구하는 흐름&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;244&quot; data-start=&quot;214&quot; data-ke-size=&quot;size26&quot;&gt;2. Transformer 기반 Detection&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;361&quot; data-start=&quot;245&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;284&quot; data-start=&quot;245&quot;&gt;&lt;b&gt;DETR (Detection Transformer)&lt;/b&gt; 등장&lt;/li&gt;
&lt;li data-end=&quot;314&quot; data-start=&quot;285&quot;&gt;이후 다양한 변형 모델들이 개발되어 성능 개선&lt;/li&gt;
&lt;li data-end=&quot;361&quot; data-start=&quot;315&quot;&gt;CNN 기반 한계를 넘어선 Transformer 기반 검출 구조가 활발히 연구됨&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;394&quot; data-start=&quot;363&quot; data-ke-size=&quot;size26&quot;&gt;3. Anchor 문제와 Anchor-free 기법&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;624&quot; data-start=&quot;395&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;483&quot; data-start=&quot;395&quot;&gt;Faster R-CNN, SSD, RetinaNet 등은 Anchor Box를 사전 정의해야 했음 &amp;rarr; 사람이 개입 필요, Hyperparameter 의존적&lt;/li&gt;
&lt;li data-end=&quot;513&quot; data-start=&quot;484&quot;&gt;단점: 사전 정의 크기&amp;middot;비율에 따라 성능이 달라짐&lt;/li&gt;
&lt;li data-end=&quot;624&quot; data-start=&quot;514&quot;&gt;해결책: &lt;b&gt;Anchor-free Detector&lt;/b&gt; 개발 (대표: &lt;b&gt;FCOS&lt;/b&gt;)
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;624&quot; data-start=&quot;568&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;624&quot; data-start=&quot;568&quot;&gt;객체 중심점(center point) + 크기 비율을 직접 학습 &amp;rarr; Anchor 사전 정의 불필요&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;651&quot; data-start=&quot;626&quot; data-ke-size=&quot;size26&quot;&gt;4. 3D Object Detection&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;833&quot; data-start=&quot;652&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;690&quot; data-start=&quot;652&quot;&gt;기존: 2D 이미지 입력 &amp;rarr; 2D Bounding Box 출력&lt;/li&gt;
&lt;li data-end=&quot;748&quot; data-start=&quot;691&quot;&gt;최근: LiDAR 등 3D 센서를 활용 &amp;rarr; &lt;b&gt;입체적 Bounding Box (육면체)&lt;/b&gt; 출력&lt;/li&gt;
&lt;li data-end=&quot;777&quot; data-start=&quot;749&quot;&gt;자율주행, 로보틱스 등에서 필수 기술로 발전&lt;/li&gt;
&lt;li data-end=&quot;833&quot; data-start=&quot;778&quot;&gt;대표적 기법: &lt;b&gt;Point-based 3D Detection, CVF, 3D DETR 계열&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;862&quot; data-start=&quot;835&quot; data-ke-size=&quot;size26&quot;&gt;5. 멀티모달 Object Detection&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1062&quot; data-start=&quot;863&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;961&quot; data-start=&quot;863&quot;&gt;단순히 이미지 속 모든 객체를 찾는 것이 아니라, &lt;b&gt;텍스트 Prompt 기반 검출&lt;/b&gt; 연구 확산
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;961&quot; data-start=&quot;924&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;961&quot; data-start=&quot;924&quot;&gt;예: &amp;ldquo;사람만 찾아줘&amp;rdquo;, &amp;ldquo;자전거만 찾아줘&amp;rdquo; 같은 요청 수행&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1062&quot; data-start=&quot;962&quot;&gt;대표 사례: &lt;b&gt;GLIP&lt;/b&gt; (Grounded Language-Image Pretraining)
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1062&quot; data-start=&quot;1022&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1062&quot; data-start=&quot;1022&quot;&gt;대규모 Foundation Model 기반, 텍스트와 이미지 융합&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Computer Vision1/Paper reviews</category>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/112</guid>
      <comments>https://c0mputermaster.tistory.com/112#entry112comment</comments>
      <pubDate>Fri, 12 Sep 2025 00:15:29 +0900</pubDate>
    </item>
    <item>
      <title>[Generative AI] Flow Matching for Generative Modeling</title>
      <link>https://c0mputermaster.tistory.com/111</link>
      <description>&lt;blockquote data-ke-style=&quot;style1&quot;&gt;&lt;span style=&quot;font-family: 'Noto Serif KR';&quot;&gt;이 리뷰는 오직 학습과 참고 목적으로 작성되었으며, 해당 논문을 통해 얻은 통찰력과 지식을 공유하고자 하는 의도에서 작성된 것입니다. 본 리뷰를 통해 수익을 창출하는 것이 아니라, 제 학습과 연구를 위한 공부의 일환으로 작성되었음을 미리 알려드립니다.&lt;/span&gt;&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;blockquote style=&quot;color: #666666; text-align: left;&quot; data-ke-style=&quot;style2&quot;&gt;이 논문은 Continuous Normalizing Flows(CNF)를 시뮬레이션 없이 효율적으로 학습할 수 있는 방법으로 Flow Matching(FM)을 제안한다. 최근 생성 모델에서 Flow Matching 관심을 끌고 있는 만큼, 논문을 리뷰해 보게 되었다.&lt;/blockquote&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/2210.02747&quot;&gt;https://arxiv.org/abs/2210.02747&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758275884445&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;Flow Matching for Generative Modeling&quot; data-og-description=&quot;We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs base&quot; data-og-host=&quot;arxiv.org&quot; data-og-source-url=&quot;https://arxiv.org/abs/2210.02747&quot; data-og-url=&quot;https://arxiv.org/abs/2210.02747v2&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/rkFZv/hyZJiGBY29/pYek3xzUcTZwEhfNOPh0ck/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cryNok/hyZJpFHwrY/ASA3Glq9rdfCm8jQanIfH0/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/2210.02747&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://arxiv.org/abs/2210.02747&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/rkFZv/hyZJiGBY29/pYek3xzUcTZwEhfNOPh0ck/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cryNok/hyZJpFHwrY/ASA3Glq9rdfCm8jQanIfH0/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Flow Matching for Generative Modeling&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs base&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;arxiv.org&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot; data-start=&quot;154&quot; data-end=&quot;172&quot;&gt;1. Introduction&lt;/h2&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;최근 몇 년간 생성 모델 분야에서 디퓨전(diffusion) 계열 모델은 탁월한 성능을 보여주며 사실상 주류로 자리 잡았다. 여러 단계에 걸친 반복적 샘플링을 통해 고품질의 이미지를 생성할 수 있다는 점은 큰 장점이지만, 동시에 이러한 과정은 추론 속도를 심각하게 저하시킨다는 한계를 지닌다.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;본 논문은 Continuous Normalizing Flows(CNF)를 시뮬레이션 없이&lt;span style=&quot;text-align: left;&quot;&gt;(simulation-free)&lt;/span&gt;&amp;nbsp;효율적으로 학습할 수 있는 새로운 훈련 방법으로서 Flow Matching(FM)을 제안한다. 이 글에서는 해당 논문을 중심으로 Flow Matching의 핵심 아이디어와 기여점을 정리하고자 한다.&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&lt;b&gt;Simulation-Free?&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;CNF? 하나씩 알아보자&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;2. Continuous Normalizing Flows&lt;/h2&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;Recap) 우선 이전 생성모델의 구조를 살펴보자&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1280&quot; data-origin-height=&quot;885&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dz8hrP/btsQzxMcdyp/wRrtfX47Tt0qlKneOUZidK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dz8hrP/btsQzxMcdyp/wRrtfX47Tt0qlKneOUZidK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dz8hrP/btsQzxMcdyp/wRrtfX47Tt0qlKneOUZidK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fdz8hrP%2FbtsQzxMcdyp%2FwRrtfX47Tt0qlKneOUZidK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;688&quot; height=&quot;476&quot; data-origin-width=&quot;1280&quot; data-origin-height=&quot;885&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&lt;b&gt;생성모델&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;: 데이터의 분포를 학습 (&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Source(p0)&lt;/b&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;분포에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;Target(p1)&lt;/b&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;분포로 변화되는 과정을 학습 )&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;174&quot; data-end=&quot;339&quot;&gt;&lt;b&gt;-&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;GAN,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;VAE&lt;/b&gt;&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;573&quot; data-end=&quot;645&quot;&gt;대부분의 생성 모델은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;샘플링이 쉬운 분포&lt;/b&gt;(예: 가우시안 분포) z에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;데이터 분포&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;x로의 변환을 학습한다.&lt;/li&gt;
&lt;li data-start=&quot;186&quot; data-end=&quot;249&quot;&gt;&lt;b&gt;GAN&lt;/b&gt;: z &amp;rarr; x 변환을 적대적 학습(Adversarial Training)을 통해 학습한다.&lt;/li&gt;
&lt;li data-start=&quot;250&quot; data-end=&quot;308&quot;&gt;&lt;b&gt;VAE&lt;/b&gt;: z &amp;rarr; x 생성과 x &amp;rarr; z 인코딩을 동시에 학습해 잠재공간을 정규화한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;904&quot; data-origin-height=&quot;464&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/QnpIB/btsQzcbnUKT/zVz3t83FLQuicQgqNN1zxk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/QnpIB/btsQzcbnUKT/zVz3t83FLQuicQgqNN1zxk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/QnpIB/btsQzcbnUKT/zVz3t83FLQuicQgqNN1zxk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FQnpIB%2FbtsQzcbnUKT%2FzVz3t83FLQuicQgqNN1zxk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;548&quot; height=&quot;281&quot; data-origin-width=&quot;904&quot; data-origin-height=&quot;464&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;=&amp;gt; GAN, VAE&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;같은 모델은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Source(p0)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;분포에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Target(p1)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;분포를 한번에 mapping하는 방식&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Diffusion model&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;319&quot; data-origin-height=&quot;183&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b8Zwnd/btsQAR5zcEE/6PFJALEFH4i7PfHtJETNm1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b8Zwnd/btsQAR5zcEE/6PFJALEFH4i7PfHtJETNm1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b8Zwnd/btsQAR5zcEE/6PFJALEFH4i7PfHtJETNm1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb8Zwnd%2FbtsQAR5zcEE%2F6PFJALEFH4i7PfHtJETNm1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;392&quot; height=&quot;225&quot; data-origin-width=&quot;319&quot; data-origin-height=&quot;183&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; color: #333333; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;404&quot; data-end=&quot;528&quot;&gt;&lt;b&gt;Diffusion model&lt;/b&gt;: x &amp;rarr; z 방향으로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;점진적으로 노이즈를 주입&lt;/b&gt;하는 과정을 학습하고, 반대로 z &amp;rarr; x 방향으로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;노이즈 제거(Denoising)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;과정을 통해 데이터를 생성한다. =&amp;gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;노이즈 제거 함수&lt;/b&gt;를 학습&lt;/li&gt;
&lt;li data-start=&quot;404&quot; data-end=&quot;528&quot;&gt;복잡한 데이터 분포 학습 가능, 학습이 효율적&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;But&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;생성속도가 느림&lt;/li&gt;
&lt;li data-start=&quot;404&quot; data-end=&quot;528&quot;&gt;하지만 생성속도가 느림 ( 여러번의&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;model forward )&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;=&amp;gt; Diffusion model Source(p0)&lt;/b&gt;분포에서&lt;b&gt;Target(p1)&lt;/b&gt;분포로 단계적으로 변화&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;-&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Normalizing flow (&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;b&gt;Flow 모델&lt;/b&gt;&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;, NICE )&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;250&quot; data-end=&quot;308&quot;&gt;&lt;b&gt;Normalizing flow&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;: x &amp;rarr; z로의&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;가역적 변환(flow)&lt;/b&gt;을 학습하고, 역변환을 통해 z &amp;rarr; x 생성을 수행한다.&lt;/li&gt;
&lt;li data-start=&quot;250&quot; data-end=&quot;308&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #474747; text-align: start;&quot;&gt;&lt;b&gt;Likelihood&lt;/b&gt;를 계산 가능하다는 장점이 있음 =&amp;gt; 확률추정이 가능&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;583&quot; data-origin-height=&quot;327&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/CcWnx/btsQxvIUqDF/67tWVjb5cLtlwkB0wpexh0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/CcWnx/btsQxvIUqDF/67tWVjb5cLtlwkB0wpexh0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/CcWnx/btsQxvIUqDF/67tWVjb5cLtlwkB0wpexh0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FCcWnx%2FbtsQxvIUqDF%2F67tWVjb5cLtlwkB0wpexh0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;442&quot; height=&quot;248&quot; data-origin-width=&quot;583&quot; data-origin-height=&quot;327&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Normalizing flow&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;모델은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;역변환이 가능한 모델 구조&lt;/b&gt;(Jacobian determinant 계산 가능)가 필요하고, 학습이 비효율적이다 ( 역변환을 계속 해야하기 때문에 ).&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://angeloyeo.github.io/2020/07/24/Jacobian.html&quot;&gt;https://angeloyeo.github.io/2020/07/24/Jacobian.html&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758275897260&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;자코비안(Jacobian) 행렬의 기하학적 의미 - 공돌이의 수학정리노트 (Angelo's Math Notes)&quot; data-og-description=&quot;&quot; data-og-host=&quot;angeloyeo.github.io&quot; data-og-source-url=&quot;https://angeloyeo.github.io/2020/07/24/Jacobian.html&quot; data-og-url=&quot;https://angeloyeo.github.io/2020/07/24/Jacobian.html&quot; data-og-image=&quot;&quot;&gt;&lt;a href=&quot;https://angeloyeo.github.io/2020/07/24/Jacobian.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://angeloyeo.github.io/2020/07/24/Jacobian.html&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url();&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;자코비안(Jacobian) 행렬의 기하학적 의미 - 공돌이의 수학정리노트 (Angelo's Math Notes)&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;angeloyeo.github.io&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;Flow model?&lt;/b&gt;&lt;/h2&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Source(p0)&lt;/b&gt;를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Target(p1)로 변환하는 Flow를 찾는 모델&amp;nbsp;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;923&quot; data-origin-height=&quot;356&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/lIyg0/btsQwBQFIDc/RTk1VRkbkakHbag46s0JaK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/lIyg0/btsQwBQFIDc/RTk1VRkbkakHbag46s0JaK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/lIyg0/btsQwBQFIDc/RTk1VRkbkakHbag46s0JaK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FlIyg0%2FbtsQwBQFIDc%2FRTk1VRkbkakHbag46s0JaK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;780&quot; height=&quot;301&quot; data-origin-width=&quot;923&quot; data-origin-height=&quot;356&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;309&quot; data-end=&quot;380&quot;&gt;&lt;b&gt;Flow model&lt;/b&gt;: source 분포  0를 target 분포  1으로 변환해주는 flow( )를 찾는 모델&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;i&gt;&lt;b&gt;싸이t&lt;/b&gt;&lt;/i&gt;&lt;/li&gt;
&lt;li data-start=&quot;309&quot; data-end=&quot;380&quot;&gt;&lt;b&gt;Flow?&lt;/b&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt; 0를   로 Mapping해주는 함수 (&lt;i&gt;&lt;b&gt;Diffeomorphism &amp;lt;= 특징&lt;/b&gt;&lt;/i&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;[미분가능 / 역함수 존재])&lt;/span&gt;&lt;/li&gt;
&lt;li data-start=&quot;309&quot; data-end=&quot;380&quot;&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;하지만&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Continuous time&lt;/b&gt;(중간중간 샘플이 없는) 상황에서 flow를 직접적으로 학습하기는 어려움&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;이해를 돕자면&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Flow는&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;데이터 분포와 간단한 분포(보통 정규분포)를 연결하는 변환이며&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;220&quot; data-origin-height=&quot;34&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bxteZL/btsQxEs2aSl/koj12zSYySgEKr2UBuEf9K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bxteZL/btsQxEs2aSl/koj12zSYySgEKr2UBuEf9K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bxteZL/btsQxEs2aSl/koj12zSYySgEKr2UBuEf9K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbxteZL%2FbtsQxEs2aSl%2Fkoj12zSYySgEKr2UBuEf9K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;220&quot; height=&quot;34&quot; data-origin-width=&quot;220&quot; data-origin-height=&quot;34&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;f&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;theta;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;자체&lt;/b&gt;를 학습해서 데이터 분포를 얻음, 즉&amp;nbsp;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Flow 자체를 파라미터화해서 학습&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;u&gt;그래서 Flow model에서는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;b&gt;vector field&lt;/b&gt;&lt;/b&gt;를 통해 flow를 간접적으로 계산&lt;/u&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Velocity Field (&lt;b&gt;vector field)&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1607&quot; data-origin-height=&quot;746&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/yMKiO/btsQxKzZxpk/8LtQcttTY1eO0bw6nJSrh0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/yMKiO/btsQxKzZxpk/8LtQcttTY1eO0bw6nJSrh0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/yMKiO/btsQxKzZxpk/8LtQcttTY1eO0bw6nJSrh0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FyMKiO%2FbtsQxKzZxpk%2F8LtQcttTY1eO0bw6nJSrh0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;640&quot; height=&quot;297&quot; data-origin-width=&quot;1607&quot; data-origin-height=&quot;746&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;F&lt;/span&gt;low를 간접적으로 계산하는 과정, 간단하게  (X) = Flow를 t에 대해 미분하는 것&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 한마디로 Velocity Field는 각각의 포인트에서 화살표 방향이다. 어디로 갈지, 즉 각각의 점에서 어디로 움직여야 소스에서 타겟으로 옮겨갈 수 있나를 알려주는것을 Velocity Field라고 한다. ODE로 정의됨&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;i&gt;&lt;b&gt;Flow의 미분을 통해 Velocity filed를 구하고 Solver를 통해 다시 Flow를 구할 수 있음&lt;/b&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1589&quot; data-origin-height=&quot;770&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/P3lMA/btsQA2Mvgz8/Pzl4A87D15j8GHJYM82DLk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/P3lMA/btsQA2Mvgz8/Pzl4A87D15j8GHJYM82DLk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/P3lMA/btsQA2Mvgz8/Pzl4A87D15j8GHJYM82DLk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FP3lMA%2FbtsQA2Mvgz8%2FPzl4A87D15j8GHJYM82DLk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;594&quot; height=&quot;288&quot; data-origin-width=&quot;1589&quot; data-origin-height=&quot;770&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;i&gt;&lt;b&gt;&lt;b&gt;스탭h를 지정한뒤&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;Velocity Field와 Solver&lt;/b&gt;&lt;/i&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;를 통해 t일 떄 타겟분포 Xt를 구할 수 있다.&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;+ Probability Paths&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1381&quot; data-origin-height=&quot;345&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bj2AB6/btsQyu4x5FO/IyQqnbfkhKXCsyWFLoVWxk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bj2AB6/btsQyu4x5FO/IyQqnbfkhKXCsyWFLoVWxk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bj2AB6/btsQyu4x5FO/IyQqnbfkhKXCsyWFLoVWxk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbj2AB6%2FbtsQyu4x5FO%2FIyQqnbfkhKXCsyWFLoVWxk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;728&quot; height=&quot;182&quot; data-origin-width=&quot;1381&quot; data-origin-height=&quot;345&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Probability Paths(pt) = 그렇게 source 분포  0에서 target 분포  1로 가는 과정의 t시점의 분포를 말함&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;646&quot; data-end=&quot;803&quot;&gt;&lt;b&gt;Normalizing Flow(NF)&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;678&quot; data-end=&quot;803&quot;&gt;
&lt;li data-start=&quot;678&quot; data-end=&quot;738&quot;&gt;x &amp;rarr; z (데이터 분포 &amp;rarr; 잠재 분포)로 가는 가역적인 변환 f를 뉴럴 네트워크가 학습한다.&lt;/li&gt;
&lt;li data-start=&quot;741&quot; data-end=&quot;803&quot;&gt;학습 후에는 z0 ~ N(0, I)를 샘플링한 뒤 f⁻&amp;sup1;(z0)을 통해 데이터를 생성할 수 있다.&lt;/li&gt;
&lt;li data-start=&quot;741&quot; data-end=&quot;803&quot;&gt;쉽게 말하자면&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;u&gt;&lt;span style=&quot;color: #666666; text-align: left;&quot;&gt;데이터 분포인&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: left;&quot; data-mathml=&quot;&amp;lt;math xmlns=&amp;quot;http://www.w3.org/1998/Math/MathML&amp;quot;&amp;gt;&amp;lt;mi&amp;gt;x&amp;lt;/mi&amp;gt;&amp;lt;/math&amp;gt;&quot;&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: left;&quot;&gt;에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: left;&quot; data-mathml=&quot;&amp;lt;math xmlns=&amp;quot;http://www.w3.org/1998/Math/MathML&amp;quot;&amp;gt;&amp;lt;mi&amp;gt;z&amp;lt;/mi&amp;gt;&amp;lt;/math&amp;gt;&quot;&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;z&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: left;&quot;&gt;로의 역변환이 가능한 함수(Flow)를 학습하는 모델&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;805&quot; data-end=&quot;1085&quot;&gt;&lt;b&gt;Continuous Normalizing Flow(CNF)&lt;/b&gt;:&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;849&quot; data-end=&quot;1085&quot;&gt;
&lt;li data-start=&quot;875&quot; data-end=&quot;921&quot;&gt;변환 함수 자체를 학습하는 대신,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;vector field&lt;/b&gt;를 학습한다.&lt;/li&gt;
&lt;li data-start=&quot;924&quot; data-end=&quot;1020&quot;&gt;시간축 t &amp;isin; [0, 1]을 따라&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;정의된 흐름(flow)&lt;/b&gt;에서,
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;969&quot; data-end=&quot;1020&quot;&gt;
&lt;li data-start=&quot;969&quot; data-end=&quot;990&quot;&gt;&lt;i&gt;&lt;b&gt;t=0일&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;때 분포는 z&lt;/i&gt;,&lt;/li&gt;
&lt;li data-start=&quot;995&quot; data-end=&quot;1020&quot;&gt;&lt;b&gt;t=1일 때 분포는 x가 된다&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;1023&quot; data-end=&quot;1085&quot;&gt;따라서, z에서 출발해&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;ODE Solver&lt;/b&gt;로 적분하면 최종적으로 데이터 분포 샘플을 생성할 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;1087&quot; data-end=&quot;1114&quot;&gt;&lt;b&gt;장점:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;역변환 가능해야 한다는 제약 해소.&lt;/li&gt;
&lt;li data-start=&quot;1115&quot; data-end=&quot;1208&quot;&gt;&lt;b&gt;단점&lt;/b&gt;: 학습과 샘플링 과정에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;ODE Solver(수치적분)&lt;/b&gt;을 반복적으로 호출해야 하므로 비용이 크다. &amp;rarr; Diffusion의 느린 샘플링 문제와 유사, &quot;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;ODE의 적분에는 많은 시간이 걸린다&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;NF&lt;/b&gt;가 복잡한 분포에 맞게 함수 파라미터 최적화 한다면 CNF는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;u&gt;&lt;b&gt;시간에 따라 분포가 맞게 흘러가도록 vector field 최적화&lt;/b&gt;&lt;/u&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;정리:&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Data&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;: d차원 벡터 공간에 존재하는 포인트&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Flow(ϕ)&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;: 데이터를 시간에 따라 연속적으로 변환하는 함수. ODE(미분 방정식)&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Vector Field&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;: 데이터 공간의 각 위치에서 어떤 방향으로 얼마나 이동할지를 나타내는 정보&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Probability density path(p)&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;: 시간에 따라 변화하는 확률 밀도 함수&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Continuous Normalizing Flow(CNF)&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;: 위의 vector field를 neural network로 나타낸 모델,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;연속적인 시간 변화에 따른 데이터 변환을 모델링. 즉, 간단한 데이터 분포에서 복잡한 분포로 변환하는 역할&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;Flow Matching (FM)&lt;/h2&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Flow mathcing&lt;/b&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;은 적분 없이 Velocity Field를 학습, 즉&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;&lt;b&gt;&lt;b&gt;Flow Matching은 CNF를 학습하기 위한 새로운 목적 함수로 vector field를 학습하도록 하는 것&lt;/b&gt;&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;CNF는 굉장히 강력한 프레임워크이지만 많은 데이터셋에 대해서 학습하기가 굉장히 어렵다. 학습 과정 중에 적분을 수행해야 하기 때문에 ODE Solver를 통해서 여러번의 forward가 필요하기 때문이다. 간단하게 말해서, Diffusion 모델의 느린 샘플링 과정을 매 학습마다 수행해야 하는 것이다. 목표인 Flow Matching이 어떻게&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;simulation-free&lt;/b&gt;, 즉 실제 적분 과정 없이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;CNF&lt;/b&gt;를 학습 가능하게 하는지를 알아보자.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;356&quot; data-origin-height=&quot;36&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/njEa7/btsQzaY30yF/I79ByBPWTPqGgUWJqzs1O1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/njEa7/btsQzaY30yF/I79ByBPWTPqGgUWJqzs1O1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/njEa7/btsQzaY30yF/I79ByBPWTPqGgUWJqzs1O1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnjEa7%2FbtsQzaY30yF%2FI79ByBPWTPqGgUWJqzs1O1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;356&quot; height=&quot;36&quot; data-origin-width=&quot;356&quot; data-origin-height=&quot;36&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;CNF처럼&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;vector field&lt;/b&gt;를 학습하지만 직접 밀도를 계산하지 않고 학습해야한다. x1는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;우리가 알지 못하는 q(x1)데이터의 분포&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;에서 얻은 샘플들로, 생성 모델을 훈련하고자 하는 데이터셋이다.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;pt&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;를 정의할 것인데,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;는 우리가 알고 있는 쉬운 분포p&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;가&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;p0&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;이고,&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;p1&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;&amp;nbsp;되도록 하고 싶다.&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;570&quot; data-origin-height=&quot;208&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d4WpJ5/btsQzM4J0zL/S7nW45pNjdmKxu69CwJf8K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d4WpJ5/btsQzM4J0zL/S7nW45pNjdmKxu69CwJf8K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d4WpJ5/btsQzM4J0zL/S7nW45pNjdmKxu69CwJf8K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd4WpJ5%2FbtsQzM4J0zL%2FS7nW45pNjdmKxu69CwJf8K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;556&quot; height=&quot;203&quot; data-origin-width=&quot;570&quot; data-origin-height=&quot;208&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: left;&quot;&gt;데이터 샘플을 갖고 있지만, 데이터 분포 함수 자체는 알지 못한 상태에서 목표&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;u&lt;/b&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;v&lt;/b&gt;가 같아지도록&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;regression,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: left;&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;손실 함수가 0에 가까워지면 CNF 모델이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;probability path p&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;를 생성할 수 있다. 하지만 우리는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;p&lt;/b&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;u&lt;/b&gt;를 모르기 때문에 F&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;low Matching은 probability path에 대한 supervision&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;을&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;사용하여 로스를 사용할 수 있게 한다. 우리가 직접&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;p&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;와&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;u&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;를 만들어주는 것이다.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;당연히 전체 분포에 대한 것을 임의로 만들 수는 없고 샘플 별로&lt;b&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;p&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;와&lt;b&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;u&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;를 디자인해주는데 어떻게 하는지 살펴보자.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;Conditional Flow Matching (CFM)&lt;/b&gt;&lt;/h2&gt;
&lt;h4 style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: left;&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;1.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;span style=&quot;color: #666666; text-align: start;&quot;&gt;pt와 ut를 정의&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h4&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Conditional probability path(조건부 확률 경로,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;t=0일 때&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;는 간단한 초기 분포 p(x).&lt;/li&gt;
&lt;li&gt;t=1일 때&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;는 평균이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;이고, 작은 표준편차를 가지는 정규분포&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;-&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Marginal probability path(주변 확률 경로,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;)&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;: 여러 조건부 확률 경로를 합친 결과&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;297&quot; data-origin-height=&quot;70&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zJL0B/btsQA7fWKNz/zJawgHDHUqoYbXdIFwgk7k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zJL0B/btsQA7fWKNz/zJawgHDHUqoYbXdIFwgk7k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zJL0B/btsQA7fWKNz/zJawgHDHUqoYbXdIFwgk7k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FzJL0B%2FbtsQA7fWKNz%2FzJawgHDHUqoYbXdIFwgk7k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;297&quot; height=&quot;70&quot; data-origin-width=&quot;297&quot; data-origin-height=&quot;70&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Marginal Vector field(주변 벡터 필드,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;조건부 벡터 필드(&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;)는 각 조건부 확률 경로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;b&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;/b&gt;&lt;span&gt;&lt;b&gt;&lt;span&gt;x&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 생성하는 벡터 필드&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;398&quot; data-origin-height=&quot;73&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bZFtIv/btsQzQlGBXY/SQwLdUQ0SKUSJalLBeaxJK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bZFtIv/btsQzQlGBXY/SQwLdUQ0SKUSJalLBeaxJK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bZFtIv/btsQzQlGBXY/SQwLdUQ0SKUSJalLBeaxJK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbZFtIv%2FbtsQzQlGBXY%2FSQwLdUQ0SKUSJalLBeaxJK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;398&quot; height=&quot;73&quot; data-origin-width=&quot;398&quot; data-origin-height=&quot;73&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;조건부 벡터 필드들을 모두 합친 것이 주변 벡터 필드&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;위의 수식에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;은 특정 샘플&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;에 대해서 간단한 분포의 x가 어떻게 이동해야 하는지 나타내고&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;span&gt;q&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;text-align: center;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;는 가중치로, 각 데이터 샘플에 대한&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;conditional vector field(&lt;/b&gt;&lt;/span&gt;&lt;b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;b&gt;)&lt;/b&gt;가&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;marginal vector field(&lt;/b&gt;&lt;/span&gt;&lt;b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;b&gt;)&lt;/b&gt;에 얼마나 기여하는지를 결정한다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #666666; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;u&gt;&lt;b&gt;Theorem1&lt;/b&gt;: 조건부 문제들을 잘 정의하고 이를 마진화하여 합치면 전체적인 분포를 생성할 수 있는 유효한 벡터 필드를 얻을 수 있다는 것을 보장합니다.&lt;/u&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;2.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;Conditional flow matching&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;915&quot; data-origin-height=&quot;366&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Iqj76/btsQzmMlJqu/FteQqzmKM5x2OuJyZ5dKVk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Iqj76/btsQzmMlJqu/FteQqzmKM5x2OuJyZ5dKVk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Iqj76/btsQzmMlJqu/FteQqzmKM5x2OuJyZ5dKVk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FIqj76%2FbtsQzmMlJqu%2FFteQqzmKM5x2OuJyZ5dKVk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;783&quot; height=&quot;313&quot; data-origin-width=&quot;915&quot; data-origin-height=&quot;366&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;이전 단락에서 설명한 marginal probability path와 vector field는 적분 계산이 복잡하고 직접 계산하기 어려운 난해한 수식을 포함함. 따라서 Flow mathcing 목표를 계산하는 것은 비현실적이다.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;이에 따라&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Conditional flow matching loss&lt;/b&gt;이라는 더 간단한 목표를 제안한다.&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;&lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;t&amp;sim;U[0,1]은 균등 분포를 따르는 시간&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;sim;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;q&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;은 데이터 분포에서 샘플링한 데이터.&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&amp;sim;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;은 조건부 확률 경로에서 샘플링한 데이터&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Theorem2&lt;/b&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;272&quot; data-origin-height=&quot;27&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/9fllz/btsQAUBlAch/kOPwuyw3d1UdUollQ3CPN1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/9fllz/btsQAUBlAch/kOPwuyw3d1UdUollQ3CPN1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/9fllz/btsQAUBlAch/kOPwuyw3d1UdUollQ3CPN1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F9fllz%2FbtsQAUBlAch%2FkOPwuyw3d1UdUollQ3CPN1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;272&quot; height=&quot;27&quot; data-origin-width=&quot;272&quot; data-origin-height=&quot;27&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;그리고 여기서 두 로스가 같다는 증명이 등장한다.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;text-align: start;&quot;&gt;즉, sample 별로 최적화를 수행하고 expectation을 구하는 식으로 최적화를 진행해도 된다는 뜻이다.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;text-align: start;&quot;&gt;증명은 논문의 Appendix에 있다.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;text-align: start;&quot;&gt;이걸로 적당한 supervision을 통해 ode solver 필요 없이 CNF를 학습하는 방법을 알았다&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;text-align: start;&quot;&gt;이제 필요한 것&lt;/span&gt;&lt;span style=&quot;text-align: start;&quot;&gt;은 적절한 p, u를 정의하는 것이다.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot;&gt;probability path&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #000000; text-align: left;&quot; data-mathml=&quot;&amp;lt;math xmlns=&amp;quot;http://www.w3.org/1998/Math/MathML&amp;quot;&amp;gt;&amp;lt;mi&amp;gt;p&amp;lt;/mi&amp;gt;&amp;lt;/math&amp;gt;&quot;&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;p&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot;&gt;를 실제로 정의하여 conditional path (우리의 &amp;ldquo;label&amp;rdquo; 이 될 대상) 을 구체화시켜보자.&lt;/span&gt;&lt;/p&gt;
&lt;h4 style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;(1) Probability path p 정의&lt;/h4&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot;&gt;&lt;span&gt;논문에서는 probability path로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;gaussian distribution을 사용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;304&quot; data-origin-height=&quot;44&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c4vbIs/btsQzHvBo5o/9gelWQ1tkEKgIjbDn2Unm0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c4vbIs/btsQzHvBo5o/9gelWQ1tkEKgIjbDn2Unm0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c4vbIs/btsQzHvBo5o/9gelWQ1tkEKgIjbDn2Unm0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc4vbIs%2FbtsQzHvBo5o%2F9gelWQ1tkEKgIjbDn2Unm0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;304&quot; height=&quot;44&quot; data-origin-width=&quot;304&quot; data-origin-height=&quot;44&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;background-color: #ffffff; text-align: start;&quot;&gt;x1에대한 평균과 표준편차를 사용하여&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;정규분포로 가정&lt;/span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; text-align: left;&quot; data-mathml=&quot;&amp;lt;math xmlns=&amp;quot;http://www.w3.org/1998/Math/MathML&amp;quot;&amp;gt;&amp;lt;msub&amp;gt;&amp;lt;mi&amp;gt;&amp;amp;#x03BC;&amp;lt;/mi&amp;gt;&amp;lt;mi&amp;gt;t&amp;lt;/mi&amp;gt;&amp;lt;/msub&amp;gt;&amp;lt;/math&amp;gt;&quot;&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&amp;mu;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; text-align: start;&quot;&gt;&amp;nbsp;와&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; text-align: left;&quot; data-mathml=&quot;&amp;lt;math xmlns=&amp;quot;http://www.w3.org/1998/Math/MathML&amp;quot;&amp;gt;&amp;lt;msub&amp;gt;&amp;lt;mi&amp;gt;&amp;amp;#x03C3;&amp;lt;/mi&amp;gt;&amp;lt;mi&amp;gt;t&amp;lt;/mi&amp;gt;&amp;lt;/msub&amp;gt;&amp;lt;/math&amp;gt;&quot;&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&amp;sigma;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;text-align: left;&quot;&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; text-align: start;&quot;&gt;&lt;span style=&quot;text-align: start;&quot;&gt;에 대한 조건을 아래와 같이 사용한다.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;581&quot; data-origin-height=&quot;82&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cuoOxB/btsQym0r69x/hqeFMmHvFpORv0XMUL3eu1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cuoOxB/btsQym0r69x/hqeFMmHvFpORv0XMUL3eu1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cuoOxB/btsQym0r69x/hqeFMmHvFpORv0XMUL3eu1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcuoOxB%2FbtsQym0r69x%2FhqeFMmHvFpORv0XMUL3eu1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;581&quot; height=&quot;82&quot; data-origin-width=&quot;581&quot; data-origin-height=&quot;82&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;t=0일 때 평균은&amp;nbsp; 0, 표준편차를 1,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; letter-spacing: 0px;&quot;&gt;t=1일 때 평균을 x,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;표준편차를 충분히 작은값으로 설정하였을 때 다음과 같은 Flow를 얻을 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;282&quot; data-origin-height=&quot;128&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b9a4cn/btsQAeNSQkh/VOXfTgUibkFyJ94CHDAHAK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b9a4cn/btsQAeNSQkh/VOXfTgUibkFyJ94CHDAHAK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b9a4cn/btsQAeNSQkh/VOXfTgUibkFyJ94CHDAHAK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb9a4cn%2FbtsQAeNSQkh%2FVOXfTgUibkFyJ94CHDAHAK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;282&quot; height=&quot;128&quot; data-origin-width=&quot;282&quot; data-origin-height=&quot;128&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;277&quot; data-origin-height=&quot;33&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Rodxc/btsQAQZ3InQ/Z47JWOHFawblJbMQnbS4D1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Rodxc/btsQAQZ3InQ/Z47JWOHFawblJbMQnbS4D1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Rodxc/btsQAQZ3InQ/Z47JWOHFawblJbMQnbS4D1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FRodxc%2FbtsQAQZ3InQ%2FZ47JWOHFawblJbMQnbS4D1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;277&quot; height=&quot;33&quot; data-origin-width=&quot;277&quot; data-origin-height=&quot;33&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;그리고 앞에서와 같은 flow를 시점 t에 대하여 미분하여 vector 필드를 구할 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;333&quot; data-origin-height=&quot;339&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/R5luY/btsQAZimo5j/6wgolgNnhlgN8AnY6ursD1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/R5luY/btsQAZimo5j/6wgolgNnhlgN8AnY6ursD1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/R5luY/btsQAZimo5j/6wgolgNnhlgN8AnY6ursD1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FR5luY%2FbtsQAZimo5j%2F6wgolgNnhlgN8AnY6ursD1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;333&quot; height=&quot;339&quot; data-origin-width=&quot;333&quot; data-origin-height=&quot;339&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;u&gt;&lt;span style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot;&gt;Theorem3: 최종적으로 백터필드는 위와같은 수식으로 구할 수 있다.&lt;/span&gt;&lt;/u&gt;&lt;/p&gt;
&lt;p style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p id=&quot;related-work&quot; style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;RELATED WORK&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;70&quot; data-end=&quot;289&quot;&gt;정리하자면 Continuous Normalizing Flows(CNF)는 데이터 분포를 학습하기 위해&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;ODE 적분&lt;/b&gt;을 사용하였음. 그러나 이러한 적분 과정은 계산량이 많고 시간이 오래 걸리는 문제가 있었다. 이를 개선하기 위해 augmentation이나 regularization을 추가하는 방식의 연구들이 진행되었으나, 이는 ODE를 정규화한 것에 불과하고 학습 알고리즘 자체를 변화시키지는 못했다.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;70&quot; data-end=&quot;289&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;291&quot; data-end=&quot;416&quot;&gt;CNF 학습 속도를 높이기 위해 simulation-free CNF training frameworks가 개발되었다. 하지만 이러한 방식 역시 여전히 적분 계산의 부담이 있었고, Flow Matching은 이러한 한계를 극복하기 위해 제안된 방법으로, 시뮬레이션 과정조차 필요 없이 CNF를 학습할 수 있도록 하였다. 이로써 단순하고 빠른 학습이 가능해졌다&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;291&quot; data-end=&quot;416&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;557&quot; data-end=&quot;775&quot;&gt;Flow Matching의 Conditional Flow Matching(CFM)은 diffusion 기반 설계에서 출발하였으나, 벡터 필드를 직접 매칭하는 접근 방식을 일반화하였음. 따라서 Flow Matching은 처음으로 diffusion 과정 없이 확률 경로를 직접 학습할 수 있음을 보여주었으며, CNF 학습의 새로운 가능성을 제시하였다.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;557&quot; data-end=&quot;775&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;557&quot; data-end=&quot;775&quot;&gt;- 참고자료&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;557&quot; data-end=&quot;775&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot; data-start=&quot;557&quot; data-end=&quot;775&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/2210.02747&quot;&gt;https://arxiv.org/abs/2210.02747&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758275908346&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;Flow Matching for Generative Modeling&quot; data-og-description=&quot;We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs base&quot; data-og-host=&quot;arxiv.org&quot; data-og-source-url=&quot;https://arxiv.org/abs/2210.02747&quot; data-og-url=&quot;https://arxiv.org/abs/2210.02747v2&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/rkFZv/hyZJiGBY29/pYek3xzUcTZwEhfNOPh0ck/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cryNok/hyZJpFHwrY/ASA3Glq9rdfCm8jQanIfH0/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000&quot;&gt;&lt;a href=&quot;https://arxiv.org/abs/2210.02747&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://arxiv.org/abs/2210.02747&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/rkFZv/hyZJiGBY29/pYek3xzUcTZwEhfNOPh0ck/img.png?width=1200&amp;amp;height=700&amp;amp;face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cryNok/hyZJpFHwrY/ASA3Glq9rdfCm8jQanIfH0/img.png?width=1000&amp;amp;height=1000&amp;amp;face=0_0_1000_1000');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Flow Matching for Generative Modeling&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs base&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;arxiv.org&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://youtu.be/YFZbFr3cjpA?si=K4yX3xw-CTp1Y3wv&quot;&gt;https://youtu.be/YFZbFr3cjpA?si=K4yX3xw-CTp1Y3wv&lt;/a&gt;&lt;/p&gt;
&lt;figure data-ke-type=&quot;video&quot; data-ke-style=&quot;alignCenter&quot; data-video-host=&quot;youtube&quot; data-video-url=&quot;https://www.youtube.com/watch?v=YFZbFr3cjpA&quot; data-video-thumbnail=&quot;https://scrap.kakaocdn.net/dn/bv3e0d/hyZJKPKwkK/wpASChN1IJkFoCzDAUftY0/img.jpg?width=1280&amp;amp;height=720&amp;amp;face=0_0_1280_720,https://scrap.kakaocdn.net/dn/dq7KvO/hyZJpZ1JQS/bSEOwcdmzBaiszb0gfnIP0/img.jpg?width=1280&amp;amp;height=720&amp;amp;face=0_0_1280_720&quot; data-video-width=&quot;860&quot; data-video-height=&quot;484&quot; data-video-origin-width=&quot;860&quot; data-video-origin-height=&quot;484&quot; data-ke-mobilestyle=&quot;widthContent&quot; data-video-title=&quot;Flow Matching을 여행하는 히치하이커를 위한 안내서&quot; data-original-url=&quot;&quot;&gt;&lt;iframe src=&quot;https://www.youtube.com/embed/YFZbFr3cjpA&quot; width=&quot;860&quot; height=&quot;484&quot; frameborder=&quot;&quot; allowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;
&lt;figcaption style=&quot;display: none;&quot;&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://seastar105.tistory.com/176&quot;&gt;https://seastar105.tistory.com/176&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758275917549&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;Flow Matching 설명&quot; data-og-description=&quot;Introduction 디퓨전 계열이 생성 모델에서 엄청난 성능을 보여주며 주류가 되어 버린지는 한참 되었다. 그러나 여러 번에 걸친 샘플링이 디퓨전 모델의 좋은 성능을 만들어 주는 것처럼 보이지만 &quot; data-og-host=&quot;seastar105.tistory.com&quot; data-og-source-url=&quot;https://seastar105.tistory.com/176&quot; data-og-url=&quot;https://seastar105.tistory.com/176&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/cOY9xV/hyZJfDNEGv/A1ObpMIk0VuKbiczKEm55k/img.png?width=800&amp;amp;height=553&amp;amp;face=0_0_800_553,https://scrap.kakaocdn.net/dn/bVn4AX/hyZI8dDoN1/DrsiejIweHpRRh5xaivft0/img.png?width=800&amp;amp;height=553&amp;amp;face=0_0_800_553,https://scrap.kakaocdn.net/dn/cfXdJp/hyZJv60tJz/JszvcR186YVxjQSJD7kLXK/img.png?width=3416&amp;amp;height=2362&amp;amp;face=0_0_3416_2362&quot;&gt;&lt;a href=&quot;https://seastar105.tistory.com/176&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://seastar105.tistory.com/176&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/cOY9xV/hyZJfDNEGv/A1ObpMIk0VuKbiczKEm55k/img.png?width=800&amp;amp;height=553&amp;amp;face=0_0_800_553,https://scrap.kakaocdn.net/dn/bVn4AX/hyZI8dDoN1/DrsiejIweHpRRh5xaivft0/img.png?width=800&amp;amp;height=553&amp;amp;face=0_0_800_553,https://scrap.kakaocdn.net/dn/cfXdJp/hyZJv60tJz/JszvcR186YVxjQSJD7kLXK/img.png?width=3416&amp;amp;height=2362&amp;amp;face=0_0_3416_2362');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Flow Matching 설명&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;Introduction 디퓨전 계열이 생성 모델에서 엄청난 성능을 보여주며 주류가 되어 버린지는 한참 되었다. 그러나 여러 번에 걸친 샘플링이 디퓨전 모델의 좋은 성능을 만들어 주는 것처럼 보이지만&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;seastar105.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Computer Vision1/Paper reviews</category>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/111</guid>
      <comments>https://c0mputermaster.tistory.com/111#entry111comment</comments>
      <pubDate>Thu, 11 Sep 2025 22:49:51 +0900</pubDate>
    </item>
    <item>
      <title>[ILSVRC 논문 정리해 보기] DenseNet, SENet과 대회 그 이후</title>
      <link>https://c0mputermaster.tistory.com/110</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/109&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.09.05 - [분류 전체보기] - [ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758277293558&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&quot; data-og-description=&quot;이전 논문 리뷰에 이어서 ILSVRC논문을 정리해보았다.2025.05.07 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks) [ILSVRC 논문 정리해 보기]&quot; data-og-host=&quot;c0mputermaster.tistory.com&quot; data-og-source-url=&quot;https://c0mputermaster.tistory.com/109&quot; data-og-url=&quot;https://c0mputermaster.tistory.com/109&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/bCY1P7/hyZJxX5w8B/CQXXysD4IRu1BZ4gONxu50/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/gus6a/hyZJmI1QDT/RgUZFamo7GbdjegmVRnPb1/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/bd4pVV/hyZI85NfHB/K5ZmumwUWjAS6JuiN6zTLK/img.png?width=1022&amp;amp;height=587&amp;amp;face=0_0_1022_587&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/109&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://c0mputermaster.tistory.com/109&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/bCY1P7/hyZJxX5w8B/CQXXysD4IRu1BZ4gONxu50/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/gus6a/hyZJmI1QDT/RgUZFamo7GbdjegmVRnPb1/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/bd4pVV/hyZI85NfHB/K5ZmumwUWjAS6JuiN6zTLK/img.png?width=1022&amp;amp;height=587&amp;amp;face=0_0_1022_587');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;이전 논문 리뷰에 이어서 ILSVRC논문을 정리해보았다.2025.05.07 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks) [ILSVRC 논문 정리해 보기]&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;c0mputermaster.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000;&quot; data-ke-size=&quot;size26&quot;&gt;DenseNet&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;867&quot; data-origin-height=&quot;482&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/yif1Y/btsQycIh61J/RRaVxGbVgOEAg9FyuM8AK1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/yif1Y/btsQycIh61J/RRaVxGbVgOEAg9FyuM8AK1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/yif1Y/btsQycIh61J/RRaVxGbVgOEAg9FyuM8AK1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fyif1Y%2FbtsQycIh61J%2FRRaVxGbVgOEAg9FyuM8AK1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;610&quot; height=&quot;339&quot; data-origin-width=&quot;867&quot; data-origin-height=&quot;482&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ResNet의 Residual Block에서는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;F&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;span&gt;+&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x를 하였다면 DenseNet에 DenseBlock에서는 출력&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;F(x)&lt;/span&gt;&lt;/span&gt;&amp;nbsp;입력&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;채널 차원에서 Concatenate&lt;/b&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;F&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. 이러다 보면 채널 수가 점점 늘어나는데 이 늘어나는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;속도&lt;/b&gt;를 조절하는 게&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Growth Rate (k)&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;203&quot; data-origin-height=&quot;287&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Rz3Wt/btsQyyj3g7L/yrkArCZZMCpsukt45w3hr0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Rz3Wt/btsQyyj3g7L/yrkArCZZMCpsukt45w3hr0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Rz3Wt/btsQyyj3g7L/yrkArCZZMCpsukt45w3hr0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FRz3Wt%2FbtsQyyj3g7L%2FyrkArCZZMCpsukt45w3hr0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;203&quot; height=&quot;287&quot; data-origin-width=&quot;203&quot; data-origin-height=&quot;287&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하지만 Concat을 해버리면 기존의 피처가&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;그대로 뒤로 전달&lt;/b&gt;되면서 작은 파라미터로&amp;nbsp;풍부한 피처를 활용가능하지만&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;채널 수 증가 문제&lt;/b&gt;가 발생함 =&amp;gt; 중간에&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Transition Layer( conv + pooling )&lt;/b&gt;&amp;nbsp;넣어서 해결&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;930&quot; data-origin-height=&quot;456&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bSRNLn/btsQu6JE06g/AaeptsnK7nnUk6Y3GvTVtK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bSRNLn/btsQu6JE06g/AaeptsnK7nnUk6Y3GvTVtK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bSRNLn/btsQu6JE06g/AaeptsnK7nnUk6Y3GvTVtK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbSRNLn%2FbtsQu6JE06g%2FAaeptsnK7nnUk6Y3GvTVtK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;672&quot; height=&quot;329&quot; data-origin-width=&quot;930&quot; data-origin-height=&quot;456&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 style=&quot;color: #000000;&quot; data-ke-size=&quot;size20&quot; data-start=&quot;743&quot; data-end=&quot;762&quot;&gt;Bottleneck 구조&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;763&quot; data-end=&quot;991&quot;&gt;
&lt;li data-start=&quot;763&quot; data-end=&quot;991&quot;&gt;ResNet과 DenseNet 모두&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;병목(bottleneck)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;아이디어를 씀. 하지만 동작은 다름&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;478&quot; data-origin-height=&quot;390&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xw6ZF/btsQvZDBM5T/S1cbKz2who21qzqWicMnQK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xw6ZF/btsQvZDBM5T/S1cbKz2who21qzqWicMnQK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xw6ZF/btsQvZDBM5T/S1cbKz2who21qzqWicMnQK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fxw6ZF%2FbtsQvZDBM5T%2FS1cbKz2who21qzqWicMnQK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;478&quot; height=&quot;390&quot; data-origin-width=&quot;478&quot; data-origin-height=&quot;390&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;CNN에서 3&amp;times;3 convolution을 채널이 많을 때 그대로 쓰면 연산량이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;너무 크니까 좁은 병목 구간(bottleneck)을 거쳐 연산을 줄이자&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;385&quot; data-end=&quot;423&quot;&gt;&lt;b&gt;1&amp;times;1 conv&lt;/b&gt;: 채널 축소 (예: 256 &amp;rarr; 64)&lt;/li&gt;
&lt;li data-start=&quot;424&quot; data-end=&quot;476&quot;&gt;&lt;b&gt;3&amp;times;3 conv&lt;/b&gt;: 실제 feature extraction (64 채널만 사용)&lt;/li&gt;
&lt;li data-start=&quot;477&quot; data-end=&quot;510&quot;&gt;&lt;b&gt;1&amp;times;1 conv&lt;/b&gt;: 채널 복원 (다시 256)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;근데 DenseNet에서는 어차피 채널이 길어지기 때문에 ResNet처럼&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;1&amp;times;1 conv해서 늘린다음 더하는게 아니라 3x3 conv후 Concat&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;여기서 k = Growth rate&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;SENet (&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;text-align: start;&quot;&gt;Squeeze-and-Excitation Networks&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;)&amp;nbsp;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;907&quot; data-origin-height=&quot;504&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/BmdLw/btsQuquVvI4/7zwi0loM7tmYZRweXhMTgk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/BmdLw/btsQuquVvI4/7zwi0loM7tmYZRweXhMTgk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/BmdLw/btsQuquVvI4/7zwi0loM7tmYZRweXhMTgk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FBmdLw%2FbtsQuquVvI4%2F7zwi0loM7tmYZRweXhMTgk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;616&quot; height=&quot;342&quot; data-origin-width=&quot;907&quot; data-origin-height=&quot;504&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;Squeeze-and-Excitation Networks, 마지막&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;ILSVRC의 우승작&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;기존 ILSVRC에 CNN 모델들이 전반적으로 모델의 설계 구조를 제안하였다면 SENet은 모델에 하나의 플러그인 모듈을 제안함 (SE block) =&amp;gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #555555;&quot;&gt;Feature recalibrarion을 제안 플러그인처럼 붙여서(global 정보를 보정)&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;Squeeze와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;Excitation&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;하나씩 살펴보자&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;h4 style=&quot;color: #000000;&quot; data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;Squeeze&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;902&quot; data-origin-height=&quot;409&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d5fFD8/btsQuoDV1Wa/QIUhHqVCd8VlkX13YonrE0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d5fFD8/btsQuoDV1Wa/QIUhHqVCd8VlkX13YonrE0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d5fFD8/btsQuoDV1Wa/QIUhHqVCd8VlkX13YonrE0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd5fFD8%2FbtsQuoDV1Wa%2FQIUhHqVCd8VlkX13YonrE0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;675&quot; height=&quot;306&quot; data-origin-width=&quot;902&quot; data-origin-height=&quot;409&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;압축 =&amp;gt; 각 채널의 평균을 구함 = GAP&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;(H x W x C) =&amp;gt; Global average pooling을 통해 백터로 만듬, 백터화 =&amp;gt; 그럼 C만큼의 1x1 사이즈의 백터가 생성됨 ( 채널별 특징 )&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;282&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bto3EP/btsQyaX9urB/smH43Ol4geCvc6oeC1U2m1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bto3EP/btsQyaX9urB/smH43Ol4geCvc6oeC1U2m1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bto3EP/btsQyaX9urB/smH43Ol4geCvc6oeC1U2m1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbto3EP%2FbtsQyaX9urB%2FsmH43Ol4geCvc6oeC1U2m1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;282&quot; height=&quot;57&quot; data-origin-width=&quot;282&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h4 style=&quot;color: #000000;&quot; data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;Excitation&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;914&quot; data-origin-height=&quot;493&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cfEOdR/btsQxsZvKbt/cPGcfBQTUMBwMRgRO72Bo0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cfEOdR/btsQxsZvKbt/cPGcfBQTUMBwMRgRO72Bo0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cfEOdR/btsQxsZvKbt/cPGcfBQTUMBwMRgRO72Bo0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcfEOdR%2FbtsQxsZvKbt%2FcPGcfBQTUMBwMRgRO72Bo0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;560&quot; height=&quot;302&quot; data-origin-width=&quot;914&quot; data-origin-height=&quot;493&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;재조정 =&amp;gt; FC1 - ReLU - FC2 - Softmax를 통과 그럼 결과적으로 출력값이 확률로 변화됨&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;결론적으로 이 출력값을 가중치로 사용하여 원래 채널에 곱해줌 즉 채널별의 중요도를 찾아주는 과정&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;Squeeze-and-Excitation과정 =&amp;nbsp; Attention&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;&lt;a href=&quot;https://codingopera.tistory.com/41&quot;&gt;https://codingopera.tistory.com/41&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758275514194&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;3. Attention [초등학생도 이해하는 자연어처리]&quot; data-og-description=&quot;안녕하세요&amp;nbsp;'코딩 오페라'블로그를 운영하고 있는 저는&amp;nbsp;'Master.M'입니다. 현재 저는&amp;nbsp;'초등학생도 이해하는&amp;nbsp;자연어 처리'라는 주제로 자연어 처리(NLP)에 대해 포스팅을 하고 있습니다. 제목처럼 &quot; data-og-host=&quot;codingopera.tistory.com&quot; data-og-source-url=&quot;https://codingopera.tistory.com/41&quot; data-og-url=&quot;https://codingopera.tistory.com/41&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/23NeU/hyZJi7F2Bk/KTkRNhxhhNvf7ZSDl4VRuk/img.jpg?width=800&amp;amp;height=450&amp;amp;face=5_93_795_268,https://scrap.kakaocdn.net/dn/bH7syJ/hyZJoNz0P7/a2piXjyiBjYaKMJhbzsYG1/img.jpg?width=800&amp;amp;height=450&amp;amp;face=5_93_795_268,https://scrap.kakaocdn.net/dn/qobA9/hyZJtVC7pJ/mTArMMPdKxAG7ELbcynHK0/img.jpg?width=1280&amp;amp;height=720&amp;amp;face=8_150_1272_434&quot;&gt;&lt;a href=&quot;https://codingopera.tistory.com/41&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://codingopera.tistory.com/41&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/23NeU/hyZJi7F2Bk/KTkRNhxhhNvf7ZSDl4VRuk/img.jpg?width=800&amp;amp;height=450&amp;amp;face=5_93_795_268,https://scrap.kakaocdn.net/dn/bH7syJ/hyZJoNz0P7/a2piXjyiBjYaKMJhbzsYG1/img.jpg?width=800&amp;amp;height=450&amp;amp;face=5_93_795_268,https://scrap.kakaocdn.net/dn/qobA9/hyZJtVC7pJ/mTArMMPdKxAG7ELbcynHK0/img.jpg?width=1280&amp;amp;height=720&amp;amp;face=8_150_1272_434');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;3. Attention [초등학생도 이해하는 자연어처리]&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;안녕하세요&amp;nbsp;'코딩 오페라'블로그를 운영하고 있는 저는&amp;nbsp;'Master.M'입니다. 현재 저는&amp;nbsp;'초등학생도 이해하는&amp;nbsp;자연어 처리'라는 주제로 자연어 처리(NLP)에 대해 포스팅을 하고 있습니다. 제목처럼&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;codingopera.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;892&quot; data-origin-height=&quot;430&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dDnMDU/btsQu7Wg3JU/u79rON64vUyDtW4YOuQNQk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dDnMDU/btsQu7Wg3JU/u79rON64vUyDtW4YOuQNQk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dDnMDU/btsQu7Wg3JU/u79rON64vUyDtW4YOuQNQk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdDnMDU%2FbtsQu7Wg3JU%2Fu79rON64vUyDtW4YOuQNQk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;728&quot; height=&quot;351&quot; data-origin-width=&quot;892&quot; data-origin-height=&quot;430&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이런식으로 모델에 모듈을 달 수 있는데 그냥 사용하는 것은 아니고 global한 정보를 다루는 부분에 붙였을 떄 큰 효과가 있었다고 한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;color: #555555; text-align: start;&quot;&gt;After ILSVRC&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;667&quot; data-origin-height=&quot;456&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b40zsE/btsQxIVa22A/tPbVJPBBS80SlvbO2TvGv0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b40zsE/btsQxIVa22A/tPbVJPBBS80SlvbO2TvGv0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b40zsE/btsQxIVa22A/tPbVJPBBS80SlvbO2TvGv0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb40zsE%2FbtsQxIVa22A%2FtPbVJPBBS80SlvbO2TvGv0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;530&quot; height=&quot;362&quot; data-origin-width=&quot;667&quot; data-origin-height=&quot;456&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot; data-start=&quot;119&quot; data-end=&quot;141&quot;&gt;&lt;b&gt;성능 최적화 중심 &amp;rarr; 경량화 중심&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;145&quot; data-end=&quot;309&quot;&gt;
&lt;li data-start=&quot;145&quot; data-end=&quot;201&quot;&gt;초기에는 정확도를 높이는 게 목표였음. (AlexNet &amp;rarr; VGG &amp;rarr; ResNet 같은 흐름)&lt;/li&gt;
&lt;li data-start=&quot;205&quot; data-end=&quot;264&quot;&gt;이후에는 모바일&amp;middot;임베디드 환경에서도 쓸 수 있게&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;경량화(라이트웨이트)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;모델 연구가 활발해짐.&lt;/li&gt;
&lt;li data-start=&quot;268&quot; data-end=&quot;309&quot;&gt;대표: MobileNet, ShuffleNet, EfficientNet&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;ResNeXt&lt;/b&gt;: 그룹 컨볼루션으로 연산량 줄이면서 성능 유지&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;441&quot; data-origin-height=&quot;288&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/btNUqL/btsQwP1MN2p/5vxtKFIPxK9hbOTPMitqT1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/btNUqL/btsQwP1MN2p/5vxtKFIPxK9hbOTPMitqT1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/btNUqL/btsQwP1MN2p/5vxtKFIPxK9hbOTPMitqT1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbtNUqL%2FbtsQwP1MN2p%2F5vxtKFIPxK9hbOTPMitqT1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;441&quot; height=&quot;288&quot; data-origin-width=&quot;441&quot; data-origin-height=&quot;288&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 채널별로 그룹을 나눠서 필터링 =&amp;gt; 연산량이 줄어듦&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;MobileNet&lt;/b&gt;: Xception 아이디어&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Xception?&lt;/b&gt;: Depthwise + Pointwise Convolution (연산량 대폭 절감)&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;432&quot; data-origin-height=&quot;165&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dEgc0N/btsQwiCYasJ/WFlZKgIwQZOR84logypyRk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dEgc0N/btsQwiCYasJ/WFlZKgIwQZOR84logypyRk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dEgc0N/btsQwiCYasJ/WFlZKgIwQZOR84logypyRk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdEgc0N%2FbtsQwiCYasJ%2FWFlZKgIwQZOR84logypyRk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;432&quot; height=&quot;165&quot; data-origin-width=&quot;432&quot; data-origin-height=&quot;165&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;공간 정보는 채널별로 따로 보고, 채널 정보는 1x1 컨볼루션으로 합쳐서 연산을 대폭 줄인 컨볼루션&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;102&quot; data-end=&quot;155&quot;&gt;&lt;b&gt;Depthwise&lt;/b&gt;: 각 채널을 따로 3&amp;times;3 같은 필터로 처리 &amp;rarr; 공간 정보만 학습&lt;/li&gt;
&lt;li data-start=&quot;156&quot; data-end=&quot;213&quot;&gt;&lt;b&gt;Pointwise (1&amp;times;1)&lt;/b&gt;: 채널끼리 정보를 합쳐서 출력 채널 생성 &amp;rarr; 채널 정보 학습&lt;/li&gt;
&lt;/ul&gt;
&lt;div style=&quot;background-color: #fafafa; color: #333333;&quot; data-text-less=&quot;닫기&quot; data-text-more=&quot;더보기&quot; data-ke-type=&quot;moreLess&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;일반적인 3x3 컨볼루션 =&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;FLOPs(연산량)&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;=&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;H&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;in&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;b&gt;&lt;span&gt;C&lt;/span&gt;&lt;/b&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;K&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;K&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;Depthwise + Pointwise Convolution&amp;nbsp; = ​(&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;H&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;in&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;K&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;K ) + (&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;H&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;in&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span&gt;tx&lt;b&gt;1x1&lt;/b&gt;) =&amp;gt; (&lt;span&gt;&lt;span&gt;H&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;in&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​)(&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;span&gt;u&lt;/span&gt;&lt;span&gt;t+KxK)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;939&quot; data-origin-height=&quot;456&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/J16Q5/btsQvP8XyPh/beC3u8MA1gOHxngkNTZ0hk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/J16Q5/btsQvP8XyPh/beC3u8MA1gOHxngkNTZ0hk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/J16Q5/btsQvP8XyPh/beC3u8MA1gOHxngkNTZ0hk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FJ16Q5%2FbtsQvP8XyPh%2FbeC3u8MA1gOHxngkNTZ0hk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;578&quot; height=&quot;281&quot; data-origin-width=&quot;939&quot; data-origin-height=&quot;456&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR'; color: #555555; text-align: start;&quot;&gt;&lt;span&gt;&lt;b&gt;ShuffleNet&lt;/b&gt;:&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR'; color: #555555; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;MobileNet의 구조를 기본적으로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;사용하고&amp;nbsp;&lt;/span&gt;grouped convolution을 통해&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR'; color: #555555; text-align: start;&quot;&gt;채널 전체를 다 고려하는 것이 아니라 일부만 고려하고 모든 채널을 다 고려하기 위하여 중간중간에 채널들을 섞어줘서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR'; color: #555555; text-align: start;&quot;&gt;연산량 부분에서 이득을 봄&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR'; color: #555555; text-align: start;&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR'; color: #555555; text-align: start;&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR'; color: #555555; text-align: start;&quot;&gt;NAS (Nerual Architrcture Search)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;936&quot; data-origin-height=&quot;515&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/boPo5o/btsQwDGXsey/4sxtZCuzr00O4lfZcXndbk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/boPo5o/btsQwDGXsey/4sxtZCuzr00O4lfZcXndbk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/boPo5o/btsQwDGXsey/4sxtZCuzr00O4lfZcXndbk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FboPo5o%2FbtsQwDGXsey%2F4sxtZCuzr00O4lfZcXndbk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;682&quot; height=&quot;375&quot; data-origin-width=&quot;936&quot; data-origin-height=&quot;515&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;494&quot; data-end=&quot;560&quot;&gt;NAS(Neural Architecture Search): 네트워크 구조 설계를 사람이 아니라 AI가 자동으로 탐색&lt;/li&gt;
&lt;li data-start=&quot;564&quot; data-end=&quot;593&quot;&gt;예: NASNet, AmoebaNet, FBNet&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 style=&quot;color: #000000;&quot; data-ke-size=&quot;size26&quot;&gt;Generalize Network Design&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;939&quot; data-origin-height=&quot;457&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vhuhk/btsQxF5fmnF/rSDGKEDyKhXNgH1KiIJIF0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vhuhk/btsQxF5fmnF/rSDGKEDyKhXNgH1KiIJIF0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vhuhk/btsQxF5fmnF/rSDGKEDyKhXNgH1KiIJIF0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fvhuhk%2FbtsQxF5fmnF%2FrSDGKEDyKhXNgH1KiIJIF0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;664&quot; height=&quot;323&quot; data-origin-width=&quot;939&quot; data-origin-height=&quot;457&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;자동화를 하는 것도 좋은데 사람이 그래도 좀 손을 타야 더 성능이 좋아지더라 하는 아이디어서 나온 아키텍쳐&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;49&quot; data-end=&quot;104&quot;&gt;NAS(Neural Architecture Search)로 CNN 구조를 자동으로 설계 가능&lt;/li&gt;
&lt;li data-start=&quot;105&quot; data-end=&quot;148&quot;&gt;하지만&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;완전히 자동화만으로는 최적 성능이 나오지 않을 때가 있음&lt;/b&gt;&lt;/li&gt;
&lt;li data-start=&quot;149&quot; data-end=&quot;178&quot;&gt;사람의 직관을 적절히 개입하면 성능 향상 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style=&quot;color: #000000;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;185&quot; data-end=&quot;207&quot;&gt;&lt;b&gt;RegNet&lt;/b&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;208&quot; data-end=&quot;396&quot;&gt;
&lt;li data-start=&quot;208&quot; data-end=&quot;349&quot;&gt;&lt;b&gt;디자인 스페이스를 정의하고 제한된 선택지 내에서 탐색&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;248&quot; data-end=&quot;349&quot;&gt;
&lt;li data-start=&quot;248&quot; data-end=&quot;266&quot;&gt;Width(W): 채널 수&lt;/li&gt;
&lt;li data-start=&quot;269&quot; data-end=&quot;285&quot;&gt;Depth: 레이어 수&lt;/li&gt;
&lt;li data-start=&quot;288&quot; data-end=&quot;317&quot;&gt;Bottleneck 비율: 채널 축 축소 정도&lt;/li&gt;
&lt;li data-start=&quot;320&quot; data-end=&quot;349&quot;&gt;Group Convolution 수: 그룹 수&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;350&quot; data-end=&quot;396&quot;&gt;NAS가 모든 것을 탐색하는 대신&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;제한된 범위 내에서 탐색&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;rarr; 효율적&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;EfficientNet&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;434&quot; data-end=&quot;461&quot;&gt;&lt;b&gt;사람이 설계한 규칙 + 컴퓨터 탐색&lt;/b&gt;&lt;/li&gt;
&lt;li data-start=&quot;462&quot; data-end=&quot;539&quot;&gt;주요 요소:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;475&quot; data-end=&quot;539&quot;&gt;
&lt;li data-start=&quot;475&quot; data-end=&quot;493&quot;&gt;Width(W): 채널 수&lt;/li&gt;
&lt;li data-start=&quot;496&quot; data-end=&quot;512&quot;&gt;Depth: 레이어 수&lt;/li&gt;
&lt;li data-start=&quot;515&quot; data-end=&quot;539&quot;&gt;Resolution: 피처 맵 H&amp;times;W&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;540&quot; data-end=&quot;670&quot;&gt;&lt;b&gt;컴파운드 스케일링(Compound Scaling)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;578&quot; data-end=&quot;670&quot;&gt;
&lt;li data-start=&quot;578&quot; data-end=&quot;619&quot;&gt;Width, Depth, Resolution의 균형을 공식으로 결정&lt;/li&gt;
&lt;li data-start=&quot;622&quot; data-end=&quot;670&quot;&gt;예: Width 2배 &amp;rarr; Depth는 1/2배, Resolution 적절히 조절&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 style=&quot;color: #000000;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;ILSVRC 논문을 정리하고 난뒤&lt;/b&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;LSVRC 기반의 다양한 CNN 아키텍처를 살펴본 이후, 단순히 구조를 아는 것보다&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;CNN을 어떤 역할로 활용할 것인지&lt;/b&gt;가 더 중요함을 강조한다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;249&quot; data-end=&quot;306&quot;&gt;CNN 구조 자체(레이어 수, 필터 크기, Residual/ Inception 등)는 도구일 뿐&lt;/li&gt;
&lt;li data-start=&quot;307&quot; data-end=&quot;353&quot;&gt;입력&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;를 받아 출력&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 생성하는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;하나의 함수&lt;/b&gt;로 이해&lt;/li&gt;
&lt;li data-start=&quot;354&quot; data-end=&quot;407&quot;&gt;핵심은 CNN을&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;어떤 역할&lt;/b&gt;로 사용할지와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;어떤 테스크&lt;/b&gt;에 적용할지 결정하는 것이다&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아키텍쳐 보다는 CNN을 통해&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;어떤 특징(feature)을 추출&lt;/b&gt;할지, 학습 목표(loss function)과 데이터 특성에 맞게 CNN을 설계/활용하는것이 중요하다.&lt;/p&gt;
&lt;h4 style=&quot;color: #000000;&quot; data-ke-size=&quot;size20&quot; data-start=&quot;558&quot; data-end=&quot;578&quot;&gt;CNN 활용 예시 참고&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;579&quot; data-end=&quot;738&quot;&gt;
&lt;li data-start=&quot;579&quot; data-end=&quot;625&quot;&gt;&lt;b&gt;이미지 분류(Classification)&lt;/b&gt;: 유용한 피처 추출 후 분류&lt;/li&gt;
&lt;li data-start=&quot;626&quot; data-end=&quot;678&quot;&gt;&lt;b&gt;객체 검출(Object Detection)&lt;/b&gt;: 관심 객체를 잘 표현하는 피처 추출&lt;/li&gt;
&lt;li data-start=&quot;679&quot; data-end=&quot;738&quot;&gt;&lt;b&gt;생성 AI(Generative AI)&lt;/b&gt;: 데이터를 이해하기 쉬운 latent space로 매핑&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://newitlec.com/entry/5%ED%8E%B8-%ED%95%A9%EC%84%B1%EA%B3%B1-%EC%8B%A0%EA%B2%BD%EB%A7%9DCNN-%EA%B0%9C%EC%9A%94%EC%99%80-%EB%8F%99%EC%9E%91-%EB%A7%A4%EC%BB%A4%EB%8B%88%EC%A6%98-%EB%B0%8F-%EC%9D%91%EC%9A%A9%EB%B6%84%EC%95%BC&quot;&gt;https://newitlec.com/entry/5%ED%8E%B8-%ED%95%A9%EC%84%B1%EA%B3%B1-%EC%8B%A0%EA%B2%BD%EB%A7%9DCNN-%EA%B0%9C%EC%9A%94%EC%99%80-%EB%8F%99%EC%9E%91-%EB%A7%A4%EC%BB%A4%EB%8B%88%EC%A6%98-%EB%B0%8F-%EC%9D%91%EC%9A%A9%EB%B6%84%EC%95%BC&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758275528093&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[5편] 합성곱 신경망(CNN) 개요와 동작 매커니즘 및 응용분야&quot; data-og-description=&quot;목 차합성곱 신경망, CNN(Convolution Neural Network) 알고리즘 개요CNN(Convolution Neural Network) 동작 개념도 및 동작 메커니즘 세부 설명CNN(Convolution Neural Network) 응용분야마무리본 편에서는 합성곱 신경망, &quot; data-og-host=&quot;newitlec.com&quot; data-og-source-url=&quot;https://newitlec.com/entry/5%ED%8E%B8-%ED%95%A9%EC%84%B1%EA%B3%B1-%EC%8B%A0%EA%B2%BD%EB%A7%9DCNN-%EA%B0%9C%EC%9A%94%EC%99%80-%EB%8F%99%EC%9E%91-%EB%A7%A4%EC%BB%A4%EB%8B%88%EC%A6%98-%EB%B0%8F-%EC%9D%91%EC%9A%A9%EB%B6%84%EC%95%BC&quot; data-og-url=&quot;https://newitlec.com/entry/5%ED%8E%B8-%ED%95%A9%EC%84%B1%EA%B3%B1-%EC%8B%A0%EA%B2%BD%EB%A7%9DCNN-%EA%B0%9C%EC%9A%94%EC%99%80-%EB%8F%99%EC%9E%91-%EB%A7%A4%EC%BB%A4%EB%8B%88%EC%A6%98-%EB%B0%8F-%EC%9D%91%EC%9A%A9%EB%B6%84%EC%95%BC&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/eeRkU6/hyZI2qZpui/kvpMRDpYDa2aFn8UdBoKcK/img.jpg?width=800&amp;amp;height=533&amp;amp;face=0_0_800_533,https://scrap.kakaocdn.net/dn/bPkK6g/hyZJsh7Xxy/sycG1Q6Wek3mlKPbJBxfm1/img.jpg?width=800&amp;amp;height=533&amp;amp;face=0_0_800_533,https://scrap.kakaocdn.net/dn/zBDd4/hyZJfKz8kG/kBcS04tOdQSEqbYBoobi30/img.png?width=844&amp;amp;height=433&amp;amp;face=0_0_844_433&quot;&gt;&lt;a href=&quot;https://newitlec.com/entry/5%ED%8E%B8-%ED%95%A9%EC%84%B1%EA%B3%B1-%EC%8B%A0%EA%B2%BD%EB%A7%9DCNN-%EA%B0%9C%EC%9A%94%EC%99%80-%EB%8F%99%EC%9E%91-%EB%A7%A4%EC%BB%A4%EB%8B%88%EC%A6%98-%EB%B0%8F-%EC%9D%91%EC%9A%A9%EB%B6%84%EC%95%BC&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://newitlec.com/entry/5%ED%8E%B8-%ED%95%A9%EC%84%B1%EA%B3%B1-%EC%8B%A0%EA%B2%BD%EB%A7%9DCNN-%EA%B0%9C%EC%9A%94%EC%99%80-%EB%8F%99%EC%9E%91-%EB%A7%A4%EC%BB%A4%EB%8B%88%EC%A6%98-%EB%B0%8F-%EC%9D%91%EC%9A%A9%EB%B6%84%EC%95%BC&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/eeRkU6/hyZI2qZpui/kvpMRDpYDa2aFn8UdBoKcK/img.jpg?width=800&amp;amp;height=533&amp;amp;face=0_0_800_533,https://scrap.kakaocdn.net/dn/bPkK6g/hyZJsh7Xxy/sycG1Q6Wek3mlKPbJBxfm1/img.jpg?width=800&amp;amp;height=533&amp;amp;face=0_0_800_533,https://scrap.kakaocdn.net/dn/zBDd4/hyZJfKz8kG/kBcS04tOdQSEqbYBoobi30/img.png?width=844&amp;amp;height=433&amp;amp;face=0_0_844_433');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[5편] 합성곱 신경망(CNN) 개요와 동작 매커니즘 및 응용분야&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;목 차합성곱 신경망, CNN(Convolution Neural Network) 알고리즘 개요CNN(Convolution Neural Network) 동작 개념도 및 동작 메커니즘 세부 설명CNN(Convolution Neural Network) 응용분야마무리본 편에서는 합성곱 신경망,&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;newitlec.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;</description>
      <category>Computer Vision1/Paper reviews</category>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/110</guid>
      <comments>https://c0mputermaster.tistory.com/110#entry110comment</comments>
      <pubDate>Fri, 5 Sep 2025 13:34:47 +0900</pubDate>
    </item>
    <item>
      <title>[ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet</title>
      <link>https://c0mputermaster.tistory.com/109</link>
      <description>&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size18&quot;&gt;이전 논문 리뷰에 이어서 ILSVRC논문을 정리해보았다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/85&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.05.07 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks)&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758277256521&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks)&quot; data-og-description=&quot;ILSVRC(ImageNet Large-Scale Visual Recognition Challenge)이란?- 2010년 ~ 2017년 매년 개최된 국제 컴퓨터 비전 경진대회로 대규모 데이터셋(ImageNet)을 기반으로 이미지 인식 성능을 겨루었던 역사적인 대회- 2012&quot; data-og-host=&quot;c0mputermaster.tistory.com&quot; data-og-source-url=&quot;https://c0mputermaster.tistory.com/85&quot; data-og-url=&quot;https://c0mputermaster.tistory.com/85&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/SPxhF/hyZJhhmvul/Yatm3FnEkwCKp3SzGM0x90/img.png?width=800&amp;amp;height=472&amp;amp;face=0_0_800_472,https://scrap.kakaocdn.net/dn/b042aM/hyZJlXAlDn/XqKifi9npdm3qzmgtVbtj1/img.png?width=800&amp;amp;height=472&amp;amp;face=0_0_800_472,https://scrap.kakaocdn.net/dn/qpi9B/hyZJpsaQJz/EmwKyQsIlXnCvxK2RVO8B1/img.png?width=497&amp;amp;height=633&amp;amp;face=0_0_497_633&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/85&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://c0mputermaster.tistory.com/85&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/SPxhF/hyZJhhmvul/Yatm3FnEkwCKp3SzGM0x90/img.png?width=800&amp;amp;height=472&amp;amp;face=0_0_800_472,https://scrap.kakaocdn.net/dn/b042aM/hyZJlXAlDn/XqKifi9npdm3qzmgtVbtj1/img.png?width=800&amp;amp;height=472&amp;amp;face=0_0_800_472,https://scrap.kakaocdn.net/dn/qpi9B/hyZJpsaQJz/EmwKyQsIlXnCvxK2RVO8B1/img.png?width=497&amp;amp;height=633&amp;amp;face=0_0_497_633');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks)&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;ILSVRC(ImageNet Large-Scale Visual Recognition Challenge)이란?- 2010년 ~ 2017년 매년 개최된 국제 컴퓨터 비전 경진대회로 대규모 데이터셋(ImageNet)을 기반으로 이미지 인식 성능을 겨루었던 역사적인 대회- 2012&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;c0mputermaster.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;VGGNet&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1036&quot; data-origin-height=&quot;617&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dgLhmB/btsQwOndbYG/b2NDxPDZSzvn937IL4Ddg1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dgLhmB/btsQwOndbYG/b2NDxPDZSzvn937IL4Ddg1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dgLhmB/btsQwOndbYG/b2NDxPDZSzvn937IL4Ddg1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdgLhmB%2FbtsQwOndbYG%2Fb2NDxPDZSzvn937IL4Ddg1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;632&quot; height=&quot;376&quot; data-origin-width=&quot;1036&quot; data-origin-height=&quot;617&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;35&quot; data-end=&quot;85&quot;&gt;&lt;b&gt;제안자&lt;/b&gt;: 옥스포드 대학의 Visual Geometry Group (VGG).&lt;/li&gt;
&lt;li data-start=&quot;86&quot; data-end=&quot;163&quot;&gt;&lt;b&gt;등장&lt;/b&gt;: ILSVRC 2014 이미지넷 대회에서 분류(Classification) 2등, 위치(Localization) 1등.&lt;/li&gt;
&lt;li data-start=&quot;164&quot; data-end=&quot;241&quot;&gt;&lt;b&gt;의의&lt;/b&gt;: 구조가 단순하면서도 강력하여, 이후 컴퓨터 비전 모델의&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;백본(Feature Extractor)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;으로 널리 사용됨.&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;263&quot; data-end=&quot;342&quot;&gt;&lt;b&gt;VGG-16 / VGG-19&lt;/b&gt;: 학습되는 층(Convolution + Fully Connected)을 각각 16개, 19개 사용.&lt;/li&gt;
&lt;li data-start=&quot;343&quot; data-end=&quot;462&quot;&gt;&lt;b&gt;특징&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;355&quot; data-end=&quot;462&quot;&gt;
&lt;li data-start=&quot;355&quot; data-end=&quot;390&quot;&gt;&lt;b&gt;3&amp;times;3 Convolution&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;필터를 일관적으로 사용.&lt;/li&gt;
&lt;li data-start=&quot;393&quot; data-end=&quot;419&quot;&gt;2&amp;times;2 Max Pooling으로 다운샘플링.&lt;/li&gt;
&lt;li data-start=&quot;422&quot; data-end=&quot;462&quot;&gt;Fully Connected Layer 3개 + Softmax 출력.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;463&quot; data-end=&quot;562&quot;&gt;&lt;b&gt;성능&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;475&quot; data-end=&quot;562&quot;&gt;
&lt;li data-start=&quot;475&quot; data-end=&quot;503&quot;&gt;Top-5 Accuracy:&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;92.7%&lt;/b&gt;.&lt;/li&gt;
&lt;li data-start=&quot;506&quot; data-end=&quot;562&quot;&gt;단순하고 규칙적인 구조 &amp;rarr; 재사용성, 전이학습(Transfer Learning)에서 높은 활용도.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;핵심 아이디어&lt;/b&gt;&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;595&quot; data-end=&quot;645&quot;&gt;&lt;b&gt;기존&lt;/b&gt;: AlexNet 등에서는 11&amp;times;11, 5&amp;times;5 등 다양한 크기의 필터 사용.&lt;/li&gt;
&lt;li data-start=&quot;646&quot; data-end=&quot;687&quot;&gt;&lt;b&gt;VGG&lt;/b&gt;: 모든 Conv Layer를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;3&amp;times;3 필터&lt;/b&gt;로 통일.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1198&quot; data-origin-height=&quot;601&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bi8ebq/btsQur0So61/G9lxfaY2MRMzC1F2womfWk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bi8ebq/btsQur0So61/G9lxfaY2MRMzC1F2womfWk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bi8ebq/btsQur0So61/G9lxfaY2MRMzC1F2womfWk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbi8ebq%2FbtsQur0So61%2FG9lxfaY2MRMzC1F2womfWk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;620&quot; height=&quot;311&quot; data-origin-width=&quot;1198&quot; data-origin-height=&quot;601&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;706&quot; data-end=&quot;849&quot;&gt;&lt;b&gt;비선형성(Non-linearity) 증가&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;741&quot; data-end=&quot;849&quot;&gt;
&lt;li data-start=&quot;741&quot; data-end=&quot;762&quot;&gt;7&amp;times;7 한 번 = ReLU 1번&lt;/li&gt;
&lt;li data-start=&quot;766&quot; data-end=&quot;849&quot;&gt;3&amp;times;3 세 번 = ReLU 3번&lt;br /&gt;&amp;rarr; 같은 수용영역(Receptive Field)을 가지면서도 더 많은 비선형성 확보 &amp;rarr; 표현력이 강해짐.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;851&quot; data-end=&quot;974&quot;&gt;&lt;b&gt;파라미터 수 감소&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;873&quot; data-end=&quot;974&quot;&gt;
&lt;li data-start=&quot;873&quot; data-end=&quot;899&quot;&gt;7&amp;times;7 필터 1개 = 49개의 파라미터.&lt;/li&gt;
&lt;li data-start=&quot;903&quot; data-end=&quot;974&quot;&gt;3&amp;times;3 필터 3개 = 27개의 파라미터.&lt;br /&gt;&amp;rarr; 같은 리셉티브 필드지만&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;더 적은 파라미터&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;rarr; 과적합 위험 감소.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;976&quot; data-end=&quot;1063&quot;&gt;&lt;b&gt;표현력과 성능의 균형&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;1000&quot; data-end=&quot;1063&quot;&gt;
&lt;li data-start=&quot;1000&quot; data-end=&quot;1020&quot;&gt;더 깊은 네트워크 구조 가능.&lt;/li&gt;
&lt;li data-start=&quot;1024&quot; data-end=&quot;1063&quot;&gt;작은 필터를 여러 번 쌓음으로써 복잡한 패턴을 더 효과적으로 학습.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;구조가 단순하고 성능이 좋아 오늘날에도 Object Detection, Segmentation 등 다양한 CV 작업에서 백본(Feature Extractor)으로 쓰이고, 이미지 스타일 변환, 초해상화 등에서 손실 계산용 네트워크( Perceptual Loss)로 자주 활용됨.&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;GoogleNet&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1107&quot; data-origin-height=&quot;589&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/2u4Kk/btsQt4dQdaq/W6CVIcEH3N1WnVsO9KVkK0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/2u4Kk/btsQt4dQdaq/W6CVIcEH3N1WnVsO9KVkK0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/2u4Kk/btsQt4dQdaq/W6CVIcEH3N1WnVsO9KVkK0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F2u4Kk%2FbtsQt4dQdaq%2FW6CVIcEH3N1WnVsO9KVkK0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;622&quot; height=&quot;331&quot; data-origin-width=&quot;1107&quot; data-origin-height=&quot;589&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;(Inception v1)&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;119&quot; data-end=&quot;128&quot;&gt;배경&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;129&quot; data-end=&quot;273&quot;&gt;
&lt;li data-start=&quot;129&quot; data-end=&quot;175&quot;&gt;&lt;b&gt;발표&lt;/b&gt;: ILSVRC 2014 ImageNet 대회&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;1등&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;모델.&lt;/li&gt;
&lt;li data-start=&quot;176&quot; data-end=&quot;232&quot;&gt;&lt;b&gt;명칭&lt;/b&gt;: GoogLeNet, 혹은 Inception v1. (LeNet을 오마주한 이름)&lt;/li&gt;
&lt;li data-start=&quot;233&quot; data-end=&quot;273&quot;&gt;&lt;b&gt;의의&lt;/b&gt;: &amp;ldquo;더 깊게 쌓되, 효율적으로 설계할 방법&amp;rdquo;을 제안.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;280&quot; data-end=&quot;292&quot;&gt;기본 구조&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;293&quot; data-end=&quot;642&quot;&gt;
&lt;li data-start=&quot;293&quot; data-end=&quot;347&quot;&gt;&lt;b&gt;Stem&lt;/b&gt;: 입력 이미지를 다운샘플링하여 작은 크기의 feature map으로 변환.&lt;/li&gt;
&lt;li data-start=&quot;348&quot; data-end=&quot;499&quot;&gt;&lt;b&gt;Inception Module&lt;/b&gt;: GoogLeNet의 핵심 블록.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;394&quot; data-end=&quot;499&quot;&gt;
&lt;li data-start=&quot;394&quot; data-end=&quot;406&quot;&gt;1&amp;times;1 Conv&lt;/li&gt;
&lt;li data-start=&quot;409&quot; data-end=&quot;421&quot;&gt;3&amp;times;3 Conv&lt;/li&gt;
&lt;li data-start=&quot;424&quot; data-end=&quot;436&quot;&gt;5&amp;times;5 Conv&lt;/li&gt;
&lt;li data-start=&quot;439&quot; data-end=&quot;499&quot;&gt;3&amp;times;3 Max Pooling&lt;br /&gt;&amp;rarr; 모두 병렬 실행 후&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Concatenation&lt;/b&gt;으로 합침.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-start=&quot;500&quot; data-end=&quot;543&quot;&gt;&lt;b&gt;Inception 모듈 개수&lt;/b&gt;: 9개 쌓아서 전체 네트워크 형성.&lt;/li&gt;
&lt;li data-start=&quot;544&quot; data-end=&quot;642&quot;&gt;&lt;b&gt;Auxiliary Classifier (보조 분류기)&lt;/b&gt;: 중간에 2개 삽입, 학습 시에만 사용하여&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;vanishing gradient 방지&lt;/b&gt;. (추론 시에는 제거)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1022&quot; data-origin-height=&quot;587&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bJXFyC/btsQu6a6kB8/TjKI8PlTUkRFhtwv1Zipk1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bJXFyC/btsQu6a6kB8/TjKI8PlTUkRFhtwv1Zipk1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bJXFyC/btsQu6a6kB8/TjKI8PlTUkRFhtwv1Zipk1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbJXFyC%2FbtsQu6a6kB8%2FTjKI8PlTUkRFhtwv1Zipk1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;728&quot; height=&quot;418&quot; data-origin-width=&quot;1022&quot; data-origin-height=&quot;587&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;649&quot; data-end=&quot;675&quot;&gt;1&amp;times;1 Convolution의 의미&lt;/h3&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1228&quot; data-origin-height=&quot;581&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/MTVFb/btsQufM6ATr/urPzfJaHXlXAoQcaMU3Bl1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/MTVFb/btsQufM6ATr/urPzfJaHXlXAoQcaMU3Bl1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/MTVFb/btsQufM6ATr/urPzfJaHXlXAoQcaMU3Bl1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FMTVFb%2FbtsQufM6ATr%2FurPzfJaHXlXAoQcaMU3Bl1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;649&quot; height=&quot;307&quot; data-origin-width=&quot;1228&quot; data-origin-height=&quot;581&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;973&quot; data-end=&quot;1085&quot;&gt;
&lt;li data-start=&quot;676&quot; data-end=&quot;705&quot;&gt;공간적 정보는 거의 보지 않음 (픽셀 단위).&lt;/li&gt;
&lt;li data-start=&quot;706&quot; data-end=&quot;748&quot;&gt;&lt;b&gt;채널 축을 압축/확장&lt;/b&gt;하는 역할 &amp;rarr; 파라미터 수와 연산량 감소.&lt;/li&gt;
&lt;li data-start=&quot;749&quot; data-end=&quot;798&quot;&gt;예: 192채널 입력 &amp;rarr; 1&amp;times;1 Conv(64 filters) &amp;rarr; 64채널 출력.&lt;/li&gt;
&lt;li data-start=&quot;799&quot; data-end=&quot;887&quot;&gt;&lt;b&gt;효과&lt;/b&gt;:
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot; data-start=&quot;811&quot; data-end=&quot;887&quot;&gt;
&lt;li data-start=&quot;811&quot; data-end=&quot;847&quot;&gt;연산량 줄임 (3&amp;times;3이나 5&amp;times;5 Conv 앞단에 사용).&lt;/li&gt;
&lt;li data-start=&quot;850&quot; data-end=&quot;887&quot;&gt;채널 간 조합을 통해 더 다양한 feature 표현 가능.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;1x1 conv랑 FC랑 효과가 같다?&lt;/h4&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;Fully Connected Layer (FC)는 입력 벡터의 모든 요소와 출력 노드가&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;전부 연결&lt;/b&gt;됨. 예를 들어, 입력이 크기&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;N&lt;/span&gt;&lt;/span&gt;이고 출력이 크기&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;M&lt;/span&gt;&lt;/span&gt;이라면,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;가중치 행렬&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;&amp;nbsp;으로 곱한 후 bias를 더해서 출력 계산.&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;1&amp;times;1 Convolution은? 한 픽셀 위치에서, 입력 채널들을 모두 받아서 선형 결합 후 출력 채널을 만듦. (&lt;span&gt;h&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;w)에서 채널 백터에 가중치 행렬을 곱함&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;채널 축에서의 선형 변환&lt;/b&gt;이라는 점에서 수학적으로 동일하고 1&amp;times;1 Conv는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;픽셀 위치별&lt;/b&gt;로 독립적으로 FC를 적용하는 것과 같다.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-end=&quot;1104&quot; data-start=&quot;1092&quot;&gt;성능/특징&lt;/h3&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;AlexNet: 8개의 레이어, 파라미터 수 60M.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;GoogLeNet: 훨씬 깊은 구조(22층)이지만&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;파라미터 수 5M&lt;/b&gt;.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;연산량 최적화, 성능 우수하지만 구조가 복잡하여 구현/변형 난이도가 높음. 이후&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Inception v2, v3, v4&lt;/b&gt;에서 개선되어 계속 사용됨.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;ResNet&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1015&quot; data-origin-height=&quot;504&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bELRhc/btsQv2Uwsiw/EJkkxSTlxfmjCZUXSZYNDk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bELRhc/btsQv2Uwsiw/EJkkxSTlxfmjCZUXSZYNDk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bELRhc/btsQv2Uwsiw/EJkkxSTlxfmjCZUXSZYNDk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbELRhc%2FbtsQv2Uwsiw%2FEJkkxSTlxfmjCZUXSZYNDk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;757&quot; height=&quot;376&quot; data-origin-width=&quot;1015&quot; data-origin-height=&quot;504&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size18&quot; data-start=&quot;39&quot; data-end=&quot;69&quot;&gt;이미지 인식 능력이 사람을 뛰어 넘음 152개의&amp;nbsp;&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;39&quot; data-end=&quot;69&quot;&gt;기존 Degradation 문제&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;70&quot; data-end=&quot;199&quot;&gt;
&lt;li data-start=&quot;70&quot; data-end=&quot;157&quot;&gt;층을 계속 쌓으면&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;표현력은&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;늘지만, 실제 학습은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;더 어려워져&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;오히려 훈련/테스트 오류가 커지는 현상(=&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;degradation&lt;/b&gt;).&lt;/li&gt;
&lt;li data-start=&quot;158&quot; data-end=&quot;199&quot;&gt;학습과 최적화 난이도가 높아지고 오버피팅,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;vanishing gradient&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;등의 문제가 생김&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;816&quot; data-origin-height=&quot;387&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/L2cRI/btsQwRSvdgU/Oq02rFGG62w1PlGjInBhhk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/L2cRI/btsQwRSvdgU/Oq02rFGG62w1PlGjInBhhk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/L2cRI/btsQwRSvdgU/Oq02rFGG62w1PlGjInBhhk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FL2cRI%2FbtsQwRSvdgU%2FOq02rFGG62w1PlGjInBhhk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;744&quot; height=&quot;353&quot; data-origin-width=&quot;816&quot; data-origin-height=&quot;387&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;CIFAR-10와 Unseen data에 대해 56레이어와 20레이어에 대해 테스트 결과 레이어가 많은 신경망이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #333333; text-align: left;&quot;&gt;오버피팅,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;vanishing gradient 문제로 인해 Train error가 더 높았음 = 층이 무조건 많다고 좋은 신경망이 아니였다.&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-end=&quot;242&quot; data-start=&quot;201&quot; data-ke-size=&quot;size23&quot;&gt;잔차 학습(Residual Learning, Block)&lt;/h3&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;738&quot; data-origin-height=&quot;455&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ci5VBB/btsQwDGGl3b/k8QwAVFIWXQhIvawXdpy4k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ci5VBB/btsQwDGGl3b/k8QwAVFIWXQhIvawXdpy4k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ci5VBB/btsQwDGGl3b/k8QwAVFIWXQhIvawXdpy4k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fci5VBB%2FbtsQwDGGl3b%2Fk8QwAVFIWXQhIvawXdpy4k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;738&quot; height=&quot;455&quot; data-origin-width=&quot;738&quot; data-origin-height=&quot;455&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;243&quot; data-end=&quot;423&quot;&gt;
&lt;li data-start=&quot;243&quot; data-end=&quot;330&quot;&gt;블록 출력&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;H(x)&lt;/span&gt;&lt;/span&gt;를 직접 학습하지 않고&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;차이&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;F(x)=H(x)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;minus;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;를 학습 = 출력과 입력의 차이를 학&lt;/li&gt;
&lt;li data-start=&quot;331&quot; data-end=&quot;423&quot;&gt;입력&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;지름길(Shortcut)&lt;/b&gt;로 더해&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;아이덴티티 경로&lt;/b&gt;를 유지 &amp;rarr; 정보/그라디언트가 막힘없이 흐름 &amp;rarr; 깊게 쌓아도 학습 용이.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;425&quot; data-end=&quot;442&quot;&gt;레지듀얼 블록 형태&lt;/h3&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;761&quot; data-origin-height=&quot;423&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bKC55b/btsQwNQa35r/vr7xlUY933LVVKSQFdxTQK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bKC55b/btsQwNQa35r/vr7xlUY933LVVKSQFdxTQK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bKC55b/btsQwNQa35r/vr7xlUY933LVVKSQFdxTQK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbKC55b%2FbtsQwNQa35r%2Fvr7xlUY933LVVKSQFdxTQK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;761&quot; height=&quot;423&quot; data-origin-width=&quot;761&quot; data-origin-height=&quot;423&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;CBR? BRC?&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;ResNet에 후속 논문에서는 순서가 바뀜&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;443&quot; data-end=&quot;675&quot;&gt;
&lt;li data-start=&quot;443&quot; data-end=&quot;533&quot;&gt;&lt;b&gt;Basic Block&lt;/b&gt;&amp;nbsp;3&amp;times;3 Conv &amp;rarr; BN &amp;rarr; ReLU &amp;rarr; 3&amp;times;3 Conv &amp;rarr; BN +&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;skip add&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;rarr; ReLU&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;677&quot; data-end=&quot;707&quot;&gt;Batch Normalization?&lt;/h3&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size14&quot; data-start=&quot;677&quot; data-end=&quot;707&quot;&gt;&lt;span style=&quot;color: #333333; font-size: 16px; letter-spacing: 0px;&quot;&gt;정규화를 하는 이유?&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&amp;rarr;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;빠른 수렴&lt;/b&gt;&lt;span style=&quot;color: #333333;&quot;&gt;과&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;정규화 효과&lt;/b&gt;. ( 오버피팅과 기울기 소실 문제 완화 )&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333; font-size: 16px; letter-spacing: 0px;&quot;&gt;&lt;a href=&quot;https://velog.io/@cbkyeong/ML%EC%A0%95%EA%B7%9C%ED%99%94normalization%EC%99%80-%ED%91%9C%EC%A4%80%ED%99%94standardization%EB%8A%94-%EC%99%9C-%ED%95%98%EB%8A%94%EA%B1%B8%EA%B9%8C&quot;&gt;https://velog.io/@cbkyeong/ML%EC%A0%95%EA%B7%9C%ED%99%94normalization%EC%99%80-%ED%91%9C%EC%A4%80%ED%99%94standardization%EB%8A%94-%EC%99%9C-%ED%95%98%EB%8A%94%EA%B1%B8%EA%B9%8C&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758276122418&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[ML]정규화(normalization)와 표준화(standardization)는 왜 하는걸까?&quot; data-og-description=&quot;exploration을 진행하다보니 **인공지능 모델을 훈련시키고 사용할 때, 일반적으로 입력은 0 ~ 1 사이의 값으로 정규화 시켜주는 것이 좋습니다.** 라는 말을 봤는데, 그 말에대한 설명이 없어 개인적&quot; data-og-host=&quot;velog.io&quot; data-og-source-url=&quot;https://velog.io/@cbkyeong/ML%EC%A0%95%EA%B7%9C%ED%99%94normalization%EC%99%80-%ED%91%9C%EC%A4%80%ED%99%94standardization%EB%8A%94-%EC%99%9C-%ED%95%98%EB%8A%94%EA%B1%B8%EA%B9%8C&quot; data-og-url=&quot;https://velog.io/@cbkyeong/ML정규화normalization와-표준화standardization는-왜-하는걸까&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/iRBDz/hyZJl4nrQH/MxF3BgbIFuKAwq8R4VYTwk/img.png?width=738&amp;amp;height=618&amp;amp;face=0_0_738_618,https://scrap.kakaocdn.net/dn/fctCJ/hyZJoz12ub/8llXoUkk60DXOMxvaHOXK0/img.png?width=738&amp;amp;height=618&amp;amp;face=0_0_738_618,https://scrap.kakaocdn.net/dn/dALO2F/hyZJl4nrRZ/gYFqubDc6IdVut45Iv5XT0/img.jpg?width=960&amp;amp;height=1280&amp;amp;face=464_410_556_510&quot;&gt;&lt;a href=&quot;https://velog.io/@cbkyeong/ML%EC%A0%95%EA%B7%9C%ED%99%94normalization%EC%99%80-%ED%91%9C%EC%A4%80%ED%99%94standardization%EB%8A%94-%EC%99%9C-%ED%95%98%EB%8A%94%EA%B1%B8%EA%B9%8C&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://velog.io/@cbkyeong/ML%EC%A0%95%EA%B7%9C%ED%99%94normalization%EC%99%80-%ED%91%9C%EC%A4%80%ED%99%94standardization%EB%8A%94-%EC%99%9C-%ED%95%98%EB%8A%94%EA%B1%B8%EA%B9%8C&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/iRBDz/hyZJl4nrQH/MxF3BgbIFuKAwq8R4VYTwk/img.png?width=738&amp;amp;height=618&amp;amp;face=0_0_738_618,https://scrap.kakaocdn.net/dn/fctCJ/hyZJoz12ub/8llXoUkk60DXOMxvaHOXK0/img.png?width=738&amp;amp;height=618&amp;amp;face=0_0_738_618,https://scrap.kakaocdn.net/dn/dALO2F/hyZJl4nrRZ/gYFqubDc6IdVut45Iv5XT0/img.jpg?width=960&amp;amp;height=1280&amp;amp;face=464_410_556_510');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[ML]정규화(normalization)와 표준화(standardization)는 왜 하는걸까?&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;exploration을 진행하다보니 **인공지능 모델을 훈련시키고 사용할 때, 일반적으로 입력은 0 ~ 1 사이의 값으로 정규화 시켜주는 것이 좋습니다.** 라는 말을 봤는데, 그 말에대한 설명이 없어 개인적&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;velog.io&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;903&quot; data-origin-height=&quot;433&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/mSqkt/btsQuV2PE9b/8KfbvKv2RgknLLW9r7uVl0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/mSqkt/btsQuV2PE9b/8KfbvKv2RgknLLW9r7uVl0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/mSqkt/btsQuV2PE9b/8KfbvKv2RgknLLW9r7uVl0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FmSqkt%2FbtsQuV2PE9b%2F8KfbvKv2RgknLLW9r7uVl0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;666&quot; height=&quot;319&quot; data-origin-width=&quot;903&quot; data-origin-height=&quot;433&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot; data-start=&quot;677&quot; data-end=&quot;707&quot;&gt;&lt;span style=&quot;color: #333333; font-size: 16px; letter-spacing: 0px;&quot;&gt;모든 Conv 뒤에&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;BN&lt;/b&gt;&lt;span style=&quot;color: #333333; font-size: 16px; letter-spacing: 0px;&quot;&gt;을 넣어 내부 공변량 변화(Internal Covariate Shift) 완화 &amp;rarr;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;빠른 수렴&lt;/b&gt;&lt;span style=&quot;color: #333333; font-size: 16px; letter-spacing: 0px;&quot;&gt;과&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;정규화&lt;/b&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;708&quot; data-end=&quot;897&quot;&gt;
&lt;li data-start=&quot;798&quot; data-end=&quot;851&quot;&gt;Conv 경로에서는 보통&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Dropout의 효과가 있어 불필요&lt;/b&gt;&lt;/li&gt;
&lt;li data-start=&quot;852&quot; data-end=&quot;897&quot;&gt;&lt;b&gt;학습 시&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;배치 통계 사용,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;추론 시&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;이동평균(러닝 스탯) 사용.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;933&quot; data-origin-height=&quot;452&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ouhrf/btsQvAqADhr/UEKHeiOThkdMl8KlrBtMRK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ouhrf/btsQvAqADhr/UEKHeiOThkdMl8KlrBtMRK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ouhrf/btsQvAqADhr/UEKHeiOThkdMl8KlrBtMRK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fouhrf%2FbtsQvAqADhr%2FUEKHeiOThkdMl8KlrBtMRK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;740&quot; height=&quot;358&quot; data-origin-width=&quot;933&quot; data-origin-height=&quot;452&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Batch Normalization은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;학습 과정&lt;/b&gt;에서 각 배치 단위 별 다양한 분포를 가진 데이터를 각 배치별(채널)로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;평균&lt;/b&gt;과&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;분산&lt;/b&gt;을 이용해&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;정규화&lt;/b&gt;하는 것이다.&lt;/li&gt;
&lt;li&gt;Batch Normalization는 별도의 과정으로 있는 것이 아닌, 신경망 안에 포함되어 학습시 평균과 분산으로 조정하는 과정이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;span-stylebackground-colore6e6fa-batch-normalization에서-중요한-것은-학습-단계와-추론-단계에서-다르게-적용되어야-한다-span&quot; style=&quot;background-color: #ffffff; color: #212529; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;background-color: #ffffff;&quot;&gt;Batch Normalization도 중요한 것은 Dropout 처럼&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;학습 단계와 추론 단계에서 다르게 적용되어야 한다. 왜?&lt;/b&gt;&lt;/span&gt;&lt;/h4&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff;&quot;&gt;&lt;b&gt;train은 배치 단위로 들어오니까&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;433&quot; data-origin-height=&quot;248&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bL46Gt/btsQvZXSlQH/StFr2qU6aoSF7Uqc4wf3xK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bL46Gt/btsQvZXSlQH/StFr2qU6aoSF7Uqc4wf3xK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bL46Gt/btsQvZXSlQH/StFr2qU6aoSF7Uqc4wf3xK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbL46Gt%2FbtsQvZXSlQH%2FStFr2qU6aoSF7Uqc4wf3xK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;433&quot; height=&quot;248&quot; data-origin-width=&quot;433&quot; data-origin-height=&quot;248&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;여기서 x를 통해 평균과 분산을 구할 수 있지만&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;u&gt;test에서는 배치가 없어서 평균과 분산의 개념이 없음&lt;/u&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;그래서 train 과정에서 평균과 분산을 메모리에 저장해놓은뒤 사용되도록 사용하기 때문에 학습과 추론이 다르게 적용된다.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;정리하자면 &amp;ldquo;어떤 CNN을 쓸지 모르겠다면&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;ResNet부터&lt;/b&gt;&amp;rdquo;라는 말이 나올 정도로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;범용 백본이다&lt;/b&gt;.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot; data-start=&quot;913&quot; data-end=&quot;1041&quot;&gt;
&lt;li data-start=&quot;913&quot; data-end=&quot;964&quot;&gt;대표 구성&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;ResNet-152&lt;/b&gt;(152층).&lt;/li&gt;
&lt;li data-start=&quot;965&quot; data-end=&quot;1041&quot;&gt;깊이가 깊어질수록(34 &amp;rarr; 50 &amp;rarr; 101 &amp;rarr; 152)&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;오류 감소&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;그럼에도&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;연산 효율&lt;/b&gt;은 VGG 대비 우수&lt;/li&gt;
&lt;li data-start=&quot;1118&quot; data-end=&quot;1216&quot;&gt;분류뿐 아니라&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Detection/Segmentation/재식별/Metric Learning/Perceptual Loss&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;등&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;거의 모든 CV 작업의 기본 뼈대&lt;/b&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다른 논문들은 이어서 다음 포스팅에서 알아보겠다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a style=&quot;background-color: #e6f5ff; color: #0070d1; text-align: start;&quot; href=&quot;https://c0mputermaster.tistory.com/110&quot;&gt;2025.09.05 - [분류 전체보기] - [ILSVRC 논문 정리해 보기] DenseNet, SENet과 대회 그 이후&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758276710414&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[ILSVRC 논문 정리해 보기] DenseNet, SENet과 대회 그 이후&quot; data-og-description=&quot;2025.06.14 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet [ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNetVGGNet 제안자: 옥스포드 대학의 Visual Geometry Group (VGG).등장: ILSVRC 2014 &quot; data-og-host=&quot;c0mputermaster.tistory.com&quot; data-og-source-url=&quot;https://c0mputermaster.tistory.com/110&quot; data-og-url=&quot;https://c0mputermaster.tistory.com/110&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/cnBhdh/hyZJpsaJed/8LOoiTp5RM7euqT7cJe2Gk/img.png?width=800&amp;amp;height=444&amp;amp;face=0_0_800_444,https://scrap.kakaocdn.net/dn/b5H3NB/hyZJb9e1kV/z8CbUSQYgyytBRizjyMtk0/img.png?width=800&amp;amp;height=444&amp;amp;face=0_0_800_444,https://scrap.kakaocdn.net/dn/49VNX/hyZJJXBWE0/qI8XpirAh2a33qgDHU5Wl1/img.png?width=930&amp;amp;height=456&amp;amp;face=0_0_930_456&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/110&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://c0mputermaster.tistory.com/110&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/cnBhdh/hyZJpsaJed/8LOoiTp5RM7euqT7cJe2Gk/img.png?width=800&amp;amp;height=444&amp;amp;face=0_0_800_444,https://scrap.kakaocdn.net/dn/b5H3NB/hyZJb9e1kV/z8CbUSQYgyytBRizjyMtk0/img.png?width=800&amp;amp;height=444&amp;amp;face=0_0_800_444,https://scrap.kakaocdn.net/dn/49VNX/hyZJJXBWE0/qI8XpirAh2a33qgDHU5Wl1/img.png?width=930&amp;amp;height=456&amp;amp;face=0_0_930_456');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[ILSVRC 논문 정리해 보기] DenseNet, SENet과 대회 그 이후&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;2025.06.14 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet [ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNetVGGNet 제안자: 옥스포드 대학의 Visual Geometry Group (VGG).등장: ILSVRC 2014&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;c0mputermaster.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Computer Vision1/Paper reviews</category>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/109</guid>
      <comments>https://c0mputermaster.tistory.com/109#entry109comment</comments>
      <pubDate>Fri, 5 Sep 2025 13:34:36 +0900</pubDate>
    </item>
    <item>
      <title>[Project] Classification Model 구현해보기 (ResNet)</title>
      <link>https://c0mputermaster.tistory.com/108</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;Classification 모델을 구현해보자 먼저 Train과 Test 구조를 정의&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Import&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1758384907011&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim

import os
import matplotlib.pyplot as plt&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; torch.nn as nn&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;469&quot; data-start=&quot;443&quot;&gt;신경망 레이어와 모델 구축을 위한 모듈.&lt;/li&gt;
&lt;li data-end=&quot;532&quot; data-start=&quot;470&quot;&gt;예: nn.Conv2d, nn.Linear, nn.BatchNorm2d 등을 사용해 레이어 정의.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; torch.nn.functional as F&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;616&quot; data-start=&quot;568&quot;&gt;활성화 함수나 pooling 같은 연산을 함수형(functional)으로 제공.&lt;/li&gt;
&lt;li data-end=&quot;656&quot; data-start=&quot;617&quot;&gt;예: F.relu(x), F.max_pool2d(x, 2).&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; torch.optim as optim&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- SGD, Adam, RMSprop 등 다양한 Optimizer 제공.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;torchvision&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- CIFAR-10, ImageNet 등 유명 데이터셋과 미리 정의된 모델들(ResNet, VGG 등) 제공.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1758385057654&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;def accuracy(output, target, topk=(1, )):
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        acc = []
        num_cor = []
        for k in topk:
            correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
            num_cor.append(correct_k.clone())
            acc.append(correct_k.mul_(1 / batch_size))
    return acc, num_cor&lt;/code&gt;&lt;/pre&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;1857&quot; data-start=&quot;1820&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;top-k 분류 정확도를 계산&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1974&quot; data-start=&quot;1858&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1891&quot; data-start=&quot;1858&quot;&gt;topk=(1,) &amp;rarr; 일반적인 top-1 정확도.&lt;/li&gt;
&lt;li data-end=&quot;1974&quot; data-start=&quot;1892&quot;&gt;topk=(1,5) &amp;rarr; top-1 정확도와 top-5 정확도를 동시에 계산 가능.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre id=&quot;code_1758385343721&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;def initialize_weights(module):
    if isinstance(module, nn.Conv2d):
        nn.init.kaiming_normal_(module.weight.data, mode='fan_out')
    elif isinstance(module, nn.BatchNorm2d):
        module.weight.data.fill_(1)
        module.bias.data.zero_()
    elif isinstance(module, nn.Linear):
        module.bias.data.zero_()&lt;/code&gt;&lt;/pre&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;만약 레이어가 &lt;b&gt;합성곱(Conv2D)&lt;/b&gt; 레이어라면 He(Kaiming) 초기화 [각 층의 가중치를 입력 노드 수의 역수에 비례하는 분산을 가진 분포에서 무작위로 선택하여 초기화]&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://resultofeffort.tistory.com/114&quot;&gt;https://resultofeffort.tistory.com/114&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758385482856&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[Deep learning] 가중치 초기화(weight initialization) (feat. Xavier, He,normal, uniform)&quot; data-og-description=&quot;0. 딥러닝 모델 학습 / 모델 훈련 프로세스1. 모델 초기화(Initialization):&amp;nbsp;최초 가중치(weight) 값을 설정합니다.2. 예측(Prediction):&amp;nbsp;설정된 가중치와 입력 feature(X) 값을 사용하여 예측 값을 계산합니다.&quot; data-og-host=&quot;resultofeffort.tistory.com&quot; data-og-source-url=&quot;https://resultofeffort.tistory.com/114&quot; data-og-url=&quot;https://resultofeffort.tistory.com/114&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/UnjuV/hyZI7eSNF5/JR8nffR6dg8D9kwRQgdce0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/bqu6aM/hyZI1TgWs8/A3urbAh3L6UF4hfw4FgBu0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://resultofeffort.tistory.com/114&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://resultofeffort.tistory.com/114&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/UnjuV/hyZI7eSNF5/JR8nffR6dg8D9kwRQgdce0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/bqu6aM/hyZI1TgWs8/A3urbAh3L6UF4hfw4FgBu0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[Deep learning] 가중치 초기화(weight initialization) (feat. Xavier, He,normal, uniform)&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;0. 딥러닝 모델 학습 / 모델 훈련 프로세스1. 모델 초기화(Initialization):&amp;nbsp;최초 가중치(weight) 값을 설정합니다.2. 예측(Prediction):&amp;nbsp;설정된 가중치와 입력 feature(X) 값을 사용하여 예측 값을 계산합니다.&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;resultofeffort.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;fan_out 모드&lt;/b&gt;: 출력 채널 기준으로 분산을 맞춤.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;만약 레이어가 &lt;b&gt;배치 정규화(BatchNorm2D)&lt;/b&gt; 라면 배치 정규화 레이어의 scale(&amp;gamma;)은 1로, shift(&amp;beta;)는 0으로 초기화. = 정규화된 출력 그대로 weight=1, bias=0.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;만약 레이어가 &lt;b&gt;완전연결층(FC, Linear)&lt;/b&gt; 라면 bias 항을 0으로 초기화.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;정규화된 이미지를 다시 원래 값으로 되돌리는 함수&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1758385616404&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;def inverse_normalize(tensor, mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.2010)):
    for t, m, s in zip(tensor, mean, std):
        t.mul_(s).add_(m)
    return tensor&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;345&quot; data-start=&quot;306&quot;&gt;tensor: 정규화된 이미지 텐서 (C, H, W 형태).&lt;/li&gt;
&lt;li data-end=&quot;389&quot; data-start=&quot;346&quot;&gt;mean: 정규화 시 사용한 평균값 (CIFAR-10 채널 평균).&lt;/li&gt;
&lt;li data-end=&quot;433&quot; data-start=&quot;390&quot;&gt;std: 정규화 시 사용한 표준편차 (CIFAR-10 채널 표준편차).&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;일반적으로 학습 전에 이미지를 Normalize(mean, std)로 변환함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이렇게 되면 픽셀 값이 -1~1 근처의 값으로 바뀌어서 &lt;b&gt;시각화할 때 원본 이미지와 다르게 보임 &lt;/b&gt;따라서 다시 원래 값으로 돌려줄 필요가 있기 때문에 정의&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Train&lt;/p&gt;
&lt;pre id=&quot;code_1758385801917&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;def train(epochs):
    best_acc = 0.0
    print('[*] start training')
    for epoch in range(1, epochs):
        model.train() #입력과 출력이 다를 댸
        for step, (data, targets) in enumerate(trainloader): # trainloader에서 데이터(batch) 꺼내오기.
            data = data.to(device, dtype=torch.float)
            targets = targets.to(device)
            optimizer.zero_grad() # optimizer.zero_grad() &amp;rarr; 이전 배치에서 계산된 gradient 초기화.

            outputs = model(data) # 모델에 입력 넣고 예측값 outputs
            loss = nn.CrossEntropyLoss(reduction='mean')(outputs, targets) # 손실 함수

            loss.backward() # gradient 계산 (backpropagation).
            optimizer.step() # Adam optimizer로 가중치 업데이트.

            loss = loss.item()
            acc, _ = accuracy(outputs, targets)
            acc = acc[0].item()

            if step % 10 == 0: # 중간 학습 상황 출력.
                print(f'[Epoch {epoch}/{epochs}, Step {step}/{len(trainloader)}] Loss {loss:.4f}, Accuracy {acc:.4f}')
        scheduler.step() # 에폭이 끝날 때마다 학습률(learning rate) 조정


        model.eval() # 평가 모드 전환 (Dropout 비활성화, BatchNorm 고정).
        total_cor = 0
        total_samples = 0

        with torch.no_grad():
            for step, (data, targets) in enumerate(testloader):
                data = data.to(device, dtype=torch.float)
                targets = targets.to(device)
                outputs = model(data)
                _, num_cor = accuracy(outputs, targets)
                num_cor = num_cor[0].item()
                total_samples += data.size(0)
                total_cor += num_cor
            acc = total_cor / total_samples
            print(f'Epoch {epoch} : Test Accuracy {acc:.4f}')

        if acc &amp;gt; best_acc: # 모델 저장 새로운 최고 정확도 달성 시 &amp;rarr; 모델 저장.
            print('[*] model saving...')
            state = {
                'model': model.state_dict(),#모델 가중치만 저장
                'acc': acc,
                'epoch': epoch,
            }
            if not os.path.isdir('ckpt_0'):
                os.mkdir('ckpt_0')

            path = f'ckpt_0/model_{model.__class__.__name__}_state_{epoch:03d}_{acc:.4f}.st'
            torch.save(state, path)
            best_acc = acc
    print(f'Best Test Accuracy {best_acc:.4f}')&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;best_acc = 최고 test accuracy 저장&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;model.train() &amp;rarr; 학습 모드 전환. (Dropout, BatchNorm 등 학습용 동작 활성화)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1758386180117&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;def test(ckpt_path):
    print(f'[*] load {ckpt_path}')
    model.eval() #평가 모드 전환
    state_dict = torch.load(ckpt_path) # 학습된 모델 checkpoint 파일 불러오기.
    model.load_state_dict(state_dict['model'], strict=True) # 저장된 가중치 불러오기.
    # (strict=True &amp;rarr; 저장된 키와 모델 구조가 정확히 일치해야 함)

    total_cor = 0
    total_samples = 0
    with torch.no_grad():
        for step, (data, targets) in enumerate(testloader):
            data = data.to(device, dtype=torch.float)
            targets = targets.to(device)
            outputs = model(data)
            _, num_cor = accuracy(outputs, targets)
            num_cor = num_cor[0].item()
            total_samples += data.size(0)
            total_cor += num_cor
        acc = total_cor / total_samples # 전체 정답 수 &amp;divide; 전체 샘플 수 &amp;rarr; 최종 Test Accuracy 출력.
        print(f'Test Accuracy {acc:.4f} of Loaded Model {model.__class__.__name__}')

        # Visualize
        images = []
        pred_classes = []
        labels = []
        pred = outputs.topk(1, dim=1, largest=True, sorted=True) # 각 샘플의 예측 클래스(Top-1) 뽑기
        fig, axes = plt.subplots(3, 3, figsize=(15, 5))  # 3 row, 3 columns
        axes = axes.flatten()
        for k in range(9):  # check only the first 9 images
            images.append(inverse_normalize(data[k, :, :, :]).detach().cpu().permute(1, 2, 0).numpy())
            pred_classes.append(classes[pred[1][k].item()])
            labels.append(classes[targets[k].item()])
        for k, image in enumerate(images):
            axes[k].imshow(image)
            axes[k].axis('off')
            axes[k].set_title(f'label: {labels[k]}, pred: {pred_classes[k]}', fontsize=10)
        plt.tight_layout()
        plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2213&quot; data-start=&quot;2190&quot;&gt;저장된 모델 checkpoint 로드&lt;/li&gt;
&lt;li data-end=&quot;2245&quot; data-start=&quot;2216&quot;&gt;전체 test dataset에 대해 정확도 계산&lt;/li&gt;
&lt;li data-end=&quot;2282&quot; data-start=&quot;2248&quot;&gt;일부 이미지를 &lt;b&gt;예측 결과 vs 실제 라벨&lt;/b&gt;로 시각화&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Main - train&lt;/p&gt;
&lt;pre id=&quot;code_1758386521928&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;if __name__ == '__main__':
    # 학습에 사용할 디바이스 설정 (CUDA가 있으면 GPU, 없으면 CPU)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(device)

    # --------------------
    # 1. 데이터 준비
    # --------------------
    print('[*] preparing data')
    # 학습 데이터 변환 (데이터 증강 포함)
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),         # 랜덤 크롭 (여백 포함)
        transforms.RandomHorizontalFlip(),            # 랜덤 좌우 반전
        transforms.ToTensor(),                        # 텐서 변환
        transforms.Normalize((0.4914, 0.4822, 0.4465),# 채널별 평균 정규화
                             (0.2023, 0.1994, 0.2010))# 채널별 표준편차 정규화
    ])

    # 테스트 데이터 변환 (증강 X, 정규화만 적용)
    transform_test = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465),
                             (0.2023, 0.1994, 0.2010))
    ])

    # CIFAR-10 학습 데이터셋 (5만 장)
    trainset = torchvision.datasets.CIFAR10(
        root='./data', train=True, download=True, transform=transform_train)
    # 학습 데이터 로더
    trainloader = torch.utils.data.DataLoader(
        trainset, batch_size=128, shuffle=True, num_workers=2)

    # CIFAR-10 테스트 데이터셋 (1만 장)
    testset = torchvision.datasets.CIFAR10(
        root='./data', train=False, download=True, transform=transform_test)
    # 테스트 데이터 로더
    testloader = torch.utils.data.DataLoader(
        testset, batch_size=100, shuffle=False, num_workers=2)

    # CIFAR-10 클래스 이름
    classes = ('plane', 'car', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck')

    # --------------------
    # 2. 모델 정의
    # --------------------
    print('[*] building model')
    # model = ToyNetwork()   # 간단한 CNN (연습용)
    model = ResNet50()       # 실제 학습에 사용할 ResNet50 모델
    model.to(device)         # GPU/CPU 디바이스에 올리기

    # --------------------
    # 3. 손실 함수
    # --------------------
    criterion = nn.CrossEntropyLoss(reduction='mean') # 다중 분류용 손실 함수

    # --------------------
    # 4. 최적화 도구 &amp;amp; 학습률 스케줄러
    # --------------------
    epochs = 100
    params = model.parameters()
    optimizer = optim.Adam(params, lr=1e-3) # Adam Optimizer (lr=0.001)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=epochs)             # Cosine Annealing 학습률 스케줄러

    # --------------------
    # 5. 학습 시작
    # --------------------
    train(epochs)  # 학습 루프 실행&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Test&lt;/p&gt;
&lt;pre id=&quot;code_1758386640694&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;    directory = './ckpt_0'
    ckpt_list = os.listdir(directory) # ckpt_0 폴더 안의 모든 파일 목록 불러오기.
    ckpt_list = [f for f in ckpt_list if os.path.isfile(os.path.join(directory, f)) and model.__class__.__name__ in f]
    # 리스트 컴프리헨션으로 필터링 파일 이름에 현재 모델 이름 포함된 것만 선택
    ckpt_list.sort()
    ckpt_path = os.path.join(directory, ckpt_list[-1]) # 가장 최신 checkpoint 파일 경로 얻기.
    print(ckpt_path)
    test(ckpt_path=ckpt_path)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Model ( ResNet50 )&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/109&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.09.05 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758386937372&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&quot; data-og-description=&quot;이전 논문 리뷰에 이어서 ILSVRC논문을 정리해보았다.2025.05.07 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks) [ILSVRC 논문 정리해 보기]&quot; data-og-host=&quot;c0mputermaster.tistory.com&quot; data-og-source-url=&quot;https://c0mputermaster.tistory.com/109&quot; data-og-url=&quot;https://c0mputermaster.tistory.com/109&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/rQvsC/hyZJtuR6QL/xG4nYvPEQGhDyAockuHJl0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/cu9Ntg/hyZJGfD0ME/tewgNZl8JY73Rqt1qQGZq0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/bPPeuc/hyZJU5tzsZ/AmZqCjWz7rdQsClnnls2O1/img.png?width=1022&amp;amp;height=587&amp;amp;face=0_0_1022_587&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/109&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://c0mputermaster.tistory.com/109&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/rQvsC/hyZJtuR6QL/xG4nYvPEQGhDyAockuHJl0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/cu9Ntg/hyZJGfD0ME/tewgNZl8JY73Rqt1qQGZq0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/bPPeuc/hyZJU5tzsZ/AmZqCjWz7rdQsClnnls2O1/img.png?width=1022&amp;amp;height=587&amp;amp;face=0_0_1022_587');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;이전 논문 리뷰에 이어서 ILSVRC논문을 정리해보았다.2025.05.07 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks) [ILSVRC 논문 정리해 보기]&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;c0mputermaster.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ResNet을 구성하는 기본 단위 블록&lt;/p&gt;
&lt;pre id=&quot;code_1758387061149&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion *
                               planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- BasicBlock&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1758387171805&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;class BasicBlock(nn.Module):
    expansion = 1&lt;/code&gt;&lt;/pre&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;567&quot; data-start=&quot;281&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;322&quot; data-start=&quot;281&quot;&gt;conv1: 3&amp;times;3 Conv, stride는 입력에 따라 다름.&lt;/li&gt;
&lt;li data-end=&quot;353&quot; data-start=&quot;323&quot;&gt;bn1: BatchNorm &amp;rarr; 학습 안정화.&lt;/li&gt;
&lt;li data-end=&quot;381&quot; data-start=&quot;354&quot;&gt;conv2: 또 다른 3&amp;times;3 Conv.&lt;/li&gt;
&lt;li data-end=&quot;403&quot; data-start=&quot;382&quot;&gt;bn2: BatchNorm.&lt;/li&gt;
&lt;li data-end=&quot;567&quot; data-start=&quot;404&quot;&gt;shortcut: 잔차 연결(residual connection).
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;567&quot; data-start=&quot;450&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;474&quot; data-start=&quot;450&quot;&gt;기본은 단순히 identity 연결.&lt;/li&gt;
&lt;li data-end=&quot;567&quot; data-start=&quot;477&quot;&gt;하지만 &lt;b&gt;stride가 바뀌거나(in_planes &amp;ne; planes)&lt;/b&gt; 차원이 달라지면 &amp;rarr;&lt;br /&gt;1&amp;times;1 Conv + BatchNorm으로 차원 맞춰줌.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; forward()&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1758387163460&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x)     # 입력 + 변환된 출력 &amp;rarr; skip connection
out = F.relu(out)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;792&quot; data-start=&quot;745&quot;&gt;입력을 그대로 더하는 skip connection(잔차 연결)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;- Bottleneck&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1758387198256&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;class Bottleneck(nn.Module):
    expansion = 4&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;947&quot; data-start=&quot;920&quot;&gt;ResNet-50 이상에서 사용하는 블록.&lt;/li&gt;
&lt;li data-end=&quot;986&quot; data-start=&quot;948&quot;&gt;expansion=4 &amp;rarr; 출력 채널 수가 입력보다 4배 확장.&lt;/li&gt;
&lt;li data-end=&quot;1040&quot; data-start=&quot;995&quot;&gt;conv1: 1&amp;times;1 Conv &amp;rarr; 채널 축소 (차원 줄여 연산량 감소).&lt;/li&gt;
&lt;li data-end=&quot;1079&quot; data-start=&quot;1041&quot;&gt;conv2: 3&amp;times;3 Conv &amp;rarr; 실제 feature 추출.&lt;/li&gt;
&lt;li data-end=&quot;1130&quot; data-start=&quot;1080&quot;&gt;conv3: 1&amp;times;1 Conv &amp;rarr; 채널 확장 (planes &amp;rarr; planes&amp;times;4).&lt;/li&gt;
&lt;li data-end=&quot;1155&quot; data-start=&quot;1131&quot;&gt;각 conv 뒤에 BatchNorm.&lt;/li&gt;
&lt;li data-end=&quot;1205&quot; data-start=&quot;1156&quot;&gt;shortcut: 마찬가지로 stride나 채널이 다르면 1&amp;times;1 Conv로 매핑.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; forward()&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1758387226375&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;out = F.relu(self.bn1(self.conv1(x)))  # 채널 축소
out = F.relu(self.bn2(self.conv2(out)))# 3&amp;times;3 conv
out = self.bn3(self.conv3(out))        # 채널 확장
out += self.shortcut(x)                # skip connection
out = F.relu(out)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1594&quot; data-start=&quot;1534&quot;&gt;&lt;b&gt;BasicBlock&lt;/b&gt;: 2개의 3&amp;times;3 Conv &amp;rarr; ResNet-18, ResNet-34에 사용.&lt;/li&gt;
&lt;li data-end=&quot;1659&quot; data-start=&quot;1595&quot;&gt;&lt;b&gt;Bottleneck&lt;/b&gt;: 1&amp;times;1 &amp;rarr; 3&amp;times;3 &amp;rarr; 1&amp;times;1 Conv 구조 &amp;rarr; ResNet-50 이상에서 사용.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre id=&quot;code_1758387267659&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])


def ResNet34():
    return ResNet(BasicBlock, [3, 4, 6, 3])


def ResNet50():
    return ResNet(Bottleneck, [3, 4, 6, 3])


def ResNet101():
    return ResNet(Bottleneck, [3, 4, 23, 3])


def ResNet152():
    return ResNet(Bottleneck, [3, 8, 36, 3])


def test():
    net = ResNet18()
    y = net(torch.randn(1, 3, 32, 32))
    print(y.size())&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;class&amp;nbsp;ResNet(nn.Module):&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;def&amp;nbsp;__init__(self,&amp;nbsp;block,&amp;nbsp;num_blocks,&amp;nbsp;num_classes=10):&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;super(ResNet,&amp;nbsp;self).__init__()&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;self.in_planes&amp;nbsp;=&amp;nbsp;64&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;410&quot; data-start=&quot;360&quot;&gt;block: 어떤 블록을 쓸지 (BasicBlock or Bottleneck).&lt;/li&gt;
&lt;li data-end=&quot;452&quot; data-start=&quot;411&quot;&gt;num_blocks: 각 레이어에 몇 개의 블록을 쌓을지 지정.&lt;/li&gt;
&lt;li data-end=&quot;496&quot; data-start=&quot;453&quot;&gt;num_classes: 분류할 클래스 수 (CIFAR-10 &amp;rarr; 10).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;144&quot; data-start=&quot;124&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;수치 표현과 데이터 전처리&lt;/b&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;496&quot; data-start=&quot;145&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;189&quot; data-start=&quot;145&quot;&gt;딥러닝 프레임워크의 기본 자료형은 float32 (32비트 부동소수점).&lt;/li&gt;
&lt;li data-end=&quot;255&quot; data-start=&quot;190&quot;&gt;일부 최적화/압축 기법으로 int4, float16 같은 &lt;b&gt;quantization&lt;/b&gt;을 적용할 수 있음.&lt;/li&gt;
&lt;li data-end=&quot;405&quot; data-start=&quot;256&quot;&gt;데이터 정규화(Normalize):
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;405&quot; data-start=&quot;282&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;313&quot; data-start=&quot;282&quot;&gt;각 채널(R, G, B)의 평균과 표준편차로 정규화.&lt;/li&gt;
&lt;li data-end=&quot;405&quot; data-start=&quot;316&quot;&gt;보통 ImageNet의 통계값(예: 평균 [0.4914, 0.4822, 0.4465], 표준편차 [0.2023, 0.1994, 0.2010])을 많이 사용.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;496&quot; data-start=&quot;406&quot;&gt;학습(train) 때는 데이터 어그멘테이션(RandomCrop, Flip 등)을 적용하지만, 테스트(test) 시에는 동일한 Normalize만 적용.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;521&quot; data-start=&quot;503&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;데이터셋 &amp;amp; 데이터로더&lt;/b&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;874&quot; data-start=&quot;522&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;700&quot; data-start=&quot;522&quot;&gt;&lt;b&gt;데이터셋 클래스 작성법&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;700&quot; data-start=&quot;544&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;576&quot; data-start=&quot;544&quot;&gt;torch.utils.data.Dataset 상속.&lt;/li&gt;
&lt;li data-end=&quot;700&quot; data-start=&quot;579&quot;&gt;필수 구현 메서드:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;700&quot; data-start=&quot;596&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;629&quot; data-start=&quot;596&quot;&gt;__init__: 경로/데이터/라벨 로드 및 초기화.&lt;/li&gt;
&lt;li data-end=&quot;669&quot; data-start=&quot;634&quot;&gt;__getitem__: 인덱스로 데이터와 레이블을 반환.&lt;/li&gt;
&lt;li data-end=&quot;700&quot; data-start=&quot;674&quot;&gt;__len__: 전체 데이터 크기 반환.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;874&quot; data-start=&quot;701&quot;&gt;&lt;b&gt;데이터로더(DataLoader)&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;874&quot; data-start=&quot;728&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;758&quot; data-start=&quot;728&quot;&gt;배치 단위로 데이터를 꺼내 학습 파이프라인에 전달.&lt;/li&gt;
&lt;li data-end=&quot;817&quot; data-start=&quot;761&quot;&gt;인자: dataset, batch_size, shuffle, num_workers.&lt;/li&gt;
&lt;li data-end=&quot;874&quot; data-start=&quot;820&quot;&gt;num_workers: CPU가 GPU 학습 중에도 병렬로 데이터를 전처리해서 성능 향상.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt; 모델 구조 (PyTorch) &lt;/b&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;992&quot; data-start=&quot;903&quot;&gt;커스텀 네트워크는 nn.Module 상속.
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;992&quot; data-start=&quot;933&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;954&quot; data-start=&quot;933&quot;&gt;__init__에 레이어 정의.&lt;/li&gt;
&lt;li data-end=&quot;992&quot; data-start=&quot;957&quot;&gt;forward()에서 순전파 정의 (연산 그래프 구성).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1082&quot; data-start=&quot;993&quot;&gt;학습 가능한 텐서(requires_grad=True)는 **파라미터(필터, 가중치)**이고, 입력 데이터는 보통 requires_grad=False.&lt;/li&gt;
&lt;li data-end=&quot;1167&quot; data-start=&quot;1083&quot;&gt;Convolution Layer 예시:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1167&quot; data-start=&quot;1109&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1167&quot; data-start=&quot;1109&quot;&gt;Conv2d(3, 32, kernel_size=7) &amp;rarr; 파라미터 크기: (32, 3, 7, 7).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1217&quot; data-start=&quot;1168&quot;&gt;1x1 Convolution은 Fully Connected Layer와 동일한 효과.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;1246&quot; data-start=&quot;1224&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;미분 그래프(Autograd)&lt;/b&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1477&quot; data-start=&quot;1247&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1281&quot; data-start=&quot;1247&quot;&gt;PyTorch 핵심: &lt;b&gt;자동 미분(Autograd)&lt;/b&gt;.&lt;/li&gt;
&lt;li data-end=&quot;1416&quot; data-start=&quot;1282&quot;&gt;각 텐서는 다음 3개 멤버를 가짐:
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;1416&quot; data-start=&quot;1306&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;1347&quot; data-start=&quot;1306&quot;&gt;requires_grad: 학습 여부 (True/False).&lt;/li&gt;
&lt;li data-end=&quot;1386&quot; data-start=&quot;1350&quot;&gt;grad_fn: 어떤 연산을 거쳤는지(미분 함수 추적).&lt;/li&gt;
&lt;li data-end=&quot;1416&quot; data-start=&quot;1389&quot;&gt;grad: 역전파 시 계산된 기울기 값.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1477&quot; data-start=&quot;1417&quot;&gt;Forward Pass = 그래프 구성, Backward Pass = 기울기 전파 및 파라미터 업데이트.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;1495&quot; data-start=&quot;1484&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;학습 과정&lt;/b&gt;&lt;/h2&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;1692&quot; data-start=&quot;1496&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;1514&quot; data-start=&quot;1496&quot;&gt;데이터 로더에서 배치 꺼냄.&lt;/li&gt;
&lt;li data-end=&quot;1538&quot; data-start=&quot;1515&quot;&gt;모델 forward &amp;rarr; 출력값 계산.&lt;/li&gt;
&lt;li data-end=&quot;1579&quot; data-start=&quot;1539&quot;&gt;손실함수(loss) 계산 (CrossEntropyLoss 등).&lt;/li&gt;
&lt;li data-end=&quot;1615&quot; data-start=&quot;1580&quot;&gt;loss.backward() &amp;rarr; gradient 계산.&lt;/li&gt;
&lt;li data-end=&quot;1651&quot; data-start=&quot;1616&quot;&gt;옵티마이저(Adam, SGD)가 파라미터 업데이트.&lt;/li&gt;
&lt;li data-end=&quot;1692&quot; data-start=&quot;1652&quot;&gt;루프: (배치 &amp;rarr; 옵티마이저 step) &amp;times; (에폭 수 만큼 반복).&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 data-end=&quot;1715&quot; data-start=&quot;1699&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;학습 &amp;amp; 추론 모드&lt;/b&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1811&quot; data-start=&quot;1716&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1757&quot; data-start=&quot;1716&quot;&gt;model.train(): 드롭아웃/배치정규화가 학습 모드로 동작.&lt;/li&gt;
&lt;li data-end=&quot;1811&quot; data-start=&quot;1758&quot;&gt;model.eval(): 추론 모드 (드롭아웃 비활성화, 배치정규화는 EMA 값 사용).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-end=&quot;1845&quot; data-start=&quot;1818&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;학습률(Learning Rate) 제어&lt;/b&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1974&quot; data-start=&quot;1846&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1872&quot; data-start=&quot;1846&quot;&gt;기본: 일정한 lr (예: 0.001).&lt;/li&gt;
&lt;li data-end=&quot;1974&quot; data-start=&quot;1873&quot;&gt;고급 기법:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1974&quot; data-start=&quot;1884&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1928&quot; data-start=&quot;1884&quot;&gt;&lt;b&gt;Cosine Annealing&lt;/b&gt;: 코사인 함수 곡선 형태로 lr 감소.&lt;/li&gt;
&lt;li data-end=&quot;1974&quot; data-start=&quot;1931&quot;&gt;&lt;b&gt;Warm-up&lt;/b&gt;: 초기에 작은 lr로 시작해 점차 증가시킨 뒤 감소.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;결과 시각화&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1004&quot; data-origin-height=&quot;496&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/DtpN2/btsQNoW0Uvo/aiQa3mNNN2dkYlLJVBpZIK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/DtpN2/btsQNoW0Uvo/aiQa3mNNN2dkYlLJVBpZIK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/DtpN2/btsQNoW0Uvo/aiQa3mNNN2dkYlLJVBpZIK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FDtpN2%2FbtsQNoW0Uvo%2FaiQa3mNNN2dkYlLJVBpZIK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;672&quot; height=&quot;332&quot; data-origin-width=&quot;1004&quot; data-origin-height=&quot;496&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모델의 더 자세한 정보는 이전에 정리한 포스팅을 참고하면 좋겠다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/109&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.09.05 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758716209473&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&quot; data-og-description=&quot;이전 논문 리뷰에 이어서 ILSVRC논문을 정리해보았다.2025.05.07 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks) [ILSVRC 논문 정리해 보기]&quot; data-og-host=&quot;c0mputermaster.tistory.com&quot; data-og-source-url=&quot;https://c0mputermaster.tistory.com/109&quot; data-og-url=&quot;https://c0mputermaster.tistory.com/109&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/dD17Ci/hyZJZzvksR/AF40JuxwHYKle4cZbEQlo0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/HwcQh/hyZJXuWlBH/nK2vCw3ud3IKTtAtyvUmU0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/z5k9V/hyZJxdyqeh/sURFj59J8O8A0OBxNaNowK/img.png?width=1022&amp;amp;height=587&amp;amp;face=0_0_1022_587&quot;&gt;&lt;a href=&quot;https://c0mputermaster.tistory.com/109&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://c0mputermaster.tistory.com/109&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/dD17Ci/hyZJZzvksR/AF40JuxwHYKle4cZbEQlo0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/HwcQh/hyZJXuWlBH/nK2vCw3ud3IKTtAtyvUmU0/img.png?width=800&amp;amp;height=378&amp;amp;face=0_0_800_378,https://scrap.kakaocdn.net/dn/z5k9V/hyZJxdyqeh/sURFj59J8O8A0OBxNaNowK/img.png?width=1022&amp;amp;height=587&amp;amp;face=0_0_1022_587');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[ILSVRC 논문 정리해 보기] VGGNet, GoogleNet, ResNet&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;이전 논문 리뷰에 이어서 ILSVRC논문을 정리해보았다.2025.05.07 - [Computer Vision1/Paper reviews] - [ILSVRC 논문 정리해 보기] AlexNet (ImageNet Classification with Deep Convolutional Neural Networks) [ILSVRC 논문 정리해 보기]&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;c0mputermaster.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Computer Vision1/Project</category>
      <author>임승택</author>
      <guid isPermaLink="true">https://c0mputermaster.tistory.com/108</guid>
      <comments>https://c0mputermaster.tistory.com/108#entry108comment</comments>
      <pubDate>Tue, 12 Aug 2025 09:06:28 +0900</pubDate>
    </item>
  </channel>
</rss>