๐Ÿง  EfficientNet: ์ •ํ™•๋„์™€ ์—ฐ์‚ฐ ํšจ์œจ์˜ ๊ท ํ˜•์„ ๋งž์ถ˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ

๐Ÿ“Œ ๊ฐœ์š”

EfficientNet์€ 2019๋…„ Google Brain์—์„œ ๋ฐœํ‘œ๋œ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ๋กœ,
โ€œ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‚ด๋Š” ๋ชจ๋ธโ€์„ ๋ชฉํ‘œ๋กœ ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽฏ ํ•ต์‹ฌ ์•„์ด๋””์–ด: ๋‹จ์ˆœํžˆ ๋„คํŠธ์›Œํฌ ๊นŠ์ด(depth)๋‚˜ ๋„ˆ๋น„(width), ํ•ด์ƒ๋„(resolution)๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ,
**์„ธ ๊ฐ€์ง€๋ฅผ ๊ท ํ˜• ์žˆ๊ฒŒ ํ™•์žฅ(compound scaling)**ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ •ํ™•๋„์™€ ์—ฐ์‚ฐ๋Ÿ‰ ์‚ฌ์ด์˜ trade-off๋ฅผ ์ตœ์ ํ™”ํ•œ๋‹ค.


๐Ÿ” EfficientNet์˜ ํŠน์ง• ์š”์•ฝ

ํ•ญ๋ชฉ์„ค๋ช…
๋ชจ๋ธ ์ด๋ฆ„EfficientNet (B0 ~ B7)
๋ฐœํ‘œ์—ฐ๋„2019
์ œ์•ˆ ๊ธฐ๊ด€Google Brain
๊ธฐ๋ฐ˜ ๋…ผ๋ฌธEfficientNet: Rethinking Model Scaling for CNNs
์ฃผ์š” ๊ฐœ๋…Compound Scaling, MBConv, AutoML
์žฅ์ ๋†’์€ ์ •ํ™•๋„, ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ, ์—ฐ์‚ฐ ํšจ์œจ

๐Ÿ”ง ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ

1. ๐Ÿงฑ MBConv (Mobile Inverted Bottleneck Convolution)

  • MobileNetV2์—์„œ ๋„์ž…๋œ ๊ฒฝ๋Ÿ‰ ์ปจ๋ณผ๋ฃจ์…˜ ๋ธ”๋ก
  • Depthwise Separable Convolution ๊ธฐ๋ฐ˜ โ†’ ์—ฐ์‚ฐ๋Ÿ‰ ์ ˆ๊ฐ
  • Inverted residual ๊ตฌ์กฐ + SE(Squeeze-and-Excitation) ๋ธ”๋ก ๋‚ด์žฅ

2. โš™๏ธ Compound Scaling

EfficientNet์€ ๋‹จ์ผ ์š”์ธ์„ ํ‚ค์šฐ๋Š” ๋Œ€์‹ , ์„ธ ๊ฐ€์ง€๋ฅผ ํ•จ๊ป˜ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค:

์š”์†Œ์˜๋ฏธํšจ๊ณผ
depth (d)๋„คํŠธ์›Œํฌ ๊นŠ์ด๋” ๋ณต์žกํ•œ ํ‘œํ˜„ ํ•™์Šต
width (w)์ฑ„๋„ ์ˆ˜๋” ๋งŽ์€ ์ •๋ณด ์ฒ˜๋ฆฌ
resolution (r)์ž…๋ ฅ ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„๋” ์ •๋ฐ€ํ•œ ํŠน์ง• ๊ฐ์ง€

๐Ÿ“ ์ˆ˜์‹:

  • depth: d = ฮฑ^ฯ•
  • width: w = ฮฒ^ฯ•
  • resolution: r = ฮณ^ฯ•
    ๋‹จ, ฮฑ * ฮฒยฒ * ฮณยฒ โ‰ˆ 2 โ†’ ์—ฐ์‚ฐ๋Ÿ‰ 2๋ฐฐ ์ฆ๊ฐ€๋กœ ์ •ํ™•๋„ ๊ทน๋Œ€ํ™”

๐Ÿ“Š EfficientNet ๋ชจ๋ธ๊ตฐ ๋น„๊ต

๋ชจ๋ธTop-1 Accuracy (ImageNet)ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜FLOPs
B077.1%5.3M0.39B
B179.1%7.8M0.7B
B381.6%12M1.8B
B583.6%30M9.9B
B784.3%66M37B

B0์€ ๊ธฐ๋ณธ ๊ตฌ์กฐ, ์ดํ›„ ๋ฒ„์ „์€ Compound Scaling์œผ๋กœ ํ™•์žฅ
B7์€ ImageNet ๊ธฐ์ค€ ๊ฐ€์žฅ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•œ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜


๐Ÿš€ ์‹ค์Šต ์˜ˆ์ œ (PyTorch)

python

from torchvision.models import efficientnet_b0
import torch

model = efficientnet_b0(pretrained=True)
model.eval()

# ์˜ˆ์‹œ ์ž…๋ ฅ
dummy_input = torch.randn(1, 3, 224, 224)
output = model(dummy_input)
  • PyTorch ๋ฐ torchvision.models์—์„œ B0~B7 ๋ชจ๋ธ ์ง€์›
  • Keras์—์„œ๋Š” tf.keras.applications.EfficientNetB0~B7๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

๐Ÿ’ก EfficientNet์˜ ์žฅ์ 

์žฅ์ ์„ค๋ช…
โœ… ์—ฐ์‚ฐ ํšจ์œจ์ ์€ FLOPs๋กœ ๋†’์€ ์ •ํ™•๋„ ํ™•๋ณด
โœ… ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ๋ชจ๋ฐ”์ผ/Edge ํ™˜๊ฒฝ์—์„œ ์ ํ•ฉ (ํŠนํžˆ B0~B2)
โœ… ํ™•์žฅ ๊ฐ€๋ŠฅCompound Scaling์œผ๋กœ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ์— ๋งž๊ฒŒ ์กฐ์ •
โœ… SOTA ๊ธฐ๋ก๊ณต๊ฐœ ๋‹น์‹œ ImageNet ์ •ํ™•๋„ 1์œ„ ๋‹ฌ์„ฑ

โš ๏ธ ๋‹จ์  ๋ฐ ํ•œ๊ณ„

ํ•ญ๋ชฉ์„ค๋ช…
โ— ํ•™์Šต ์‹œ๊ฐ„Compound ๊ตฌ์กฐ๋กœ ์ธํ•ด ์ดˆ๊ธฐ ํ•™์Šต ๋น„์šฉ ๋†’์Œ
โ— ๊ตฌ์กฐ ๋ณต์žก์„ฑ๊ตฌ์กฐ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•์ด ์–ด๋ ต๊ณ , ์ดํ•ด๊ฐ€ ๋‹ค์†Œ ๋ณต์žก
โ— ์ตœ์‹  ๋ชจ๋ธ๊ณผ์˜ ๊ฒฝ์ŸConvNeXt, ViT ๋“ฑ์˜ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ๋“ฑ์žฅ ์ดํ›„์—” ์šฐ์œ„ ์ถ•์†Œ

๐Ÿงช ํ™œ์šฉ ์‚ฌ๋ก€

๋ถ„์•ผํ™œ์šฉ ์˜ˆ์‹œ
์˜๋ฃŒ ์˜์ƒ ๋ถ„์„ํ CT/์‹ฌ์žฅ ์ดˆ์ŒํŒŒ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜
์ž๋™ ํ’ˆ์งˆ ๊ฒ€์‚ฌ์ œ์กฐ ๊ณต์ •์—์„œ ๊ฒฐํ•จ ํƒ์ง€
๋ฆฌํ…Œ์ผ ๋ถ„์„์ƒํ’ˆ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ์ธ๋ฒคํ† ๋ฆฌ ์ถ”์ 
๋ชจ๋ฐ”์ผ์•ฑ์‚ฌ์ง„ ํƒœ๊ทธ ์ž๋™ ๋ถ„๋ฅ˜, AR ํ•„ํ„ฐ ์ ์šฉ

๐Ÿงฌ ํŒŒ์ƒ ๋ชจ๋ธ: EfficientNetV2

  • 2021๋…„ Google์—์„œ ๋ฐœํ‘œํ•œ ํ›„์† ๋ชจ๋ธ
  • Fused-MBConv ๋ธ”๋ก ๋„์ž…
  • ๋” ๋น ๋ฅธ ํ•™์Šต, ์ ์€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ
  • EfficientNetV2-S, M, L, XL ๋ชจ๋ธ ๊ตฌ์„ฑ

โœ… ๊ฒฐ๋ก 

EfficientNet์€ “์ตœ์ ์˜ ๊ท ํ˜•”์„ ์ถ”๊ตฌํ•˜๋Š” ๋น„์ „ ๋ชจ๋ธ์˜ ๋Œ€ํ‘œ์ฃผ์ž์ž…๋‹ˆ๋‹ค.

  • ๋†’์€ ์ •ํ™•๋„์™€ ํšจ์œจ์„ ๋™์‹œ์— ์ถ”๊ตฌํ•˜๋Š” ํ”„๋กœ์ ํŠธ์— ๋งค์šฐ ์ ํ•ฉ
  • ์—ฐ๊ตฌ ๋ฐ ์‹ค๋ฌด, ๋ชจ๋ฐ”์ผ๊ณผ ์„œ๋ฒ„ ํ™˜๊ฒฝ ๋ชจ๋‘ ๋Œ€์‘ ๊ฐ€๋Šฅ
  • ์ตœ์‹  EfficientNetV2, MobileNetV3, ConvNeXt ๋“ฑ๊ณผ ๋น„๊ต๋„ ๊ณ ๋ คํ•˜๋ฉด ์ข‹์Œ

๐Ÿ“Œ โ€œ์ ์€ ๋น„์šฉ์œผ๋กœ ๋†’์€ ์„ฑ๋Šฅ์„ ์›ํ•œ๋‹ค๋ฉด, EfficientNet์€ ์—ฌ์ „ํžˆ ํ›Œ๋ฅญํ•œ ์„ ํƒ์ž…๋‹ˆ๋‹ค.โ€