Published on

Leverage: MFS


Mobile Face Swap

๐Ÿ“ Objective: Real time face swap โ†’ 30fps
  • elevate synthesis frame speed by multiprocessing in video frame loop
  • add upscale module โ†’ FHD super resolution result

Experiment

์‹ค์‹œ๊ฐ„ ํ•ฉ์„ฑ fps๋ฅผ ๋ชฉํ‘œ๋กœ ๋ชจ๋ธ ๋ถ„์„ ๋ณด๋‹ค๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์— focusํ•จ. ์ดํ›„ SOTA upscale module์„ ๋ถ™์—ฌ์„œ ๊ณ ํ’ˆ์งˆ ์˜์ƒ ํ•ฉ์„ฑ์„ ๊ฒฐ๊ณผ๋ฌผ ๋‚ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•จ.

Configuration

์‹คํ—˜์€ ์ž์ฒด ์„œ๋ฒ„ wswg3์—์„œ ์ง„ํ–‰.

A/B test: multi-process vs multi-thread

๋ณ‘๋ ฌ๋กœ ์ž‘์—…์„ ์ฒ˜๋ฆฌํ•œ๋‹ค๋Š” ๊ณตํ†ต์ ์ด ์žˆ์œผ๋‚˜ ๋ฉ”๋ชจ๋ฆฌ ๊ณต์œ ์— ์žˆ์–ด์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์–‘์ƒ์„ ๋ณด์ž„.
  • ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์Šค๋กœ ์ž‘์—…ํ•  ๊ฒฝ์šฐ ๋…๋ฆฝ์ ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ด€๋ฆฌํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ์ฒ˜๋ฆฌ๊ฐ€ ๋œ๋‹ค๋Š” ์ . ํ•˜์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ ๋‚ด context switching ์ž์ฃผ ์ผ์–ด๋‚˜์„œ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ ์„ฑ๋Šฅ์ €ํ•˜๊ฐ€ ์ผ์–ด๋‚  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋‹จ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.
  • ๋ฐ˜๋ฉด ๋ฉ€ํ‹ฐ์“ฐ๋ ˆ๋“œ ์ž์›์„ ๊ณต์œ ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์ž์› ์†Œ๋ชจ๊ฐ€ ๊ฐ์†Œํ•˜๊ณ  ๊ณต์œ ๋œ ๋ฉ”๋ชจ๋ฆฌ๋‹ค ๋ณด๋‹ˆ context switching ๋น ๋ฅด๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ฐ๋“œ๋ฝ์ด๋‚˜ ์“ฐ๋ ˆ๋“œ ํ•˜๋‚˜ ๋ฌธ์ œ๊ฐ€ ์ „์ฒด ํ”„๋กœ์„ธ์Šค๊ฐ€ ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค๋Š” ๋‹จ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.
์ด์— ๋”ฐ๋ผ ๋‘ ๊ฐ€์ง€ ์‹คํ—˜์„ ํ•˜๋ฉด์„œ ์•ž์œผ๋กœ upscale module์„ ๋ถ™์˜€์„ ๋•Œ ๋” ์„ฑ๋Šฅ์ด ๋‚˜์€ ๊ฒฝ์šฐ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” 2์ฐจ๋…„๋„์—์„œ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ๋“ค ๊ธฐ์ค€์œผ๋กœ ์ง„ํ–‰.
  • source image: koo.png
  • target video: test.mp4

Single process

๊ธฐ์กด ์ฝ”๋“œ์—์„œ gpu 4๊ฐœ ํ™œ์„ฑํ™” ์ฝ”๋“œ ์ถ”๊ฐ€ ๋ฐ ๋ฆฌํŒฉํ† ๋ง์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ํฌ๊ฒŒ ์„ฑ๋Šฅ์„ ๋ฒ—์–ด๋‚˜๊ฒŒ ๊ธฐ๋Šฅ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š๋Š” ์ƒํƒœ์—์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€์„ ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋‹ค.
Single process: video demo test
Single process: video demo test

Multi process

๊ฐ ํ”„๋กœ์„ธ์Šค ๋‹น frame read โ†’ frame process โ†’ frame write ๋‹จ๊ณ„๋กœ ์ด 3๋‹จ๊ณ„๋กœ ๋‚˜๋‰˜์—ˆ๋‹ค.
์ธ๋ฑ์Šค๊ฐ€ ์ ํžŒ worker๋Š” ๋ชจ๋ธ์„ loadํ•˜๊ณ  ๊ทธ๋งŒํผ ์ €์žฅ๊ณต๊ฐ„์„ ํ™•๋ณดํ•˜๋ฏ€๋กœ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค์— ๋น„ํ•ด ์ฐจ์ง€ํ•˜๋Š” ์šฉ๋Ÿ‰์ด ๋น„๊ต์  ๋†’์€ ํŽธ. ๋”ฐ๋ผ์„œ lightweightํ•œ process๋ผ๊ณ  ๋ณผ ์ˆ˜ ์—†๋‹ค. ์ด์ „ single process์— ๋น„ํ•ด์„œ๋Š” ์•ฝ 2๋ฐฐ ๊ฐœ์„ ์ด ๋˜์—ˆ๋‹ค. ํ•œ ํ”„๋กœ์„ธ์Šค์—์„œ loaded model์„ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ์•ˆ์œผ๋กœ multi thread๋กœ ์‹คํ—˜์„ ์ง„ํ–‰.
Multi process: video demo test
Multi process: video demo test

Multi thread

๋‹จ์ผ ํ”„๋กœ์„ธ์Šค์—์„œ ์ง„ํ–‰. ๊ธฐ์กด single process๋ณด๋‹ค ์•ฝ 4๋ฐฐ, multi process์—์„œ๋Š” 2๋ฐฐ ์ •๋„ ์†๋„๊ฐ€ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. single process์™€ ์šฉ๋Ÿ‰ ์ฐจ์ด๊ฐ€ ๋‚œ ์ด์œ ๋Š” ์•„๋ž˜ ๊ฐ ๊ธฐ๋Šฅ๋ณ„ ์“ฐ๋ ˆ๋“œ๋กœ ์ฒ˜๋ฆฌ์‹œ ๋ณ„๋„์˜ ๋Œ€๊ธฐ ํ๋ฅผ ์„ค์ •ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์œผ๋กœ ์ถ”์ •ํ•˜๊ณ  ์žˆ๋‹ค.
Multi thread: video demo test
Multi thread: video demo test

A/B test: High resolution module

Real-ESRGAN

๊ฐ„์ด ํ…Œ์ŠคํŠธ๋กœ ์ง„ํ–‰. 1~2fps ๋‹ฌ์„ฑ์œผ๋กœ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋ชจ๋“ˆ ์ฐพ๋Š” ๊ฒƒ์œผ๋กœ ์ง„ํ–‰
notion image

Fast-SRGAN

๊ธฐ์กด ๋ณด๋‹ค๋Š” 2๋ฐฐ ๋น ๋ฅด๋‚˜, ์—ฌ์ „ํžˆ 2~3fps์— ์›ƒ๋Œ๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธ
notion image

Bicubic interpolation

๋ชจ๋ธ ๋‘˜๋‹ค ๋А๋ ค์„œ ์•„๋ž˜ ๋ฒค์น˜๋งˆํฌ๋ฅผ ํ™•์ธ.
  • Video Upscalers Benchmark
  • ADSADSBicubic++: Slim, Slimmer, Slimmest -- Designing an Industry-Grade Super-Resolution Network
๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋งค์šฐ ๋А๋ ค์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐœ์„ ๋œ Bicubic++ ํ…Œ์ŠคํŠธ ์ง„ํ–‰
โœ… fps ๋‹ฌ์„ฑ ์ค€์ˆ˜ํ•œ ํ€„๋ฆฌํ‹ฐ ํ•ด์ƒ๋„ ์˜์ƒ ๊ฒฐ๊ณผ๋ฌผ ํ™•์ธ
Before upsample via bicubic++ (360p)
Before upsample via bicubic++ (360p)
ย 
After upsample via bicubic++ (1080p)
After upsample via bicubic++ (1080p)
notion image

Others or Issues

  • Ubuntu 22.04์—์„œ cv.Videowriter X โ†’ ๊ฒฐ๊ณผ๋ฌผ ๋น„๋””์˜ค๋ฅผ ๋ณ„๋„๋กœ google drive์— ์˜ฌ๋ ค์„œ ํ™•์ธํ•จ

Result

Face swap

๊ธฐ์กด ํ•ฉ์„ฑ ์†๋„์— ๋น„ํ•ด 4๋ฐฐ ํ–ฅ์ƒ
demo video result with improved compositon speed
demo video result with improved compositon speed

Super resolution

๊ธฐ์กด ๋Œ€๋น„ 25fps โ†’ 45fps ํ–ฅ์ƒ ๋ฐ sr ๋ชจ๋“ˆ ๋„์ž… ๊ณ ๋„ํ™” ์™„๋ฃŒ
notion image

Summary

  • DNN ๊ธฐ๋ฐ˜ ์–ผ๊ตด ํ•ฉ์„ฑ
    • โœ… ์ถ”๊ฐ€ ๊ฐœ๋ฐœ ์š”์†Œ 1): ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ โ†’ ํ•ฉ์„ฑ ์†๋„ ํ–ฅ์ƒ
    • โœ… ์ถ”๊ฐ€ ๊ฐœ๋ฐœ ์š”์†Œ 2): upscale ๋ชจ๋“ˆ ์ถ”๊ฐ€ โ†’ ํ•ฉ์„ฑ ๊ฒฐ๊ณผ FHD๋กœ super-resolution
๋ณธ ๊ณผ์ œ์—์„œ fps ํ–ฅ์ƒ์„ ์œ„ํ•ด multi process/thread๋ฅผ ๋„์ž…ํ•˜์˜€์œผ๋ฉฐ, context switching์ด ์—†๋‹ค๋Š” ์ ์—์„œ multi thread๊ฐ€ ๋” ๋น ๋ฅธ ๋ชจ์Šต์„ ๋ณด์—ฌ์คŒ.
๋˜ํ•œ ๊ณ ๋„ํ™”๋ฅผ ์œ„ํ•ด ์—ฌ๋Ÿฌ SR benchmark๋ฅผ ์‚ดํ•€ ๊ฒฐ๊ณผ fps๊ฐ€ ์ƒ๊ฐ๋ณด๋‹ค 20fps๋ฅผ ๋„˜๋Š” sr ๋ชจ๋“ˆ์ด ๋งŽ์ด ์—†์—ˆ์Œ. ๋”ฐ๋ผ์„œ bicubic interpolation implement ์ค‘์— bicubic++์™€ ๊ฐ™์ด ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ leverageํ•œ methodology๋ฅผ ๋ฐœ๊ฒฌํ•˜์˜€๊ณ , bicubic์ด 1ms๋กœ ์ฒ˜๋ฆฌํ•  ๋•Œ bicubic++๋Š” 2.9ms๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค๋Š” ์ ์—์„œ ๋‹ค๋ฅธ ๋ชจ๋ธ์— ๋น„ํ•ด์„œ ๋น ๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•จ. ์ด์‹ ํ›„ 480p โ†’ 1080p upscale ํ™•์ธ. ๋‹ค๋งŒ bicubic++ SR rate๊ฐ€ 3์œผ๋กœ ๊ณ ์ •์ ์ธ ๋ฌธ์ œ์ ์„ ์ง€๋‹ˆ๊ณ  ์žˆ๋Š”๋ฐ(๋ชจ๋ธ ์ž์ฒด ๋ฌธ์ œ) ์ด์™€ ๊ด€๋ จํ•ด์„œ๋Š” ์‹คํ—˜์€ ๋” ์ด์ƒ ์ง„ํ–‰ํ•˜์ง€ ์•Š์Œ.

Reference

  • Doodle-KyungchaeDoodle-KyungchaePython Multiprocessing์œผ๋กœ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ, ๋น„๋””์˜ค ์ฒ˜๋ฆฌ๋กœ ๋ง›๋ณด๊ธฐ
  • Video Upscalers Benchmark
ย 
ย 
Authors