Published on

๐Ÿ›  | 3D Pose-Based Temporal Action Segmentation for Figure Skating

Review Today

๋‹จ์•ˆ ์นด๋ฉ”๋ผ๋กœ ์ดฌ์˜๋œ ๋น„๋””์˜ค๋งŒ์œผ๋กœ ํ”ผ๊ฒจ ์Šค์ผ€์ดํŒ… ๋™์ž‘์„ ์ •ํ™•ํžˆ ํŒ๋ณ„ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ๋Ÿฌ ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค. ํŠนํžˆ ํ”ผ๊ฒจ ์Šค์ผ€์ดํŒ…์˜ ๋ณต์žกํ•œ ํšŒ์ „ ๋™์ž‘๊ณผ ์ ํ”„์˜ 3์ฐจ์›์  ํŠน์„ฑ์„ 2D ์˜์ƒ๋งŒ์œผ๋กœ๋Š” ์™„์ „ํžˆ ๋ถ„์„ํ•˜๊ธฐ ์–ด๋ ต๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ๊นŠ์ด ์ •๋ณด์˜ ๋ถ€์กฑ์œผ๋กœ ์ธํ•ด ์ •ํ™•ํ•œ ๋™์ž‘ ๋ถ„๋ฅ˜์— ํ•œ๊ณ„๋ฅผ ๋ณด์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด 3์ฐจ์› ์ž์„ธ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ฑ„ํƒํ•˜์˜€๋‹ค. 2D ์ž์„ธ ์ถ”์ • ๊ฒฐ๊ณผ๋ฅผ 3D๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ์‹œ๊ณ„์—ด ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ ํ”„ ๋™์ž‘์„ ์ •ํ™•ํžˆ ๋ถ„์„ํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹จ์•ˆ ์นด๋ฉ”๋ผ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๋ฉด์„œ๋„ ์‹ค์šฉ์ ์ธ ์‹œ์Šคํ…œ ๊ตฌ์ถ•์„ ๋ชฉํ‘œ๋กœ ํ•˜์˜€๋‹ค.

์˜ค๋Š˜์€ ํ˜„์žฌ ํ”„๋กœ์ ํŠธ์— ์ฐธ๊ณ ๋œ ๋…ผ๋ฌธ์„ ๊ฐ„๋žตํ•˜๊ฒŒ ์ •๋ฆฌํ•˜๊ณ  ์‚ฌ์šฉ๋œ ์ฃผ์š” ๋ชจ๋ธ์ธ MotionAGFormer์— ๋Œ€ํ•ด์„œ ๋” ๋ณด์ถฉ ์„ค๋ช…์„ ํ•œ๋‹ค.

Proposed method

overview

๋ณธ ๋…ผ๋ฌธ์˜ contribution์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. (๋…ผ๋ฌธ์—์„œ๋Š” 3๊ฐœ๋กœ ๋‚˜๋‰˜์—ˆ์ง€๋งŒ ์‚ฌ์‹ค์ƒ ์ด ๋‘ ๊ฐ€์ง€๋กœ ์š”์•ฝํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.)

  1. ํ”ผ๊ฒจ์Šค์ผ€์ดํŒ… ์ ํ”„ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ
  2. Pose Estimation๊ณผ Temporal Action Segmentation๋กœ์„œ ๋ณธ ๋ฐ์ดํ„ฐ์…‹์€ ์œ ์˜๋ฏธํ•˜๋‹ค.

Figure Skating Jump Dataset

์ด ํ”„๋กœ์ ํŠธ์—์„œ ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹์€ 12๊ฐœ ์นด๋ฉ”๋ผ๋กœ ๋™์‹œ ์ดฌ์˜๋œ ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ์™€ ํ•จ๊ป˜ 3D ์ขŒํ‘œ ๋ฐ์ดํ„ฐ(c3d), ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ(json)๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

โ”œโ”€c3d
โ”‚  โ”œโ”€Skater_A
โ”‚  โ”‚  โ”œโ”€Axel, Comb, Flip, Loop, Lutz, Salchow, Toeloop
โ”œโ”€json
โ”‚  โ”œโ”€Skater_A (๋™์ผ ๊ตฌ์กฐ, 2D ๊ธฐ๋ฐ˜)
โ”œโ”€skater_A
โ”‚  โ”œโ”€cam_1 ~ cam_12 (์ดฌ์˜ ์˜์ƒ)

์ ํ”„ ์ข…๋ฅ˜ ๋ถ„๋ฅ˜: ์•ก์…€, ํ”Œ๋ฆฝ, ๋ฃน, ๋Ÿฌ์ธ , ์‚ด์ฝ”, ํ† ๋ฃน, ๊ทธ๋ฆฌ๊ณ  ์ฝค๋น„๋„ค์ด์…˜๊นŒ์ง€ ์ด 7๊ฐ€์ง€ ์ ํ”„ ์œ ํ˜•์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๊ฐ ์ ํ”„๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋„์•ฝ ๋ฐฉ์‹๊ณผ ํšŒ์ „ ํŠน์„ฑ์„ ๊ฐ€์ง€๋ฏ€๋กœ, ์ด๋“ค์„ ์ •ํ™•ํžˆ ๊ตฌ๋ถ„ํ•˜๋Š” ๊ฒƒ์ด ์ด ํ”„๋กœ์ ํŠธ์˜ ํ•ต์‹ฌ ๊ณผ์ œ์˜€๋‹ค.

Figure Skating Jump TAS with 3D Poses

์ €์ž์˜ ์ƒ๊ฐ๊ณผ ๋™์˜ํ•˜๋Š” ๋ถ€๋ถ„์œผ๋กœ, ๋‹จ์•ˆ์นด๋ฉ”๋ผ๋กœ ์ดฌ์˜๋œ ๋น„๋””์˜ค๋กœ ํ”ผ๊ฒจ์Šค์ผ€์ดํŒ… ๋™์ž‘์„ ๊ด€๋ จํ•˜์—ฌ ํŒ๋ณ„ํ•˜๋Š” ๊ฒƒ์€ ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ 3์ฐจ์› ๋ฐ์ดํ„ฐ๋Š” ํŒ๋‹จ์— ์žˆ์–ด์„œ ์œ ์˜๋ฏธํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ๊ฒƒ์ด๋ผ๊ณ  ๊ธฐ๋Œ€ํ•˜๊ณ  ์žˆ๋‹ค. Estimation์˜ ์ „๋ฐ˜์ ์ธ ๊ตฌ์กฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

DWpose โ†’ MotionAGFormer โ†’ Frame Action Cross Attention

Estimation ํ•œํ•ด์„œ input output์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

MethodInputOuput
DWposeMonocular RGB Video(nframes, 17 joints, pixel coord x, pixel coord y,confidence score)
MotionAGFormer(nframes, 17 joints, pixel coord x, pixel coord y,confidence score)(nframes, 17 joints, xyz world coord)
Frame Action Cross Attentionaligned 3d flatten pose, dummy labelssegments

์—ฌ๊ธฐ์„œ ๋ณ„๋„์˜ ํ›ˆ๋ จ์„ ๊ฑฐ์นœ ๋ชจ๋ธ์€ MotionAGFormer์™€ Frame Action Cross Attention์ด๋‹ค.

MotionAGFormer

MotionAGFormer์˜ ๊ตฌ์กฐ์˜ ์ „๋ฐ˜์ ์ธ ๊ตฌ์กฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ์ธ์ฒด ๋™์ž‘์˜ ๊ณต๊ฐ„์  ํŠน์„ฑ๊ณผ ์‹œ๊ฐ„์  ํŠน์„ฑ์„ ๋™์‹œ์— ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ „์ฒด ๊ตฌ์กฐ๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐœ์˜ ๋ณ‘๋ ฌ ๋ธŒ๋žœ์น˜๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. MotionAGFormer

์ด๋ฏธ์ง€๋Š” AGFormer์™€ ๊ทธ ํ๋ฆ„์„ ๋ณด์—ฌ์ฃผ๋Š”๋ฐ, AGFormer ์† ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. (b)๋Š” spatial ์ฆ‰, ๊ด€์ ˆ์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ๋‹ด๋‹นํ•˜๋Š” spatial metaformer, ๊ด€์ ˆ๋“ค ๊ฐ„์˜ ๊ณต๊ฐ„์  ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ์—ญํ• ์„ ๋‹ด๋‹นํ•œ๋‹ค. ๊ฐ ๊ด€์ ˆ์ด ์ธ์ ‘ํ•œ ๊ด€์ ˆ๋“ค๊ณผ ์–ด๋–ค ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๋Š”์ง€ ํ•™์Šตํ•˜์—ฌ, ์ž์—ฐ์Šค๋Ÿฌ์šด ์ธ์ฒด ๋™์ž‘์˜ ์ œ์•ฝ์กฐ๊ฑด์„ ๋ชจ๋ธ์— ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. (c)๋Š” trajectory ์ •๋ณด๋ฅผ ๋‹ด๋‹นํ•˜๋Š” temporal metaformer๊ตฌ์กฐ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ๊ณผ๊ฑฐ์™€ ํ˜„์žฌ, ๋ฏธ๋ž˜์˜ ํ”„๋ ˆ์ž„๋“ค ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•œ๋‹ค.

์ด๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ Metaformer ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•ด์•ผํ•˜๋Š”๋ฐ,

TokenMixer๋ผ๊ณ  ์ ํ˜€์žˆ์ง€๋งŒ, ๋ณธ ๋…ผ๋ฌธ์˜ ์ฝ”๋“œ๋ฅผ ์Šคํ‚ค๋ฐํ•ด๋ดค์„ ๋•Œ ์ € ๋ถ€๋ถ„์ด Attention์ด ๋“ค์–ด๊ฐ€๋ƒ ์•„๋‹˜ GCN์ด ๋“ค์–ด๊ฐ€๋ƒ์˜ ์ฐจ์ด๋ฐ–์— ์—†์–ด๋ณด์ธ๋‹ค. ์ € ๋‘ ๊ฐœ๋ฅผ ์„ž๋Š” ๊ฒƒ๋„ ์•„๋‹ˆ๊ณ  ๋…ผ๋ฌธ์—์„œ๋„ parallel module์ด๋ผ๊ณ  ํ‘œํ˜„ํ•˜๊ณ  ์žˆ๋‹ค.

GCN ๊ณ„์‚ฐ ๊ณผ์ •์—์„œ ์ƒ๋‹นํ•œ ํ–‰๋ ฌ ์—ฐ์‚ฐ์ด ํ•„์š”ํ•˜๋ฏ€๋กœ, ์ด ๋ถ€๋ถ„์ด ์ „์ฒด ํ›ˆ๋ จ ๊ณผ์ •์—์„œ ์ฃผ์š” ๋ณ‘๋ชฉ ์ง€์ ์ด ๋  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ์‹ค์ œ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ์‹œ์—๋Š” ์ด ๋ถ€๋ถ„์˜ ์ตœ์ ํ™”๊ฐ€ ์ค‘์š”ํ•œ ๊ณ ๋ ค์‚ฌํ•ญ์ด๋‹ค.

Data preprocessing

Finetuningํ•œ ๋ชจ๋ธ์€ ๋‘ ๊ฐ€์ง€๋กœ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

MotionAGFormer

์•„๋ž˜ ๋ฐ์ดํ„ฐ์…‹์€ ์ด๋ฏธ MotionAGFormer์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ๊ฐ€ ์ค€๋น„๋˜์–ด์žˆ๋‹ค.

๋งŒ์•ฝ Custom์œผ๋กœ ์ค€๋น„ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜ ์ •๋ณด๊ฐ€ ๋“ค์–ด๊ฐ€์•ผํ•œ๋‹ค.

dict_keys(['joint_2d', 'confidence', 'joint3d_image', 'joints_2.5d_image', '2.5d_factor', 'camera_name', 'action', 'source', 'frame', 'world_3d', 'cam_3d', 'cam_param'])

์—ฌ๊ธฐ์„œ 'joint3d_image'๋Š” ์ด๋ฏธ์ง€ ์ขŒํ‘œ๊ณ„ (world โ†’ camera โ†’ image)๋กœ ๋ณ€ํ™˜๋œ ๊ฐ’์œผ๋กœ z๊ฐ’์€ depth ๊ฐ’์ด๋‹ค. ์›๊ทผ๊ฐ์„ ์œ„ํ•ด ์ ์šฉ๋œ ๊ฐ’์ด๋‹ค. ๋”ฐ๋ผ์„œ: joint3d_image * 2.5d_factor = joint_2.5d_image์™€ ๊ฐ™๋‹ค.

2d์™€ confidence๋Š” 2d pose estimator์„ ํ†ตํ•ด์„œ, 3d๋Š” world frame ๊ธฐ์ค€ 3d pose estimator๋กœ ์–ป๊ณ , camera์˜ intrinsic, extrinsic๊ฐ’์ด ์žˆ๋‹ค๋ฉด ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“ค๊ธฐ๊นŒ์ง€๋Š” ํฌ๊ฒŒ ์–ด๋ ต์ง€ ์•Š๋‹ค. ๋ฌธ์ œ๋Š” 2.5d factor์„ ์–ด๋–ป๊ฒŒ ์–ป๋А๋ƒ๋Š”๊ฑด๋ฐ,

2.5d_factor
lambda = (box[2] - box[0] + 1) / rectangle_3d_size

์—ฌ๊ธฐ์„œ lambda๋Š” world ์ขŒํ‘œ๊ณ„์—์„œ image ์ขŒํ‘œ๊ณ„๋กœ์˜ ๋ณ€ํ™˜ ๋น„์œจ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋ฐ˜๋Œ€๋กœ 2.5d factor๋Š” image ์ขŒํ‘œ๊ณ„์—์„œ world ์ขŒํ‘œ๊ณ„๋กœ์˜ ์—ญ๋ณ€ํ™˜์„ ์œ„ํ•œ ํŒฉํ„ฐ์ด๋ฏ€๋กœ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ด€๊ณ„๊ฐ€ ์„ฑ๋ฆฝํ•œ๋‹ค:

2.5d_factor = 1/lambda

joint3d_image๋Š” 3D ์›”๋“œ ์ขŒํ‘œ๋ฅผ ์นด๋ฉ”๋ผ ์ขŒํ‘œ๊ณ„๋ฅผ ๊ฑฐ์ณ ์ด๋ฏธ์ง€ ์ขŒํ‘œ๊ณ„๋กœ ๋ณ€ํ™˜ํ•œ ๊ฒฐ๊ณผ์ด๋‹ค. ์ด๋•Œ z๊ฐ’์€ ์‹ค์ œ ๊นŠ์ด ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์œผ๋ฉฐ, ์›๊ทผ๊ฐ์„ ์ •ํ™•ํžˆ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณ„์‚ฐ์ด ์ˆ˜ํ–‰๋œ๋‹ค:

joint3d_image * 2.5d_factor = joint_2.5d_image

Frame Action Cross Attention

pose alignment

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ์—์„œ๋Š” 17๊ฐœ ๊ด€์ ˆ์˜ ์ •๋ ฌ๋œ local pose ์ •๋ณด์™€ ํ•จ๊ป˜ ๊ด€๋ จ ์˜ค์ผ๋Ÿฌ ๊ฐ๋„๋ฅผ ์ถ”๊ฐ€ํ•œ ํ˜•ํƒœ๋ฅผ ํŠน์„ฑ์œผ๋กœ ํ™œ์šฉํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ด€์ ˆ์˜ ์œ„์น˜๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ฐฉํ–ฅ ์ •๋ณด๋„ ํ•จ๊ป˜ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์–ด, ๋ณด๋‹ค ์ •ํ™•ํ•œ ๋™์ž‘ ๋ถ„์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

๋ฐ์ดํ„ฐ์…‹์„ ํ™•์ธํ•ด๋ณธ ๊ฒฐ๊ณผ ์•„๋ž˜์™€ ๊ฐ™์•˜๋Š”๋ฐ,

corrupted

์ด์ „ 3d๋กœ ์ถ”์ถœํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ทธ๋‹ฅ ์ฉ ์ข‹์ง€ ์•Š์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.. ์ด ์ƒํƒœ๋กœ TAS๋ฅผ ์ง„ํ–‰ํ•œ ๊ฒƒ ๊ฐ™๋‹ค.

Results

MotionAGFormer์™€ Frame Action Cross Attention์˜ ๊ฐ ์ˆœ์„œ๋Œ€๋กœ์˜ evaluation์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

MotionAGFormer Error
Protocol #1 Error (MPJPE): 68.64220575394518 mm
Acceleration error: 0.8875901212590517 mm/s^2
Protocol #2 Error (P-MPJPE): 10.61481703004164 mm
MetricValue
Edit88.12912075922682
AccB97.32553214576997
Acc97.32553214576997
F1@0.1088.6706877013139
F1@0.2588.46973316829319
F1@0.5087.56543776970041

์ƒ๊ฐ๋ณด๋‹ค ์ธ์‹์— ์žˆ์–ด์„œ ์‹ค๋ง์Šค๋Ÿฐ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. Annotation feature ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ด ์†์ƒ๋œ ๊ฒƒ์„ ํ™•์ธํ•ด์„œ.. ๊ทธ๊ฒƒ์„ ๊ฐ์•ˆํ•˜๊ณ  ์ธ์‹ํ•˜๋Š” ๊ฒƒ ๊ฐ™์•„๋ณด์ธ๋‹ค. ์˜์ƒ์€ ๋Ÿฟ์ธ ์ด๋‚˜ ๋ฃน์œผ๋กœ ์ธ์‹ํ•˜๋Š” ์˜ค๋ฅ˜๋ฅผ ๋ฒ”ํ•˜๊ณ  ์žˆ๋‹ค..

Wrapup & Conclusion

์ด์ „ ํฌ์ŠคํŠธ์—์„œ๋„ ์–ธ๊ธ‰์ด ๋˜์—ˆ๋“ฏ, ์ผ๋ณธ์—์„œ๋Š” ์ด์™€ ๊ด€๋ จํ•ด์„œ ์—ฐ๊ตฌ๊ฐ€ ์ƒ๋‹นํžˆ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™์€๋ฐ, ๊ธฐ์กด์— ์ฒด์กฐ ์ชฝ์—์„œ ์“ฐ์ด๋˜ AI ์ฑ„์  ๊ธฐ์ˆ ์„ ๋ฐ”ํƒ•์œผ๋กœ ์„ ์ˆ˜๋“ค ํ›ˆ๋ จ ๋ถ„์„์šฉ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ  ์žˆ๋Š” ์ค‘์ด๋‹ค.

๊ฒฐ๊ณผ๋Š” ๋น„๋ก ์‹คํŒจ์— ๊ฐ€๊นŒ์šด ๋…ผ๋ฌธ ๊ตฌํ˜„์ด์—ˆ์ง€๋งŒ ์ถฉ๋ถ„ํžˆ ๊ฐœ์„ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๋ณธ๋‹ค.

  • foot feature ์ถ”๊ฐ€: ๋„์•ฝ๊ณผ ์ฐฉ์ง€๊ฐ€ ์ค‘์š”ํ•œ ํŒ์ • ๊ธฐ์ค€์œผ๋กœ์จ Annotation์— ์ถ”๊ฐ€์ ์ธ ๋ถ€๋ถ„๋“ค์ด ํ•„์š”ํ•˜๋‹ค. ์ด ๋ถ€๋ถ„์€ ์ด์ „ ํ”„๋ ˆ์ž„๊ณผ ํ˜„์žฌ ํ”„๋ ˆ์ž„๊ฐ„์˜ ์†๋„๋กœ ๊ณ„์‚ฐํ•ด์„œ ๋“ค์–ด๊ฐ€๋ฉด ๋ฐœ ์ž์ฒด๋ฅผ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์•„๋„ ๋  ๊ฒƒ ๊ฐ™์•„ ๋ณด์ธ๋‹ค.
  • ์ถ”๊ฐ€์ ์ธ Annotation ๋ฐ์ดํ„ฐ ํ•„์š”: ์•„์ง ์ฐฉ์ง€์™€ ๋„์•ฝ์— ๊ด€ํ•œ ๋ฐ์ดํ„ฐ์…‹์€ ํ˜„์ €ํžˆ ๋ถ€์กฑํ•˜๊ณ  ์ฃผ๊ด€์ ์ธ ์š”์†Œ๋“ค์ด ๋“ค์–ด๊ฐ„๋‹ค. ์ด์™€ ๊ด€๋ จํ•ด์„œ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ํ•„์š”ํ•ด๋ณด์ธ๋‹ค.

๋˜ํ•œ ๋…ผ๋ฌธ์—์„œ๋Š” DWposeEstimator๋ฅผ ์‚ฌ์šฉํ•˜์˜€์ง€๋งŒ.. coco ํฌ๋งท์œผ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™€์„œ ๋ณ€ํ™˜ํ•ด์„œ h36m ํฌ๋งท์„ estimate์„ ํ–ˆ์—ˆ๋‹ค. ์ด๋ฏธ ๋‚ด๋ถ€์— hrnet์ด๋ผ๊ณ  2d pose estimator๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ, ์†๋„๋ฉด์—์„œ hrnet์ด estimateํ•˜๋Š”๋ฐ ๋” ๋นจ๋ž์–ด์„œ dwpose๋Š” fallback์šฉ์œผ๋กœ ๋„ฃ์–ด๋‘๊ธฐ๋Š” ํ•ด๋†“์•˜๋‹ค. ์•„, ๊ทธ๋ฆฌ๊ณ .. coco ํฌ๋งท์ด ๋‘ ๊ฐ€์ง€(?)๋‚˜ ์žˆ๋‹ค๋Š” ๊ฑธ ์ด๋ฒˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋œฏ์–ด๋ณด๋ฉด์„œ ์•Œ๊ฒŒ๋˜์—ˆ๋‹ค. ๋‚˜๋ฆ„ ์–ป์–ด๊ฐ€๋Š” ๋ถ€๋ถ„์ด ๋งŽ์•˜๋˜ ํ”„๋กœ์ ํŠธ์ธ ๊ฒƒ ๊ฐ™๋‹ค.


Reference

Authors