STNet:羽毛球运动小目标定位跟踪网络
STNet: a small moving target localization and tracking network for badminton
- 2025年30卷第1期 页码:148-160
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.230800
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
钟亮, 欧巧凤, 沈敏杰, 熊邦书. STNet:羽毛球运动小目标定位跟踪网络[J]. 中国图象图形学报, 2025,30(1):148-160.
ZHONG LIANG, OU QIAOFENG, SHEN MINJIE, XIONG BANGSHU. STNet: a small moving target localization and tracking network for badminton. [J]. Journal of image and graphics, 2025, 30(1): 148-160.
目的
2
相比于一般视频目标检测跟踪任务,视频羽毛球的实时定位跟踪主要面临两大难点:1)羽毛球属于小目标,同时伴有严重的运动模糊以及相似目标的干扰,使用基于矩形框的目标检测跟踪方法准确率低且会带来中心点误差问题;2)单帧图像很难准确定位羽毛球目标,利用视频前后帧的时域特征则可以跟踪到羽毛球目标,而现有提取时域特征的网络模块结构复杂,难以满足实时性要求。针对以上问题,本文使用热力图轮廓检测方法,提出了羽毛球运动小目标的定位跟踪网络算法(shuttlecock track net,STNet)。
方法
2
网络主体采用“U”型编解码结构;针对小目标像素信息少的问题,基于SE(squeeze and excitation)通道注意力与残差结构设计高效特征提取模块(SE channel attention and residual,SECAR),实现了空域信息的高效提取与传递,提高了网络的定位性能;针对目标丢失与相似目标干扰问题,设计了时序网络(temporal network,TPN)结构用于提取和记忆视频时域特征,提高了网络跟踪性能。
结果
2
在羽毛球比赛公开数据集TrackNetv2与自制数据集上的实验表明,本文方法在多个指标上取得了最好的性能表现。相较于现有性能较好的羽毛球定位跟踪方法TrackNetv2,本文方法在准确率、精确率和F1上分别提高7.5%、15.7%和7.5%,并且显著降低了参数量,满足实时处理需求(54帧/s)。
结论
2
本文提出的STNet羽毛球定位跟踪网络,在面对羽毛球目标外观剧烈变化以及背景干扰严重时,能够准确定位羽毛球比赛视频帧中可能存在的羽毛球,实现羽毛球的稳定跟踪,相比其他羽毛球定位跟踪网络,具有更优的性能。
Objective
2
Badminton has become one of the most popular sports in the world today. Badminton match videos contain a wealth of useful information, and extracting this information can provide data support for precise analysis of sports. Among various types of information, badminton trajectory information is particularly important, as it can extract ball speed, athlete explosiveness, and shuttlecock trajectory data from the positioning information of the shuttlecock. Compared with general video object detection and tracking tasks, real-time localization tracking in badminton videos faces two major challenges: 1) Shuttlecocks are small targets with severe motion blur and interference from similar objects. The accuracy of object detection and tracking methods based on rectangular bounding boxes is low and can cause centroid estimation errors. 2) Locating the shuttlecock target accurately in a single-frame image is difficult, but its position can be tracked using temporal features from adjacent video frames. However, existing network modules for extracting temporal features are complex and struggle to meet real-time requirements. Early research on badminton localization and tracking mainly included threshold-based methods and feature learning-based methods, which performed well in specific scenarios. However, these methods focused only on target information during model construction, making it difficult to utilize background information. As a result, their localization and tracking performance suffer in complex environments. Currently, most deep learning-based object detection and tracking methods are based on rectangular bounding box algorithms. However, shuttlecocks exhibit severe motion blur in single-frame images, making it difficult for the rectangular box algorithm to accurately locate the shuttlecock target’s pixel coordinates. Subsequently, a localization method based on heatmap contour detection emerged, which overcame the limitations of rectangular bounding box methods by directly determining the pixel position of the shuttlecock’s head center by using segmentation and morphological detection methods. However, current approaches based on heatmap contour detection have weak capabilities in extracting spatiotemporal features for video object localization and tracking, making them inadequate for real-world requirements. To enhance the network’s ability to extract temporal and spatial features from video frames, we propose a new badminton localization tracking network algorithm named shuttlecock track net (STNet) based on heatmap contour detection.
Method
2
The backbone of the STNet network adopts a “U”-shaped encoder-decoder structure. To tackle the issue of limited pixel information in small targets, we design an efficient feature extraction module based on squeeze and excitation (SE) channel attention and residual structure called SE channel attention and residual (SECAR). This module facilitates efficient extraction and transmission of spatial information, improving the network’s localization performance. To tackle the challenges of target loss and interference from similar objects, we design a temporal network (TPN) structure for extracting and memorizing temporal features, enhancing the network’s tracking performance. The STNet network consists of four main parts: input layer, encoder, decoder, and output layer. The input layer introduces the TPN structure, enabling the network to achieve neural network-style Kalman filtering effects. The encoder and decoder utilize SECAR feature extraction modules, which use multi-level residual bypass channels to improve the network’s spatial feature extraction capability and mitigate the problem of information loss for small targets. The output layer is responsible for restoring the output heat map to the predetermined size. After contour detection on the output heat map, pixel coordinate information of the shuttlecock can be obtained. Stable tracking of the shuttlecock is achieved by locating the pixel coordinates of the shuttlecock frame by frame.
Result
2
Experiments conducted on the open data set TrackNetv2 and a home-brewed data set demonstrated that the STNet network achieved an accuracy of 92.0%, a precision of 96.7%, and an F1 score of 95.0%. Compared with the shuttlecock positioning trajectory method TrackNetv2, our method improved accuracy, precision, and F1 by 7.5%, 15.7%, and 7.5%, respectively, while significantly reducing the parameter count to meet real-time processing requirements (54 frames per second).
Conclusion
2
This paper proposes a badminton localization tracking algorithm based on SECAR and a temporal encoder–decoder network. The network algorithm employs the SE channel attention mechanism and a residual bypass structure to design the SECAR feature extraction unit, which effectively improves the network’s capability to extract spatial features. To extract temporal information from video frames and achieve neural network-style Kalman filtering effects, we introduce the TPN structure. Experimental results demonstrate that even in low-frame-rate badminton match videos with severe motion blur, our proposed network can accurately locate the pixel coordinates of the shuttlecock and achieve stable tracking.
羽毛球定位跟踪小目标热力图轮廓检测编解码网络时序网络
shuttlecock localization and trackingsmall targetheatmapcontour detectionencoder-decoder networktemporal network
Bochkovskiy A, Wang C Y and Liao H Y M. 2020. Yolov4: optimal speed and accuracy of object detection [EB/OL]. [2023-11-20]. https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf
Cao J L, Li Y L, Sun H Q, Xie J, Huang K Q and Pang Y W. 2022. A survey on deep learning based visual object detection. Journal of Image and Graphics, 27(6): 1697-1722
曹家乐, 李亚利, 孙汉卿, 谢今, 黄凯奇, 庞彦伟. 2022. 基于深度学习的视觉目标检测技术综述. 中国图象图形学报, 27(6): 1697-1722 [DOI: 10.11834/jig.220069http://dx.doi.org/10.11834/jig.220069]
Cao Z G, Liao T B, Song W, Chen Z H and Li C S. 2021. Detecting the shuttlecock for a badminton robot: a YOLO based approach. Expert Systems with Applications, 164: #113833 [DOI: 10.1016/j.eswa.2020.113833http://dx.doi.org/10.1016/j.eswa.2020.113833]
Chen W, Liao T B, Li Z H, Lin H Z, Xue H, Zhang L, Guo J and Cao Z G. 2019. Using FTOC to track shuttlecock for the badminton robot. Neurocomputing, 334: 182-196 [DOI: 10.1016/j.neucom.2019.01.023http://dx.doi.org/10.1016/j.neucom.2019.01.023]
Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587 [DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]
Goh G L, Goh G D, Pan J W, Teng P S P and Kong P W. 2023. Automated service height fault detection using computer vision and machine learning for badminton matches. Sensors, 23(24): #9759 [DOI: 10.3390/s23249759http://dx.doi.org/10.3390/s23249759]
Huang Y C, Liao I N, Chen C H, İk T U and Peng W C. 2019. TrackNet: a deep learning network for tracking high-speed and tiny objects in sports applications//Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). Taipei, China: IEEE: 1-8 [DOI: 10.1109/AVSS.2019.8909871http://dx.doi.org/10.1109/AVSS.2019.8909871]
Jacob A, Wan Zakaria W N and Md Tomari M R B. 2016. Implementation of IMU sensor for elbow movement measurement of Badminton players//Proceedings of the 2nd IEEE International Symposium on Robotics and Manufacturing Automation (ROMA). Ipoh, Malaysia: IEEE: 1-6 [DOI: 10.1109/ROMA.2016.7847813http://dx.doi.org/10.1109/ROMA.2016.7847813]
Komorowski J, Kurzejamski G and Sarwas G. 2019. DeepBall: deep neural-network ball detector [EB/OL]. [2023-11-20]. https://arxiv.org/pdf/1902.07304.pdfhttps://arxiv.org/pdf/1902.07304.pdf
Lai S W, Xu L H, Liu K and Zhao J. 2015. Recurrent convolutional neural networks for text classification//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin, USA: AAAI: 2267-2273 [DOI: 10.1609/aaai.v29i1.9513http://dx.doi.org/10.1609/aaai.v29i1.9513]
Lea C, Flynn M D, Vidal R, Reiter A and Hager G D. 2017. Temporal convolutional networks for action segmentation and detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 1003-1012 [DOI: 10.1109/CVPR.2017.113http://dx.doi.org/10.1109/CVPR.2017.113]
Liu C X, Wang W, Liu H R and Wang J. 2022. Application of hawk-eye technology to sports events//Proceedings of the 2nd International Conference on Information Technology and Contemporary Sports (TCS). Guangzhou, China: IEEE: 1-5 [DOI: 10.1109/TCS56119.2022.9918811http://dx.doi.org/10.1109/TCS56119.2022.9918811]
Liu J B, Huang G G, Hyyppä J, Li J, Gong X D and Jiang X F. 2023. A survey on location and motion tracking technologies, methodologies and applications in precision sports. Expert Systems with Applications, 229: #120492 [DOI: 10.1016/j.eswa.2023.120492http://dx.doi.org/10.1016/j.eswa.2023.120492]
Liu P and Wang J H. 2022. MonoTrack: shuttle trajectory reconstruction from monocular badminton video//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New Orleans, USA: IEEE: 3512-3521 [DOI: 10.1109/CVPRW56347.2022.00395http://dx.doi.org/10.1109/CVPRW56347.2022.00395]
Liu Y, Liu H Y, Fan J L, Gong Y C, Li Y H, Wang F P and Lu J. 2020. A Survey of research and application of small object detection based on deep learning. Acta Electronica Sinica, 48(3): 590-601
刘颖, 刘红燕, 范九伦, 公衍超, 李莹华, 王富平, 卢津. 2020. 基于深度学习的小目标检测研究与应用综述. 电子学报, 48(3): 590-601 [DOI: 10.3969/j.issn.0372-2112.2020.03.024http://dx.doi.org/10.3969/j.issn.0372-2112.2020.03.024]
McLaughlin N, del Rincon J M and Miller P. 2016. Recurrent convolutional network for video-based person re-identification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1325-1334 [DOI: 10.1109/CVPR.2016.148http://dx.doi.org/10.1109/CVPR.2016.148]
Raina A, Lakshmi T G and Murthy S. 2017. CoMBaT: wearable technology based training system for novice badminton players//Proceedings of the 17th IEEE International Conference on Advanced Learning Technologies (ICALT). Timisoara, Romania: IEEE: 153-157 [DOI: 10.1109/ICALT.2017.96http://dx.doi.org/10.1109/ICALT.2017.96]
Redmon J and Farhadi A. 2018. Yolov3: an incremental improvement [EB/OL]. [2023-11-20]. https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031]
Shishido H, Kitahara I, Kameda Y and Ohta Y. 2014. A trajectory estimation method for badminton shuttlecock utilizing motion blur//Proceeding of the 6th Pacific-Rim Symposium Image and Video Technology. Guanajuato, Mexico: Springer: 325-336 [DOI: 10.1007/978-3-642-53842-1_28http://dx.doi.org/10.1007/978-3-642-53842-1_28]
Sun N E, Lin Y C, Chuang S P, Hsu T H, Yu D R, Chung H Y and İk T U. 2020. TrackNetV2: efficient shuttlecock tracking network//Proceedings of 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). Taipei, China: IEEE: 86-91 [DOI: 10.1109/ICPAI51961.2020.00023http://dx.doi.org/10.1109/ICPAI51961.2020.00023]
Tao S and Wang M L. 2022. Stroke recognition in badminton videos based on pose estimation and temporal segment networks analysis. Journal of Image and Graphics, 27(11): 3280-3291
陶树, 王美丽. 2022. 结合姿态估计和时序分段网络分析的羽毛球视频动作识别. 中国图象图形学报, 27(11): 3280-3291 [DOI: 10.11834/jig.210407http://dx.doi.org/10.11834/jig.210407]
Tarashima S, Haq M A, Wang Y S and Tagawa N. 2023. Widely applicable strong baseline for sports ball detection and tracking [EB/OL]. [2023-11-20]. https://arxiv.org/pdf/2311.05237.pdfhttps://arxiv.org/pdf/2311.05237.pdf
Van Zandycke G and De Vleeschouwer C. 2019. Real-time CNN-based segmentation architecture for ball detection in a single view setup//Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports. Nice, France: ACM: 51-58 [DOI: 10.1145/3347318.3355517http://dx.doi.org/10.1145/3347318.3355517]
Wang C Y, Bochkovskiy A and Liao H Y M. 2023. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE: 7464-7475 [DOI: 10.1109/CVPR52729.2023.00721http://dx.doi.org/10.1109/CVPR52729.2023.00721]
Zhang Y M, Chen C X and Hu R L. 2022. YOLO-BTM: a novel shuttlecock detection method for embedded badminton robots//Proceedings of 2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE). Wuhan, China: IEEE: 1-6 [DOI: 10.1109/ICARCE55724.2022.10046579http://dx.doi.org/10.1109/ICARCE55724.2022.10046579]
相关作者
相关机构