字符敏感编辑距离的零样本汉字识别
Character-aware edit distance for zero-shot Chinese character recognition
- 2024年29卷第11期 页码:3383-3400
纸质出版日期: 2024-11-16
DOI: 10.11834/jig.230875
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-11-16 ,
移动端阅览
陈宇, 王大寒, 池雪可, 江楠峰, 张煦尧, 王驰明, 朱顺痣. 2024. 字符敏感编辑距离的零样本汉字识别. 中国图象图形学报, 29(11):3383-3400
Chen Yu, Wang Dahan, Chi Xueke, Jiang Nanfeng, Zhang Xuyao, Wang Chiming, Zhu Shunzhi. 2024. Character-aware edit distance for zero-shot Chinese character recognition. Journal of Image and Graphics, 29(11):3383-3400
目的
2
零样本汉字识别(zero-shot Chinese character recognition,ZSCCR)因其能在零或少训练样本下识别未见汉字而受到广泛关注。现有的零样本汉字识别方法大多采用基于部首序列匹配框架,即首先预测部首序列,然后根据表意描述序列(ideographic description sequence,IDS)字典进行最小编辑距离(minimum edit distance,MED)匹配。然而,现有的MED算法默认不同部首的替换代价、插入代价和删除代价相同,导致在匹配时候选字符类别存在距离代价模糊和冗余的问题。为此,提出了一种字符敏感编辑距离(character-aware edit distance,CAED)以正确匹配目标字符类别。
方法
2
通过设计多种部首信息提取方法,获得了更为精细化的部首描述,从而得到更精确的部首替换代价,提高了MED的鲁棒性和有效性;此外,提出部首计数模块预测样本的部首数量,从而形成代价门控以约束和调整插入和删除代价,克服了IDS序列长度预测不准确产生的影响。
结果
2
在手写汉字、场景汉字和古籍汉字等数据集上进行实验验证,与以往的方法相比,本文提出的CAED在识别未见汉字类别的准确率上分别提高了4.64%、1.1%和5.08%,同时对已见汉字类别保持相当的性能,实验结果充分表明了本方法的有效性。
结论
2
本文所提出的字符敏感编辑距离,使得替换、插入和删除3种编辑代价根据字符进行自适应调整,有效提升了对未见汉字的识别性能。
Objective
2
Zero-shot Chinese character recognition (ZSCCR) has attracted increasing attention in recent years due to its importance in recognizing unseen Chinese characters with zero/few training samples. The fundamental concept of zero-shot learning is to solve the new class recognition problem by generalizing semantic knowledge from seen classes to unseen classes, usually represented by auxiliary information such as attribute descriptions shared between different classes. Chinese characters comprise multiple radicals; therefore, radicals are often used as shared attributes between different Chinese character classes. Most existing ZSCCR methods adopt the radical-based sequence matching framework that recognizes the character by predicting the radical sequence, followed by minimum edit distance (MED) matching based on the ideographic description sequence (IDS) dictionary. The MED can quickly compare the predicted radical sequences individually with the IDS dictionary to measure the difference between the two sequences and thus determine the unseen Chinese character category. However, this algorithm is mainly based on a framework where the insertion, deletion, and substitution costs are all set to 1, assuming that the cost is the same between all pairs of radicals. However, in practice, the substitution cost between similar radicals should be lower than that between non-similar radicals. Moreover, this approach needs increased flexibility due to the excessively long or short length of the predicted IDS sequence, resulting in redundant insertion or deletion costs. Consequently, a character-aware edit distance (CAED) is proposed to extract refined radical substitution costs, and the impacts of insertion and deletion costs are considered.
Method
2
The CAED in this study adaptively adjusts the cost of substitution, insertion, and deletion in edit distance to match the unseen Chinese character category according to the sensitivity of each Chinese character. In ZSCCR, the key to the radical-based approach lies in identifying radical sequences and the metrics between predicted and candidate sequences, and the accuracy of the metrics will directly determine the performance of the final model. Therefore, the metrics between radical sequences must be refined. Specifically, the CAED proposed in this paper analyzes the cost of editing distance. The similarity probability between different radicals is calculated as the substitution cost by assigning weights to the structure of the radicals, number of strokes, partials, and four-corner method information. Thus, the cost of the distance between different radicals is finely adjusted to improve the robustness and performance of MED. In addition, a radical counting module is introduced to predict the number of radicals. Constraints on the cost of insertions and deletions are imposed by comparing the radical counts with the number of radicals in the predicted sequence to help mitigate the problem of excessively long or short predicted sequences of radicals. Therefore, refined distances are obtained between radical sequences. Compared to traditional methods, the proposed method can accurately match the correct character class with the shortest distance, regardless of misrecognition of similar radicals, mismatch of radical sequences, or both simultaneously.
Result
2
Experiments are conducted on the handwriting database (HWDB) and the 12th International Conference on Document Analysis and Recognition (ICDAR 2013) datasets, the Chinese text in the wild (CTW) datasets, and the ancient handwritten characters database (AHCDB). Initially, on the handwritten and scene Chinese character datasets, the proposed CAED consistently outperformed current state-of-the-art methods in ZSCCR, demonstrating the superiority of CAED. Subsequently, CAED was integrated with other networks in the ancient Chinese dataset to emphasize its scalability. Additionally, the performance of the radical counting module was evaluated, recognizing its direct impact on cost gating. Subsequent ablation experiments validated the effectiveness of the insertion and deletion cost constraint modules and the substitution cost refinement module. Combinatorial analysis was conducted on the multiple pieces of information contributing to the substitution cost to determine their respective values. Finally, traditional Chinese character recognition experiments were conducted to evaluate the performance of CAED in recognizing purely visible Chinese character categories, and the accuracy reached 97.02% on ICDAR 2013. Although it failed to reach optimal performance, CAED is still highly competitive and performs excellently in all comparison results. Experimental outcomes revealed a notable improvement in unseen Chinese character accuracy, with CAED achieving a 4.64%, 1.1%, and 5.08% enhancement compared to other methods on the HWDB, ICDAR 2013, CTW, and AHCDB datasets.
Conclusion
2
A CAED for zero-shot Chinese character recognition, in which the cost of editing in edit distance depends adaptively on the character, is proposed. The method refines the substitution cost between radicals with multiple pieces of information, which can correct similar radicals misrecognized as confusing by the model. Moreover, introducing a radical counting module to form a cost gating is used to constrain the cost of insertions and deletions, thus alleviating the problem of mismatched radical sequence lengths. In addition, the method can be combined with any network based on radical sequence recognition to improve the resistance to misrecognition.
零样本汉字识别(ZSCCR)表意描述序列(IDS)编辑距离字符敏感部首信息代价门控
zero-shot Chinese character recognition (ZSCCR)ideographic description sequence (IDS)edit distancecharacter-awareradical informationcost gate
Ao X, Zhang X Y and Liu C L. 2022. Cross-modal prototype learning for zero-shot handwritten character recognition. Pattern Recognition, 131: #108859 [DOI: 10.1016/j.patcog.2022.108859http://dx.doi.org/10.1016/j.patcog.2022.108859]
Ao X, Zhang X Y, Yang H M, Yin F and Liu C L. 2019. Cross-modal prototype learning for zero-shot handwriting recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 589-594 [DOI: 10.1109/ICDAR.2019.00100http://dx.doi.org/10.1109/ICDAR.2019.00100]
Bromley J, Guyon I, LeCun Y, Säckinger E and Shah R. 1993. Signature verification using a “Siamese” time delay neural network//Proceedings of the 6th International Conference on Neural Information Processing Systems. Denver, USA: Morgan Kaufmann Publishers Inc.: 737-744
Cao Z, Lu J, Cui S and Zhang C S. 2020. Zero-shot handwritten Chinese character recognition with hierarchical decomposition embedding. Pattern Recognition, 107: #107488 [DOI: 10.1016/j.patcog.2020.107488http://dx.doi.org/10.1016/j.patcog.2020.107488]
Chen J Y, Li B and Xue X Y. 2021. Zero-shot Chinese character recognition with stroke-level decomposition//Proceedings of the 30th International Joint Conference on Artificial Intelligence. Montreal, Canada: IJCAI.org: 615-621 [DOI: 10.24963/ijcai.2021/85http://dx.doi.org/10.24963/ijcai.2021/85]
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H and Bengio Y. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: ACL: 1724-1734 [DOI: 10.3115/V1/D14-1179http://dx.doi.org/10.3115/V1/D14-1179]
Cireşan D and Meier U. 2015. Multi-column deep neural networks for offline handwritten Chinese character classification//Proceedings of 2015 International Joint Conference on Neural Networks (IJCNN). Killarney, Ireland: IEEE: 1-6 [DOI: 10.1109/IJCNN.2015.7280516http://dx.doi.org/10.1109/IJCNN.2015.7280516]
Corbetta M and Shulman G L. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3(3): 201-215 [DOI: 10.1038/nrn755http://dx.doi.org/10.1038/nrn755]
Diao X L, Shi D Q, Tang H, Shen Q, Li Y Z, Wu L and Xu H. 2023. RZCR: zero-shot character recognition via radical-based reasoning//Proceedings of the 32nd International Joint Conference on Artificial Intelligence. Macao, China: IJCAI.org: 654-662 [DOI: 10.24963/IJCAI.2023/73http://dx.doi.org/10.24963/IJCAI.2023/73]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269 [DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Huang G J, Luo X Y, Wang S W, Gu T L and Su K L. 2022. Hippocampus-heuristic character recognition network for zero-shot learning in Chinese character recognition. Pattern Recognition, 130: #108818 [DOI: 10.1016/j.patcog.2022.108818http://dx.doi.org/10.1016/j.patcog.2022.108818]
Kim I J, Liu C L and Kim J H. 1999. Stroke-guided pixel matching for handwritten Chinese character recognition//Proceedings of the 5th International Conference on Document Analysis and Recognition. Bangalore, India: IEEE: 665-668 [DOI: 10.1109/ICDAR.1999.791875http://dx.doi.org/10.1109/ICDAR.1999.791875]
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, United States: Curran Associates Inc.: 1097-1105
Li B H, Peng L R and Ji J N. 2014. Historical Chinese character recognition method based on style transfer mapping//Proceedings of the 11th IAPR International Workshop on Document Analysis Systems. Tours, France: IEEE: 96-100 [DOI: 10.1109/DAS.2014.33http://dx.doi.org/10.1109/DAS.2014.33]
Li Y Q, Du J, Hu P F and Zhang J S. 2023. A method of radical form and hierarchical structure based handwritten Chinese character error correction. Journal of Image and Graphics, 28(8): 2382-2395
李云青, 杜俊, 胡鹏飞, 张建树. 2023. 结合部首字形和层级结构的手写汉字纠错方法. 中国图象图形学报, 28(8): 2382-2395 [DOI: 10.11834/jig.220906http://dx.doi.org/10.11834/jig.220906]
Li Y Q, Zhu Y X, Du J, Wu C J and Zhang J S. 2020a. Radical counter network for robust Chinese character recognition//Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Milan, Italy: IEEE: 4191-4197 [DOI: 10.1109/ICPR48806.2021.9412918http://dx.doi.org/10.1109/ICPR48806.2021.9412918]
Li Z Y, Wu Q, Xiao Y, Jin M and Lu H X. 2020b. Deep matching network for handwritten Chinese character recognition. Pattern Recognition, 107: #107471 [DOI: 10.1016/j.patcog.2020.107471http://dx.doi.org/10.1016/j.patcog.2020.107471]
Li Z Y, Xiao Y, Wu Q, Jin M and Lu H X. 2020c. Deep template matching for offline handwritten Chinese character recognition. The Journal of Engineering, 4: 120-124 [DOI: 10.1049/joe.2019.0895http://dx.doi.org/10.1049/joe.2019.0895]
Liu C L, Kim I J and Kim J H. 2001. Model-based stroke extraction and matching for handwritten Chinese character recognition. Pattern Recognition, 34(12): 2339-2352 [DOI: 10.1016/S0031-3203(00)00165-5http://dx.doi.org/10.1016/S0031-3203(00)00165-5]
Liu C L, Yin F, Wang D H and Wang Q F. 2013. Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognition, 46(1): 155-162 [DOI: 10.1016/j.patcog.2012.06.021http://dx.doi.org/10.1016/j.patcog.2012.06.021]
Luo G F, Wang D H, Du X, Yin H Y, Zhang X Y and Zhu S Z. 2023. Self-information of radicals: a new clue for zero-shot Chinese character recognition. Pattern Recognition, 140: #109598 [DOI: 10.1016/j.patcog.2023.109598http://dx.doi.org/10.1016/j.patcog.2023.109598]
Ma W H, Zhang H S, Jin L W, Wu S H, Wang J P and Wang Y P. 2020. Joint layout analysis, character detection and recognition for historical document digitization//Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition. Dortmund, Germany: IEEE: 31-36 [DOI: 10.1109/ICFHR2020.2020.00017http://dx.doi.org/10.1109/ICFHR2020.2020.00017]
Wang T Q, Yin F and Liu C L. 2017. Radical-based Chinese character recognition via multi-labeled learning of deep residual networks//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 579-584 [DOI: 10.1109/ICDAR.2017.100http://dx.doi.org/10.1109/ICDAR.2017.100]
Wang T W, Xie Z C, Li Z, Jin L W and Chen X L. 2019a. Radical aggregation network for few-shot offline handwritten Chinese character recognition. Pattern Recognition Letters, 125: 821-827 [DOI: 10.1016/j.patrec.2019.08.005http://dx.doi.org/10.1016/j.patrec.2019.08.005]
Wang W, Zheng V W, Yu H and Miao C Y. 2019b. A survey of zero-shot learning: settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology, 10(2): #13 [DOI: 10.1145/3293318http://dx.doi.org/10.1145/3293318]
Wang W C, Zhang J S, Du J, Wang Z R and Zhu Y X. 2018. DenseRAN for offline handwritten Chinese character recognition//Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition. Niagara Falls, USA: IEEE: 104-109 [DOI: 10.1109/ICFHR-2018.2018.00027http://dx.doi.org/10.1109/ICFHR-2018.2018.00027]
Wu C J, Wang Z R, Du J, Zhang J S and Wang J M. 2019. Joint spatial and radical analysis network for distorted Chinese character recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW). Sydney, Australia: IEEE: 122-127 [DOI: 10.1109/ICDARW.2019.40092http://dx.doi.org/10.1109/ICDARW.2019.40092]
Wu C P, Fan W, He Y, Sun J and Naoi S. 2014. Handwritten character recognition by alternately trained relaxation convolutional neural network//Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. Hersonissos, Greece: IEEE: 291-296 [DOI: 10.1109/ICFHR.2014.56http://dx.doi.org/10.1109/ICFHR.2014.56]
Xu Y, Yin F, Wang D H, Zhang X Y, Zhang Z X and Liu C L. 2019. CASIA-AHCDB: a large-scale Chinese ancient handwritten characters database//Proceedings of 2019 International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 793-798 [DOI: 10.1109/ICDAR.2019.00132http://dx.doi.org/10.1109/ICDAR.2019.00132]
Yang C, Liu C, Fang Z Y, Han Z, Liu C L and Yin X C. 2023. Open set text recognition technology. Journal of Image and Graphics, 28(6): 1767-1791
杨春, 刘畅, 方治屿, 韩铮, 刘成林, 殷绪成. 2023. 开放集文字识别技术. 中国图象图形学报, 28(6) 1767-1791 [DOI: 10.11834/jig.230018http://dx.doi.org/10.11834/jig.230018]
Yang C, Wang Q, Du J, Zhang J S, Wu C J and Wang J M. 2020. A transformer-based radical analysis network for Chinese character recognition//Proceedings of the 25th International Conference on Pattern Recognition. Milan, Italy: IEEE: 3714-3719 [DOI: 10.1109/ICPR48806.2021.9412439http://dx.doi.org/10.1109/ICPR48806.2021.9412439]
Yang H L, Jin L W, Huang W G, Yang Z Y, Lai S X and Sun J F. 2018b. Dense and tight detection of Chinese characters in historical documents: datasets and a recognition guided detector. IEEE Access, 6: 30174-30183 [DOI: 10.1109/ACCESS.2018.2840218http://dx.doi.org/10.1109/ACCESS.2018.2840218]
Yang H L, Jin L W and Sun J F. 2018a. Recognition of Chinese text in historical documents with page-level annotations//Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition. Niagara Falls, USA: IEEE: 199-204 [DOI: 10.1109/ICFHR-2018.2018.00043http://dx.doi.org/10.1109/ICFHR-2018.2018.00043]
Yang H M, Zhang X Y, Yin F and Liu C L. 2018c. Robust classification with convolutional prototype learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3474-3482 [DOI: 10.1109/CVPR.2018.00366http://dx.doi.org/10.1109/CVPR.2018.00366]
Yang X, He D F, Zhou Z H, Kifer D and Giles C L. 2017. Improving offline handwritten Chinese character recognition by iterative refinement//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 5-10 [DOI: 10.1109/ICDAR.2017.11http://dx.doi.org/10.1109/ICDAR.2017.11]
Yin F, Wang Q F, Zhang X Y and Liu C L. 2013. ICDAR 2013 Chinese handwriting recognition competition//Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, USA: IEEE: 1464-1470 [DOI: 10.1109/ICDAR.2013.218http://dx.doi.org/10.1109/ICDAR.2013.218]
Yu H Y, Chen J Y, Li B and Xue X Y. 2024. Chinese character recognition with radical-structured stroke trees. Machine Learning, 113(6): 3807-3827 [DOI: 10.1007/s10994-023-06450-6http://dx.doi.org/10.1007/s10994-023-06450-6]
Yu H Y, Wang X C, Li B and Xue X Y. 2023. Chinese text recognition with a pre-trained CLIP-Like model through image-IDS aligning//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE: 11909-11918 [DOI: 10.1109/ICCV51070.2023.01097http://dx.doi.org/10.1109/ICCV51070.2023.01097]
Yuan T L, Zhu Z, Xu K, Li C J, Mu T J and Hu S M. 2019. A large Chinese text dataset in the wild. Journal of Computer Science and Technology, 34(3): 509-521 [DOI: 10.1007/s11390-019-1923-yhttp://dx.doi.org/10.1007/s11390-019-1923-y]
Zhang J S, Du J and Dai L R. 2020a. Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognition, 103: #107305 [DOI: 10.1016/j.patcog.2020.107305http://dx.doi.org/10.1016/j.patcog.2020.107305]
Zhang X Y, Bengio Y and Liu C L. 2017. Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recognition, 61: 348-360 [DOI: 10.1016/j.patcog.2016.08.005http://dx.doi.org/10.1016/j.patcog.2016.08.005]
Zhang X Y, Liu C L and Suen C Y. 2020b. Towards robust pattern recognition: a review. Proceedings of the IEEE, 108(6): 894-922 [DOI: 10.1109/jproc.2020.2989782http://dx.doi.org/10.1109/jproc.2020.2989782]
Zhong Z Y, Jin L W and Xie Z C. 2015. High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps//Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis, Tunisia: IEEE: 846-850 [DOI: 10.1109/ICDAR.2015.7333881http://dx.doi.org/10.1109/ICDAR.2015.7333881]
Zu X Y, Yu H Y, Li B and Xue X Y. 2022. Chinese character recognition with augmented character profile matching//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM: 6094-6102 [DOI: 10.1145/3503161.3547827http://dx.doi.org/10.1145/3503161.3547827]
相关文章
相关作者
相关机构