IconFormer: 基于CNNs和Transformer的图标生成模型

候冬辉; 竺乐庆

doi:10.11834/jig.240570

浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

IconFormer: 基于CNNs和Transformer的图标生成模型
IconFormer： An Icon Generation Model Based on CNNs and Transformer
2024年页码：1-11
网络出版日期： 2024-12-23 ，
DOI： 10.11834/jig.240570
稿件说明：

移动端阅览

候冬辉,竺乐庆.IconFormer: 基于CNNs和Transformer的图标生成模型[J].中国图象图形学报,

Hou Donghui,Zhu Leqing.IconFormer： An Icon Generation Model Based on CNNs and Transformer[J].Journal of Image and Graphics,
候冬辉,竺乐庆.IconFormer: 基于CNNs和Transformer的图标生成模型[J].中国图象图形学报, DOI： 10.11834/jig.240570.

Hou Donghui,Zhu Leqing.IconFormer： An Icon Generation Model Based on CNNs and Transformer[J].Journal of Image and Graphics, DOI： 10.11834/jig.240570.

摘要

目的

图标自动生成可以提高软件图形用户界面设计的效率，现有的图标自动生成方法存在多样性不足、生成过程复杂、输入要求较高等问题，限制了生成结果的自由度和创新性。本文提出了一种基于Transformer的高效和灵活的图标生成方法，该方法只需提供任意一对内容图标和风格图标，即可生成一张新的具有特定风格的图标图像。

方法

本文提出了一个图标生成模型IconFormer，网络结构中包括一个VGG特征编码器、一个基于CNNs（convolutional neural networks）的风格编码器、一个Transformer多层解码器和一个CNN解码器，并用内容损失、风格损失、一致性损失和梯度损失组成的综合损失来优化网络模型。

结果

为了评估所提出的图标生成模型，本文构建了包含43741个图标样本的数据集，并在该数据集上对IconFormer模型进行了训练和评估，并在相同的条件下与最先进的相关方法进行了对比和分析。评估结果表明，本文的IconFormer生成的图标在颜色和结构上更为完整，而其他相关方法则一定程度出现了内容缺失、风格化不足和背景着色的情况，IconFormer在内容差异和梯度分数等量化指标上也明显优于其他模型。消融实验进一步证明了本文所构建的IconFormer模型各个创新点对图标生成过程所起的正向作用。

结论

本文所提出的图标生成模型IconFormer，结合了卷积神经网络和Transformer模型的优点，可以快速高效地生成具有不同风格的高质量图标。

Abstract

Objective

Icons are essential components for graphic user interface design of software or web sites， as they can convey their meaning through visual information to users quickly and directly， and thus improve the usability of the software and websites. However， it is a labour-exhaustive and time-expensive procedure to manually create a large number of icon images with consistent style and harmonious color scheme， and professional artists are required to do the job. Therefore， researchers explored methods to automatically generate icons by using deep learning models， so as to improve the efficiency of graphical user interface design in software. Several state-of-the-art icon generation methods were proposed in recent years， however， some of these methods based on generative adversarial networks have the problem of insufficient diversity in the generated icons， and some of these methods require users to provide initial icon sketches or color prompts as auxiliary inputs， which increases the complexity of the generation process. Therefore， this paper proposes a novel icon generation method based on Transformer and convolutional neural networks， with which new icons can be generated based on given pair of content icon and style icon. In this way， icons can be generated more efficient and flexible than previous methods with better quality. The proposed model IconFormer in this paper can effectively establish the relationship between content and style through Transformer， and avoid the problems of missing local detail information of the content and insufficient stylization.

Method

This paper proposes an icon generation model named IconFormer based on deep neural network， the network architecture is made up of a feature encoder based on VGG， a style encoder based on CNNs（convolutional neural networks）， a multi-layer Transformer decoder， and a CNN decoder. The style encoder is designed to discover more style information from style features. The Transformer decoder achieve a high degree of integration between content encoding and style encoding. In order to train and test the proposed icon generation model， this paper collected a high-quality dataset containing 43741 icon images， which covered icons of different styles， categories， and structures. The icon dataset was organized into pairs with each pair containing a content icon and a style icon. The dataset is divided into training set and t

esting set in ratio of 9：1. The content features and style features are first extracted from input content icon and style icon with the ImageNet pre-trained VGG19 encoder， and the style features are further encoded into style key

and style value

with the style encoder. Subsequently， the content features as

， style key

， and style value

are input to the multi-layer Transformer decoder for feature fusion. Finally， the fused features are decoded into a stylized new icon with the CNN decoder. A new loss function integrated by content loss， style loss， identity loss and gradient loss is adopted to optimize the network parameters.

Result

The proposed IconFormer is evaluated on the icon dataset， and is compared with previous state-of-the-art methods under the same configuration. The related state-of-the-art methods include： AdaIN， ArtFlow， StyleFormer， and StyTr2， CAP-VSTNet and S2WAT. The experimental results suggest that the icons generated by the proposed IconFormer are more complete in color and structure than previous method. The icons generated by AdaIN， ArtFlow， and StyleFormer demonstrate content loss and insufficient stylization in different extents. StyTr2 cannot effectively distinguish the main structure and background information of the icon and most of the background of its generated icons are colorized. The quantitative analysis results show that the proposed IconFormer outperforms previous methods in terms of content and gradient differences. AdaIN gives the highest content difference， indicating that this method has content loss， while the ArtFlow gives the highest style difference， indicating that this method cannot effectively stylize content icons. Plenty of ablation experiments were conducted to verify the effectiveness of feature encoder， the style encoder， loss function definition in the icon generation process. The result shows that the VGG feature extractor， the style encoder and the integrated loss function with gradient loss have all positive effects on the resulting icons. Additional experiments to generate a set of icons with unified style are conducted and the result show that IconFormer is very convenient to generate a set of icons with consistent style， harmonious colors， and high quality.

Conclusion

The icon generation model IconFormer based on CNNs and Transformer proposed in this paper combines the advantages of convolutional neural networks and Transformers， which can generate new icons with high quality and high efficiency， saving time and cost for GUI design of software or websits.

关键词

图标生成图像风格迁移卷积神经网络Transformer自注意力机制

Keywords

icon generationimage style transferconvolutional neural networkstransformerself-attention mechanism

references

An J， Huang S and Song Y. 2021 . Artflow： Unbiased image style transfer via reversible neural flows//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 862-871 ［DOI： 10.1109/CVPR46437.2021.00092http://dx.doi.org/10.1109/CVPR46437.2021.00092］

Carion N， Massa F and Synnaeve G. 2020. End-to-end object detection with transformers//European conference on computer vision. Cham： Springer International Publishing： 213-229 ［DOI： 10.1007/978-3-030-58452-8_1http://dx.doi.org/10.1007/978-3-030-58452-8_1］

Carlier A， Danelljan M and Alahi A. 2020. Deepsvg： A hierarchical generative network for vector graphics animation. Advances in Neural Information Processing Systems， 33： 16351-16361 ［DOI： 10.48550/arXiv.2007.11301http://dx.doi.org/10.48550/arXiv.2007.11301］

Chen D， Yuan L and Liao J. 2017. Stylebank： An explicit representation for neural image style transfer//Proceedings of the IEEE conference on computer vision and pattern recognition： 1897-1906 ［DOI： 10.1109/CVPR.2017.296http://dx.doi.org/10.1109/CVPR.2017.296］

Chen H， Wang Y and Guo T. 2021. Pre-trained image processing transformer//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition： 12299-12310 ［DOI： 10.1109/CVPR46437.2021.01212http://dx.doi.org/10.1109/CVPR46437.2021.01212］

Chen Y， Pan Z and Shi M. 2022. Design What You Desire： Icon Generation from Orthogonal Application and Theme Labels//Proceedings of the 30th ACM International Conference on Multimedia： 2536-2546 ［DOI： 10.1145/3503161.3548109http://dx.doi.org/10.1145/3503161.3548109］

Deng Y， Tang F and Dong W. 2022. Stytr2： Image style transfer with transformers//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition： 11326-11336 ［DOI： 10.1109/CVPR52688.2022.01104http://dx.doi.org/10.1109/CVPR52688.2022.01104］

Gatys L A， Ecker A S and Bethge M. 2016. Image style transfer using convolutional neural networks//Proceedings of the IEEE conference on computer vision and pattern recognition： 2414-2423 ［DOI： 10.48550/arXiv.1406.2661http://dx.doi.org/10.48550/arXiv.1406.2661］

Han Q R， Zhu W Z and Zhu Q. 2020. Icon colorization based on triple conditional generative adversarial networks//2020 IEEE International Conference on Visual Communications and Image Processing （VCIP）. IEEE： 462-468 ［DOI： 10.1109/VCIP49819.2020.9301890http://dx.doi.org/10.1109/VCIP49819.2020.9301890］

Huang X and Belongie S. 2017. Arbitrary style transfer in real-time with adaptive instance normalization//Proceedings of the IEEE international conference on computer vision： 1501-1510 ［DOI： 10.1109/ICCV.2017.167http://dx.doi.org/10.1109/ICCV.2017.167］

Johnson J， Alahi A and Fei-Fei L. 2016. Perceptual losses for real-time style transfer and super-resolution//Computer Vision–ECCV 2016： 14th European Conference， Amsterdam， The Netherlands， October 11-14， 2016， Proceedings， Part II 14. Springer International Publishing： 694-711 ［DOI： 10.1007/978-3-319-46475-6_43http://dx.doi.org/10.1007/978-3-319-46475-6_43］

Kingma D P and Ba J. 2014. Adam： A method for stochastic optimization. arXiv preprint arXiv：1412.6980 ［DOI： 10.48550/arXiv.1412.6980http://dx.doi.org/10.48550/arXiv.1412.6980］

Li C and Wand M. 2016. Precomputed real-time texture synthesis with markovian generative adversarial networks//Computer Vision–ECCV 2016： 14th European Conference， Amsterdam， The Netherlands， October 11-14， 2016， Proceedings， Part III 14. Springer International Publishing： 702-716 ［DOI： 10.1007/978-3-319-46487-9_43http://dx.doi.org/10.1007/978-3-319-46487-9_43］

Li Y， Lien Y H and Wang Y S. 2022. Style-structure disentangled features and normalizing flows for diverse icon colorization//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 11244-11253 ［DOI： 10.1109/CVPR52688.2022.01096http://dx.doi.org/10.1109/CVPR52688.2022.01096］

Liao Y H， Q W H and Cao J D. 2023. MStarGAN： a face style transfer network with changeable style intensity. Journal of Image and Graphics， 28（12）： 3784-3796

廖远鸿，钱文华，曹进德. 2023. 风格强度可变的人脸风格迁移网络. 中国图象图形学报， 28（12）： 3784-3796 ［DOI： 10.11834/jig.221149http://dx.doi.org/10.11834/jig.221149］

Lin J， Jiang Z and Guo J. 2024. IconDM： Text-Guided Icon Set Expansion Using Diffusion Models//Proceedings of the 32nd ACM International Conference on Multimedia： 156-165 ［DOI： 10.1145/3664647.3681057http://dx.doi.org/10.1145/3664647.3681057］

Liu S， Lin T and He D. 2021. Adaattn： Revisit attention mechanism in arbitrary neural style transfer//Proceedings of the IEEE/CVF international conference on computer vision： 6649-6658 ［DOI： 10.1109/ICCV48922.2021.00658http://dx.doi.org/10.1109/ICCV48922.2021.00658］

Park D Y and Lee K H. 2019. Arbitrary style transfer with style-attentional networks//proceedings of the IEEE/CVF conference on computer vision and pattern recognition： 5880-5888 ［DOI： 10.1109/CVPR.2019.00603http://dx.doi.org/10.1109/CVPR.2019.00603］

Reddy M D M， Basha M S M and Hari M M C. 2021. Dall-e： Creating images from text. UGC Care Group I Journal， 8（14）： 71-75.

Sun M T， Dai L Q and Tang J H. 2023. Transformer-based multi-style information transfer in image processing. Journal of Image and Graphics，28（11）： 3536-3549

孙梅婷，代龙泉，唐金辉. 2023.基于Transformer方法的任意风格迁移策略. 中国图象图形学报，28（11）： 3536-3549 ［DOI： 10.11834/jig.211237http://dx.doi.org/10.11834/jig.211237］

Sun T H， Lai C H and Wong S K. 2019. Adversarial colorization of icons based on contour and color conditions//Proceedings of the 27th ACM International Conference on Multimedia： 683-691 ［DOI： 10.1145/3343031.3351041http://dx.doi.org/10.1145/3343031.3351041］

Vaswani A， Shazeer N and Parmar N. 2017. Attention is all you need. Advances in neural information processing systems， 30 ［DOI： 10.5555/3295222.3295349http://dx.doi.org/10.5555/3295222.3295349］

Wen L， Gao C and Zou C. 2023. CAP-VSTNet： content affinity preserved versatile style transfer//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 18300-18309 ［DOI： 10.1109/CVPR52729.2023.01755http://dx.doi.org/10.1109/CVPR52729.2023.01755］

Wu R， Su W and Ma K. 2023. IconShop： Text-Guided Vector Icon Synthesis with Autoregressive Transformers. ACM Transactions on Graphics （TOG）， 42（6）： 1-14 ［DOI： 10.1145/3618364http://dx.doi.org/10.1145/3618364］

Wu X， Hu Z and Sheng L. 2021. Styleformer： Real-time arbitrary style transfer via parametric style composition//Proceedings of the IEEE/CVF International Conference on Computer Vision： 14618-14627 ［DOI： 10.1109/ICCV48922.2021.01435http://dx.doi.org/10.1109/ICCV48922.2021.01435］

Yang H， Xue C and Yang X. 2021. Icon generation based on generative adversarial networks. Applied Sciences， 11（17）： 7890 ［DOI： 10.3390/app11177890http://dx.doi.org/10.3390/app11177890］

Zhang C， Xu X and Wang L. 2024. S2wat： Image style transfer via hierarchical vision transformer using strips window attention//Proceedings of the AAAI Conference on Artificial Intelligence， 38（7）： 7024-7032 ［DOI： 10.1609/aaai.v38i7.28529http://dx.doi.org/10.1609/aaai.v38i7.28529］

Zheng S， Gao P and Zhou P. 2024. Puff-Net： Efficient Style Transfer with Pure Content and Style Feature Fusion Network//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition： 8059-8068 ［DOI： 10.48550/arXiv.2405.19775http://dx.doi.org/10.48550/arXiv.2405.19775］

文章被引用时，请邮件提醒。

提交

Transformer驱动的图像分类研究进展

融合交叉注意力与双编码器的医学图像分割

混合监督学习的乳腺癌全切片病理图像分类

面向弱纹理目标立体匹配的Transformer网络

结合双边交叉增强与自注意力补偿的点云语义分割