利用跨模态信息检索的鲁棒隐蔽通信
RoCC: robust covert communication based on cross-modal information retrieval
- 2024年29卷第2期 页码:369-381
纸质出版日期: 2024-02-16
DOI: 10.11834/jig.230504
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-02-16 ,
移动端阅览
张晏铭, 陈可江, 丁锦扬, 张卫明, 俞能海. 2024. 利用跨模态信息检索的鲁棒隐蔽通信. 中国图象图形学报, 29(02):0369-0381
Zhang Yanming, Chen Kejiang, Ding Jinyang, Zhang Weiming, Yu Nenghai. 2024. RoCC: robust covert communication based on cross-modal information retrieval. Journal of Image and Graphics, 29(02):0369-0381
目的
2
隐蔽通信是信息安全领域的一个重要研究方向,现有基于多媒体数据流构建隐蔽信道的方法,未考虑网络传输时波动产生的数据包丢失问题。本文提出一种基于跨数据模态信息检索技术的对网络异常具有鲁棒性的隐蔽通信方法,同时可以满足高隐蔽性和高安全性的要求。
方法
2
提出了一个名为RoCC(robust covert communication)的通用隐蔽通信框架,它基于跨模态信息检索和可证明安全的隐写技术。所提方法将直接通信和间接通信两种形式相结合。直接通信通过VoIP(voice over internet protocol)网络通话服务进行,传递实时生成的音频流数据,接收方可以通过语音识别将其还原为文本;而间接通信则借助公共网络数据库进行载密数据的传输,接收方通过文本语义相似度匹配的方式来还原完整语义的载密文本数据,这有助于解决网络数据包丢失和语音识别误差导致的文本语义丢失的问题。
结果
2
经实验测试,本文方法在协议上具有更好的通用性,相对Saenger方法在丢包率抵抗能力方面提高了5%,所用隐写算法满足可证安全性。同时,RoCC的数据传输率有73~136 bps(bit per second),能够满足实时通信需要。
结论
2
RoCC隐蔽通信框架综合可证明安全隐写、生成式机器学习方法和跨模态检索方法的优势,与现有的方法比较,具有更加隐蔽和安全的优势,并且是当前对数据传输丢包异常最鲁棒的模型。
Objective
2
Covert communication is a pivotal research area in the field of information security. A highly covert and secure covert channel for transmitting sensitive information must be developed to safeguard the privacy of communication users and prevent occurrences of eavesdropping on confidential data transmissions. Most methods build covert channels by tunneling multimedia streams. However, the problem of packet loss caused by fluctuations in network transmission is not considered. This study proposes a covert communication method that is robust to network anomalies and is based on cross-modal information retrieval and provably secure steganography.
Method
2
We propose a general covert communication framework named robust covert communication (RoCC), which is based on cross-modal information retrieval and provably secure steganography. Artificially generated information from artificial intelligence (AI) systems, including deep synthesis models, AI-driven artwork, intelligent voice assistants, and conversational chatbots, has emerged. These AI models can synthesize multimodal data, such as videos, images, audio, and text. The practical application of provably secure steganography has become a reality as generative models make significant strides. Thus, we introduce generative models and provably secure steganography techniques into our framework, embedding secret messages within the cover text data. Furthermore, the domain of speech synthesis and recognition has witnessed the advent of numerous mature open-source models, facilitating seamless cross-modal conversion between speech and text. Our approach employs a combination of direct and indirect communication. In direct communication using voice over internet protocol (VoIP) network call service, real-time synthesized audio stream data are delivered, and the receiver can restore the text through voice recognition. Indirect communication uses a public network database for steganographic text data transmission. The receiver restores lost text semantics because of network packet loss and speech recognition errors via text semantic similarity matching. The entire communication process can be succinctly described as follows. Assuming that the sender of confidential data is Alice and the recipient is Bob, Alice and Bob share the same generative model and parameter settings for provably secure steganography. Alice embeds the confidential data into the generated text data using provably secure steganography techniques and publishes it on a publicly accessible and searchable network database. The only means of direct communication between the two parties is through VoIP network voice calls. Thus, the potential loss of network data packets is acknowledged. On the basis of the preserved semantic information, Bob performs cross-modal information retrieval from the public database and successfully locates the corresponding steganographic text data within the cover text. Subsequently, Bob recovers the confidential data from the steganographic texts by using the same generative model and parameter settings for steganography.
Result
2
The results of speech recognition experiments indicate that speech recognition often leads to semantic loss issues. The sentence error rate of the best model, standing at a mere 0.612 5, fails to meet the text recovery capability required for constructing covert channels through direct cross-modal transformations. Text similarity analysis experiments indicate that the best model can achieve a recall metric of 1.0, thereby theoretically enabling complete semantic information restoration. The experiment on combating network packet loss shows that RoCC achieves an impressive information recovery rate of 0.992 1 when the packet loss rate is 10% with a
K
value of 2. This finding demonstrates the exceptional resilience of RoCC to network anomalies and establishes it as the current state-of-the-art solution. In the experiment on real-time performance, we validate the high efficiency of the RoCC system in various components, such as speech synthesis and recognition, secure steganographic encoding and decoding, and text semantic similarity analysis. These results demonstrate the ability of RoCC to meet the real-time requirements of covert channel communication. In comparative experiments, RoCC is compared with eight representative methods. The results show that RoCC has outstanding advantages in terms of protocol versatility, robustness, and data steganography as provable security. Compared with the current robust model, RoCC shows increased resistance to packet loss rate by 5% in the antinetwork packet loss experiment.
Conclusion
2
The covert communication framework proposed in this study combines the advantages of provably secure steganography, generative machine learning methods, and cross-modal retrieval methods, making the covert communication process increasingly stealthy and secure. We also implement the first method of using semantic similarity to restore data communication lost due to an abnormal transmission process. After experimental verification, our framework meets the requirements of real-time communication in terms of performance, and the real-time transmission rate reaches 73~136 bps.
信息隐藏隐蔽通信生成式模型数据跨模态转换可证明安全隐写多媒体信息检索相似度分析
information hidingcovert communicationgenerative modeldata cross-modal conversionprovable security steganographymultimedia information retrievalsimilarity analysis
Ao J Y, Wang R, Zhou L, Wang C Y, Ren S, Wu Y, Liu S J, Ko T, Li Q, Zhang Y, Wei Z H, Qian Y, Li J Y and Wei F R. 2022. Speecht5: unified-modal encoder-decoder pre-training for spoken language processing[EB/OL].[2023-07-25]. https://arxiv.org/pdf/2110.07205.pdfhttps://arxiv.org/pdf/2110.07205.pdf
Barradas D, Santos N, Rodrigues L and Nunes V. 2020. Poking a hole in the wall: efficient censorship-resistant internet communications by parasitizing on WebRTC//Proceedings of 2020 ACM SIGSAC Conference on Computer and Communications Security. Virtual Event, USA: ACM: 35-48 [DOI: 10.1145/3372297.3417874http://dx.doi.org/10.1145/3372297.3417874]
Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I and Amodei D. 2020. Language models are few-shot learners//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 1877-1901
Chen K J, Zhou H, Zhao H Q, Chen D D, Zhang W M and Yu N H. 2022. Distribution-preserving steganography based on text-to-speech generative models. IEEE Transactions on Dependable and Secure Computing, 19(5): 3343-3356 [DOI: 10.1109/TDSC.2021.3095072http://dx.doi.org/10.1109/TDSC.2021.3095072]
Devlin J, Chang M W, Lee K and Toutanova K. 2019. Bert: pre-training of deep bidirectional transformers for language understanding [EB/OL]. [2023-07-25]. https://arxiv.org/pdf1810.04805.pdfhttps://arxiv.org/pdf1810.04805.pdf
Ding J Y, Chen K J, Wang Y F, Zhao N, Zhang W M and Yu N H. 2023. Discop: provably secure steganography in practice based on “Distribution Copies”//2023 IEEE Symposium on Security and Privacy (SP). San Francisco, USA: IEEE: 2238-2255 [DOI: 10.1109/SP46215.2023.10179287http://dx.doi.org/10.1109/SP46215.2023.10179287]
Figueira G, Barradas D and Santos N. 2022. Stegozoa: enhancing WebRTC covert channels with video steganography for internet censorship circumvention//Proceedings of 2022 ACM on Asia Conference on Computer and Communications Security. Nagasaki, Japan: ACM: 1154-1167 [DOI: 10.1145/3488932.3517419http://dx.doi.org/10.1145/3488932.3517419]
Gao Z F, Zhang S L, Lei M and McLoughlin I. 2020. Universal ASR: unifying streaming and non-streaming ASR using a single encoder-decoder model [EB/OL]. [2023-07-25]. https://arxiv.org/pdf/2010.14099.pdfhttps://arxiv.org/pdf/2010.14099.pdf
Gao Z F, Zhang S L, McLoughlin I and Yan Z J. 2023. Paraformer: fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition [EB/OL]. [2023-07-25]. https://arxiv.org/pdf/2206.08317.pdfhttps://arxiv.org/pdf/2206.08317.pdf
Hopper N J, Langford J and Von Ahn L. 2002. Provably secure steganography//Proceedings of the 22nd Annual International Cryptology Conference Santa Barbara. California, USA: Springer: 77-92 [DOI: 10.1007/3-540-45708-9_6http://dx.doi.org/10.1007/3-540-45708-9_6]
Houmansadr A, Riedl T J, Borisov N and Singer A C. 2013. I want my voice to be heard: IP over Voice-over-IP for unobservable censorship circumvention//20th Annual Network and Distributed System Security Symposium. San Diego, USA: The Internet Society: 861-878
Kaptchuk G, Jois T M, Green M and Rubin A D. 2021. Meteor: cryptographically secure steganography for realistic distributions//Proceedings of 2021 ACM SIGSAC Conference on Computer and Communications Security. Virtual Event, Republic of Korea: ACM: 1529-1548 [DOI: 10.1145/3460120.3484550http://dx.doi.org/10.1145/3460120.3484550]
Kerckhoffs A. 1883. La cryptographie militaire. Journal des Sciences Militaires, IX: 5-38
Kohls K, Holz T, Kolossa D and Pöpper C. 2016. SkypeLine: robust hidden data transmission for VoIP//Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. Xi’an, China: ACM: 877-888 [DOI: 10.1145/2897845.2897913http://dx.doi.org/10.1145/2897845.2897913]
Lang R L, Xia Y, Zhi Y and Dai G Z. 2004. Analysis and evaluation of several typical steganalysis algorithms. Journal of Image and Graphics, 9(2): 249-256
郎荣玲, 夏煜, 郅艳, 戴冠中. 2004. 几类典型隐写术分析算法的分析与评价. 中国图象图形学报, 9(2): 249-256 [DOI: 10.3969/j.issn.1006-8961.2004.02.023http://dx.doi.org/10.3969/j.issn.1006-8961.2004.02.023]
Li F H, Li C Y, Guo C, Li Z F, Fang L and Guo Y C. 2022. Survey on key technologies of covert channel in ubiquitous network environment. Journal on Communications, 43(4): 186-201
李凤华, 李超洋, 郭超, 李子孚, 房梁, 郭云川. 2022. 泛在网络环境下隐蔽通道关键技术研究综述. 通信学报, 43(4): 186-201 [DOI: 10.11959/j.issn.1000-436x.2022072http://dx.doi.org/10.11959/j.issn.1000-436x.2022072]
Li Y J and Liu B. 2007. A normalized Levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6): 1091-1095 [DOI: 10.1109/TPAMI.2007.1078http://dx.doi.org/10.1109/TPAMI.2007.1078]
McPherson R, Houmansadr A and Shmatikov V. 2016. CovertCast: using live streaming to evade internet censorship. Proceedings on Privacy Enhancing Technologies, 2016(3): 212-225 [DOI: 10.1515/popets-2016-0024http://dx.doi.org/10.1515/popets-2016-0024]
Peng J H, Jiang Y J, Tang S Y and Meziane F. 2021. Security of streaming media communications with logistic map and self-adaptive detection-based steganography. IEEE Transactions on Dependable and Secure Computing, 18(4): 1962-1973 [DOI: 10.1109/TDSC.2019.2946138http://dx.doi.org/10.1109/TDSC.2019.2946138]
Reimers N and Gurevych I. 2019. Sentence-BERT: sentence embeddings using siamese BERT-networks [EB/OL]. [2023-07-25]. https://arxiv.org/pdf/1908.10084.pdfhttps://arxiv.org/pdf/1908.10084.pdf
Ren Y, Ruan Y J, Tan X, Qin T, Zhao S, Zhao Z and Liu T Y. 2019. FastSpeech: fast, robust and controllable text to speech//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 3171-3180
Rosen M B, Parker J and Malozemoff A J. 2021. Balboa: bobbing and weaving around network censorship//The 30th USENIX Security Symposium. Virtual Event: USENIX Association: 3399-3413
Saenger J, Mazurczyk W, Keller J and Caviglione L. 2020. VoIP network covert channels to enhance privacy and information sharing. Future Generation Computer Systems, 111: 96-106 [DOI: 10.1016/j.future.2020.04.032http://dx.doi.org/10.1016/j.future.2020.04.032]
Salton G and Buckley C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5): 513-523 [DOI: 10.1016/0306-4573(88)90021-0http://dx.doi.org/10.1016/0306-4573(88)90021-0]
Tian J, Xiong G, Li Z and Gou G P. 2020. A survey of key technologies for constructing network covert channel. Security and Communication Networks, 2020: #8892896 [DOI: 10.1155/2020/8892896http://dx.doi.org/10.1155/2020/8892896]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6000-6010
Wang Y X, Skerry-Ryan R J, Stanton D, Wu Y H, Weiss R J, Jaitly N, Yang Z H, Xiao Y, Chen Z F, Bengio S, Le Q, Agiomyrgiannakis Y, Clark R and Saurous R A. 2017. Tacotron: towards end-to-end speech synthesis [EB/OL]. [2023-07-25]. https://arxiv.org/pdf/1703.10135.pdfhttps://arxiv.org/pdf/1703.10135.pdf
Zhang W M, Wang H X, Li B, Ren Y Z, Yang Z L, Chen K J, Li W X, Zhang X P and Yu N H. 2022. Overview of steganography on multimedia. Journal of Image and Graphics, 27(6): 1918-1943
张卫明, 王宏霞, 李斌, 任延珍, 杨忠良, 陈可江, 李伟祥, 张新鹏, 俞能海. 2022. 多媒体隐写研究进展. 中国图象图形学报, 27(6): 1918-1943 [DOI: 10.11834/jig.211272http://dx.doi.org/10.11834/jig.211272]
相关作者
相关机构