基于自然语言处理的企业科技成果管理平台研究
Research on enterprise scientific and technological achievement management platform based on natural language processing
浏览(3435) 下载(5)
- 引用格式:
-
韩光明,车坚女,郭龙,韩玉林,王继鹏.基于自然语言处理的企业科技成果管理平台研究[J].天然气与石油,2025,43(1):43-50.doi:10.3969/j.issn.1006-5539.2025.01.006
HAN Guangming, CHE Jiannv, GUO Long, HAN Yulin, WANG Jipeng.Research on enterprise scientific and technological achievement management platform based on natural language processing[J].Natural Gas and Oil,2025,43(1):43-50.doi:10.3969/j.issn.1006-5539.2025.01.006
- DOI:
- 10.3969/j.issn.1006-5539.2025.01.006
- 作者:
- 韩光明1 车坚女1 郭龙2,3 韩玉林1 王继鹏1
HAN Guangming1, CHE Jiannv1, GUO Long2,3, HAN Yulin1, WANG Jipeng1
- 作者单位:
- 1. 中海石油(中国)有限公司海南分公司, 海南 海口 570100; 2. 中国海油南海油气能源院士工作站, 海南 海口 570100; 3. 海南省深海深层能源工程重点实验室, 海南 海口 570100
1. CNOOC China Ltd., Hainan Branch, Haikou, Hainan, 570100, China; 2. CNOOC South China Sea Oil & Gas Energy Academician Workstation, Haikou, Hainan, 570100, China; 3. Key Laboratory of Deep Sea Deep Formation Energy Engineering of Hainan Province, Haikou, Hainan, 570100, China
- 关键词:
- NLP;SVM;CNN;词语向量化处理;Swift;企业科技成果管理;AES算法
Natural Language Processing(NLP); Support Vector Machine(SVM); Convolutional Neural Networks(CNN); Word vectorization(Word 2 vec) processing; Swift Object Storage Service(Swift); Enterprise scientific and technological achievement management; Advanced Encryption Standard(AES) algorithm
- 摘要:
企业科技成果包含数据较为复杂,并涵盖较多敏感数据,现有文本分类结果不能满足实际的保密管理需求,可能存在数据泄露或非法访问的风险。为此,设计基于自然语言处理(Natural Language Processing,NLP)的企业科技成果管理平台,以解决关键字检索不能对保密文本进行准确分类的经典问题。使用卷积神经网络(Convolutional Neural Networks,CNN)自动提取文本特征,并用支持向量机(Support Vector Machine,SVM)作为最终的分类器,构建CNN-SVM模型;采用多种不同维度的卷积核进行卷积运算,利用全连接层接收并处理来自注意力层的输出数据,采用SVM分类器对科技成果文本进行分类;通过附件管理模块实现对象存储服务(Swift Object Storage Service,Swift)部署;通过高级加密标准(Advanced Encryption Standard,AES)算法实施科技成果文本数据在传输和存储过程中的加密处理,实现企业科技成果管理平台设计。为了验证设计平台的有效性,将系统A、系统B进行对比实验,表明不同频率的数据窃取攻击下,被窃取科技成果数据不超过1 MB,检索一致性超过90%,对文档进行分类后语义涉密检查的召回率最高可达97%,说明设计平台的文档自动分类效果较好,能够对保护企业知识产权起一定作用。研究设计的企业科技成果管理平台,通过结合NLP技术和先进的加密手段,有效提升了科技成果文本的保密管理水平,能够在很大程度上防止数据泄露和非法访问,同时保证了文档分类的准确性和效率。
Enterprise scientific and technological achievements contain complex data and cover a large amount of sensitive information. The existing text classification results cannot meet the actual confidentiality management needs, potentially leading to risks of data leakage or unauthorized access. To address this, an enterprise scientific and technological achievement management platform based on natural language processing is designed to solve the problem of classic keyword retrieval being unable to accurately classify confidential documents. The platform uses Convolutional Neural Networks(CNN) to automatically extract text features, with Support Vector Machine(SVM) as the final classifier, developing a CNN-SVM model. It uses multiple convolution kernels of different dimensions for convolution operations, utilizes fully connected layers to receive and process output data from the attention layers, and applies SVM classifiers to classify scientific and technological achievement texts. The attachment management module deploys Swift Object Storage Service(Swift). Finally, the encryption processing of scientific and technological achievement text data during transmission and storage is implemented through the Advanced Encryption Standard(AES) encryption algorithm, thus achieving the design of the enterprise scientific and technological achievement management platform. In order to verify the effectiveness of the design platform, a comparative experiment was conducted with System A and System B. The experiment shows that under data theft attacks of varying frequencies, the amount of stolen scientific and technological achievement data does not exceed 1 MB, the retrieval consistency can reach over 90%, and the recall rate for semantic confidentiality inspections after document classification can reach up to 97%. This indicates that the automatic document classification effect of the designed platform presented in this paper is good and can play a role in protecting enterprise intellectual property rights. The enterprise scientific and technological achievement management platform designed in this study effectively improves the confidentiality management level of technology achievement documents by combining NLP technology and advanced encryption methods. It can prevent data leakage and illegal access to a large extent, while ensuring the accuracy and effectiveness of document classification.