Research on enterprise scientific and technological achievement management platform based on natural language processing
Author of the article:HAN Guangming1, CHE Jiannv1, GUO Long2,3, HAN Yulin1, WANG Jipeng1
Author's Workplace:1. CNOOC China Ltd., Hainan Branch, Haikou, Hainan, 570100, China; 2. CNOOC South China Sea Oil & Gas Energy Academician Workstation, Haikou, Hainan, 570100, China; 3. Key Laboratory of Deep Sea Deep Formation Energy Engineering of Hainan Province, Haikou, Hainan, 570100, China
Key Words:Natural Language Processing(NLP); Support Vector Machine(SVM); Convolutional Neural Networks(CNN); Word vectorization(Word 2 vec) processing; Swift Object Storage Service(Swift); Enterprise scientific and technological achievement management; Advanced Encryption Standard(AES) algorithm
Abstract:
Enterprise scientific and technological achievements
contain complex data and cover a large amount of sensitive information. The
existing text classification results cannot meet the actual confidentiality
management needs, potentially leading to risks of data leakage or unauthorized
access. To address this, an enterprise scientific and technological achievement
management platform based on natural language processing is designed to solve
the problem of classic keyword retrieval being unable to accurately classify
confidential documents. The platform uses Convolutional Neural Networks(CNN) to
automatically extract text features, with Support Vector Machine(SVM) as the
final classifier, developing a CNN-SVM model.
It uses multiple convolution kernels of different dimensions for convolution
operations, utilizes fully connected layers to receive and process output data
from the attention layers, and applies SVM classifiers to classify scientific
and technological achievement texts. The attachment management module deploys
Swift Object Storage Service(Swift). Finally, the encryption processing of
scientific and technological achievement text data during transmission and
storage is implemented through the Advanced Encryption Standard(AES) encryption
algorithm, thus achieving the design of the enterprise scientific and
technological achievement management platform. In order to verify the
effectiveness of the design platform, a comparative experiment was conducted
with System A and System B. The experiment shows that under data theft attacks
of varying frequencies, the amount of stolen scientific and technological
achievement data does not exceed 1 MB, the retrieval consistency can reach over
90%, and the recall rate for semantic confidentiality inspections after
document classification can reach up to 97%. This indicates that the automatic
document classification effect of the designed platform presented in this paper
is good and can play a role in protecting enterprise intellectual property
rights. The enterprise scientific and technological achievement management
platform designed in this study effectively improves the confidentiality
management level of technology achievement documents by combining NLP
technology and advanced encryption methods. It can prevent data leakage and
illegal access to a large extent, while ensuring the accuracy and effectiveness
of document classification.