K-Sign Consortium completes construction of 'Cyber security AI training dataset'
K-Sign (CEO Choi Seung-rak) announced on the 11th that it has completed the construction of 400 million cybersecurity AI training datasets with Sands Lab and East Security.
The security threat paradigm is rapidly changing due to COVID-19, and the work environment of companies and organizations is being conducted in a non-face-to-face manner. As cyber attacks become more intelligent and rapidly increase, the importance of security is emerging as a more important issue than ever.
The Korea Internet & Security Agency (KISA) carried out the 'Cyber Security AI Dataset Construction Project' as part of the K-Cyber Defense Promotion Strategy of the Ministry of Science and ICT last year. This task is to create a virtuous cycle environment for building cyber security AI datasets through cooperation between public and private experts in the field of cyber breach response, and to prepare a foundation for preemptively responding to rapidly increasing new and variant security threats by making domestic security technologies intelligent. to aim
The Ksign consortium classified more than 300 million normal and malicious files and more than 300 types of malware families in the project, and built a cybersecurity artificial intelligence (AI) dataset (malware field) that can be used immediately. It drew attention from the industry by establishing an optimal dataset and preparing a transfer and verification system with professional know-how on malware AI feature information extraction and cloud-based peta-class large-capacity dataset transfer methodology.
The KSign Consortium extracted 300 million representative malicious codes from the 2 billion malicious code analysis data analyzed by Malwares.com operated by Sands Lab, and classified them into 300 families based on the characteristic information of the malicious codes. .
The built dataset includes a total of 150 types of meta information and raw data. In addition, 100 million cases of malicious code analysis data were prepared by conducting a high-level correlation analysis on malicious code attribute information such as attack groups, attack techniques, and distribution methods. The technology was recognized for its excellence by generating in-depth information that could not be derived only from static-dynamic analysis as attributes, performing similarity analysis, and constructing a dataset based on the results of clustering.
And the built dataset was demonstrated through various AI models from multiple organizations. In addition, we sought to improve the quality of the dataset by consulting with malicious code experts and 10 experts in each field to verify quality.
Shin Dae-kyun, K-sign project manager in charge of this project, said, “It is difficult for small and medium-sized businesses to extract metadata related to malicious codes due to lack of know-how and resources.” It is meaningful to have successfully built a core dataset related to malware that can be used throughout the industry.”
Ksign plans to use the built dataset even after the end of the business for research and development of core artificial intelligence technologies, while supporting it so that it can be used as an important base dataset necessary to respond to cyber security threats.
Meanwhile, the 'Cyber Security AI Learning Dataset' will be opened to the private sector through the Cyber Security Big Data Center of the Korea Internet & Security Agency. It includes various meta data extracted from malicious codes such as images and gram data that even non-expert groups can create and test artificial intelligence models, and the latest datasets tailored to global trends such as MITER ATT&CK T-ID mapping.