Building an AI training dataset optimized for cyber security.

sandslab 2021.09.26

Building an AI training dataset optimized for cyber security.

The 2021 artificial intelligence information security conference 'AIS 2021' online was held successfully on September 16th with 900 people attending, hosted by Daily Secu. There are 1,400 pre-registrants.

At the event, Sands Lab CEO Kim Ki-hong gave a lecture on 'Efforts to create high-quality AI learning datasets for intrusion response and malicious code detection'.

In the field of cyber security, the field in which artificial intelligence technology is being most actively introduced is the field of intrusion detection and malicious code detection. He had time to share the artificial intelligence technologies most commonly used in the field of intrusion and malicious code detection, and how Sands Lab is making efforts to create high-quality datasets used in these technologies.

CEO Kim Ki-hong said, “Datasets are an essential element in developing artificial intelligence and are based on learning data. The more data there is, the higher the accuracy and performance, and the more diverse the feature information, the higher the dimension,” he said. It is more important than anything else to secure a degree of freedom in development based on a rich dataset.”

He continued, “The creation of a data set in the field of intrusion incidents is aimed at constructing original attack data for the real environment, rather than a simple simulation environment.” is about 150 or more, and the labeling aims to provide normal/abnormal actual attack methods and raw data.”

SandsLab announced that it would prepare the incident data set built in this way to be used in various fields, such as an intelligent network threat analysis model for national agencies and a response model for mass device threats of telecommunication companies.

CEO Kim Ki-hong said, “Domestic cyber security artificial intelligence research is still highly dependent on foreign datasets and is vulnerable to detecting actual domestic events and attacks. In order to improve this, we are aiming to build a dataset optimized for the domestic environment and export it overseas as well as domestic use.”

Sands Lab team leader Shin Dae-gyun, who gave a lecture, explained the data set construction in the field of malicious code and said, “The biggest cause of AI project failure is the problem of securing learning data.” Most important, but most research cases are being conducted in a way that uses overseas datasets or collects malicious codes and creates metadata with each method,” he explained.

SandsLab is preparing raw data in the form of executable file types, document types, compressed files, and script types, such as Windows, Linux, Android, and IOS, for high-quality datasets.

In addition, by classifying malicious types based on clear criteria, we intend to provide internationally compatible datasets with malicious types.

In addition, the metadata items provided in the dataset consist of about 250 metadata items in total, and finally, we are preparing to configure the learning dataset with various feature sets.

In addition, SandsLab plans to apply its own technology that matches the functional OP code set to MITER ATT&CK's T-ID to malicious code, learn detection information for each attack technique, and provide metadata that can be used to detect new malicious code.

News

News

News