RU2745371C1

RU2745371C1 - Method and a system for prediction of cyber security risks during the development of software products

Info

Publication number: RU2745371C1
Application number: RU2020131501A
Authority: RU
Inventors: Дмитрий Сергеевич Кудияров; Виталий Оттович Биферт; Елена Анатольевна Демьянова; Геннадий Геннадьевич Глотов; Александр Артурович Анистратенко
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2021-03-24
Also published as: EA202092897A1; WO2022066038A1

Abstract

FIELD: computing.SUBSTANCE: invention relates to computing. Disclosed is a computer-implemented method for predicting cybersecurity risks in the development of software products, in which data is obtained containing information about teams of software developers and software products being developed; processing the received data using a machine learning model trained on the basis of expert data on cybersecurity, in the course of which the following is carried out: division of the received data into categorical and numerical variables; processing the obtained variables, performing vectorization of categorical variables and normalization of numerical variables; concatenation of processed variables and building a vector based on them; assessment using the vector of the degree of occurrence of cybersecurity risks for each software product, and the classification of development teams with the assignment of the degree of probability of occurrence of the cybersecurity risk based on the assessment of the developed software products.EFFECT: invention improves speed and accuracy of predicting cybersecurity risks and classifying agile teams according to the degree of cybersecurity requirements fulfillment.7 cl, 4 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

[0001] Заявленное техническое решение в общем относится к вычислительной области техники, а в частности к автоматизированному способу и системе прогнозирования рисков кибербезопасности при разработке программных продуктов с помощью алгоритмов машинного обучения.[0001] The claimed technical solution generally relates to the computing field, and in particular to an automated method and system for predicting cybersecurity risks when developing software products using machine learning algorithms.

УРОВЕНЬ ТЕХНИКИLEVEL OF TECHNOLOGY

[0002] Разработка программного обеспечения для крупных финансовых организаций (банков) всегда трудоемкий и кропотливый труд. Кроме того, при разработке программного продукта необходимо учесть все риски возникновения дефектов кибербезопасности. Для данных проверок привлекаются эксперты по кибербезопасности. которые вручную проверяют наличие критических дефектов в разрабатываемом программном продукте. Необходимая численность экспертов кибербезопасности, работающих с командами разработчиков (например, Agile-командами), разрабатывающими банковские продукты, зависит линейно от числа таких команд. В условиях ограничения штатной численности экспертов кибербезопасности и роста числа Agile-команд становится невозможным качественное и полное участие экспертов кибербезопасности в работе всех Agile-команд.[0002] Software development for large financial institutions (banks) is always laborious and painstaking work. In addition, when developing a software product, it is necessary to take into account all the risks of cybersecurity defects. Cybersecurity experts are involved in these audits. which manually check for critical defects in the developed software product. The required number of cybersecurity experts working with development teams (for example, Agile teams) developing banking products depends linearly on the number of such teams. With the limited staffing of cybersecurity experts and the growing number of Agile teams, it becomes impossible for cybersecurity experts to fully and qualitatively participate in the work of all Agile teams.

[0003] Работа опытных в вопросах кибербезопасности Agile-команд, разрабатывающих программные продукты, замедляется от участия экспертов кибербезопасности, хотя такое участие не требуется. При этом у эксперта кибербезопасности отсутствует возможность мотивированно оценить, является ли команда опытной, что влечет необходимость участвовать в их работе и усложняет взаимозаменяемость экспертов кибербезопасности.[0003] The work of cybersecurity-savvy Agile software development teams is slowed by the involvement of cybersecurity experts, although such involvement is not required. At the same time, the cybersecurity expert does not have the opportunity to motivatedly assess whether the team is experienced, which entails the need to participate in their work and complicates the interchangeability of cybersecurity experts.

[0004] Работа неопытных в вопросах кибербезопасности Agile-команд, разрабатывающих программные продукты, требует более плотного участия экспертов кибербезопасности, так как невнимание к вопросам кибербезопасности на ранних этапах разработки продукта влечет необходимость вносить корректировки в программный код по кибербезопасности на поздних этапах или запреты на внедрение в промышленную эксплуатацию со стороны эксперта кибербезопасности на приемо-сдаточных испытаниях. Это влечет увеличение времени разработки программных продуктов, повышение рисков кибербезопасности (в результате неорганичности функций по кибербезопасности и их несоответствия архитектуре приложения), демотивацию членов Agile-команды и репутационные потери для экспертов и функции кибербезопасности в целом.[0004] The work of inexperienced cybersecurity Agile teams developing software products requires a closer involvement of cybersecurity experts, as lack of attention to cybersecurity issues in the early stages of product development entails the need to make adjustments to the cybersecurity code at a later stage or prohibit implementation into industrial operation by a cybersecurity expert during acceptance tests. This entails an increase in the development time of software products, an increase in cybersecurity risks (as a result of the inorganic nature of cybersecurity functions and their inconsistency with the application architecture), demotivation of Agile team members and reputational losses for experts and the cybersecurity function in general.

[0005] Из уровня техники известен патент US 8631384 В2 "Creating a test progression plan", патентообладатель: IBM, опубликовано: 01.12.2011. В данном решении описывается автоматизированный процесс составления планов тестирования программных продуктов. Известное решение обеспечивает автоматическое создание плана выполнения теста программного обеспечения путем вычисления для каждой единицы периода тестирования х усилий по выполнению тестовых блоков АТТх и усилий по завершению выполнения тестового блока ССх. В вычислении вводятся три переменные, характеризующие стратегию тестирования: эффективность, которая представляет эффективность группы тестирования, коэффициент плотности дефектов и значение коэффициента проверки. Выбирая стратегию тестирования, менеджер тестов определяет значения трех переменных, которые влияют на план развития. Во время выполнения теста кумулятивная кривая «попытка» значений АТТх и кумулятивная кривая «завершение» значений ССх позволяют менеджеру тестирования сравнить уже предпринятые усилия с ожидаемыми усилиями, предпринятыми для испытательных блоков, которые были предприняты и для испытательных единиц, которые были закончены, то есть, когда дефекты, найденные в коде, были исправлены.[0005] From the prior art patent US 8631384 B2 "Creating a test progression plan" is known, patentee: IBM, published: 01.12.2011. This solution describes an automated process for generating software test plans. The known solution provides automatic creation of a software test execution plan by calculating for each unit of the testing period x the efforts to execute the ATTx test blocks and the efforts to complete the execution of the CCx test block. The calculation introduces three variables to characterize the testing strategy: performance, which represents the performance of the test team, the defect density ratio, and the validation ratio value. When choosing a test strategy, the test manager determines the values of three variables that affect the development plan. During test execution, the cumulative “try” curve of ATTx values and the cumulative curve “completion” of CCx values allow the test manager to compare the efforts already made with the expected efforts made for test blocks that have been made and for test items that have been completed, that is, when defects found in the code have been fixed.

[0006] Недостатком известного решения в данной области техники является отсутствие возможности автоматизированного прогнозирования риска кибербезопасности и классификации Agile-команд по степени выполнения требований по кибербезопасности при разработке программных продуктов.[0006] A disadvantage of the known solution in the art is the lack of the ability to automatically predict cybersecurity risk and classify Agile teams according to the degree of fulfillment of cybersecurity requirements in software development.

РАСКРЫТИЕ ИЗОБРЕТЕНИЯDISCLOSURE OF THE INVENTION

[0007] В заявленном техническом решении предлагается новый подход, к прогнозированию рисков кибербезопасности и классификации agile-команд по степени выполнения требований по кибербезопасности при разработке программных продуктов. В данном решении используется алгоритм машинного обучения, который позволяет автоматизировать процесс прогнозирования рисков кибербезопасности и классификации agile-команд по степени выполнения требований по кибербезопасности.[0007] The claimed technical solution proposes a new approach to predicting cybersecurity risks and classifying agile teams according to the degree of fulfillment of cybersecurity requirements in software development. This solution uses a machine learning algorithm that automates the process of predicting cybersecurity risks and classifying agile teams according to the degree of fulfillment of cybersecurity requirements.

[0008] Таким образом, решается техническая проблема автоматизированного прогнозирования рисков кибербезопасности и классификации agile-команд по степени выполнения требований по кибербезопасности.[0008] Thus, the technical problem of automated cybersecurity risk prediction and classification of agile teams according to the degree of fulfillment of cybersecurity requirements is solved.

[0009] Техническим результатом, достигающимся при решении данной проблемы, является повышение скорости и точности прогнозирования рисков кибербезопасности и классификации agile-команд по степени выполнения требований по кибербезопасности.[0009] The technical result achieved in solving this problem is to increase the speed and accuracy of predicting cybersecurity risks and classifying agile teams according to the degree of fulfillment of cybersecurity requirements.

[0010] Указанный технический результат достигается благодаря осуществлению компьютерно-реализуемого способа прогнозирования рисков кибербезопасности при разработке программных продуктов, выполняемого с помощью по меньшей мере одного процессора и содержащего этапы, на которых:[0010] The specified technical result is achieved through the implementation of a computer-implemented method for predicting cybersecurity risks in the development of software products, performed using at least one processor and containing the stages, which are:

- получают данные, содержащие информацию по меньшей мере о командах разработчиков программных продуктов и разрабатываемых программных продуктах, каждой из упомянутых команд;- receive data containing information at least about the teams of software developers and developed software products, each of the mentioned teams;

- осуществляют обработку полученных данных с помощью модели машинного обучения (МО), обученной на основе экспертных данных по кибербезопасности о соблюдении требований кибербезопасности командами при разработке программных продуктов, причем в ходе указанной обработки осуществляется:- carry out the processing of the obtained data using a machine learning (ML) model trained on the basis of expert data on cybersecurity on compliance with cybersecurity requirements by teams in the development of software products, and during this processing, the following is carried out:

- разделение полученных данных на категориальные и численные переменные;- division of the received data into categorical and numerical variables;

- обработка полученных переменных, при которой выполняется векторизация категориальных переменных и нормализация численных переменных;- processing of the obtained variables, in which vectorization of categorical variables and normalization of numerical variables is performed;

- конкатенация обработанных переменных и построение на их основе вектора;- concatenation of processed variables and construction of a vector based on them;

- оценка с помощью упомянутого вектора степени возникновения рисков кибербезопасности для каждого программного продукта, и - классификация команд разработчиков с присвоением степени вероятности наступления риска кибербезопасности на основании выполненной оценки разрабатываемых программных продуктов.- using the aforementioned vector to evaluate the degree of occurrence of cybersecurity risks for each software product, and - classification of development teams with the assignment of the degree of probability of occurrence of cybersecurity risk based on the assessment of the developed software products.

[0011] В одном из частных вариантов реализации способа обработка полученных данных осуществляется с помощью модели машинного обучения на базе классификатора случайного леса (англ. random forest).[0011] In one of the particular embodiments of the method, the data is processed using a machine learning model based on a random forest classifier.

[0012] В другом частном варианте реализации способа классификация команд осуществляется с присвоением высокой, средней и низкой степени вероятности наступления риска кибербезопасности.[0012] In another particular embodiment of the method, the classification of commands is carried out with the assignment of high, medium and low probability of the occurrence of the cybersecurity risk.

[0013] В другом частном варианте реализации способа информация по командам разработчиков, классифицированных со средней и высокой степенью, автоматически отправляются в АРМ экспертов по кибербезопасности, взаимодействующих с командами разработчиков, с отметкой повышенного контроля.[0013] In another particular embodiment of the method, information on development teams classified with a medium and high degree is automatically sent to the automated workstation of cybersecurity experts interacting with the development teams, with an increased control mark.

[0014] В другом частном варианте реализации способа данные о классифицируемых командах содержат по меньшей мере:[0014] In another particular embodiment of the method, the data on the classified commands contain at least:

i. данные о задачах сотрудников команды при разработке программного продукта в системе управления задачами на разработку программных продуктов;i. data on the tasks of the team members during the development of a software product in the task management system for the development of software products;

ii. данные о структуре команды и профессиональных качествах членов команды;ii. data on the structure of the team and the professional qualities of the team members;

iii. данные о коммуникациях между членами команды и экспертами по кибербезопасности при разработке программных продуктов за все время существования команды;iii. data on communications between team members and cybersecurity experts in the development of software products for the entire lifetime of the team;

iv. данные об исходном коде программных продуктов, выпущенных командой.iv. data on the source code of software products released by the team.

[0015] В другом частном варианте реализации способа данные о разрабатываемых программных продуктах содержат по меньшей мере:[0015] In another particular embodiment of the method, the data on the developed software products contain at least:

v. данные о количестве критических дефектов по кибербезопасности, выявленных на приемо-сдаточных испытаниях программных продуктов команды, выпущенных в промышленную эксплуатацию за все время существования команды;v. data on the number of critical cybersecurity defects identified during acceptance tests of the team's software products released into commercial operation over the entire period of the team's existence;

vi. данные о количестве критических дефектов не связанных с кибербезопасностью, выявленных на приемо-сдаточных испытаниях продуктов команды, выпущенных в промышленную эксплуатацию за все время существования команды;vi. data on the number of critical non-cybersecurity defects identified during acceptance tests of the team's products released into commercial operation over the entire period of the team's existence;

vii. данные о тестировании готового программного продукта команды, перед его выпуском в промышленную эксплуатацию;vii. data on testing the finished software product of the team, before its release into industrial operation;

viii. данные о прохождении проверок системой статического и динамического анализа на предмет наличия уязвимостей в готовых программных продуктах команды, перед их выпуском в промышленную эксплуатацию;viii. data on the passage of checks by the static and dynamic analysis system for vulnerabilities in the finished software products of the team, before their release into industrial operation;

ix. данные об обнаруженных после выпуска в промышленную эксплуатацию уязвимостях в программных продуктах команды.ix. data on vulnerabilities in the team's software products discovered after the release into industrial operation.

[0016] Кроме того, заявленный технический результат достигается за счет системы прогнозирования рисков кибербезопасности при разработке программных продуктов, содержащей:[0016] In addition, the claimed technical result is achieved through a system for predicting cybersecurity risks in the development of software products, containing:

- по меньшей мере один процессор;- at least one processor;

- по меньшей мере одну память, соединенную с процессором, которая содержит машиночитаемые инструкции, которые при их выполнении по меньшей мере одним процессором обеспечивают выполнение способа оценки вероятности возникновения критических дефектов по кибербезопасности на приемо-сдаточных испытаниях релизов продуктов.- at least one memory coupled to the processor, which contains machine-readable instructions that, when executed by at least one processor, provide a method for assessing the likelihood of critical cybersecurity defects in acceptance tests of product releases.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF DRAWINGS

[0017] Признаки и преимущества настоящего технического решения станут очевидными из приводимого ниже подробного описания и прилагаемых чертежей.[0017] The features and advantages of the present technical solution will become apparent from the following detailed description and accompanying drawings.

[0018] Фиг. 1 иллюстрирует блок-схему выполнения заявленного способа.[0018] FIG. 1 illustrates a block diagram of the implementation of the claimed method.

[0019] Фиг. 2 иллюстрирует ROC-кривую (кривая ошибок) для классификатора команд, основанного на случайном лесе.[0019] FIG. 2 illustrates the ROC curve (error curve) for a command classifier based on a random forest.

[0020] Фиг. 3 иллюстрирует матрицу ошибок (без нормализации) для классификатора команд, основанного на случайном лесе.[0020] FIG. 3 illustrates an error matrix (without normalization) for a command classifier based on a random forest.

[0021] Фиг. 4 иллюстрирует пример общего вида вычислительной системы, которая обеспечивает реализацию заявленного решения.[0021] FIG. 4 illustrates an example of a general view of a computing system that provides an implementation of the claimed solution.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯCARRYING OUT THE INVENTION

[0022] Ниже будут описаны понятия и термины, необходимые для понимания данного технического решения.[0022] The following will describe the concepts and terms necessary to understand this technical solution.

[0023] Модель в машинном обучении (МО) - совокупность методов искусственного интеллекта, характерной чертой которых является не прямое решение задачи, а обучение в процессе применения решений множества сходных задач.[0023] A model in machine learning (ML) is a set of artificial intelligence methods, a characteristic feature of which is not a direct problem solution, but learning in the process of applying solutions to many similar problems.

[0024] F-1 мера представляет собой совместную оценку точности и полноты.[0024] The F-1 measure is a joint assessment of accuracy and completeness.

[0025] ROC-кривая - графическая характеристика качества бинарного классификатора, отражающая зависимость доли истинно-положительных классификаций от доли ложноположительных классификаций при варьировании порога решающего правила.[0025] ROC-curve is a graphical characteristic of the quality of a binary classifier, reflecting the dependence of the proportion of true-positive classifications on the proportion of false-positive classifications when varying the threshold of the decision rule.

[0026] Матрица ошибок - это способ разбить классифицируемые объекты на четыре категории в зависимости от комбинации реального класса и ответа классификатора.[0026] An error matrix is a way to categorize objects to be classified into four categories based on a combination of the real class and the response of the classifier.

[0027] Коннекторы - программные компоненты, осуществляющие сбор данных от источников информации (Система управления задачами/Система для совместной работы над релизами/Система управления версиями/Система управления проектами/Система управления сервисами предприятия/и др.) и приведение данных к необходимым структуре и формату.[0027] Connectors are software components that collect data from information sources (Task management system / System for collaborative work on releases / Version control system / Project management system / Enterprise service management system / etc.) and bringing data to the required structure and format.

[0028] Хранилище - система для хранения больших объемов собранных и обработанных коннекторами данных, а также генерируемой иными компонентами системы.[0028] Storage - a system for storing large volumes of data collected and processed by connectors, as well as generated by other system components.

[0029] Данное техническое решение может быть реализовано на компьютере, в виде автоматизированной информационной системы (АИС) или машиночитаемого носителя, содержащего инструкции для выполнения вышеупомянутого способа.[0029] This technical solution can be implemented on a computer in the form of an automated information system (AIS) or a computer-readable medium containing instructions for performing the above method.

[0030] Техническое решение может быть реализовано в виде распределенной компьютерной системы.[0030] The technical solution can be implemented as a distributed computer system.

[0031] В данном решении под системой подразумевается компьютерная система, ЭВМ (электронно-вычислительная машина), ЧПУ (числовое программное управление), ПЛК (программируемый логический контроллер), компьютеризированные системы управления и любые другие устройства, способные выполнять заданную, четко определенную последовательность вычислительных операций (действий, инструкций).[0031] In this solution, the system means a computer system, a computer (electronic computer), CNC (numerical control), PLC (programmable logic controller), computerized control systems and any other devices capable of performing a given, well-defined sequence of computing operations (actions, instructions).

[0032] Под устройством обработки команд подразумевается электронный блок либо интегральная схема (микропроцессор), исполняющая машинные инструкции (программы).[0032] By a command processing device is meant an electronic unit or an integrated circuit (microprocessor) that executes machine instructions (programs).

[0033] Устройство обработки команд считывает и выполняет машинные инструкции (программы) с одного или более устройства хранения данных, например таких устройств, как оперативно запоминающие устройства (ОЗУ) и/или постоянные запоминающие устройства (ПЗУ). В качестве ПЗУ могут выступать, но, не ограничиваясь, жесткие диски (HDD), флеш-память, твердотельные накопители (SSD), оптические носители данных (CD, DVD, BD, MD и т.п.) и др.[0033] A command processor reads and executes machine instructions (programs) from one or more storage devices such as random access memory (RAM) and / or read only memory (ROM). The ROM can be, but is not limited to, hard disks (HDD), flash memory, solid state drives (SSD), optical data carriers (CD, DVD, BD, MD, etc.), etc.

[0034] Программа - последовательность инструкций, предназначенных для исполнения устройством управления вычислительной машины или устройством обработки команд.[0034] A program is a sequence of instructions for execution by a computer control device or command processing device.

[0035] Обучение модели МО производится на заранее размеченных данных. Всего на момент создания модели машинного обучения были доступны 142 команды разработчиков, действующие на 01.06.2019. Для оценки качества модели набор данных был разбит на 2 части: тренировочную (92 команды) и контрольную выборки (50 команд). Команды из тренировочной выборки были классифицированы путем опроса экспертов по кибербезопасности, работающих (или работавших) с данными командами, причем каждая команда была классифицирована более 1 раза различными экспертами.[0035] The training of the MO model is performed on pre-labeled data. In total, at the time of the creation of the machine learning model, 142 development teams were available, acting as of 06/01/2019. To assess the quality of the model, the dataset was divided into 2 parts: training (92 teams) and control samples (50 teams). Teams from the training sample were classified by interviewing cybersecurity experts working (or working) with these teams, with each team being classified more than 1 time by different experts.

[0036] В случае несовпадения мнений экспертов выбиралось мнение простого большинства экспертов. В случае совпадения числа экспертов с противоположными мнениями классификация осуществлялась в соответствии с мнением эксперта, работавшего с командой в момент опроса.[0036] In case of discrepancy between the opinions of experts, the opinion of a simple majority of experts was chosen. If the number of experts coincided with opposite opinions, the classification was carried out in accordance with the opinion of the expert who worked with the team at the time of the survey.

[0037] Взвешенная f-1 мера для классификатора составляет около 0.62, точность -около 0.63.[0037] The weighted f-1 measure for the classifier is about 0.62, the accuracy is about 0.63.

[0038] На Фиг. 2 приведена ROC-кривая (кривая ошибок) для классификатора команд, основанного на случайном лесе.[0038] FIG. 2 shows the ROC curve (error curve) for the team classifier based on a random forest.

[0039] На Фиг. 3 приведена матрица ошибок (без нормализации) для классификатора команд, основанного на случайном лесе.[0039] FIG. 3 shows a matrix of errors (without normalization) for a command classifier based on a random forest.

[0040] Коннекторы получают необходимую информацию источников (путем загрузки файлов, запросов в БД, к API, анализа web-страниц, чтения журналов событий и т.п.), сохранить ее в хранилище.[0040] Connectors receive the necessary information from the sources (by downloading files, queries to the database, to the API, analyzing web pages, reading event logs, etc.), save it to the storage.

[0041] Коннекторы выделяют из загруженных данных значимые параметры для дальнейших вычислений, выполняют их предобработку и формируют в хранилище таблицу значений указанных параметров (или признаков) по командам со следующими столбцами:[0041] Connectors extract significant parameters from the loaded data for further calculations, perform their preprocessing and form in the storage a table of values of the specified parameters (or attributes) according to commands with the following columns:

Число обращений типа Bug у команды;The number of Bug hits for the team;

Число обращений типа Feature у команды;The number of Feature calls for the command;

Число обращений с приоритетом minor у команды;The number of hits with minor priority for the team;

Число обращений с приоритетом major у команды;The number of calls with the major priority for the team;

Число обращений с приоритетом critical у команды;The number of calls with priority critical for the command;

Число коммуникаций между членами команды и экспертом кибербезопасности;The number of communications between team members and the cybersecurity expert;

Среднее время от создания обращения типа Release до отметки его как решенного;Average time from creating a Release case to marking it as resolved;

Среднее время от создания обращения типа Feature до отметки его как решенного;Average time from creating a Feature case to marking it as resolved;

Среднее время от создания обращения типа Bug до отметки его как решенного;Average time from creating a bug of type Bug to marking it as resolved;

Число выпущенных командой релизов.The number of releases released by the team.

[0042] Алгоритм машинного обучения на основе содержащихся в таблице значений параметров команд осуществляет маркировку присутствующих в ней команд на соблюдающие в высокой, средней и низкой степени требования кибербезопасности. Результаты маркировки сохраняются в виде таблицы в хранилище[0042] The machine learning algorithm, based on the command parameter values contained in the table, marks the commands present in the table for high, medium and low compliance with cybersecurity requirements. The marking results are saved as a table in the repository

[0043] Как показано на Фиг. 1 заявленный способ прогнозирования рисков кибербезопасности при разработке программных продуктов (100) состоит из нескольких этапов, выполняемых по меньшей мере одним процессором.[0043] As shown in FIG. 1, the claimed method for predicting cybersecurity risks in the development of software products (100) consists of several stages performed by at least one processor.

[0044] На этапе (101) на вход модели машинного обучения подаются данные, содержащие информацию, по меньшей мере, о командах разработчиков программных продуктов и разрабатываемых программных продуктах, каждой из упомянутых команд. Также данные могут содержать информацию о:[0044] At step (101), the input of the machine learning model is data containing information about at least the teams of software developers and the developed software products, each of these instructions. Also, the data can contain information about:

- заявках сотрудников agile-команд в системе управления задачами на разработку: номер, тип, важность, статус, время создания, время взятия в работу, время отметки как решенного, список зависимых заявок и типы зависимостей, список зависящих заявок и типы зависимостей, ответственная agile-команда, ответственный член команды- tickets from agile teams in the development task management system: number, type, importance, status, time of creation, time of commissioning, time of marking as resolved, list of dependent tickets and types of dependencies, list of dependent tickets and types of dependencies, responsible agile -team, responsible team member

- количестве критических дефектов по кибербезопасности, выявленных на приемо-сдаточных испытаниях всех предыдущих релизов программных продуктов agile-команд- the number of critical cybersecurity defects identified during acceptance tests of all previous releases of software products of agile teams

- количестве критических дефектов не по кибербезопасности, выявленных на приемо-сдаточных испытаниях всех предыдущих релизов программных продуктов agile-команд- the number of critical non-cybersecurity defects identified in acceptance tests of all previous releases of software products of agile teams

- данных об agile-командах и их членах: команды, сотрудники-члены команд, их роли в команде, должности, пройденное обучение, сданные экзамены и их результаты, данные о предшествующих переходах сотрудников между agile-командам и изменение должностей, перечень релизов, над которыми работали сотрудники- data about agile teams and their members: teams, team members, their roles in the team, positions, training completed, exams passed and their results, data on previous employee transitions between agile teams and changes in positions, a list of releases, above which employees worked

- данных о документации на релизы продуктов agile-команд: ее объем и иерархия страниц, число попыток и даты ее согласовании экспертами кибербезопасности и иными сотрудниками- data on documentation for product releases of agile teams: its volume and hierarchy of pages, the number of attempts and the dates of its approval by cybersecurity experts and other employees

- коммуникациях между членами agile-команд и экспертами по кибербезопасности за периоды существования продуктов: дата, участники, длительность звонков (из системы телефонии), встреч (из корпоративных календарей), видеоконференций (из системы управления видеоконференцсвязью); дата и участники электронной переписки (из корпоративной почтовой системы и корпоративной системы обмена мгновенными сообщениями)- communications between members of agile teams and cybersecurity experts over the periods of product existence: date, participants, duration of calls (from the telephony system), meetings (from corporate calendars), video conferencing (from the video conferencing management system); date and participants of email correspondence (from corporate mail system and corporate instant messaging system)

- данных об исходном коде релизов продуктов agile-команд: использованные языки, количество модулей, объем кода, количество функций, методов, классов, переменных, файлов- data on the source code of product releases of agile teams: languages used, number of modules, amount of code, number of functions, methods, classes, variables, files

- данных о кодировании релизов продуктов agile-команд: число попыток сборки, количество возникавших ошибок и предупреждений при попытках сборки, объем кода, отправляемого на сборку, количество функций, методов, классов, переменных, файлов- data on the coding of product releases of agile teams: the number of build attempts, the number of errors and warnings that occurred when trying to build, the amount of code sent to the build, the number of functions, methods, classes, variables, files

- данных о тестировании релизов продуктов agile-команд: число попыток прохождения автотестов, нагрузочного и функционального тестирования, объем кода, отправляемого на тестирование, количество функций, методов, классов, переменных, файлов- data on testing product releases of agile teams: the number of attempts to pass autotests, load and functional testing, the amount of code sent for testing, the number of functions, methods, classes, variables, files

- данных о прохождении проверок системой статического и динамического анализа на предмет наличия уязвимостей в релизах продуктов agile-команд: число и типы обнаруженных уязвимостей, результаты их отметки разработчиками релизов в системе как true-positive/false-positive, объемы кода, отправляемого на сборку, количество функций, методов, классов, переменных, файлов- data on the passage of checks by the static and dynamic analysis system for vulnerabilities in product releases of agile teams: the number and types of vulnerabilities discovered, the results of their marking by release developers in the system as true-positive / false-positive, the amount of code sent for assembly, number of functions, methods, classes, variables, files

- данных об обнаруженных после вывода в промышленную эксплуатацию уязвимостях в программных продуктах agile-команд: номер релиза, дата обнаружения, создавший уязвимый код разработчик, тип уязвимости, критичность уязвимости, кто обнаружил уязвимость.- data on vulnerabilities found in software products of agile teams after the release into industrial operation: release number, date of detection, the developer who created the vulnerable code, type of vulnerability, severity of the vulnerability, who discovered the vulnerability.

[0045] Далее на этапе (102) осуществляется обработка полученных данных с помощью модели машинного обучения (МО), например, но не ограничиваясь, с помощью алгоритма машинного обучения на базе классификатора случайного леса (random forest).[0045] Next, at step (102), the received data is processed using a machine learning (ML) model, for example, but not limited to, using a machine learning algorithm based on a random forest classifier.

[0046] В ходе обработки алгоритм машинного обучения выполняет:[0046] During processing, the machine learning algorithm performs:

- на этапе (103) разделение полученных данных на категориальные и численные переменные;- at step (103) dividing the received data into categorical and numerical variables;

- на этапе (104) обработку полученных переменных, при которой выполняется векторизация категориальных переменных и нормализация численных переменных;- at step (104) processing of the obtained variables, in which vectorization of categorical variables and normalization of numerical variables is performed;

- на этапе (105) конкатенацию обработанных переменных и построение на их основе вектора;- at step (105), the concatenation of the processed variables and the construction of a vector based on them;

- на этапе (106) оценку с помощью упомянутого вектора степени возникновения рисков кибербезопасности для каждого программного продукта, и- at step (106), using the above-mentioned vector, the degree of occurrence of cybersecurity risks for each software product is assessed, and

- на этапе (107) классификацию команд разработчиков с присвоением степени вероятности наступления риска кибербезопасности на основании выполненной оценки разрабатываемых программных продуктов. Вектору сопоставляется численная оценка (от 0 до 1) вероятности указанного риска. Дальше численной оценке сопоставляется качественная оценка риска, в зависимости от того, в каком диапазоне оказалась численная оценка: от 0 до 0,4 - низкий риск, 0,4-0,8 - средний риск, 0,8-1 - высокий риск.- at stage (107), the classification of the development teams with the assignment of the degree of probability of the occurrence of the cybersecurity risk based on the performed assessment of the developed software products. The vector is compared with a numerical estimate (from 0 to 1) of the probability of the specified risk. Further, the numerical assessment is compared with a qualitative risk assessment, depending on the range in which the numerical assessment turned out to be: from 0 to 0.4 - low risk, 0.4-0.8 - medium risk, 0.8-1 - high risk.

[0047] Заявленное техническое решение обеспечивает новую возможность автоматизированной оценки уровней риска кибербезопасности, порождаемых деятельностью продуктовых agile-команд, и их классификации на соблюдающих в высокой, средней и низкой степени требования по кибербезопасности при разработке ими продуктов, позволяет автоматически формировать приоритезированный список задач для экспертов кибербезопасности на основе вычисляемого алгоритмом уровня риска, что приводит к экономии трудозатрат экспертов кибербезопасности и членов agile-команд при одновременном снижении уровня рисков кибербезопасности предприятия, порождаемых деятельностью продуктовых agile-команд.[0047] The claimed technical solution provides a new opportunity for automated assessment of cybersecurity risk levels generated by the activities of agile product teams, and their classification into high, medium and low cybersecurity requirements in the development of their products, allows you to automatically generate a prioritized list of tasks for experts cybersecurity based on an algorithm-calculated risk level, resulting in labor savings for cybersecurity experts and agile team members while reducing the level of enterprise cybersecurity risks posed by agile product teams.

[0048] На Фиг. 4 представлен пример общего вида вычислительной системы (300), которая обеспечивает реализацию заявленного способа или является частью компьютерной системы, например, сервером, персональным компьютером, частью вычислительного кластера, обрабатывающим необходимые данные для осуществления заявленного технического решения.[0048] FIG. 4 shows an example of a general view of a computing system (300) that implements the claimed method or is part of a computer system, for example, a server, a personal computer, a part of a computing cluster that processes the necessary data to implement the claimed technical solution.

[0049] В общем случае, система (300) содержит объединенные общей шиной информационного обмена один или несколько процессоров (301), средства памяти, такие как ОЗУ (302) и ПЗУ (303), интерфейсы ввода/вывода (304), устройства ввода/вывода (1105), и устройство для сетевого взаимодействия (306).[0049] In the General case, the system (300) contains one or more processors (301) united by a common bus of information exchange, memory means such as RAM (302) and ROM (303), input / output interfaces (304), input devices / output (1105), and a device for networking (306).

[0050] Процессор (301) (или несколько процессоров, многоядерный процессор и т.п.) может выбираться из ассортимента устройств, широко применяемых в настоящее время, например, таких производителей, как: Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п. Под процессором или одним из используемых процессоров в системе (300) также необходимо учитывать графический процессор, например, GPU NVIDIA или Graphcore, тип которых также является пригодным для полного или частичного выполнения способа, а также может применяться для обучения и применения моделей машинного обучения в различных информационных системах.[0050] The processor (301) (or multiple processors, multi-core processor, etc.) can be selected from a range of devices currently widely used, for example, such manufacturers as: Intel ™, AMD ™, Apple ™, Samsung Exynos ™, MediaTEK ™, Qualcomm Snapdragon ™, etc. Under the processor or one of the processors used in the system (300), it is also necessary to take into account a graphics processor, for example, NVIDIA GPU or Graphcore, the type of which is also suitable for full or partial execution of the method, and can also be used for training and applying machine learning models in various information systems.

[0051] ОЗУ (302) представляет собой оперативную память и предназначено для хранения исполняемых процессором (301) машиночитаемых инструкций для выполнение необходимых операций по логической обработке данных. ОЗУ (302), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.). При этом, в качестве ОЗУ (302) может выступать доступный объем памяти графической карты или графического процессора.[0051] RAM (302) is a random access memory and is intended for storing machine-readable instructions executed by the processor (301) for performing the necessary operations for logical data processing. RAM (302), as a rule, contains executable instructions of the operating system and corresponding software components (applications, software modules, etc.). In this case, the available memory of the graphics card or graphics processor can act as RAM (302).

[0052] ПЗУ (303) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.[0052] ROM (303) is one or more persistent storage devices such as a hard disk drive (HDD), solid state data storage device (SSD), flash memory (EEPROM, NAND, etc.), optical storage media ( CD-R / RW, DVD-R / RW, BlueRay Disc, MD), etc.

[0053] Для организации работы компонентов системы (300) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (304). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, Fire Wire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.[0053] Various types of I / O interfaces (304) are used to organize the operation of system components (300) and to organize the operation of external connected devices. The choice of appropriate interfaces depends on the specific design of the computing device, which can be, but are not limited to: PCI, AGP, PS / 2, IrDa, Fire Wire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro , mini, type C), TRS / Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

[0054] Для обеспечения взаимодействия пользователя с вычислительной системой (300) применяются различные средства (305) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.[0054] To ensure user interaction with the computing system (300), various means (305) I / O information are used, for example, a keyboard, display (monitor), touch display, touch-pad, joystick, mouse manipulator, light pen, stylus, touch panel, trackball, speakers, microphone, augmented reality, optical sensors, tablet, light indicators, projector, camera, biometric identification (retina scanner, fingerprint scanner, voice recognition module), etc.

[0055] Средство сетевого взаимодействия (306) обеспечивает передачу данных посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п.В качестве одного или более средств (306) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.[0055] The means of networking (306) provides data transmission via an internal or external computer network, for example, Intranet, Internet, LAN, etc. One or more means (306) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and / or BLE module, Wi-Fi module, etc.

[0056] Представленные материалы заявки раскрывают предпочтительные примеры реализации технического решения и не должны трактоваться как ограничивающие иные, частные примеры его воплощения, не выходящие за пределы испрашиваемой правовой охраны, которые являются очевидными для специалистов соответствующей области техники.[0056] The presented materials of the application disclose the preferred examples of the implementation of the technical solution and should not be construed as limiting other, particular examples of its implementation, not going beyond the scope of the claimed legal protection, which are obvious to specialists in the relevant field of technology.

Claims

1. A computer-implemented method for predicting cybersecurity risks in the development of software products, performed using at least one processor and containing the stages at which:

- receive data containing information at least about the teams of software developers and developed software products, each of the mentioned teams;

- carry out the processing of the obtained data using a machine learning model trained on the basis of expert data on cybersecurity on compliance with cybersecurity requirements by teams in the development of software products, and during this processing, the following is carried out:

- division of the received data into categorical and numerical variables;

- processing of the obtained variables, in which vectorization of categorical variables and normalization of numerical variables is performed;

- concatenation of processed variables and construction of a vector based on them;

- using the mentioned vector to evaluate the degree of cybersecurity risks occurrence for each software product, and

- classification of development teams with the assignment of the degree of likelihood of cybersecurity risk on the basis of the assessment of the developed software products.

2. The method according to claim 1, characterized in that the processing of the obtained data is carried out using a machine learning model based on a random forest classifier.

3. The method according to claim 1, characterized in that the classification of teams is carried out with the assignment of a high, medium and low degree of likelihood of a cybersecurity risk.

4. The method according to claim 3, characterized in that information on development teams classified with a medium and high degree is automatically sent to the automated workstation of cybersecurity experts interacting with the development teams, with a high control mark.

5. The method according to claim 1, characterized in that the data on the classified commands contain at least:

i. data on the tasks of the team members during the development of a software product in the task management system for the development of software products;

ii. data on the structure of the team and the professional qualities of the team members;

iii. data on communications between team members and cybersecurity experts in the development of software products for the entire lifetime of the team;

iv. data on the source code of software products released by the team.

6. The method according to claim 1, characterized in that the data on the developed software products contain at least:

i. data on the number of critical cybersecurity defects identified during acceptance tests of the team's software products released into commercial operation over the entire period of the team's existence;

ii. data on the number of critical non-cybersecurity defects identified during acceptance tests of the team's products released into commercial operation over the entire period of the team's existence;

iii. data on testing the finished software product of the team, before its release into industrial operation;

iv. data on the passage of checks by the static and dynamic analysis system for vulnerabilities in the finished software products of the team, before their release into industrial operation;

v. data on vulnerabilities in the team's software products discovered after the release into industrial operation.

7. A system for predicting cybersecurity risks in the development of software products, containing: at least one processor;

- at least one memory connected to the processor, which contains machine-readable instructions, which, when executed by at least one processor, ensure the execution of the method according to any one of claims. 1-6.