HOW CAN CONFUSION MATRIX HELP IN DETECTING CYBER CRIMES.

Raghav Agarwal
5 min readJun 6, 2021

What is cybercrime?

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.

Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

Types of cybercrime

Here are some specific examples of the different types of cybercrime:

  • Email and internet fraud.
  • Identity fraud (where personal information is stolen and used).
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyberextortion).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data).

Most cybercrime falls under two main categories:

  • Criminal activity that targets
  • Criminal activity that uses computers to commit other crimes.

Cybercrime that targets computers often involves viruses and other types of malware.

Cybercriminals may infect computers with viruses and malware to damage devices or stop them working. They may also use malware to delete or steal data.

How can ML help in detecting Cyber Crimes?

Today, enterprises across are using cloud to build and manage software. Microservices is a widely used software development technique and Application Program Interface (API) is a type of microservice used in various industries such as banking, storage and healthcare. Many instances of microservices automatically start when required. In such a situation, it is not possible for humans to monitor and check if all the instances are genuine. This presents a greater cyber-attack risk.

A system with APIs is designed to fulfill the assumption that each of the routines will be called only limited times per day and this can provide a viable solution to such attacks. But the number of calls might increase due to programmatic retries if the API fails to respond in a timely manner. Also, the number of API calls may increase in situations when debug or trouble-shooting procedures are performed. Even with trouble shooting., the maximum threshold is not expected to go beyond a defined number of calls per day.

Here, we can make a rudimentary assumption — that if an API call is invoked more than 100 times, then it may constitute a DoS/DDoS attack. The ML algorithm can then be trained using logging data to classify if the system is under attack based on certain attributes.

The logs generated by various microservices are continuously monitored using log monitoring tools such as Fluentd. Various attributes, such as client IP address, API request and date and time, are retrieved from the acquired log data.

This information can be fed into a preprocessor in real time, which calculates the number of hits on a certain API for a given date and time, and client IP address. There can be situations where multiple machines are used to attack multiple APIs exposed by a target. Every industry that uses API, especially applications that deal with sensitive information, can be impacted by DoS or DDoS attacks. These attacks are not just used for denying services to a consumer; an attacker can use it for sending malware with the intent of gathering sensitive data.

Machine Learning algorithms can be used to train and detect if there has been a DoS/DDoS attack. As soon as the attack is detected, an email notification can be sent to the security engineers. Any classification algorithm can be used to categorize if it is a DoS/DDoS attack or not. One example of a classification algorithm is Support Vector Machine (SVM) which is a supervised learning method that analyses data and recognizes patterns.

What is Confusion Matrix?

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.

Understanding Errors in Confusion Matrix :-

  • true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
  • true negatives (TN): We predicted no, and they don’t have the disease.
  • false positives (FP): We predicted yes, but they don’t actually have the disease. (Also known as a “Type I error.”)
  • false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a “Type II error.”)

The most dangerous error is the False Positive [FP] error as the machine predicted false but it was not false it was true. For example, the machine predicted student fails but actually student was a pass.

This error causes problems in the cybersecurity world where the tools used are based on machine learning or ai, it may give a False Negative error that may cause dangerous impacts.

Therefore the role of the confusion matrix is important in the field of machine learning.

To detect cyber attacks by the application of confusion matrix in ML, we can follow the example given below :-

  • True Positive (TP): The amount of attack detected when it is actually attack.
  • True Negative (TN): The amount of normal detected when it is actually normal.
  • False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
  • False Negative (FN): The amount of normal detected when it is actually attack.

THANK YOU FOR READING THIS ARTICLE.

For any queries/suggestion feel free to reach out to me on Linkedin.

Linkedin URL :- https://www.linkedin.com/in/raghav-agarwal-4864661a2/

--

--