There are many security services that uses ChatGPT. Methods to attack against AI are, for example, input noises, poisoning data, or reverse engineering.
If you hear about "AI and security", 2 things can be considered. First, using AI for cyber security. Second, attack against AI. In this article, I am going to discuss these topics.
- Using AI for security: Introducing some security application that uses ChatGPT. - Attack against AI: What it is.
Using AI (ChatGPT) for security (purpose)
Since the announcement of GPT-3 in September 2020 and the release of many image-generating AIs in 2022, using AI become commonplace. Especially, after the release of ChatGPT in November 2022, it immediately got popular because of its ability to generate quite natural sentences for human.
ChatGPT is also used to code from human's natural languages, and also can be used to explain the meaning of the codes, memory dumps, or logs in a way that is easy for human to understand. Finding an unusual pattern from a large amount of data is what AI is good at. Hence, there is a service to use AI for Incident Response. - Microsoft Security Copilot : Security Incident response adviser.
This research uses ChatGPT to detect phishing sites and marked 98.3% of accuracy.
However no one is willing to share sensitive information with Microsoft or other vendors. Then it is possible to run ChatGPT-Like LLM on Your PC Offline by some opensource LLM application, for example gpt4all. gpt4all needs GPU and large memory (128G+) to work.
ChatGPT will be kept used for both offensive and defensive security.
Attack against AI
Before we discuss about attack against AI, let's briefly review how AI works. Research on AI has long history. However, generally people uses AI as a Machine Learning model or Deep Learning algorithms, and some of them uses Neural Network. In this article, we discuss about Deep Neural Network (DNN).
DNNs works as follows. At first, there are several nodes and one set of those are called nodes. Each nodes has it layer and the layer are connected each other. (Please see the pic below).
The data from Input layer is going to propagate to multiple (hidden) layers and then finally reached to the Output layer, which performs classification or regression analysis. For example, input many pictures of animals to let the DNN learn, and then perform to identify (categorize) which animal is in the pictures.
What kind of attacks are possible against AI?
Threat of cyber security is to compromise the system's CIA (Confidentiality, Integrity, Availability). The attack to AI is to force wrong decisions (lose Integrity), make the AI unavailable (lose availability), or the decision model is theft (lose confidentiality). Among these attacking, the most well-known attack methodology is to input a noise in the input layer and force wrong decision - it is named as an Adversarial Example attack.
The panda in the picture on the left side is the original data and be input to DNN - normally, the DNN will categorize this picture as panda obviously. However, if the attacker add a noise (middle picture), the DNN misjudge it as a gibbon. In other words, the attack on the AI is to make the AI make a wrong decision, without noticed by humans.
The example above is attack to the image classifier. Another attack example is ShapeShifter, which attack to object detector. This makes a self-driving car with AI cause an accident without being noticed by humans, by makes stop signs undetectable.
You might think even if the DNN model on a self driving car is classified so the attacker can't get info to attack to the specific DNN model. However, the paper below discuss that an adversarial example designed for one model can transfer to other models as well (transferability).
That means, even if an attacker is unable to examine the target DNN model, they can still experiment and attack by other DNN models.
Data poisoning attack
In an adversarial example attack, the data itself is not changed, instead, added noise to the data. The attack that poisoning the training data also exists. Data poisoning is to access to the training data which is used to learn/train the DNN model, and input incorrect data to make DNN model produce results which is profitable for the attacker, or reducing the accuracy of the learning. Inputting a backdoor is also possible.
Vulnerabilities in cryptography include a vulnerability that the attacker can learn the encryption model by analyzing the input/output strings which are easy to obtain. Similarly, in AI models, there is a possibility of reverse engineering of DNN models or copy the models by analysing the input (training data) and output (decision results). These papers discuss about that.