Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning.
Live demo for RADAR: RADAR-Demo
We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good detection performance on LLM-generated AI-text while being robust against paraphrasing.
cd env
# go to env directory
conda env create -f radar_core.yaml
# to init a environment with packages installed using conda
conda activate radar_env
#activate conda environment
pip install -r radar_requirements.txt
# to install packages install using pip
Our RADAR detector is trained from the RoBERTa-large model. You can use it as using RoBERTa-large model. Here is an example of using RADAR to get the probability that the text is generated by Vicuna.
detector = transformers.AutoModelForSequenceClassification.from_pretrained("TrustSafeAI/RADAR-Vicuna-7B")
tokenizer = transformers.AutoTokenizer.from_pretrained("TrustSafeAI/RADAR-Vicuna-7B")
detector.eval()
detector.to(device)
Text_Input=["I'm not a chatbot"]
with torch.no_grad():
inputs = tokenizer(Text_input, padding=True, truncation=True, max_length=512, return_tensors="pt")
inputs = {k:v.to(device) for k,v in inputs.items()}
output_probs = F.log_softmax(detector(**inputs).logits,-1)[:,0].exp().tolist()
print("Probability of AI-generated texts is",output_probs)
We prompt the gpt-3.5-turbo/gpt-4 to paraphrase the ai-generated text to make it more like human-written.
import openai
openai.api_key = "your_api_key"
def _openai_response(text,openai_model):
# get paraphrases of text from the openai model
# openai_model can be gpt-3.5-turbo/gpt-4
system_instruct = {"role": "system", "content": "Enhance the word choices in the sentence to sound more like that of a human."}
user_input={"role": "user", "content": text}
messages = [system_instruct,user_input]
k_wargs = { "messages":messages, "model": openai_model}
r = openai.ChatCompletion.create(**k_wargs)['choices'][0].message.content
return r
We may need to calculate the detection auroc of the detector.
from sklearn.metrics import auc,roc_curve
def get_roc_metrics(human_preds, ai_preds):
# human_preds is the ai-generated probabiities of human-text
# ai_preds is the ai-generated probabiities of ai-text
fpr, tpr, _ = roc_curve([0] * len(human_preds) + [1] * len(ai_preds), human_preds + ai_preds,pos_label=1)
roc_auc = auc(fpr, tpr)
return fpr.tolist(), tpr.tolist(), float(roc_auc)
We provide some examples of using RADAR in radar_examples.ipynb. You can refer to it to get more familiar with RADAR working flow.
If you find RADAR useful, please cite the following paper:
@inproceedings{DBLP:conf/nips/HuCH23,
author = {Xiaomeng Hu and
Pin{-}Yu Chen and
Tsung{-}Yi Ho},
title = {{RADAR:} Robust AI-Text Detection via Adversarial Learning},
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference
on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans,
LA, USA, December 10 - 16, 2023},
year = {2023}
}
Feel free to contact Xiaomeng Hu if you have any questions.