RADAR_AI_Detection

Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning.

Live demo for RADAR: RADAR-Demo

We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good detection performance on LLM-generated AI-text while being robust against paraphrasing.

Environment Build

    cd env
    # go to env directory
    conda env create -f radar_core.yaml 
    # to init a environment with packages installed using conda
    conda activate radar_env
    #activate conda environment
    pip install -r radar_requirements.txt 
    # to install packages install using pip

Use RADAR to get AI-generated probability

Our RADAR detector is trained from the RoBERTa-large model. You can use it as using RoBERTa-large model. Here is an example of using RADAR to get the probability that the text is generated by Vicuna.

detector = transformers.AutoModelForSequenceClassification.from_pretrained("TrustSafeAI/RADAR-Vicuna-7B")
tokenizer = transformers.AutoTokenizer.from_pretrained("TrustSafeAI/RADAR-Vicuna-7B")
detector.eval()
detector.to(device)
Text_Input=["I'm not a chatbot"]
with torch.no_grad():
  inputs = tokenizer(Text_input, padding=True, truncation=True, max_length=512, return_tensors="pt")
  inputs = {k:v.to(device) for k,v in inputs.items()}
  output_probs = F.log_softmax(detector(**inputs).logits,-1)[:,0].exp().tolist()
  print("Probability of AI-generated texts is",output_probs)

Paraphrase the ai-text to evade detection

We prompt the gpt-3.5-turbo/gpt-4 to paraphrase the ai-generated text to make it more like human-written.

import openai
openai.api_key = "your_api_key"
def _openai_response(text,openai_model):
    # get paraphrases of text from the openai model
    # openai_model can be gpt-3.5-turbo/gpt-4
    system_instruct = {"role": "system", "content": "Enhance the word choices in the sentence to sound more like that of a human."}
    user_input={"role": "user", "content": text}
    messages = [system_instruct,user_input]
    k_wargs = { "messages":messages, "model": openai_model}
    r = openai.ChatCompletion.create(**k_wargs)['choices'][0].message.content
    return r

Calculate the Detection AUROC

We may need to calculate the detection auroc of the detector.

from sklearn.metrics import auc,roc_curve
def get_roc_metrics(human_preds, ai_preds):
    # human_preds is the ai-generated probabiities of human-text
    # ai_preds is the ai-generated probabiities of ai-text
    fpr, tpr, _ = roc_curve([0] * len(human_preds) + [1] * len(ai_preds), human_preds + ai_preds,pos_label=1)
    roc_auc = auc(fpr, tpr)
    return fpr.tolist(), tpr.tolist(), float(roc_auc)

Examples

We provide some examples of using RADAR in radar_examples.ipynb. You can refer to it to get more familiar with RADAR working flow.

Citation

If you find RADAR useful, please cite the following paper:

@inproceedings{DBLP:conf/nips/HuCH23,
  author       = {Xiaomeng Hu and
                  Pin{-}Yu Chen and
                  Tsung{-}Yi Ho},
  title        = {{RADAR:} Robust AI-Text Detection via Adversarial Learning},
  booktitle    = {Advances in Neural Information Processing Systems 36: Annual Conference
                  on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans,
                  LA, USA, December 10 - 16, 2023},
  year         = {2023}
}

Contact

Feel free to contact Xiaomeng Hu if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
env		env
LICENSE		LICENSE
README.md		README.md
radar_examples.ipynb		radar_examples.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RADAR_AI_Detection

Environment Build

Use RADAR to get AI-generated probability

Paraphrase the ai-text to evade detection

Calculate the Detection AUROC

Examples

Citation

Contact

About

Releases

Packages

Contributors 3

Languages

License

IBM/RADAR

Folders and files

Latest commit

History

Repository files navigation

RADAR_AI_Detection

Environment Build

Use RADAR to get AI-generated probability

Paraphrase the ai-text to evade detection

Calculate the Detection AUROC

Examples

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages