You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, I cannot reproduce the results for AdvBench in Table 2. My results for LLaVA-v1.5-7B and LLaVA-v1.5-7B-Post-hoc-LoRA are much higher than reported (for both models, >25% for Vanilla, >95% for Suffix).
I did realize that there may be a difference in ASR calculation (I use the same method in https://github.com/ys-zong/VLGuard/blob/main/VLGuard_eval.py). Nonetheless, it is unclear (either from the code or from the paper) how exactly the authors calculate ASR for AdvBench, XSTest, and FigStep, which hinders result reproduction.
From a quick look at your code, it seems you didn't use the LLaVA conversation template but directly input the raw texts, which may cause the differences. Can you modify your inference code and check if that works? Also can you put your code into a markdown block for better visibility in case I miss something.
I'll try to find and release the code for AdvBench but the best estimate is in a few weeks. I'm too busy recently.
My observation
My experiment setup
I use the instructions in the column 'goal' in https://github.com/llm-attacks/llm-attacks/blob/main/data/advbench/harmful_behaviors.csv as the input. Then, I use an evaluation code adapted from VLGuard_eval.py (see at the bottom).
My questions
Would the authors kindly release a complete official evaluation code, rather than providing only an evaluation for VLGuard dataset?
Many thanks!
----------------------My Advbench evaluation code below--------------------
import torch
import os
import json
import argparse
import numpy as np
import gc
from utils import utils, model_utils
model_mappings = {
'llava15-7b': 'liuhaotian/llava-v1.5-7b',
'llava15-13b': 'liuhaotian/llava-v1.5-13b',
'llava15-7b-mixed': 'ys-zong/llava-v1.5-7b-Mixed',
'llava15-13b-mixed': 'ys-zong/llava-v1.5-13b-Mixed',
}
def parse_args():
parser = argparse.ArgumentParser(description='AdvBench Evaluation')
parser.add_argument('--data_path', default='harmful_behaviors.csv', type=str)
parser.add_argument("--engine", "-e", default=["llava15-7b"], nargs="+")
parser.add_argument('--max_new_tokens', default=256, type=int)
return parser.parse_args()
def load_data(args):
import pandas as pd
df = pd.read_csv(args.data_path)
return df
def text_only_inference(model, tokenizer, prompt, args):
inputs = tokenizer([prompt], return_tensors="pt").to('cuda')
def eval_questions(args, questions, model, tokenizer, processor, engine):
results = {}
vanilla_succ = []
suffix_succ = []
with torch.no_grad():
for _, row in questions.iterrows():
# Vanilla instruction
vanilla_pred = text_only_inference(model, tokenizer, row['goal'], args)
vanilla_attack = utils.check_success_attack(vanilla_pred)
vanilla_succ.append(vanilla_attack)
vanilla_asr = round(np.mean(vanilla_succ) * 100, 2)
suffix_asr = round(np.mean(suffix_succ) * 100, 2)
print(f'Final Vanilla ASR: {vanilla_asr}%')
print(f'Final Suffix Injection ASR: {suffix_asr}%')
return results
if name == "main":
args = parse_args()
all_questions = load_data(args)
for engine in args.engine:
model, tokenizer, processor = model_utils.load_model(model_mappings[engine], args)
print(f"Loaded model: {engine}\n")
The text was updated successfully, but these errors were encountered: