Skip to content

Kaggle competition repository for LLMs - You Can't Please Them All.

Notifications You must be signed in to change notification settings

zixi-liu/LLMs-You-Cant-Please-Them-All

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Collections of Papers on LLM-as-a-Judge

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

  • MT-Bench
  • Chatbot Arena
  • Potential limitations of the LLM-as-a-judge approach
    • Position bias when an LLM exhibits a propensity to favor certain positions over others.
      • it could be rooted in the training data or inherent to the left-to-right architecture of causal transformers.
    • Verbosity bias when an LLM judge favors longer, verbose responses, even if they are not as clear, high-quality, or accurate as shorter alternatives.
    • Self-enhancement bias - the effect that LLM judges may favor the answers generated by themselves.
  • 3 LLM-as-a-judge variations
    • Pairwise comparison. An LLM judge is presented with a question and two answers, and tasked to determine which one is better or declare a tie.
      • may lack scalability when the number of players increases
    • Single answer grading. Alternatively, an LLM judge is asked to directly assign a score to a single answer.
      • may be unable to discern subtle differences between specific pairs, and its results may become unstable, as absolute scores are likely to fluctuate more than relative pairwise results if the judge model changes.
    • Reference-guided grading. In certain cases, it may be beneficial to provide a reference solution if applicable.
  • Advantages of LLM-as-a-Judge
    • scalability and explainability

Large Language Models are not Fair Evaluators

LLM Evaluators Recognize and Favor Their Own Generations

Universal Adversarial Triggers for Attacking and Analyzing NLP

Universal and Transferable Adversarial Attacks on Aligned Language Models

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

AI如何评估文章的价值和质量:一个多模态的联合模型

Key Techniques to Induce Disagreement

1. Use of Trigger Words

Emotionally Charged Words:

Words like "atrocious," "magnificent," or "revolutionary" can evoke strong reactions in LLMs, influencing their tone judgment. For example, using "tragic loss" versus "unexpected failure" in a reflective essay might shift the perceived tone or impact.

Jargon and Buzzwords:

Overuse of industry-specific jargon (e.g., "blockchain," "quantum supremacy") can trigger overestimation of sophistication. Simplified language can result in a perception of lower complexity.

2. Manipulate Style and Structure

Sentence Length:

Mixing extremely short and long sentences can confuse evaluations on readability and coherence. For example: "This is important. Nevertheless, the amalgamation of factors necessitates an examination of underlying complexities."

Unusual Formatting:

Irregular use of line breaks, bullet points, or bolded text can trigger varied responses on clarity and professionalism.

3. Contradictory Arguments

Introduce subtle contradictions within the essay to test logical consistency detection. Example: "Climate change is a pressing issue that requires immediate action. However, delaying efforts might reveal better technologies."

4. Incorporate Bias-Prone Topics

Choose subjects known to elicit biases in models due to societal or cultural sensitivities. Example: "Artificial intelligence will eliminate all creative jobs" might prompt a different response than "AI will assist artists in creating."

5. Present Fabricated Data or Citations

Invent statistics or reference fictional studies to see if the model verifies or challenges the claims. Example: "According to Dr. John Smith’s 2019 study, 75% of essays with passive voice are poorly received."

6. Overuse of Figures of Speech

Overload the essay with metaphors, analogies, or hyperboles to create ambiguity in meaning. Example: "The wind whispered secrets of the universe, veiling truths in the cloak of night."

7. Tone Switching

Switch between formal and informal tones abruptly to challenge the model’s judgment of tone consistency. Example: "This essay endeavors to elucidate the implications of globalization. BTW, it’s like a double-edged sword, you know?"

8. Purposefully Introduce Ambiguity

Use vague statements that can be interpreted differently based on context. Example: "Progress is a double-edged sword. It brings light, yet it casts shadows."

About

Kaggle competition repository for LLMs - You Can't Please Them All.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published