Paper History (2)

LLM 관련 논문이 너무 많아져서.. LLM 위주의 paper만 따로 모아서 보자 

이전페이지: https://ai-information.blogspot.com/2022/05/paper-history.html

읽어볼것

  • Scaling Inference
    • Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
    • Small Language Models Need Strong Verifiers to Self-Correct Reasoning
  • LLM reasoner
    • Transfer Q⋆ : Principled Decoding for LLM Alignment
    • Think before you speak: Training Language Models With Pause Tokens, ICLR 2024
  • Hallucination
    • Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts
    • Knowledge Verification to Nip Hallucination in the Bud
    • Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation
    • Investigating Data Contamination in Modern Benchmarks for Large Language Models
    • Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework
    • 디코딩 기반
      • Factuality Enhanced Language Models for Open-Ended Text Generation, NeurIPS 2022
      • Contrastive Decoding Improves Reasoning in Large Language Models, Preprint 2023
      • Inference-Time Intervention: Eliciting Truthful Answers from a Language Model, NeurIPS 2023
      • Regularized Contrastive Decoding with Hard Negative Samples for Hallucination Mitigation
    • https://vr25.github.io/lrec-coling-hallucination-tutorial/
    • https://github.com/EdinburghNLP/awesome-hallucination-detection
    • https://github.com/LuckyyySTA/Awesome-LLM-hallucination
    • https://github.com/HillZhang1999/llm-hallucination-survey
    • https://github.com/ThuCCSLab/Awesome-LM-SSP/blob/main/collection/paper/safety/hallucination.md
  • decoding strategy
    • Automating Thought of Search: A Journey Towards Soundness and Completeness
    • Stream of Search (SoS): Learning to Search in Language
    • Chain-of-Thought Reasoning Without Prompting
    • Fast Inference from Transformers via Speculative Decoding
  • self-학습
    • Self-Instruct: Aligning Language Models with Self-Generated Instructions, ACL 2023
    • Large Language Models Can Self-Improve, EMNLP 2023
    • SELF: Self-Evolution with Language Feedback, Preprint 2024
    • Beyond Human Data: Reinforced Self-Training for Large Language Models, TMLR 2024
    • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, ICML 2024
    • SPIN: Self-Play Fine-Tuning for Large Language Models, ICML 2024
    • Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge, Preprint 2024
    • Can Large Reasoning Models Self-Train?, Preprint 2025
    • SCoRe: Self-Correction via Reinforcement Learning for Language Models, ICLR 2025
  • Safety
    • https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers?tab=readme-ov-file
    • Base LLMs refuse too
    • Deduplicating Training Data Mitigates Privacy Risks in Language Models, ICML 2022
    • Ignore Previous Prompt: Attack Techniques For Language Models, NeurIPS ML Safety Workshop
    • Jailbreaking Black Box Large Language Models in Twenty Queries, 2023
    • Jailbroken: How Does LLM Safety Training Fail?. NeurIPS 2023
    • Safety of Multimodal Large Language Models on Images and Texts, IJCAI 2024
    • Qwen3Guard Technical Report
    • Llama guard: Llm-based input-output safeguard for human-ai conversations
    • https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard2/MODEL_CARD.md
    • Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms
    • ShieldGemma: Generative AI Content Moderation Based on Gemma, Google 2024
    • ShieldGemma 2: Robust and Tractable Image Content Moderation, Google 2025
    • 블로그
      • https://openai.com/index/improving-model-safety-behavior-with-rule-based-rewards/
      • https://openai.com/index/deliberative-alignment/
      • https://openai.com/index/introducing-gpt-oss-safeguard/
      • https://www.anthropic.com/news/activating-asl3-protections
    • Dataset
      • RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models, Findings of EMNLP 2020
      • TruthfulQA: Measuring How Models Mimic Human Falsehoods, ACL 2022
      • ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection, ACL 2022
  • 읽을 논문 찾기
    • https://github.com/IAAR-Shanghai/ICSFSurvey
    • https://github.com/Hannibal046/Awesome-LLM?tab=readme-ov-file
    • https://github.com/dair-ai/ML-Papers-of-the-Week
    • https://www.promptingguide.ai/papers
  • 저자로 참여한 논문

1. PLM

1.1 PLM Models

  1. (ELMo) Deep contextualized word representations. NAACL 2018. [pdf] [project]
  2. (GPT) Improving Language Understanding by Generative Pre-Training. Preprint. [pdf] [project
  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. [code & model] [포스팅]
  4. (GPT-2) Language Models are Unsupervised Multitask Learners. Preprint. [code[포스팅]
  5. (MT-DNN) Multi-Task Deep Neural Networks for Natural Language Understanding. ACL 2019. [code & model[포스팅]
  6. XLNet: Generalized Autoregressive Pretraining for Language Understanding. NeurIPS 2019. [code & model] [포스팅]
  7. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. ACL 2020. [포스팅]
  8. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. ICLR 2020. [pdf]
  9. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scorings. ICLR 2020. [포스팅]
  10. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. ACL 2020. [code] [포스팅]
  11. Pre-training via Paraphrasing. NeurIPS 2020. [포스팅]
  12. Cloze-driven Pretraining of Self-attention Networks, EMNLP 2019 [포스팅]
  13. ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data. [포스팅]
  14. Reformer: The Efficient Transformer. [포스팅]
  15. Linformer: Self-Attention with Linear Complexity. [포스팅]
  16. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention, EMNLP 2020 [포스팅]

1.2 Knowledge Distillation & Model Compression

  1. TinyBERT: Distilling BERT for Natural Language Understanding. Preprint. [code & model] [포스팅]
  2. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Preprint. [포스팅]
  3. Patient Knowledge Distillation for BERT Model Compression. EMNLP 2019. [code] [포스팅]
  4. Small and Practical BERT Models for Sequence LabelingEMNLP 2019. [포스팅]
  5. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR 2020. [포스팅]
  6. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Preprint. [포스팅]

1.3 Analysis

  1. Language Models as Knowledge Bases? EMNLP 2019, [code] [포스팅]
  2. Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models. EMNLP 2020. [포스팅]

2. LLM

2.1 LLM Models

  • (GPT3) Language Models are Few-Shot Learners [포스팅]
  • (InstructGPT) Training language models to follow instructions with human feedback, OpenAI 2022.03 [포스팅]
  • GPT-4 Technical Report, OpenAI [포스팅]
  • LLaMA: Open and Efficient Foundation Language Models, Preprint 2023 [포스팅]

2.2 Knowledge Distillation & Model Compression

  1. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, Findings of ACL 2023 [포스팅]
  2. Large Language Models Are Reasoning Teachers, ACL 2023 [포스팅]
  3. Compact language models via pruning and knowledge distillation, NeurIPS 2024 [포스팅]
  4. LLM Pruning and Distillation in Practice: The Minitron Approach, Preprint 2024 [포스팅]

2.3 Analysis

  1. The False Promise of Imitating Proprietary LLMs, Preprint 2023 [포스팅]
  2. Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning, ACL 2023 [포스팅]

2.4 LLM Evaluator

  • Large Language Models Are State-of-the-Art Evaluators of Translation Quality, EAMT 2023 [포스팅]
  • Can Large Language Models Be an Alternative to Human Evaluations?, ACL 2023 [포스팅]
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Preprint 2023 [포스팅]

2.5 LLM + RAG

  • RAFT: Adapting Language Model to Domain Specific RAG, Preprint 2024 [포스팅]
  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, ICLR 2024 [포스팅]
  • 스크리닝 [포스팅]
    • RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 
    • FID: Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
    • RETRO: Improving Language Models by Retrieving from Trillions of Tokens
    • FID-distillation: Distilling Knowledge from Reader to Retriever for Question Answering
    • Atlas: Few-shot Learning with Retrieval Augmented Language Models
    • Re2G: Retrieve, Rerank, Generate
    • Active Retrieval Augmented Generation
    • Corrective Retrieval Augmented Generation
    • Learning to Filter Context for Retrieval-Augmented Generation
    • COCOM: Context Embeddings for Efficient Answer Generation in RAG

2.6 Hallucination

  1. How Language Model Hallucinations Can Snowball, ICML 2024 [포스팅]
  2. Fine-grained Hallucination Detection and Editing for Language Models, COLM 2024 [포스팅] [자체데이터세트]
  3. Two-tiered Encoder-based Hallucination Detection for Retrieval-Augmented Generation in the Wild, EMNLP Industry 2024 [포스팅]
  4. Reducing hallucination in structured outputs via Retrieval-Augmented Generation, NAACL industry 2024 [포스팅]
  5. Why Language Models Hallucinate, OpenAI 2025
  6. How do language models learn facts? Dynamics, curricula and hallucinations, DeepMind 2025
  7. Harmful Factuality Hallucination: LLMs Correcting What They Shouldn’, ARR202510
  8. Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models, ARR202510
  9. Large Language Models Must Be Taught to Know What They Don’t Know, NeurIPS 2024 [포스팅]
  10. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting, NeurIPS 2023 [포스팅]

2.6.1 Datasets

  1. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models, EMNLP 2023 [포스팅]
  2. FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs, Preprint 2024 [포스팅]

2.6.2 Reference-free detection

  1. SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models, EMNLP 2023 [포스팅] [자체 데이터세트]
  2. A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation, Preprint 2023 [포스팅] [자체 데이터세트(비공개)]
  3. SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency, Findings of EMNLP 2023 [포스팅, QA데이터]
  4. Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024 [포스팅] [자체 데이터세트]
  5. Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation, ACL 2024 [포스팅] [자체데이터세트(비공개), TruthfulQA, BioGEN]
  6. Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus, EMNLP 2023 [포스팅] [selfcheckgpt 데이터]
  7. Zero-Resource Hallucination Prevention for Large Language Models, Findings of EMNLP 2024 [포스팅] [자체데이터세트(비공개), pre-detection]
  8. Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification, Findings of ACL 2024 [포스팅] [자체데이터세트(비공개)]
  9. InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers, ACL 2024 [포스팅] [books & movie]
  10. LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations, ICLR 2025 [포스팅]
  11. HaDeMiF: Hallucination Detection and Mitigation in Large Language Models, ICLR 2025 [포스팅]

2.6.3 Decoding for Mitigating Hallucination

  • Contrastive decoding: Open-ended text generation as optimization, ACL 2023 [포스팅]  [wikinews, wikitext-103, bookcorpus] [[cc_news](https://huggingface.co/datasets/vblagoje/cc_news), [wikitext-103-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext)]
  • DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models, ICLR 2024 [참고포스팅1참고포스팅2참고유튜브] [TruthfulQA (MC, Gen), FACTOR, StrQA, GSM8k]
  • CAD: Trusting Your Evidence: Hallucinate Less with Context-aware Decoding, NAACL 2024 [포스팅] [CNN-DM, XSUM]
  • Integrative Decoding: Improve Factuality via Implicit Self-consistency, ICLR 2025 [포스팅] [TruthfulQA, Biographies, LongFact]
  • Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models, Preprint 2025 [포스팅] [SQuAD v1.1, SQuAD v2, TriviaQA, Natural Question]
  • Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models, COLING 2025 [포스팅] [TruthfulQA (MC), FACTOR (MC)]

2.6.4 Sampling or Regeneration for Mitigating Hallucination

  • Towards Mitigating Hallucination in Large Language Models via Self-Reflection, Findings of EMNLP 2023 [포스팅] [PubMedQA, MedQuAD, MEDIQA2019, LiveMedQA2017, MASH-QA]
  • SR: Self-refine: Iterative refinement with self-feedback, NeurIPS 2023 [포스팅] [Dialogue Response Generation, Code Optimization, Code Readability Improvement, Math Reasoning, Sentiment Reversal]
  • Inference-Time Intervention: Eliciting Truthful Answers from a Language Model, NeurIPS 2024 [포스팅] [NQ, TriviaQA, MMLU]
  • Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024 [포스팅] [자체 데이터세트]
  • USC: Universal self-consistency for large language model generation, ICML workshop 2024 [포스팅] [GSM8K, MATH Reasoning, TruthfulQA, BIRD-SQL, ARCADE, GovReport]
  • UCS: Lightweight reranking for language model generations, ACL 2024 [포스팅] [Xsum, MiniF2F, WMT14]
  • SE-SL, SE-RG: Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation, ACL 2024 [포스팅] [HumanEval, HumanEval+, BIRD-SQL , DailyMail, SummScreen, GSM8K, MATH]
  • FSC: Improving LLM Generations via Fine-Grained Self-Endorsement, Findings of ACL 2024 [포스팅] [Biographies, TriviaQA]
  • Self-Consistent Decoding for More Factual Open Responses, Preprint 2024 [포스팅]

2.6.5 Training for Mitigating Hallucination (IDK 포함)

  • Language Models (Mostly) Know What They Know, Anthropic 2022 [포스팅]
  • Do Large Language Models Know What They Don’t Know?, Findings of ACL 2023 [포스팅] [selfaware]
  • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?, EMNLP 2024 [포스팅]
  • Fine-tuning Language Models for Factuality, ICLR 2024 [포스팅]
  • I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token, NeurIPS 2024 [포스팅]
  • Alignment for Honesty, NeurIPS 2024 [포스팅]
  • R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’, NAACL 2024 [포스팅]
  • Understanding Finetuning for Factual Knowledge Extraction, ICML 2024 [포스팅]
  • Unfamiliar Finetuning Examples Control How Language Models Hallucinate, NAACL 2025 [포스팅]
  • Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning, ACL 2025 [포스팅]
  • Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning, Findings of ACL 2025 [포스팅]
  • KaFT: Knowledge-aware Fine-tuning for Boosting LLMs’ Domain-specific Question-Answering Performance, Findings of ACL 2025 [포스팅]

2.7 Scaling inference

  • Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, Preprint 2024 [포스팅]
  • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, Preprint 2024 [포스팅]
  • Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, ICLR 2025 [포스팅]

2.8 LLM reasoner

  • STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning, NeruIPS 2022 [포스팅]
  • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, COLM 2024 [포스팅]

2.9 Alignment learning

  • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model [참고]
  • KTO: Model Alignment as Prospect Theoretic Optimization [참고]
  • ORPO: Monolithic Preference Optimization without Reference Model [참고]
  • Don't Use Your Data All at Once, COLING 2025 [참고]
  • SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning, Preprint 2025 [포스팅]

2.10 Prompting

  • Making Pre-trained Language Models Better Few-shot Learners, ACL 2021 [포스팅]
  • Exploring the Universal Vulnerability of Prompt-based Learning Paradigm, NAACL 2022 [포스팅]
  • Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT. INLG 2022 [포스팅]
  • Contrastive Chain-of-Thought Prompting, Preprint 2024 [포스팅]

2.12 Further (Continual) training

2.12.1 Language Transfer

  • Extrapolating Large Language Models to Non-English by Aligning Languages, Preprint 2023 [참고]
  • Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer, Findings of NAACL 2024 [포스팅]
  • LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation, Findings of ACL 2024 [포스팅]
  • Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca, Preprint 2023 [포스팅]
  • Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models, Preprint 2024 [포스팅]
  • RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining, Preprint 2024 [포스팅]
  • Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus, NVIDIA 2024 [포스팅]

2.12.2 Domain Transfer

  • BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, Preprint 2022 [포스팅]
  • Continual pre-training of language models, ICLR 2023 [포스팅]
  • Efficient continual pre-training for building domain specific large language models, Findings of ACL 2024 [포스팅]
  • Med-PaLM: Large language models encode clinical knowledge, Nature 2023 [포스팅]
  • Med-PaLM2: Towards Expert-Level Medical Question Answering with Large Language Models, Nature medicine 2025 [포스팅]

2.13 Multilingual LLM

  • A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models, ICLR 2024 [참고]
  • Cross-Lingual Supervision improves Large Language Models Pre-training, Preprint 2023 [포스팅]

2.13.1 Consistency

  • Beneath the Surface of Consistency: Exploring Cross-Lingual Knowledge Representation Sharing in LLMs, Preprint 2024 [포스팅]
  • CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment, Sumeval 2025 [포스팅]
  • Align after Pre-train: Improving Multilingual Generative Models with Cross-Lingual Alignment, Preprint [포스팅]

2.99 기타

  1. LoRA: Low-Rank Adaptation of Large Language Models, ICLR 2022 [포스팅]
  2. Taxonomy and Analysis of Sensitive User Queries in Generative AI Search, Review (NAVER) [포스팅]
  3. Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling, LREC-COLING 2024 [포스팅]
  4. Large Language Models for Data Annotation: A Survey, Preprint 2024 [포스팅]

3. Safety

  • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Anthropic 2022 [포스팅]
  • Constitutional AI: Harmlessness from AI Feedback. Anthropic 2022 [포스팅]
  • Rule Based Rewards for Language Model Safety, NeurIPS 2024 (OpenAI) [포스팅]
  • Deliberative Alignment: Reasoning Enables Safer Language Models, OpenAI 2024 [포스팅]
  • Safety Alignment Should be Made More Than Just a Few Tokens Deep, ICLR 2025 (oral) [포스팅]
  • gpt-oss-120b & gpt-oss-20b Model Card, OpenAI 2025 [포스팅]
  • Technical Report: Performance and baseline evaluations of gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, OpenAI 2025 [포스팅]






























댓글