Paper History (2)

LLM 관련 논문이 너무 많아져서.. LLM 위주의 paper만 따로 모아서 보자 

이전페이지: https://ai-information.blogspot.com/2022/05/paper-history.html

읽어볼것

  • Scaling Inference
    • Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
    • Small Language Models Need Strong Verifiers to Self-Correct Reasoning
  • LLM reasoner
    • Transfer Q⋆ : Principled Decoding for LLM Alignment
    • Think before you speak: Training Language Models With Pause Tokens, ICLR 2024
  • Omni
    • LLAMA-OMNI: SEAMLESS SPEECH INTERACTION WITH LARGE LANGUAGE MODELS
    • OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
    • Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
    • VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
    • Qwen2.5-Omni Technical Report
  • Hallucination
    • Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts
    • Knowledge Verification to Nip Hallucination in the Bud
    • 디코딩 기반
      • Factuality Enhanced Language Models for Open-Ended Text Generation, NeurIPS 2022
      • Contrastive Decoding Improves Reasoning in Large Language Models, Preprint 2023
      • Inference-Time Intervention: Eliciting Truthful Answers from a Language Model, NeurIPS 2023
      • A Single Model Ensemble Framework for Neural Machine Translation using Pivot Translation, Preprint 2025
      • Regularized Contrastive Decoding with Hard Negative Samples for Hallucination Mitigation
    • 학습 기반

    • self-학습
      • Self-Instruct: Aligning Language Models with Self-Generated Instructions, ACL 2023
      • Large Language Models Can Self-Improve, EMNLP 2023
      • SELF: Self-Evolution with Language Feedback, Preprint 2024
      • Beyond Human Data: Reinforced Self-Training for Large Language Models, TMLR 2024
      • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, ICML 2024
      • SPIN: Self-Play Fine-Tuning for Large Language Models, ICML 2024
      • Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge, Preprint 2024
      • Can Large Reasoning Models Self-Train?, Preprint 2025
      • SCoRe: Self-Correction via Reinforcement Learning for Language Models, ICLR 2025
    • https://vr25.github.io/lrec-coling-hallucination-tutorial/
    • https://github.com/EdinburghNLP/awesome-hallucination-detection
    • https://github.com/LuckyyySTA/Awesome-LLM-hallucination
    • https://github.com/HillZhang1999/llm-hallucination-survey
    • https://github.com/ThuCCSLab/Awesome-LM-SSP/blob/main/collection/paper/safety/hallucination.md
  • decoding strategy
    • Automating Thought of Search: A Journey Towards Soundness and Completeness
    • Stream of Search (SoS): Learning to Search in Language
    • Chain-of-Thought Reasoning Without Prompting
    • Fast Inference from Transformers via Speculative Decoding
  • Continual pre-training
    • Investigating continual pretraining in large language models: Insights and implications, 2024.
      • BloombergGPT: A Large Language Model for Finance, Preprint 2023
      • FinGPT: Open-Source Financial Large Language Models, Preprint 2023
      • Galactica: A Large Language Model for Science, Preprint 2023
    • AceGPT, Localizing Large Language Models in Arabic, NAACL 2024
      • ALLaM: Large Language Models for Arabic and English, ICLR 2025
      • Reuse, Don’t Retrain: A Recipe for Continued Pretraining of Language Models, NVIDIA 2024
      • VRCP: Vocabulary Replacement Continued Pretraining for Efficient Multilingual Language Models, Sumeval 2025
      • DISTILLM: Towards Streamlined Distillation for Large Language Models, ICML 2024
  • 읽을 논문 찾기
    • https://github.com/IAAR-Shanghai/ICSFSurvey
    • https://github.com/Hannibal046/Awesome-LLM?tab=readme-ov-file
    • https://github.com/dair-ai/ML-Papers-of-the-Week
    • https://www.promptingguide.ai/papers
  • 저자로 참여한 논문

1. PLM

1.1 PLM Models

  1. (ELMo) Deep contextualized word representations. NAACL 2018. [pdf] [project]
  2. (GPT) Improving Language Understanding by Generative Pre-Training. Preprint. [pdf] [project
  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. [code & model] [포스팅]
  4. (GPT-2) Language Models are Unsupervised Multitask Learners. Preprint. [code[포스팅]
  5. (MT-DNN) Multi-Task Deep Neural Networks for Natural Language Understanding. ACL 2019. [code & model[포스팅]
  6. XLNet: Generalized Autoregressive Pretraining for Language Understanding. NeurIPS 2019. [code & model] [포스팅]
  7. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. ACL 2020. [포스팅]
  8. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. ICLR 2020. [pdf]
  9. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scorings. ICLR 2020. [포스팅]
  10. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. ACL 2020. [code] [포스팅]
  11. Pre-training via Paraphrasing. NeurIPS 2020. [포스팅]
  12. Cloze-driven Pretraining of Self-attention Networks, EMNLP 2019 [포스팅]
  13. ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data. [포스팅]
  14. Reformer: The Efficient Transformer. [포스팅]
  15. Linformer: Self-Attention with Linear Complexity. [포스팅]
  16. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention, EMNLP 2020 [포스팅]

1.2 Knowledge Distillation & Model Compression

  1. TinyBERT: Distilling BERT for Natural Language Understanding. Preprint. [code & model] [포스팅]
  2. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Preprint. [포스팅]
  3. Patient Knowledge Distillation for BERT Model Compression. EMNLP 2019. [code] [포스팅]
  4. Small and Practical BERT Models for Sequence LabelingEMNLP 2019. [포스팅]
  5. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR 2020. [포스팅]
  6. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Preprint. [포스팅]

1.3 Analysis

  1. Language Models as Knowledge Bases? EMNLP 2019, [code] [포스팅]
  2. Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models. EMNLP 2020. [포스팅]

2. LLM

2.1 LLM Models

  • (GPT3) Language Models are Few-Shot Learners [포스팅]
  • (InstructGPT) Training language models to follow instructions with human feedback, OpenAI 2022.03 [포스팅]
  • GPT-4 Technical Report, OpenAI [포스팅]
  • LLaMA: Open and Efficient Foundation Language Models, Preprint 2023 [포스팅]

2.2 Knowledge Distillation & Model Compression

  1. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, Findings of ACL 2023 [포스팅]
  2. Large Language Models Are Reasoning Teachers, ACL 2023 [포스팅]
  3. Compact language models via pruning and knowledge distillation, NeurIPS 2024 [포스팅]
  4. LLM Pruning and Distillation in Practice: The Minitron Approach, Preprint 2024 [포스팅]

2.3 Analysis

  1. The False Promise of Imitating Proprietary LLMs, Preprint 2023 [포스팅]
  2. Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning, ACL 2023 [포스팅]

2.4 LLM Evaluator

  • Large Language Models Are State-of-the-Art Evaluators of Translation Quality, EAMT 2023 [포스팅]
  • Can Large Language Models Be an Alternative to Human Evaluations?, ACL 2023 [포스팅]
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Preprint 2023 [포스팅]

2.5 LLM + RAG

  • RAFT: Adapting Language Model to Domain Specific RAG, Preprint 2024 [포스팅]
  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, ICLR 2024 [포스팅]
  • 스크리닝 [포스팅]
    • RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 
    • FID: Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
    • RETRO: Improving Language Models by Retrieving from Trillions of Tokens
    • FID-distillation: Distilling Knowledge from Reader to Retriever for Question Answering
    • Atlas: Few-shot Learning with Retrieval Augmented Language Models
    • Re2G: Retrieve, Rerank, Generate
    • Active Retrieval Augmented Generation
    • Corrective Retrieval Augmented Generation
    • Learning to Filter Context for Retrieval-Augmented Generation
    • COCOM: Context Embeddings for Efficient Answer Generation in RAG

2.6 Hallucination

  1. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?, EMNLP 2024 [포스팅]
  2. How Language Model Hallucinations Can Snowball, ICML 2024 [포스팅]
  3. Fine-grained Hallucination Detection and Editing for Language Models, COLM 2024 [포스팅] [자체데이터세트]
  4. Two-tiered Encoder-based Hallucination Detection for Retrieval-Augmented Generation in the Wild, EMNLP Industry 2024 [포스팅]
  5. Reducing hallucination in structured outputs via Retrieval-Augmented Generation, NAACL industry 2024 [포스팅]

2.6.1 Datasets

  1. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models, EMNLP 2023 [포스팅]
  2. FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs, Preprint 2024 [포스팅]

2.6.2 Reference-free detection

  1. SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models, EMNLP 2023 [포스팅] [자체 데이터세트]
  2. A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation, Preprint 2023 [포스팅] [자체 데이터세트(비공개)]
  3. SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency, Findings of EMNLP 2023 [포스팅, QA데이터]
  4. Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024 [포스팅] [자체 데이터세트]
  5. Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation, ACL 2024 [포스팅] [자체데이터세트(비공개), TruthfulQA, BioGEN]
  6. Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus, EMNLP 2023 [포스팅] [selfcheckgpt 데이터]
  7. Zero-Resource Hallucination Prevention for Large Language Models, Findings of EMNLP 2024 [포스팅] [자체데이터세트(비공개), pre-detection]
  8. Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification, Findings of ACL 2024 [포스팅] [자체데이터세트(비공개)]
  9. InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers, ACL 2024 [포스팅] [books & movie]
  10. LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations, ICLR 2025 [포스팅]
  11. HaDeMiF: Hallucination Detection and Mitigation in Large Language Models, ICLR 2025 [포스팅]

2.6.3 Decoding for Mitigating Hallucination

  • Contrastive decoding: Open-ended text generation as optimization, ACL 2023 [포스팅]  [wikinews, wikitext-103, bookcorpus] [[cc_news](https://huggingface.co/datasets/vblagoje/cc_news), [wikitext-103-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext)]
  • DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models, ICLR 2024 [참고포스팅1참고포스팅2참고유튜브] [TruthfulQA (MC, Gen), FACTOR, StrQA, GSM8k]
  • CAD: Trusting Your Evidence: Hallucinate Less with Context-aware Decoding, NAACL 2024 [포스팅] [CNN-DM, XSUM]
  • Integrative Decoding: Improve Factuality via Implicit Self-consistency, ICLR 2025 [포스팅] [TruthfulQA, Biographies, LongFact]
  • Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models, Preprint 2025 [포스팅] [SQuAD v1.1, SQuAD v2, TriviaQA, Natural Question]
  • Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models, COLING 2025 [포스팅] [TruthfulQA (MC), FACTOR (MC)]

2.6.4 Sampling or Regeneration for Mitigating Hallucination

  • Towards Mitigating Hallucination in Large Language Models via Self-Reflection, Findings of EMNLP 2023 [포스팅] [PubMedQA, MedQuAD, MEDIQA2019, LiveMedQA2017, MASH-QA]
  • SR: Self-refine: Iterative refinement with self-feedback, NeurIPS 2023 [포스팅] [Dialogue Response Generation, Code Optimization, Code Readability Improvement, Math Reasoning, Sentiment Reversal]
  • Inference-Time Intervention: Eliciting Truthful Answers from a Language Model, NeurIPS 2024 [포스팅] [NQ, TriviaQA, MMLU]
  • Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024 [포스팅] [자체 데이터세트]
  • USC: Universal self-consistency for large language model generation, ICML workshop 2024 [포스팅] [GSM8K, MATH Reasoning, TruthfulQA, BIRD-SQL, ARCADE, GovReport]
  • UCS: Lightweight reranking for language model generations, ACL 2024 [포스팅] [Xsum, MiniF2F, WMT14]
  • SE-SL, SE-RG: Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation, ACL 2024 [포스팅] [HumanEval, HumanEval+, BIRD-SQL , DailyMail, SummScreen, GSM8K, MATH]
  • FSC: Improving LLM Generations via Fine-Grained Self-Endorsement, Findings of ACL 2024 [포스팅] [Biographies, TriviaQA]
  • Self-Consistent Decoding for More Factual Open Responses, Preprint 2024 [포스팅]

2.6.5 Training for Mitigating Hallucination (IDK 포함)

  • Language Models (Mostly) Know What They Know, Anthropic 2022 [포스팅]
  • Do Large Language Models Know What They Don’t Know?, Findings of ACL 2023 [포스팅] [selfaware]
  • Fine-tuning Language Models for Factuality, ICLR 2024 [포스팅]
  • I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token, NeurIPS 2024 [포스팅]
  • Alignment for Honesty, NeurIPS 2024 [포스팅]
  • R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’, NAACL 2024 [포스팅]
  • Unfamiliar Finetuning Examples Control How Language Models Hallucinate, NAACL 2025 [포스팅]

2.7 Scaling inference

  • Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, Preprint 2024 [포스팅]
  • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, Preprint 2024 [포스팅]
  • Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, ICLR 2025 [포스팅]

2.8 LLM reasoner

  • STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning, NeruIPS 2022 [포스팅]
  • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, COLM 2024 [포스팅]

2.9 Alignment learning

  • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model [참고]
  • KTO: Model Alignment as Prospect Theoretic Optimization [참고]
  • ORPO: Monolithic Preference Optimization without Reference Model [참고]
  • Don't Use Your Data All at Once, COLING 2025 [참고]
  • SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning, Preprint 2025 [포스팅]

2.10 Prompting

  • Making Pre-trained Language Models Better Few-shot Learners, ACL 2021 [포스팅]
  • Exploring the Universal Vulnerability of Prompt-based Learning Paradigm, NAACL 2022 [포스팅]
  • Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT. INLG 2022 [포스팅]
  • Contrastive Chain-of-Thought Prompting, Preprint 2024 [포스팅]

2.12 Further (Continual) training

2.12.1 Language Transfer

  • Extrapolating Large Language Models to Non-English by Aligning Languages, Preprint 2023 [참고]
  • Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer, Findings of NAACL 2024 [포스팅]
  • LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation, Findings of ACL 2024 [포스팅]
  • Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca, Preprint 2023 [포스팅]
  • Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models, Preprint 2024 [포스팅]
  • RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining, Preprint 2024 [포스팅]
  • Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus, NVIDIA 2024 [포스팅]

2.12.2 Domain Transfer

  • BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, Preprint 2022 [포스팅]
  • Continual pre-training of language models, ICLR 2023 [포스팅]
  • Efficient continual pre-training for building domain specific large language models, Findings of ACL 2024 [포스팅]
  • Med-PaLM: Large language models encode clinical knowledge, Nature 2023 [포스팅]
  • Med-PaLM2: Towards Expert-Level Medical Question Answering with Large Language Models, Nature medicine 2025 [포스팅]

2.13 Multilingual LLM

  • A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models, ICLR 2024 [참고]
  • Cross-Lingual Supervision improves Large Language Models Pre-training, Preprint 2023 [포스팅]

2.13.1 Consistency

  • Beneath the Surface of Consistency: Exploring Cross-Lingual Knowledge Representation Sharing in LLMs, Preprint 2024 [포스팅]
  • CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment, Sumeval 2025 [포스팅]
  • Align after Pre-train: Improving Multilingual Generative Models with Cross-Lingual Alignment, Preprint [포스팅]

2.99 기타

  1. LoRA: Low-Rank Adaptation of Large Language Models, ICLR 2022 [포스팅]
  2. Taxonomy and Analysis of Sensitive User Queries in Generative AI Search, Review (NAVER) [포스팅]
  3. Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling, LREC-COLING 2024 [포스팅]
  4. Large Language Models for Data Annotation: A Survey, Preprint 2024 [포스팅]

3. Multi / Omni Models

  • Ola: Pushing the Frontiers of Omni-Modal Language Model, Preprint 2025 [포스팅]































댓글