Paper History (2)

LLM 관련 논문이 너무 많아져서.. LLM 위주의 paper만 따로 모아서 보자 

이전페이지: https://ai-information.blogspot.com/2022/05/paper-history.html

* 읽어볼것

  • 읽을 것 (우선순위)
    • 백본/서비스형 모델
      • Llama 2: Open Foundation and Fine-Tuned Chat Models, Preprint 2023
      • Orion-14B: Open-source Multilingual Large Language Models
      • Mistral 7B
      • Yi: Open Foundation Models by 01.AI
      • SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
      • The Claude 3 Model Family: Opus, Sonnet, Haiku
      • Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
      • Gemma: Open Models Based on Gemini Research and Technology
    • LLM infer
      • Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
      • Transfer Q⋆ : Principled Decoding for LLM Alignment
      • Think before you speak: Training Language Models With Pause Tokens, ICLR 2024
    • How do Large Language Models Handle Multilingualism
    • Deepspeed / vllm
    • LIMA: Less Is More for Alignment, Preprint 2023
    • RoPE
    • Chain-of-Thought Reasoning Without Prompting
    • BATCH CALIBRATION: RETHINKING CALIBRATION FOR IN-CONTEXT LEARNING AND PROMPT ENGINEERING
    • Scaling Instruction-Finetuned Language Models
    • Stealing Part of a Production Language Model
    • Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
    • Contrastive post-training large language models on data curriculum.
    • CITING: Large Language Models Create Curriculum for Instruction Tuning
  • 읽을 것 (후순위)
    • Small Models are Valuable Plug-ins for Large Language Models
    • Let's Verify Step by Step, OpenAI 2023
    • Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, ACL 2023
    • Challenges and Applications of Large Language Models
    • Self-Alignment with Instruction Back translation
    • STRUC-BENCH: Are Large Language Models Really Good at Generating Complex Structured Data?
  • Hallucination
  • 읽을 논문 찾기
    • https://github.com/jxzhangjhu/Awesome-LLM-RAG
    • Large Language Models: A Survey
    • https://github.com/Hannibal046/Awesome-LLM?tab=readme-ov-file
    • https://github.com/dair-ai/ML-Papers-of-the-Week
    • https://www.promptingguide.ai/papers
  • 저자로 참여한 논문

1. PLM

1.1 PLM Models

  1. (ELMo) Deep contextualized word representations. NAACL 2018. [pdf] [project]
  2. (GPT) Improving Language Understanding by Generative Pre-Training. Preprint. [pdf] [project
  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. [code & model] [포스팅]
  4. (GPT-2) Language Models are Unsupervised Multitask Learners. Preprint. [code[포스팅]
  5. (MT-DNN) Multi-Task Deep Neural Networks for Natural Language Understanding. ACL 2019. [code & model[포스팅]
  6. XLNet: Generalized Autoregressive Pretraining for Language Understanding. NeurIPS 2019. [code & model] [포스팅]
  7. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. ACL 2020. [포스팅]
  8. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. ICLR 2020. [pdf]
  9. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scorings. ICLR 2020. [포스팅]
  10. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. ACL 2020. [code] [포스팅]
  11. Pre-training via Paraphrasing. NeurIPS 2020. [포스팅]
  12. Cloze-driven Pretraining of Self-attention Networks, EMNLP 2019 [포스팅]
  13. ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data. [포스팅]
  14. Reformer: The Efficient Transformer. [포스팅]
  15. Linformer: Self-Attention with Linear Complexity. [포스팅]
  16. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention, EMNLP 2020 [포스팅]

1.2 Knowledge Distillation & Model Compression

  1. TinyBERT: Distilling BERT for Natural Language Understanding. Preprint. [code & model] [포스팅]
  2. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Preprint. [포스팅]
  3. Patient Knowledge Distillation for BERT Model Compression. EMNLP 2019. [code] [포스팅]
  4. Small and Practical BERT Models for Sequence LabelingEMNLP 2019. [포스팅]
  5. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR 2020. [포스팅]
  6. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Preprint. [포스팅]

1.3 Analysis

  1. Language Models as Knowledge Bases? EMNLP 2019, [code] [포스팅]
  2. Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models. EMNLP 2020. [포스팅]

2. LLM

2.1 LLM Models

  • (GPT3) Language Models are Few-Shot Learners [포스팅]
  • (InstructGPT) Training language models to follow instructions with human feedback, OpenAI 2022.03 [포스팅]
  • GPT-4 Technical Report, OpenAI [포스팅]
  • LLaMA: Open and Efficient Foundation Language Models, Preprint 2023 [포스팅]

2.2 Knowledge Distillation & Model Compression

  1. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, Findings of ACL 2023 [포스팅]
  2. Large Language Models Are Reasoning Teachers, ACL 2023 [포스팅]

2.3 Analysis

  1. The False Promise of Imitating Proprietary LLMs, Preprint 2023 [포스팅]
  2. Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning, ACL 2023 [포스팅]

2.4 LLM Evaluator

  • Large Language Models Are State-of-the-Art Evaluators of Translation Quality, EAMT 2023 [포스팅]
  • Can Large Language Models Be an Alternative to Human Evaluations?, ACL 2023 [포스팅]
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Preprint 2023 [포스팅]

2.5 LLM + RAG

  • RAFT: Adapting Language Model to Domain Specific RAG, Preprint 2024 [포스팅]
  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, ICLR 2024 [포스팅]
  • 스크리닝 & 자세히 볼 것
    • RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 
    • FID: Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
    • RETRO: Improving Language Models by Retrieving from Trillions of Tokens
    • FID-distillation: Distilling Knowledge from Reader to Retriever for Question Answering
    • Atlas: Few-shot Learning with Retrieval Augmented Language Models
    • Re2G: Retrieve, Rerank, Generate
    • Active Retrieval Augmented Generation
    • Corrective Retrieval Augmented Generation
    • Learning to Filter Context for Retrieval-Augmented Generation
    • COCOM: Context Embeddings for Efficient Answer Generation in RAG

2.6 Hallucination

  1. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?, Preprint 2024 [포스팅]
  2. How Language Model Hallucinations Can Snowball, ICML 2024 [포스팅]

2.7 LLM + data

    • Large Language Models for Data Annotation: A Survey, Preprint 2024 [포스팅]

    2.8 Scaling inference

    • Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, Preprint 2024 [포스팅]
    • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, Preprint 2024 [포스팅]
    • Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, ICLR 2025

    2.9 LLM reasoner

    • STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning, NeruIPS 2022 [포스팅]
    • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, COLM 2024 [포스팅]

    2.10 Alignment learning

    • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model [참고]
    • KTO: Model Alignment as Prospect Theoretic Optimization [참고]
    • ORPO: Monolithic Preference Optimization without Reference Model [참고]
    • Don't Use Your Data All at Once, COLING 2025 [참고]

    2.99 기타

    1. LoRA: Low-Rank Adaptation of Large Language Models, ICLR 2022 [포스팅]
    2. Taxonomy and Analysis of Sensitive User Queries in Generative AI Search, Review (NAVER) [포스팅]
    3. Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling, LREC-COLING 2024 [포스팅]

































    댓글