Paper History (2)
LLM 관련 논문이 너무 많아져서.. LLM 위주의 paper만 따로 모아서 보자
이전페이지: https://ai-information.blogspot.com/2022/05/paper-history.html
* 읽어볼것
- 읽을 것 (우선순위)
- 백본/서비스형 모델
- Llama 2: Open Foundation and Fine-Tuned Chat Models, Preprint 2023
- Orion-14B: Open-source Multilingual Large Language Models
- Mistral 7B
- Yi: Open Foundation Models by 01.AI
- SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
- The Claude 3 Model Family: Opus, Sonnet, Haiku
- Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
- Gemma: Open Models Based on Gemini Research and Technology
- LLM infer
- Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
- Transfer Q⋆ : Principled Decoding for LLM Alignment
- Think before you speak: Training Language Models With Pause Tokens, ICLR 2024
- How do Large Language Models Handle Multilingualism
- Deepspeed / vllm
- LIMA: Less Is More for Alignment, Preprint 2023
- RoPE
- Chain-of-Thought Reasoning Without Prompting
- BATCH CALIBRATION: RETHINKING CALIBRATION FOR IN-CONTEXT LEARNING AND PROMPT ENGINEERING
- Scaling Instruction-Finetuned Language Models
- Stealing Part of a Production Language Model
- Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
- Contrastive post-training large language models on data curriculum.
- CITING: Large Language Models Create Curriculum for Instruction Tuning
- 읽을 것 (후순위)
- Small Models are Valuable Plug-ins for Large Language Models
- Let's Verify Step by Step, OpenAI 2023
- Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, ACL 2023
- Challenges and Applications of Large Language Models
- Self-Alignment with Instruction Back translation
- STRUC-BENCH: Are Large Language Models Really Good at Generating Complex Structured Data?
- Hallucination
- https://vr25.github.io/lrec-coling-hallucination-tutorial/
- Cognitive Mirage: A Review of Hallucinations in Large Language Models (Ye et al., 2023)
- Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024
- 읽을 논문 찾기
- https://github.com/jxzhangjhu/Awesome-LLM-RAG
- Large Language Models: A Survey
- https://github.com/Hannibal046/Awesome-LLM?tab=readme-ov-file
- https://github.com/dair-ai/ML-Papers-of-the-Week
- https://www.promptingguide.ai/papers
- 저자로 참여한 논문
1. PLM
- 참고
1.1 PLM Models
- (ELMo) Deep contextualized word representations. NAACL 2018. [pdf] [project]
- (GPT) Improving Language Understanding by Generative Pre-Training. Preprint. [pdf] [project]
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. [code & model] [포스팅]
- (GPT-2) Language Models are Unsupervised Multitask Learners. Preprint. [code] [포스팅]
- (MT-DNN) Multi-Task Deep Neural Networks for Natural Language Understanding. ACL 2019. [code & model] [포스팅]
- XLNet: Generalized Autoregressive Pretraining for Language Understanding. NeurIPS 2019. [code & model] [포스팅]
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. ACL 2020. [포스팅]
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. ICLR 2020. [pdf]
- Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scorings. ICLR 2020. [포스팅]
- TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. ACL 2020. [code] [포스팅]
- Pre-training via Paraphrasing. NeurIPS 2020. [포스팅]
- Cloze-driven Pretraining of Self-attention Networks, EMNLP 2019 [포스팅]
- ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data. [포스팅]
- Reformer: The Efficient Transformer. [포스팅]
- Linformer: Self-Attention with Linear Complexity. [포스팅]
- LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention, EMNLP 2020 [포스팅]
1.2 Knowledge Distillation & Model Compression
- TinyBERT: Distilling BERT for Natural Language Understanding. Preprint. [code & model] [포스팅]
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Preprint. [포스팅]
- Patient Knowledge Distillation for BERT Model Compression. EMNLP 2019. [code] [포스팅]
- Small and Practical BERT Models for Sequence Labeling. EMNLP 2019. [포스팅]
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR 2020. [포스팅]
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Preprint. [포스팅]
1.3 Analysis
2. LLM
2.1 LLM Models
2.2 Knowledge Distillation & Model Compression
- Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, Findings of ACL 2023 [포스팅]
- Large Language Models Are Reasoning Teachers, ACL 2023 [포스팅]
2.3 Analysis
- The False Promise of Imitating Proprietary LLMs, Preprint 2023 [포스팅]
- Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning, ACL 2023 [포스팅]
2.4 LLM Evaluator
- Large Language Models Are State-of-the-Art Evaluators of Translation Quality, EAMT 2023 [포스팅]
- Can Large Language Models Be an Alternative to Human Evaluations?, ACL 2023 [포스팅]
- G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Preprint 2023 [포스팅]
2.5 LLM + RAG
- RAFT: Adapting Language Model to Domain Specific RAG, Preprint 2024 [포스팅]
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, ICLR 2024 [포스팅]
- 스크리닝 & 자세히 볼 것
- RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- FID: Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
- RETRO: Improving Language Models by Retrieving from Trillions of Tokens
- FID-distillation: Distilling Knowledge from Reader to Retriever for Question Answering
- Atlas: Few-shot Learning with Retrieval Augmented Language Models
- Re2G: Retrieve, Rerank, Generate
- Active Retrieval Augmented Generation
- Corrective Retrieval Augmented Generation
- Learning to Filter Context for Retrieval-Augmented Generation
- COCOM: Context Embeddings for Efficient Answer Generation in RAG
2.6 Hallucination
2.7 LLM + data
- Large Language Models for Data Annotation: A Survey, Preprint 2024 [포스팅]
2.8 Scaling inference
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, Preprint 2024 [포스팅]
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, Preprint 2024 [포스팅]
- Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, ICLR 2025
댓글
댓글 쓰기