LLM fine-tuning쪽 간단히 살펴보기

(2020) REALM: Retrieval-Augmented Language Model Pre-Training
  • https://www.youtube.com/watch?v=gtf770SDkX4
(2020) RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 



  • https://www.youtube.com/watch?v=gtOdvAQk6YU
(2020) FID: Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering


  • https://www.youtube.com/watch?v=6D-dcBH3KdU
(2021) RETRO: Improving Language Models by Retrieving from Trillions of Tokens

  • https://youtu.be/AtqzO7-B1K0
  • https://littlefoxdiary.tistory.com/107
(2022) FID-distillation: Distilling Knowledge from Reader to Retriever for Question Answering


  • https://www.youtube.com/watch?v=6D-dcBH3KdU
(2022) Atlas: Few-shot Learning with Retrieval Augmented Language Models
  • https://www.youtube.com/watch?v=U5vMXa8IYp4&t=2s
(2022) Re2G: Retrieve, Rerank, Generate
  • https://www.youtube.com/watch?v=QkR7nXErhtM
(2023) Self-RAG

  • https://ai-information.blogspot.com/2024/05/nl-215-self-rag-learning-to-retrieve.html
(2023) Active Retrieval Augmented Generation
(2023) Corrective Retrieval Augmented Generation
(2023) Learning to Filter Context for Retrieval-Augmented Generation
(2024) RAFT

  • https://ai-information.blogspot.com/2024/04/nl-214-raft-adapting-language-model-to.html
(2024) COCOM: Context Embeddings for Efficient Answer Generation in RAG
  • https://velog.io/@khs0415p/Paper

RAG

  • (24.05) Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
    • https://aiforeveryone.tistory.com/52
  • (24.06) Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
    • https://velog.io/@videorighter/%EB%85%BC%EB%AC%B8%EB%A6%AC%EB%B7%B0-Buffer-of-Thoughts-Thought-Augmented-Reasoning-with-Large-Language-Models
  • Survey
    • https://velog.io/@ash-hun/RAG-Survey-A-Survey-on-Retrieval-Augmented-Text-Generation-for-Large-LanguageModels-2024
  • https://www.youtube.com/@dsba2979/search?query=RAG
  • https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/research_updates/rag_research_table.md
(2022) PPO, Training language models to follow instructions with human feedback
  • https://dalpo0814.tistory.com/56
    • 여기서 L_ppo에 -가 빠진 것 같음. 어쨌거나 이 블로그와 그림이 잘 설명해줌
    • https://github.com/hpcaitech/ColossalAI/blob/2e16f842a9e5b1fb54e7e41070e9d2bb5cd64d7c/applications/ChatGPT/chatgpt/nn/loss.py#L25
  • https://github.com/airobotlab/KoChatGPT
    • 여기도 참고하기 나쁘지 않음
    • https://github.com/hpcaitech/ColossalAI/blob/2e16f842a9e5b1fb54e7e41070e9d2bb5cd64d7c/applications/ChatGPT/chatgpt/nn/loss.py
  • https://huggingface.co/blog/stackllama#stackllama-a-hands-on-guide-to-train-llama-with-rlhf
(2023) DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • https://dalpo0814.tistory.com/62
  • https://hi-lu.tistory.com/entry/Paper-DPO-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
  • https://rubato-yeong.github.io/language/dpo_vs_ppo/
  • PPO가 고점은 더 높다. 그래도 구현이 훨씬 간단하고, 메모리도 덜 먹기 때문에 DPO의 장점은 존재
sDPO: 
  • https://ai-information.blogspot.com/2024/07/nl-222-sdpo-dont-use-your-data-all-at.html
(2024) KTO: Model Alignment as Prospect Theoretic Optimization
  • https://ebbnflow.tistory.com/386
  • https://ai-for-value.tistory.com/29
  • preference 데이터가 아닌, response가 좋은지 나쁜지(즉, 좋아요 or 싫어요)에 대한 정보만 있으면 됨. imbalance한 데이터로도 학습이 가능.
  • 정제된 preference data가 있으면 DPO가 그래도 낫다
(2024) ORPO: Monolithic Preference Optimization without Reference Model
  • https://data-newbie.tistory.com/988
  • 예전에 unlikelihood loss로 LM학습하는거랑 느낌이 비슷한데.. preference 데이터를 바로 학습하는 것이다. 즉 SFT 학습후 진행되는게 아니라, win/lose 데이터의 개념을 loss에 녹여서 학습하는 것이다. 근데 이러면 pair로만 존재하는 데이터로만 학습하니까 데이터량이 확줄어들지 않나?
automatic dataset
  • https://youtu.be/pk_DoGp4Af8?t=3185
(2023) phi-3: Textbooks are all you need
  • https://all-the-meaning.tistory.com/m/41
  • https://bnmy6581.tistory.com/232
  • stackoverflow 필터링 데이터 및 Openai gpt 합성 데이터로 높은 성능을 달성
  • 모델 크기 및 데이터량이 작아도 다른 비교모델들보다 좋음
  • 하지만, 데이터세트 공개안됨
(2024) Cosmopedia
  • https://huggingface.co/blog/cosmopedia
Rejection-sampling fine-tuning
  • preference 데이터 어떻게 만들까?
LLM judge
  • Large Language Models Are State-of-the-Art Evaluators of Translation Quality, EAMT 2023 [포스팅]
  • Can Large Language Models Be an Alternative to Human Evaluations?, ACL 2023 [포스팅]
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Preprint 2023 [포스팅]



















댓글