LLM fine-tuning쪽 간단히 살펴보기

(2020) REALM: Retrieval-Augmented Language Model Pre-Training

https://www.youtube.com/watch?v=gtf770SDkX4

(2020) RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

https://www.youtube.com/watch?v=gtOdvAQk6YU

(2020) FID: Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

https://www.youtube.com/watch?v=6D-dcBH3KdU

(2021) RETRO: Improving Language Models by Retrieving from Trillions of Tokens
https://youtu.be/AtqzO7-B1K0
https://littlefoxdiary.tistory.com/107
(2022) FID-distillation: Distilling Knowledge from Reader to Retriever for Question Answering

https://www.youtube.com/watch?v=6D-dcBH3KdU

(2022) Atlas: Few-shot Learning with Retrieval Augmented Language Models

https://www.youtube.com/watch?v=U5vMXa8IYp4&t=2s

(2022) Re2G: Retrieve, Rerank, Generate

https://www.youtube.com/watch?v=QkR7nXErhtM

(2023) Self-RAG

https://ai-information.blogspot.com/2024/05/nl-215-self-rag-learning-to-retrieve.html

(2023) Active Retrieval Augmented Generation

(2023) Corrective Retrieval Augmented Generation

(2023) Learning to Filter Context for Retrieval-Augmented Generation

(2024) RAFT

https://ai-information.blogspot.com/2024/04/nl-214-raft-adapting-language-model-to.html

(2024) COCOM: Context Embeddings for Efficient Answer Generation in RAG

https://velog.io/@khs0415p/Paper

RAG

(24.05) Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

https://aiforeveryone.tistory.com/52

(24.06) Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

https://velog.io/@videorighter/%EB%85%BC%EB%AC%B8%EB%A6%AC%EB%B7%B0-Buffer-of-Thoughts-Thought-Augmented-Reasoning-with-Large-Language-Models

Survey

https://velog.io/@ash-hun/RAG-Survey-A-Survey-on-Retrieval-Augmented-Text-Generation-for-Large-LanguageModels-2024

https://www.youtube.com/@dsba2979/search?query=RAG
https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/research_updates/rag_research_table.md

(2022) PPO, Training language models to follow instructions with human feedback

https://dalpo0814.tistory.com/56

여기서 L_ppo에 -가 빠진 것 같음. 어쨌거나 이 블로그와 그림이 잘 설명해줌
https://github.com/hpcaitech/ColossalAI/blob/2e16f842a9e5b1fb54e7e41070e9d2bb5cd64d7c/applications/ChatGPT/chatgpt/nn/loss.py#L25

https://github.com/airobotlab/KoChatGPT

여기도 참고하기 나쁘지 않음
https://github.com/hpcaitech/ColossalAI/blob/2e16f842a9e5b1fb54e7e41070e9d2bb5cd64d7c/applications/ChatGPT/chatgpt/nn/loss.py

https://huggingface.co/blog/stackllama#stackllama-a-hands-on-guide-to-train-llama-with-rlhf

(2023) DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model

https://dalpo0814.tistory.com/62
https://hi-lu.tistory.com/entry/Paper-DPO-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
https://rubato-yeong.github.io/language/dpo_vs_ppo/
PPO가 고점은 더 높다. 그래도 구현이 훨씬 간단하고, 메모리도 덜 먹기 때문에 DPO의 장점은 존재

sDPO:

https://ai-information.blogspot.com/2024/07/nl-222-sdpo-dont-use-your-data-all-at.html

(2024) KTO: Model Alignment as Prospect Theoretic Optimization

https://ebbnflow.tistory.com/386
https://ai-for-value.tistory.com/29
preference 데이터가 아닌, response가 좋은지 나쁜지(즉, 좋아요 or 싫어요)에 대한 정보만 있으면 됨. imbalance한 데이터로도 학습이 가능.
정제된 preference data가 있으면 DPO가 그래도 낫다

(2024) ORPO: Monolithic Preference Optimization without Reference Model

https://data-newbie.tistory.com/988
예전에 unlikelihood loss로 LM학습하는거랑 느낌이 비슷한데.. preference 데이터를 바로 학습하는 것이다. 즉 SFT 학습후 진행되는게 아니라, win/lose 데이터의 개념을 loss에 녹여서 학습하는 것이다. 근데 이러면 pair로만 존재하는 데이터로만 학습하니까 데이터량이 확줄어들지 않나?

automatic dataset

https://youtu.be/pk_DoGp4Af8?t=3185

(2023) phi-3: Textbooks are all you need

https://all-the-meaning.tistory.com/m/41
https://bnmy6581.tistory.com/232
stackoverflow 필터링 데이터 및 Openai gpt 합성 데이터로 높은 성능을 달성
모델 크기 및 데이터량이 작아도 다른 비교모델들보다 좋음
하지만, 데이터세트 공개안됨

(2024) Cosmopedia

https://huggingface.co/blog/cosmopedia

Rejection-sampling fine-tuning

preference 데이터 어떻게 만들까?

LLM judge

Large Language Models Are State-of-the-Art Evaluators of Translation Quality, EAMT 2023 [포스팅]
Can Large Language Models Be an Alternative to Human Evaluations?, ACL 2023 [포스팅]
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Preprint 2023 [포스팅]

인공지능, AI, NLP, 논문 리뷰, Natural Language, Leetcode

AI Information

LLM fine-tuning쪽 간단히 살펴보기

댓글

댓글 쓰기