LLM fine-tuning쪽 간단히 살펴보기
(2020) REALM: Retrieval-Augmented Language Model Pre-Training
(2020) RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
(2020) FID: Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
(2022) FID-distillation: Distilling Knowledge from Reader to Retriever for Question Answering
(2022) Atlas: Few-shot Learning with Retrieval Augmented Language Models
- https://www.youtube.com/watch?v=QkR7nXErhtM
(2023) Self-RAG
- https://ai-information.blogspot.com/2024/05/nl-215-self-rag-learning-to-retrieve.html
(2023) Active Retrieval Augmented Generation
(2023) Corrective Retrieval Augmented Generation
(2023) Learning to Filter Context for Retrieval-Augmented Generation
RAG
- (24.05) Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
- https://aiforeveryone.tistory.com/52
- (24.06) Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
- https://velog.io/@videorighter/%EB%85%BC%EB%AC%B8%EB%A6%AC%EB%B7%B0-Buffer-of-Thoughts-Thought-Augmented-Reasoning-with-Large-Language-Models
- Survey
- https://velog.io/@ash-hun/RAG-Survey-A-Survey-on-Retrieval-Augmented-Text-Generation-for-Large-LanguageModels-2024
- https://www.youtube.com/@dsba2979/search?query=RAG
- https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/research_updates/rag_research_table.md
(2022) PPO, Training language models to follow instructions with human feedback
- https://dalpo0814.tistory.com/56
- 여기서 L_ppo에 -가 빠진 것 같음. 어쨌거나 이 블로그와 그림이 잘 설명해줌
- https://github.com/hpcaitech/ColossalAI/blob/2e16f842a9e5b1fb54e7e41070e9d2bb5cd64d7c/applications/ChatGPT/chatgpt/nn/loss.py#L25
- https://github.com/airobotlab/KoChatGPT
- 여기도 참고하기 나쁘지 않음
- https://github.com/hpcaitech/ColossalAI/blob/2e16f842a9e5b1fb54e7e41070e9d2bb5cd64d7c/applications/ChatGPT/chatgpt/nn/loss.py
- https://huggingface.co/blog/stackllama#stackllama-a-hands-on-guide-to-train-llama-with-rlhf
- https://dalpo0814.tistory.com/62
- https://hi-lu.tistory.com/entry/Paper-DPO-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
- https://rubato-yeong.github.io/language/dpo_vs_ppo/
- PPO가 고점은 더 높다. 그래도 구현이 훨씬 간단하고, 메모리도 덜 먹기 때문에 DPO의 장점은 존재
- https://ai-information.blogspot.com/2024/07/nl-222-sdpo-dont-use-your-data-all-at.html
- https://ebbnflow.tistory.com/386
- https://ai-for-value.tistory.com/29
- preference 데이터가 아닌, response가 좋은지 나쁜지(즉, 좋아요 or 싫어요)에 대한 정보만 있으면 됨. imbalance한 데이터로도 학습이 가능.
- 정제된 preference data가 있으면 DPO가 그래도 낫다
- https://data-newbie.tistory.com/988
- 예전에 unlikelihood loss로 LM학습하는거랑 느낌이 비슷한데.. preference 데이터를 바로 학습하는 것이다. 즉 SFT 학습후 진행되는게 아니라, win/lose 데이터의 개념을 loss에 녹여서 학습하는 것이다. 근데 이러면 pair로만 존재하는 데이터로만 학습하니까 데이터량이 확줄어들지 않나?
automatic dataset
- https://youtu.be/pk_DoGp4Af8?t=3185
- https://all-the-meaning.tistory.com/m/41
- https://bnmy6581.tistory.com/232
- stackoverflow 필터링 데이터 및 Openai gpt 합성 데이터로 높은 성능을 달성
- 모델 크기 및 데이터량이 작아도 다른 비교모델들보다 좋음
- 하지만, 데이터세트 공개안됨
- https://huggingface.co/blog/cosmopedia
- preference 데이터 어떻게 만들까?
LLM judge
댓글
댓글 쓰기