Paper history (3)
텍스트가 아닌 모달리티(오디오, 옴니모델)들에 관련된 논문 읽기
읽어볼것
- MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, Preprint 2025
1. Audio
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations, NeurIPS 2020 [포스팅]
- HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Unit, TASLP 2021 [포스팅]
- SoundStream: An End-to-End Neural Audio Codec, TASLP 2021 [포스팅]
- EnCodec: High Fidelity Neural Audio Compression, TMLR 2023 [포스팅]
- Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, OpenAI 2022 [포스팅]
- CLAP: Contrastive Language-Audio Pretraining, Preprint 2022 [포스팅]
2. Multi / Omni Models
- Ola: Pushing the Frontiers of Omni-Modal Language Model, Preprint 2025 [포스팅]
- MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, Preprint 2025
댓글
댓글 쓰기