author
Aaron Courville
Abhay Gupta
Adam Fisch
Aixin Liu
Ajay Jaiswal
Amir Gholami
Amir H. Abdi
André F. T. Martins
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
AdaSplash | AdaSplash: Adaptive Sparse Flash Attention | note |
Aojun Zhou
Arvind Krishnamurthy
Ashish Panwar
Bairu Hou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
IFPruning | Instruction-Following Pruning for Large Language Models | ![]() |
note | ||
KVLink | KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse | ![]() |
note |
Baris Kasikci
Bei Feng
Beidi Chen
Bin Gao
Bin Lin
Bin Wang
Bing Xue
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Bingxuan Wang
Bochao Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Chang Chen
Chang Gao
Chao Yang
Chaojun Xiao
Chen Chen
Chen Zhang
Chengda Lu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Chenggang Zhao
Chengqi Deng
Chengquan Jiang
Chengruidong Zhang
Chenyang Song
Chenyu Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Chong Ruan
Christos Kozyrakis
Chuang Gan
Clark Barrett
Cody Hao Yu
Coleman Hooper
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
SqueezeLLM | SqueezeLLM: Dense-and-Sparse Quantization | ![]() |
note | ||
KVQuant | KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | note |
Damai Dai
Dan Alistarh
Daxin Jiang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MFA | Multi-matrix Factorization Attention | note | |||
Step-3 | Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding | note |
Daya Guo
DeepSeek-AI
Dejian Yang
Deli Chen
Dianhai Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FlashMask | FlashMask: Efficient and Rich Mask Extension of FlashAttention | ![]() |
note | ||
CCQ | CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs | note |
Dong Li
Dongjie Ji
Dongsheng Li
Dongyang Wang
Eldar Kurtic
Elias Frantar
Erhang Li
Fan Yang
Fangyun Lin
Fei Huang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
CateKV | CateKV: On Sequential Consistency for Long-Context LLM Inference Acceleration | ![]() |
note | ||
Qwen3 | Qwen3 Technical Report | ![]() |
note |
Fucong Dai
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Fuli Luo
Furu Wei
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Q-Sparse | Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | ![]() |
note | ||
ReSA | Rectified Sparse Attention | ![]() |
note |
Genghan Zhang
Gongfan Fang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
LLM-Pruner | LLM-Pruner: On the Structural Pruning of Large Language Models | ![]() |
note | ||
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note |
Guanchen Li
Guangbo Hao
Guangxuan Xiao
Guanting Chen
Guohao Dai
Guowei Li
H. Zhang
Hai Zhao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption | ![]() |
note | ||
SIFT | Sparse is Enough in Fine-tuning Pre-trained Large Language Models | note |
Haibin Lin
Haibo Chen
Haifeng Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FlashMask | FlashMask: Efficient and Rich Mask Extension of FlashAttention | ![]() |
note | ||
CCQ | CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs | note |
Han Bao
Hanshi Sun
Hanwei Xu
Hao Zhang
Haocheng Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Haocheng Xi
Haofeng Huang
Haoli Bai
Haotian Tang
Haotong Xie
Haowei Zhang
Hayden Kwok-Hay So
Heung-Yeung Shum
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MFA | Multi-matrix Factorization Attention | note | |||
Step-3 | Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding | note |
Honghui Ding
Hongsheng Li
Hrayr Harutyunyan
Huajian Xin
Huazuo Gao
Hui Li
Hui Qu
Huiqiang Jiang
Iman Mirzadeh
Ion Stoica
J. L. Cai
J. Zico Kolter
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Wanda | A Simple and Effective Pruning Approach for Large Language Models | ![]() |
note | ||
massive-activations | Massive Activations in Large Language Models | ![]() |
note |
Jan Kautz
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note | ||
Minitron | Compact Language Models via Pruning and Knowledge Distillation | ![]() |
note |
Jason D. Lee
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MeZO | Fine-Tuning Language Models with Just Forward Passes | note | |||
m | Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark | note |
Jayashree Mohan
Jeff Pool
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Channel Permutations for N:M Sparsity | ||||
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note |
Jia Wei
Jiaming Tang
Jiaming Xu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
SpecEE | SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | ![]() |
note |
Jian Liang
Jianfei Chen
Jianfeng Gao
Jiangfei Duan
Jianxi Ye
Jianyong Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
LeanK | LeanK: Learnable K Cache Channel Pruning for Efficient Decoding | ![]() |
note | ||
ReSA | Rectified Sparse Attention | ![]() |
note |
Jianzhong Guo
Jiaqi Ni
Jiashi Li
Jiawei Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Jie Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DBudgetKV | DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance | ![]() |
note | ||
MiniCPM4 | MiniCPM4: Ultra-Efficient LLMs on End Devices | ![]() |
note |
Jin Chen
Jin Fang
Jingchang Chen
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Jingyang Yuan
Jintao Zhang
Joseph E. Gonzalez
Jun Zhu
Junjie Qiu
Junlong Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Junxian Guo
Junxiao Song
Junyang Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
CateKV | CateKV: On Sequential Consistency for Long-Context LLM Inference Acceleration | ![]() |
note | ||
Qwen3 | Qwen3 Technical Report | ![]() |
note |
Kai Dong
Kai Hu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Kaige Gao
Kan Zhu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Quest | Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference | ![]() |
note | ||
NanoFlow | NanoFlow: Towards Optimal Large Language Model Serving Throughput | ![]() |
note |
Kang Guan
Kang Zhao
Ke Hong
Kehong Yuan
Kexin Huang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Kuai Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Kurt Keutzer
Lean Wang
Lecong Zhang
Lefei Zhang
Lei Chen
Lei Xu
Leyi Xia
Li Dong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Li-Wen Chang
Lian Liu
Liang Zhao
Lianmin Zheng
Lili Qiu
Litong Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Liyue Zhang
Lu Hou
Mao Yang
Maosong Sun
Marcos Treviso
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
AdaSplash | AdaSplash: Adaptive Sparse Flash Attention | note |
Mark Kurtz
Mehrdad Farajtabar
Meng Li
Mengdi Wang
Miaojun Wang
Michael Goin
Michael Hassid
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
TOVA | Transformers are Multi-State RNNs | ![]() |
note |
Michael W. Mahoney
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Mingjie Sun
Mingming Li
Minmin Sun
Ning Tian
Ningxin Zheng
Nipun Kwatra
Panpan Huang
Pavlo Molchanov
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note | ||
Minitron | Compact Language Models via Pruning and Knowledge Distillation | ![]() |
note |
Peiyi Wang
Peng Sun
Peng Zhang
Pengcheng He
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
AdaLoRA | AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning | ![]() |
|||
LoftQ | LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models | ![]() |
note |
Pengfei Zuo
Pengle Zhang
Qi Hou
Qianchao Zhu
Qiancheng Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Qianhui Wu
Qihao Zhu
Qinyu Chen
Qiushi Du
R. J. Chen
R. L. Jin
Ramachandran Ramjee
Ramya Prabhu
Rayan Saab
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
GPFQ | A Greedy Algorithm for Quantizing Neural Networks | ||||
GPFQv2 | Post-training Quantization for Neural Networks with Provable Guarantees |
Roy Schwartz
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
TOVA | Transformers are Multi-State RNNs | ![]() |
note |
Ruihang Lai
Ruiqi Ge
Ruisong Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Ruizhe Pan
Runji Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Runxin Xu
Ruoyu Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Ruyi Chen
S. S. Li
Sangmin Bae
Saurav Muralidharan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note | ||
Minitron | Compact Language Models via Pruning and Knowledge Distillation | ![]() |
note |
Sean Lie
Sehoon Kim
Shang Yang
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shaoqing Wu
Shengen Yan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Shengfeng Ye
Shijie Cao
Shirong Ma
Shiwei Liu
Shiyao Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Shiyu Chang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
IFPruning | Instruction-Following Pruning for Large Language Models | ![]() |
note | ||
KVLink | KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse | ![]() |
note |
Shiyu Wang
Shreyas Saxena
Shuang Zhou
Shuiping Yu
Shunfeng Zhou
Shuo Yang
Shuting Pan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Size Zheng
Song Han
Stephanie Wang
Surin Ahn
T. Wang
Tal Schuster
Tao Xie
Tao Yuan
Tao Yun
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Tian Pei
Tianle Cai
Tianlong Chen
Tianqi Chen
Tianqi Wu
Tianyu Fu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Tianyu Gao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MeZO | Fine-Tuning Language Models with Just Forward Passes | note | |||
LLM-shearing | Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning | ![]() |
note |
Tianyu Sun
Tianzhu Ye
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Tim Dettmers
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
QLoRA | QLoRA: Efficient Finetuning of Quantized LLMs | ![]() |
|||
SpQR | SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression |
Ting Cao
Tong Yang
Torsten Hoefler
Tri Dao
Tuo Zhao
Vithursan Thangarasa
W. L. Xiao
Wangding Zeng
Wanjia Zhao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Wei An
Wei Lin
Wei Wang
Weilin Zhao
Weiyu Huang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Accelerating Transformer Pre-training with 2:4 Sparsity | note | |||
AdaptiveSparseTrainer | Pruning Large Language Models with Semi-Structural Adaptive Sparse Training | ![]() |
note |
Weizhu Chen
Wen Liu
Wenfeng Liang
Wenjun Gao
Wenlei Bao
Wenqin Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Wentao Zhang
Woosuk Kwon
Wulong Liu
X. Q. Li
Xiafei Qiu
Xiandong Zhao
Xiang Liu
Xiangyu Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MFA | Multi-matrix Factorization Attention | note | |||
Step-3 | Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding | note |
Xiangyue Jin
Xianzhi Yu
Xianzu Wang
Xiao Bi
Xiaodong Liu
Xiaohan Wang
Xiaojin Shen
Xiaokang Chen
Xiaokang Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xiaosha Chen
Xiaotao Nie
Xiaowei Li
Xiaowen Sun
Xiaoxiang Wang
Xin Chen
Xin Cheng
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xin Jin
Xin Liu
Xin Xie
Xinchao Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
LLM-Pruner | LLM-Pruner: On the Structural Pruning of Large Language Models | ![]() |
note | ||
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note |
Xingchao Liu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xingcheng Zhang
Xingkai Yu
Xinnan Song
Xinxia Shan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xinyi Zhou
Xinyu Yang
Xinyu Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MoBA | MoBA: Mixture of Block Attention for Long-Context LLMs | ![]() |
note | ||
0VRXJQ3F | Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving | ![]() |
note |
Xinyuan Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xiuhong Li
Xu Han
Xu Owen He
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FoX | Forgetting Transformer: Softmax Attention with a Forget Gate | note | |||
ACP | Adaptive Computation Pruning for the Forgetting Transformer | note |
Xuecheng Su
Xuefei Ning
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Xuegui Zheng
Xufang Luo
Xuheng Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Y. K. Li
Y. Q. Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Y. Wu
Y. X. Wei
Y. X. Zhu
Yang Li
Yang Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yanhong Xu
Yankai Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReLU2 | ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs | ![]() |
note | ||
MiniCPM4 | MiniCPM4: Ultra-Efficient LLMs on End Devices | ![]() |
note |
Yann Le Cun
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
OBD | Optimal Brain Damage | ||||
OBD | Optimal Brain Damage |
Yanping Huang
Yao Li
Yao Zhao
Yaofeng Sun
Yaohui Li
Yaohui Wang
Yehui Tang
Yi Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yi Zheng
Yichao Zhang
Yifan Shi
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yikai Zhang
Yiliang Xiong
Yilong Zhao
Ying He
Ying Sheng
Ying Tang
Yingfa Chen
Yinhe Han
Yishi Piao
Yisong Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yiwu Yao
Yixiao Li
Yixin Dong
Yixin Song
Yixuan Tan
Yiyang Ma
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yiyuan Liu
Yiyuan Ma
Yizhao Gao
Yong Li
Yongqiang Guo
Yu Cheng
Yu Wang
Yu Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuan Ou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuandong Tian
Yuchen Zhu
Yucheng Li
Yuduan Wang
Yue Gong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuezhou Hu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Accelerating Transformer Pre-training with 2:4 Sparsity | note | |||
AdaptiveSparseTrainer | Pruning Large Language Models with Semi-Structural Adaptive Sparse Training | ![]() |
note |
Yuheng Zou
Yuhui Xu
Yujia He
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yujun Lin
Yukun Zha
Yulhwa Kim
Yunfan Xiong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yunhe Wang
Yunxian Ma
Yuqing Xia
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Yuqing Yang
Yutao Sun
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Yuting Yan
Yuxiang Luo
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuxiang You
Yuxin Wu
Yuxiong He
Yuxuan Li
Yuxuan Liu
Yuyang Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Z. F. Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Z. Z. Ren
Zefan Cai
Zehui Ren
Zeyu Mi
Zhangli Sha
Zhangyang Wang
Zhe Fu
Zhean Xu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhen Dong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
SqueezeLLM | SqueezeLLM: Dense-and-Sparse Quantization | ![]() |
note | ||
R-KV | R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration | ![]() |
note |
Zhen Huang
Zhen Zhang
Zhenda Xie
Zhengyan Zhang
Zhenyu Zhang
Zhewei Yao
Zhewen Hao
Zhibin Gou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhicheng Ma
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhigang Yan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhihang Yuan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
RPTQ | RPTQ: Reorder-based Post-training Quantization for Large Language Models | note | |||
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note |
Zhihong Shao
Zhilin Yang
Zhipeng Xu
Zhixuan Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FoX | Forgetting Transformer: Softmax Attention with a Forget Gate | note | |||
ACP | Adaptive Computation Pruning for the Forgetting Transformer | note |
Zhiyu Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhiyuan Liu
Zhongyu Zhang
Zhou Yu
Zhuang Liu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Wanda | A Simple and Effective Pruning Approach for Large Language Models | ![]() |
note | ||
massive-activations | Massive Activations in Large Language Models | ![]() |
note |
Zhuomin He
Zhuoshu Li
Zihan Wang
Zihao Ye
Ziheng Jiang
Zihui Gu
Zijia Zhu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zijun Liu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zili Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MFA | Multi-matrix Factorization Attention | note | |||
Step-3 | Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding | note |
Zilin Li
Ziqing Yang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
TextPruner | TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models | ![]() |
|||
GRAIN | Gradient-based Intra-attention Pruning on Pre-trained Language Models | ![]() |
note |
Ziwei Ji
Ziwei Xie
Zixiao Huang
Zixuan Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MiniCPM4 | MiniCPM4: Ultra-Efficient LLMs on End Devices | ![]() |
note |
Ziyang Song
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Ziyi Gao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zizheng Pan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |