author
Aaron Courville
Abhay Gupta
Adam Fisch
Aixin Liu
Ajay Jaiswal
Amir Gholami
Amir H. Abdi
André F. T. Martins
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
AdaSplash | AdaSplash: Adaptive Sparse Flash Attention | note |
Aohan Zeng
Aojun Zhou
Aonian Li
Arvind Krishnamurthy
Ashish Panwar
Bairu Hou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
IFPruning | Instruction-Following Pruning for Large Language Models | ![]() |
note | ||
KVLink | KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse | ![]() |
note |
Bangwei Gong
Baris Kasikci
Bei Feng
Beidi Chen
Bin Cui
Bin Gao
Bin Lin
Bin Wang
Bing Xue
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Bingxuan Wang
Bo Yang
Bochao Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Bohan Zhuang
Boji Shan
Bowen Xu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
LaRoSA | La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation | ![]() |
note | ||
GLM-4.5 | GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models | note |
Chang Chen
Chang Gao
Chao Wang
Chao Yang
Chaojun Xiao
Chen Chen
Chen Zhang
Cheng Zhu
Chengda Lu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Chenggang Zhao
Chengqi Deng
Chengquan Jiang
Chengruidong Zhang
Chenyang Song
Chenyu Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Chong Ruan
Christos Kozyrakis
Chuang Gan
Chunhao Zhang
Clark Barrett
Cody Hao Yu
Coleman Hooper
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
SqueezeLLM | SqueezeLLM: Dense-and-Sparse Quantization | ![]() |
note | ||
KVQuant | KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | ![]() |
note |
Congchao Guo
Da Chen
Damai Dai
Dan Alistarh
Daxin Jiang
Daya Guo
DeepSeek-AI
Dejian Yang
Deli Chen
Dianhai Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FlashMask | FlashMask: Efficient and Rich Mask Extension of FlashAttention | ![]() |
note | ||
CCQ | CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs | note |
Dong Li
Dongjie Ji
Dongsheng Li
Dongwon Jo
Dongyang Wang
Eldar Kurtic
Elias Frantar
Emad Barsoum
Enwei Jiao
Erhang Li
Fan Yang
Fangcheng Fu
Fangyun Lin
Fei Huang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
CateKV | CateKV: On Sequential Consistency for Long-Context LLM Inference Acceleration | ![]() |
note | ||
Qwen3 | Qwen3 Technical Report | ![]() |
note |
Fucong Dai
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Fuli Luo
Furu Wei
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Q-Sparse | Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | ![]() |
note | ||
ReSA | Rectified Sparse Attention | ![]() |
note |
Genghan Zhang
Gongfan Fang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
LLM-Pruner | LLM-Pruner: On the Structural Pruning of Large Language Models | ![]() |
note | ||
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note |
Guanchen Li
Guangbo Hao
Guangxuan Xiao
Guanting Chen
Guohao Dai
Guowei Li
H. Zhang
Hai Zhao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption | ![]() |
note | ||
SIFT | Sparse is Enough in Fine-tuning Pre-trained Large Language Models | note |
Haibin Lin
Haibo Chen
Haifeng Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FlashMask | FlashMask: Efficient and Rich Mask Extension of FlashAttention | ![]() |
note | ||
CCQ | CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs | note |
Hailin Zhang
Han Bao
Hanshi Sun
Hanwei Xu
Hao Zhang
Haocheng Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Haocheng Xi
Haofeng Huang
Haohai Sun
Haoli Bai
Haotian Tang
Haotong Xie
Haowei Zhang
Hayden Kwok-Hay So
Heung-Yeung Shum
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MFA | Multi-matrix Factorization Attention | note | |||
Step-3 | Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding | note |
Hong Zhou
Honghui Ding
Hongsheng Li
Hrayr Harutyunyan
Huajian Xin
Huazuo Gao
Hui Li
Hui Qu
Huiqiang Jiang
Iman Mirzadeh
Ion Stoica
J. L. Cai
J. Zico Kolter
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Wanda | A Simple and Effective Pruning Approach for Large Language Models | ![]() |
note | ||
massive-activations | Massive Activations in Large Language Models | ![]() |
note |
Jae-Joon Kim
Jan Kautz
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note | ||
Minitron | Compact Language Models via Pruning and Knowledge Distillation | ![]() |
note |
Jason D. Lee
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MeZO | Fine-Tuning Language Models with Just Forward Passes | note | |||
m | Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark | note |
Jayashree Mohan
Jeff Pool
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Channel Permutations for N:M Sparsity | ||||
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note |
Jia Wei
Jiajie Zhang
Jiaming Tang
Jiaming Xu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
SpecEE | SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | ![]() |
note |
Jian Liang
Jianfei Chen
Jianfeng Gao
Jiangfei Duan
Jianxi Ye
Jianyong Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
LeanK | LeanK: Learnable K Cache Channel Pruning for Efficient Decoding | ![]() |
note | ||
ReSA | Rectified Sparse Attention | ![]() |
note |
Jianzhong Guo
Jiaqi Ni
Jiaqi Zhuang
Jiashi Li
Jiawei Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Jiayuan Song
Jidong Zhai
Jie Liu
Jie Tang
Jie Ye
Jie Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DBudgetKV | DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance | ![]() |
note | ||
MiniCPM4 | MiniCPM4: Ultra-Efficient LLMs on End Devices | ![]() |
note |
Jin Chen
Jin Fang
Jin Zhu
Jing Liu
Jingchang Chen
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Jingyang Li
Jingyang Yuan
Jintao Zhang
Joseph E. Gonzalez
Juanzi Li
Jun Zhu
Junhao Xu
Junjie Qiu
Junjie Yan
Junlong Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Junxian Guo
Junxiao Song
Junyang Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
CateKV | CateKV: On Sequential Consistency for Long-Context LLM Inference Acceleration | ![]() |
note | ||
Qwen3 | Qwen3 Technical Report | ![]() |
note |
Kai Dong
Kai Hu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Kaige Gao
Kan Zhu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Quest | Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference | ![]() |
note | ||
NanoFlow | NanoFlow: Towards Optimal Large Language Model Serving Throughput | ![]() |
note |
Kang Guan
Kang Zhao
Ke Hong
Kecheng Xiao
Kehong Yuan
Kexin Huang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Krishna Teja Chitty-Venkata
Kuai Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Kurt Keutzer
Le Han
Lean Wang
Lecong Zhang
Lefei Zhang
Lei Chen
Lei Xu
Leyang Wang
Leyi Xia
Li Dong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Li-Wen Chang
Lian Liu
Lianfei Yu
Liang Zhao
Lianmin Zheng
Liheng Feng
Lili Qiu
Lin Li
Lin Zheng
Litong Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Liyue Zhang
Lu Hou
Mao Yang
Maosong Sun
Marcos Treviso
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
AdaSplash | AdaSplash: Adaptive Sparse Flash Attention | note |
Mark Kurtz
Mehrdad Farajtabar
Meng Li
Mengdi Wang
Miaojun Wang
Michael Goin
Michael Hassid
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
TOVA | Transformers are Multi-State RNNs | ![]() |
note |
Michael W. Mahoney
Mingchuan Zhang
Minghao Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
CCQ | CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs | note | |||
GLM-4.5 | GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models | note |
Minghua Zhang
Minghui Tang
Mingjie Sun
Mingming Li
Mingyuan Chi
MiniMax
Minmin Sun
Minsik Cho
Mohammad Rastegari
Mozhi Zhang
Murali Emani
Ning Tian
Ningxin Zheng
Nipun Kwatra
Panpan Huang
Pavlo Molchanov
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note | ||
Minitron | Compact Language Models via Pruning and Knowledge Distillation | ![]() |
note |
Peiyi Wang
Peng Gao
Peng Sun
Peng Zhang
Pengcheng He
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
AdaLoRA | AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning | ![]() |
|||
LoftQ | LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models | ![]() |
note |
Pengfei Li
Pengfei Zuo
Pengle Zhang
Pengyu Zhao
Ping Luo
Qi Hou
Qianchao Zhu
Qiancheng Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Qianhui Wu
Qidi Xu
Qihao Zhu
Qin Wang
Qingru Zhang
Qinkai Zheng
Qinyu Chen
Qiushi Du
R. J. Chen
R. L. Jin
Ramachandran Ramjee
Ramya Prabhu
Rayan Saab
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
GPFQ | A Greedy Algorithm for Quantizing Neural Networks | ||||
GPFQv2 | Post-training Quantization for Neural Networks with Provable Guarantees |
Roy Schwartz
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Efficient Methods for Natural Language Processing: A Survey | ![]() |
|||
TOVA | Transformers are Multi-State RNNs | ![]() |
note |
Ruihang Lai
Ruiqi Ge
Ruisong Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Ruitao Leng
Ruizhe Pan
Runji Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Runxin Xu
Ruoyu Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Ruyi Chen
S. S. Li
Saeed Maleki
Sangmin Bae
Saurav Muralidharan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note | ||
Minitron | Compact Language Models via Pruning and Knowledge Distillation | ![]() |
note |
Sean Lie
Sehoon Kim
Shang Yang
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shaoqing Wu
Shengding Hu
Shengen Yan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Shengfeng Ye
Shengmin Shi
Shijie Cao
Shirong Ma
Shiwei Liu
Shiyao Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Shiyu Chang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
IFPruning | Instruction-Following Pruning for Large Language Models | ![]() |
note | ||
KVLink | KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse | ![]() |
note |
Shiyu Wang
Shreyas Saxena
Shuang Zhou
Shuiping Yu
Shunfeng Zhou
Shuo Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
InfiniteBench | Bench: Extending Long Context Evaluation Beyond 100K Tokens | note | |||
MiniCPM4 | MiniCPM4: Ultra-Efficient LLMs on End Devices | ![]() |
note |
Shuo Yang
Shuqi Yu
Shuting Pan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Sichen Li
Size Zheng
Song Han
Songquan Zhu
Stephanie Wang
Surin Ahn
T. Wang
Tal Schuster
Tao Xie
Tao Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FLAP | Fluctuation-based Adaptive Structured Pruning for Large Language Models | ![]() |
|||
MXFP4Train | Training LLMs with MXFP4 | ![]() |
note |
Tao Yuan
Tao Yun
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Tian Pei
Tianle Cai
Tianlong Chen
Tianqi Chen
Tianqi Wu
Tianrun Liang
Tianyu Fu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Tianyu Gao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MeZO | Fine-Tuning Language Models with Just Forward Passes | note | |||
LLM-shearing | Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning | ![]() |
note |
Tianyu Sun
Tianzhu Ye
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Tim Dettmers
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
QLoRA | QLoRA: Efficient Finetuning of Quantized LLMs | ![]() |
|||
SpQR | SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression |
Ting Cao
Tong Yang
Torsten Hoefler
Tri Dao
Tuo Zhao
Vithursan Thangarasa
W. L. Xiao
Wangding Zeng
Wanjia Zhao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Wei An
Wei Lin
Wei Wang
Weigao Sun
Weilin Zhao
Weixuan Sun
Weiyu Cheng
Weiyu Huang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Accelerating Transformer Pre-training with 2:4 Sparsity | note | |||
AdaptiveSparseTrainer | Pruning Large Language Models with Semi-Structural Adaptive Sparse Training | ![]() |
note |
Weizhu Chen
Wen Liu
Wenfeng Liang
Wenjun Gao
Wenkai Li
Wenlei Bao
Wenqi Shao
Wenqin Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Wentao Zhang
Woosuk Kwon
Wulong Liu
X. Q. Li
Xiafei Qiu
Xiandong Zhao
Xiang Liu
Xiangjun Song
Xiangyu Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MFA | Multi-matrix Factorization Attention | note | |||
Step-3 | Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding | note |
Xiangyue Jin
Xianzhi Yu
Xianzu Wang
Xiao Bi
Xiao Su
Xiaodong Han
Xiaodong Ji
Xiaodong Liu
Xiaohan Wang
Xiaojin Shen
Xiaokang Chen
Xiaokang Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xiaosha Chen
Xiaotao Gu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
QA-LoRA | QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models | ![]() |
note | ||
GLM-4.5 | GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models | note |
Xiaotao Nie
Xiaowei Li
Xiaowen Sun
Xiaoxiang Wang
Xin Chen
Xin Cheng
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xin Jin
Xin Liu
Xin Lv
Xin Xie
Xinchao Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
LLM-Pruner | LLM-Pruner: On the Structural Pruning of Large Language Models | ![]() |
note | ||
MaskLLM | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | ![]() |
note |
Xingchao Liu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xingcheng Zhang
Xingkai Yu
Xinnan Song
Xinxia Shan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xinyi Zhou
Xinyu Yang
Xinyu Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
0VRXJQ3F | Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving | ![]() |
note | ||
MoBA | MoBA: Mixture of Block Attention for Long-Context LLMs | ![]() |
note |
Xinyuan Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xinzhu Hou
Xiuhong Li
Xu Han
Xu Owen He
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FoX | Forgetting Transformer: Softmax Attention with a Forget Gate | note | |||
ACP | Adaptive Computation Pruning for the Forgetting Transformer | note |
Xuan Lu
Xuecheng Su
Xuefei Ning
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MoA | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | ![]() |
note |
Xuegui Zheng
Xufang Luo
Xuheng Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Xun Zou
Xuyang Shen
Y. K. Li
Y. Q. Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Y. Wu
Y. X. Wei
Y. X. Zhu
Yan Gong
Yang Li
Yang Zhang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yanhong Xu
Yankai Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReLU2 | ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs | ![]() |
note | ||
MiniCPM4 | MiniCPM4: Ultra-Efficient LLMs on End Devices | ![]() |
note |
Yanping Huang
Yao Li
Yao Zhao
Yaofeng Sun
Yaohui Li
Yaohui Wang
Yefei He
Yehui Tang
Yi Yu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yi Zheng
Yichao Zhang
Yifan Shi
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yikai Zhang
Yiliang Xiong
Yilong Zhao
Ying He
Ying Sheng
Ying Tang
Yingfa Chen
Yinhe Han
Yiran Zhong
Yishi Piao
Yisong Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yiwu Yao
Yixiao Li
Yixin Dong
Yixin Song
Yixuan Tan
Yiyang Ma
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yiyuan Liu
Yiyuan Ma
Yizhao Gao
Yong Li
Yongqiang Guo
Yongyi Hu
Yu Cheng
Yu Wang
Yu Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuan Ou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuandong Tian
Yuanxiang Fan
Yuchen Zhu
Yucheng Li
Yuduan Wang
Yue Gong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuezhou Hu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | Accelerating Transformer Pre-training with 2:4 Sparsity | note | |||
AdaptiveSparseTrainer | Pruning Large Language Models with Semi-Structural Adaptive Sparse Training | ![]() |
note |
Yufeng Yang
Yuhang Li
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
BRECQ | BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction | ||||
GLM-4.5 | GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models | note |
Yuhao Li
Yuheng Zou
Yuhui Xu
Yujia He
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yujun Lin
Yukun Zha
Yulhwa Kim
Yunan Huang
Yunfan Xiong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yunhe Wang
Yunji Li
Yunxian Ma
Yunzhi Xu
Yuqing Xia
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Yuqing Yang
Yushi Bai
Yutao Sun
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
ReSA | Rectified Sparse Attention | ![]() |
note | ||
SeerAttention-R | SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | ![]() |
note |
Yuting Yan
Yuxiang Luo
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Yuxiang You
Yuxiao Dong
Yuxin Mao
Yuxin Wu
Yuxiong He
Yuxuan Li
Yuxuan Liu
Yuyang Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Z. F. Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Z. Z. Ren
Zefan Cai
Zehan Li
Zehui Ren
Zeping Li
Zeyu Mi
Zhangli Sha
Zhangyang Wang
Zhe Fu
Zhean Xu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhen Dong
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
SqueezeLLM | SqueezeLLM: Dense-and-Sparse Quantization | ![]() |
note | ||
R-KV | R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration | ![]() |
note |
Zhen Huang
Zhen Qin
Zhen Zhang
Zhenda Xie
Zhengxiao Du
Zhengyan Zhang
Zhenhua Fan
Zhenyu Zhang
Zhewei Yao
Zhewen Hao
Zhibin Gou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhicheng Ma
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhigang Yan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhihang Yu
Zhihang Yuan
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
RPTQ | RPTQ: Reorder-based Post-training Quantization for Large Language Models | note | |||
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note |
Zhihong Shao
Zhilin Yang
Zhipeng Xu
Zhixuan Lin
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
FoX | Forgetting Transformer: Softmax Attention with a Forget Gate | note | |||
ACP | Adaptive Computation Pruning for the Forgetting Transformer | note |
Zhiyu Wu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zhiyuan Liu
Zhongyu Zhang
Zhou Yu
Zhuang Liu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
Wanda | A Simple and Effective Pruning Approach for Large Language Models | ![]() |
note | ||
massive-activations | Massive Activations in Large Language Models | ![]() |
note |
Zhuo Jiang
Zhuomin He
Zhuoshu Li
Zihan Wang
Zihao Ye
Ziheng Jiang
Zihui Gu
Zijia Wu
Zijia Zhu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zijun Liu
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Zili Wang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
MFA | Multi-matrix Factorization Attention | note | |||
Step-3 | Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding | note |
Zilin Li
Ziqing Yang
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
TextPruner | TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models | ![]() |
|||
GRAIN | Gradient-based Intra-attention Pruning on Pre-trained Language Models | ![]() |
note |
Ziwei Ji
Ziwei Xie
Zixiao Huang
Zixuan Zhou
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
m | A Survey on Efficient Inference for Large Language Models | ![]() |
note | ||
MiniCPM4 | MiniCPM4: Ultra-Efficient LLMs on End Devices | ![]() |
note |
Ziyang Song
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |
Ziyi Gao
Meta | Title | Cover | Publish | Code | Note |
---|---|---|---|---|---|
DeepSeek-V3 | DeepSeek-V3 Technical Report | ![]() |
note | ||
DeepSeek-R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | ![]() |
note |