Efficient Paper
Home
Paper List
By Year
By Keyword
By Publication
By Institution
By Author
Weekly Paper
Efficient Paper
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Key idea:
Load-as-Sparse and Compute-as-Dense