|
| 1 | +--- |
| 2 | +title: 'SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform' |
| 3 | + |
| 4 | +# Authors |
| 5 | +# If you created a profile for a user (e.g. the default `admin` user), write the username (folder name) here |
| 6 | +# and it will be replaced with their full name and linked to their profile. |
| 7 | +authors: |
| 8 | + - Kimia Saedi |
| 9 | + - Aditya Desai |
| 10 | + - Apoorv Walia |
| 11 | + - Jihyeong Lee |
| 12 | + - Keren Zhou |
| 13 | + - Anshumali Shrivastava |
| 14 | + |
| 15 | +# Author notes (optional) |
| 16 | +# author_notes: |
| 17 | +# - 'Equal contribution' |
| 18 | +# - 'Equal contribution' |
| 19 | + |
| 20 | +date: '2024-12-15T05:30:00Z' |
| 21 | +doi: '' |
| 22 | + |
| 23 | +# Schedule page publish date (NOT publication's date). |
| 24 | +publishDate: '2025-11-03T00:00:00Z' |
| 25 | + |
| 26 | +# Publication type. |
| 27 | +# Accepts a single type but formatted as a YAML list (for Hugo requirements). |
| 28 | +# Enter a publication type from the CSL standard. |
| 29 | +publication_types: ['paper-conference'] |
| 30 | + |
| 31 | +# Publication name and optional abbreviated publication name. |
| 32 | +publication: In The Thirty-eighth Annual Conference on Neural Information Processing Systems |
| 33 | +publication_short: In *NeurIPS 2024* |
| 34 | + |
| 35 | +abstract: Tensor multiplication with learned weight matrices is the fundamental building block in deep learning models. These matrices can often be sparsified, decomposed, quantized, or subjected to random parameter sharing without losing accuracy, suggesting the possibility of more efficient transforms. Although many variants of weight matrices exist, unstructured ones are incompatible with modern hardware, slowing inference and training. On the other hand, structured variants often limit expressivity or fail to deliver the promised latency benefits. We present Sketch Structured Transform (SS1), an expressive and GPU-friendly operator that accelerates inference. SS1 leverages parameter sharing in a random yet structured manner to reduce computation while retraining the rich expressive nature of parameter sharing. We confirm empirically that SS1 offers better quality-efficiency tradeoffs than competing variants. Interestingly SS1 can be combined with Quantization to achieve gains unattainable by either method alone, a finding we justify via theoretical analysis. The analysis may be of independent interest.Moreover, existing pre-trained models can be projected onto SS1 and finetuned for efficient deployment. Surprisingly, these projected models can perform reasonably well even without finetuning. Our experiments highlight various applications of the SS1:(a) Training GPT2 and DLRM models from scratch for faster inference. (b) Finetuning projected BERT models for 1.31× faster inference while maintaining GLUE scores. (c) Proof of concept with Llama-3-8b, showing 1.11× faster wall clock inference using projected SS1 layers without finetuning. We open source our code :https://github.com/apd10/Sketch-Structured-Linear/ |
| 36 | + |
| 37 | +# Summary. An optional shortened abstract. |
| 38 | +# summary: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum. |
| 39 | + |
| 40 | +tags: [] |
| 41 | + |
| 42 | +# Display this page in the Featured widget? |
| 43 | +featured: false |
| 44 | + |
| 45 | +# Custom links (uncomment lines below) |
| 46 | +# links: |
| 47 | +# - name: Custom Link |
| 48 | +# url: http://example.org |
| 49 | + |
| 50 | +url_pdf: 'https://proceedings.neurips.cc/paper_files/paper/2024/file/57ef0373c890b30407eadfe6e06c8c84-Paper-Conference.pdf' |
| 51 | +# url_code: 'https://github.com/HugoBlox/hugo-blox-builder' |
| 52 | +# url_dataset: 'https://github.com/HugoBlox/hugo-blox-builder' |
| 53 | +# url_poster: '' |
| 54 | +# url_project: '' |
| 55 | +# url_slides: 'uploads/MLSH2024.pdf' |
| 56 | +# url_source: 'https://github.com/HugoBlox/hugo-blox-builder' |
| 57 | +# url_video: 'https://youtube.com' |
| 58 | + |
| 59 | +# Featured image |
| 60 | +# To use, add an image named `featured.jpg/png` to your page's folder. |
| 61 | +# image: |
| 62 | +# caption: 'Image credit: [**Unsplash**](https://unsplash.com/photos/pLCdAaMFLTE)' |
| 63 | +# focal_point: '' |
| 64 | +# preview_only: false |
| 65 | + |
| 66 | +# Associated Projects (optional). |
| 67 | +# Associate this publication with one or more of your projects. |
| 68 | +# Simply enter your project's folder or file name without extension. |
| 69 | +# E.g. `internal-project` references `content/project/internal-project/index.md`. |
| 70 | +# Otherwise, set `projects: []`. |
| 71 | +# projects: |
| 72 | +# - example |
| 73 | + |
| 74 | +# Slides (optional). |
| 75 | +# Associate this publication with Markdown slides. |
| 76 | +# Simply enter your slide deck's filename without extension. |
| 77 | +# E.g. `slides: "example"` references `content/slides/example/index.md`. |
| 78 | +# Otherwise, set `slides: ""`. |
| 79 | +# slides: example |
| 80 | +--- |
| 81 | + |
| 82 | +<!-- {{% callout note %}} |
| 83 | +Click the _Cite_ button above to demo the feature to enable visitors to import publication metadata into their reference management software. |
| 84 | +{{% /callout %}} |
| 85 | +
|
| 86 | +{{% callout note %}} |
| 87 | +Create your slides in Markdown - click the _Slides_ button to check out the example. |
| 88 | +{{% /callout %}} --> |
| 89 | + |
| 90 | +<!-- Add the publication's **full text** or **supplementary notes** here. You can use rich formatting such as including [code, math, and images](https://docs.hugoblox.com/content/writing-markdown-latex/). --> |
0 commit comments