mirage-project / mirage
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
See what the GitHub community is most excited about today.
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
FSA/FST algorithms, differentiable, with PyTorch compatibility.
LLM training in simple, raw C/CUDA
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
cuVS - a library for vector search and clustering on the GPU
cuGraph - RAPIDS Graph Analytics Library
CUDA-accelerated GIS and spatiotemporal algorithms
RAPIDS Accelerator JNI For Apache Spark
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
A plugin to use Nvidia GPU in PySCF package
Tile primitives for speedy kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Lightning fast differentiable SSIM.