Inferless (@Inferless

Inferless

393 posts

Inferless

@Inferless_

Fastest serverless GPU inference for custom models (acq by @baseten)

San Francisco

Joined March 2023

已置顶
Inferless
@Inferless_
3月20日
Announcing the acquihire of Inferless by Baseten
From baseten.co
623
Inferless
@Inferless_
2024年8月20日
Join us for our first live townhall showcasing Inferless' new user interface 🎉 If you are a ML engineer who is struggling with issues like high cold-start times at peak loads and effective serverless orchestration for your ML workloads - we understand. We just released the
00:00
5.9K
Inferless
@Inferless_
2025年3月25日
🚀 We’re officially LIVE on @ProductHunt Check out the video below—it's a quick dive into the Inferless journey. We talk about why we started, the powerful benefits of Serverless GPUs, and give you a hands-on product demo. This video represents nearly two years of relentless
00:00
6.7K
Inferless
@Inferless_
2024年1月8日
🎉 Exciting news from our Inferless community 🎉 🌟 We're thrilled to support our user @Tenyx_AI 's incredible success. Their Tenyx_Chat-7B-v1 model is now leading the MT-Bench Leaderboard, surpassing giants like Chat GPT! We are a proud compute partner! 🚀 To help you get
2.7K
Inferless
@Inferless_
2023年6月23日
DM us! 💚
Aishwarya Goel (Ash)
@aishwarya_08
2023年6月23日
🚀 Hiring for 2 roles ( Product + Engineering) Hybrid Work Culture Salary range 30 -90 LPA + ESOPs + Perks! If you wanna work at a startup with an engineering-first culture whose ambition is to build a cloud Infra company proudly from India - read onwards 🧵
5K
Inferless
@Inferless_
2024年8月1日
▶︎ Inferless now supports text-to-speech streaming. Implement real-time TTS streaming using Parler-TTS model today: • Quickly stream the generated audio from text input • Supports various voices & languages • Stream the audio chunk by chunk
2.2K
Inferless
@Inferless_
2024年7月25日
Llama 3.1 is now available on Inferless. Deploy Meta's latest open-source LLM with enterprise-grade performance and scalability. • Cold start: 15.44 sec • 74.79 tokens/sec • Running on A100 (80GB) Start building today: 🔗
AI at Meta
@AIatMeta
2024年7月23日
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context
00:00
docs.inferless.com
Deploy Llama-3.1-8B-Instruct using Inferless - Inferless
Llama-3.1-8B-Instruct is a new state-of-the-art model from Meta's Lama-3.1 series of large language models. The repository is for the Llama-3.1-8B-Instruct model for deploying the model in the...
2.1K
Inferless
@Inferless_
2024年3月22日
🚀 Dive into our latest blog post on LLMs Speed Benchmark and uncover the nuances of different Inference libraries! 📊 We've put Gemma 7B, Llama-2 7B, and Mistral 7B to the test across a range of scenarios to bring you comprehensive insights. Our detailed analysis covered
224
Inferless
@Inferless_
2024年8月20日
🚀 New Inferless Dashboard UI is live! Introducing a cleaner and more intuitive dashboard. 🎬 dub.sh/inferless-pw
00:00
3.3K
Inferless
@Inferless_
2023年6月14日
We're excited to recap our exciting meetup on Deploying Generative AI Models, hosted with @huggingface & @peakxvpartners on June 10th, 2023 in Bangalore. Hundreds of AI, ML, and data science enthusiasts joined us to showcase their groundbreaking projects.
526
Inferless
@Inferless_
2024年1月4日
🎉Kicking off the new year by optimizing @upstageai 's SOLAR-10.7B-Instruct-v1.0 with Auto-GPTQ Library. When deployed on an A100 GPU using vLLM, the model shows impressive performance. Although Auto-GPTQ was an option, our experience suggests that vLLM is the superior choice
565
Inferless
@Inferless_
2024年1月31日
🎉 Congratulations to @SpoofSense on the launch of SpoofSense Face! 🚀 Your innovative Facial Liveness Detection model marks a significant step forward in safeguarding businesses against deepfake identity fraud. We're proud to be your Compute Partner and look forward to seeing
00:00
1.8K
Inferless
@Inferless_
2024年1月17日
Our Cofounder & CTO @nilesh_agarwal2 recently authored a tutorial on @huggingface blog, where he explores how developers can utilize fractional GPUs in Kubernetes to potentially save up to 50% in costs. This approach involves dividing a single GPU into seven smaller units, each
2K
Inferless
@Inferless_
2025年3月18日
We're launching on Product Hunt in a week for the first time! If you've been waiting for an effortless way to deploy AI models—this is it. Would really appreciate your support. Set your reminders- be sure to click "Notify Me" to get updates on our launch on Product Hunt.
474