Baseten (@baseten) / X

Baseten

2,490 posts

Baseten

@baseten

Inference is everything.

San Francisco and New York

Joined March 2021

已置顶
Baseten
@baseten
6月22日
We’re excited to announce our $1.5B Series F. Baseten exists to help companies own their intelligence and run AI products in production with speed, reliability, and control. As we enter this next chapter, three things are clear: 1. Customers like Abridge, Clay, Cursor, Decagon,
Tuhin Srivastava
@tuhinone
6月22日
Article
Announcing our Series F
Today, we are thrilled to announce Baseten’s $1.5B Series F, led by Altimeter Capital, Conviction Partners, and Spark Capital, co-led by Sands Capital and Wellington Management, with participation...
61.1K
Baseten reposted
ali
@waterloo_intern
3h
Article
some notes on writing the fastest video kernel in the world
in this worklog, I explain how we retrofitted sparsity into a model at inference time, yielding the world's fastest video generation kernel. we iteratively optimize a frontier oss model by 54x the...
14.9K
Baseten
@baseten
9h
NYC team repping at Nasdaq tower this week! 💚 We're growing fast, and we're hiring across the board. If you're excited about helping companies own their intelligence, reach out!
2.3K
Baseten
@baseten
9h
Careers at Baseten
From baseten.co
337
Baseten reposted
Amir Haghighat
@amiruci
11h
Live inference data is the best training data for speculators. The hard part is training them online without storing hidden states or impacting inference performance and reliability. We built exactly that into the Baseten Inference Stack. Result: +20% median acceptance rates,
2.3K
Baseten
@baseten
13h
Live draft model training is now part of our Speculation Engine in the Baseten Inference Stack. Where rolled out, we see a 20% median increase in acceptance rate (= 20% faster speculative decoding), with 100%+ on some constrained traffic patterns.
Rachel Rapp
@rachelrapp
13h
Article
Live draft model training for speculative decoding
Draft models, such as EAGLE-3 and DFlash, have become a widely adopted technique for accelerating large language model (LLM) inference, leading to 2-3x higher throughput and lower latency. However,...
2.5K
Baseten
@baseten
6月25日
Excited to power GLM-5.2 on @cline! How to use it in about 10 seconds:
00:00
5.8K
Baseten reposted
Madison Kanna
@Madisonkanna
6月24日
With the launch of GLM 5.2 this week, I see everyone asking "have open models caught up to closed models?" The more interesting question that's getting missed: what can you do with an open model that you can't do with a closed one? You can specialize them. And when you do, the
00:00
27:58
Madison Kanna
@Madisonkanna
3月26日
What is AI inference engineering, why is it such an in-demand skill, and how do you break into the field? With author of Inference Engineering @philipkiely and head of training at Baseten @oneill_c 0:00: What is inference? 2:47: History of inference 4:59: Downstream effects
12.4K
Baseten
@baseten
6月24日
"Frontier models for the hardest general intelligence and post-trained open source for high-volume and specialized workloads... Many specialized models, serving many specialized workflows, inside many specialized products." Thank you, Apoorv, for taking the time to write about
Apoorv Agrawal
@apoorv03
6月22日
Article
Why we are doubling down on Baseten
We backed Baseten in Q4 2025, and I wrote up the thesis then. Six months on, it has only gotten more obvious to us, and faster. By the end of Q1, Baseten had already surpassed the full-year CY26...
4.3K
Baseten reposted
Alex Ker 🔭
@thealexker
6月24日
Article
How to run GLM-5.2 in any harness
GLM-5.2 is this year’s DeepSeek moment. It’s already shifting the trajectory of how we interact with and consume intelligence. As we and our agents continue to tokenmax, tokenonomics and performance...
24.6K
Baseten
@baseten
6月24日
You can now access our GLM-5.2 API through the Merge Gateway! GLM-5.2 matches frontier model intelligence while running 4x+ faster and at 1/5th the cost. Try it out: merge.dev/gateway
4.4K
Baseten
@baseten
6月23日
"That's when they come to open-source models, that's when they come to Baseten, that's when they come to post-train models on Baseten, to be able to do it better, faster, and cheaper. That's when you get both intelligence everywhere and unit economics that make sense for your
Tuhin Srivastava
@tuhinone
6月23日
Thanks to @EdLudlow for having us on Bloomberg Tech yesterday to talk about our latest fundraise and the growing number of companies owning their open and specialized models.
00:00
4.5K
Baseten
@baseten
6月23日
Excited to be a day 0 launch partner for BioNeMo, NVIDIA's new, fully-open agent toolkit for scientific workflows! All 10 BioNeMo NIMs are available in our model library. Learn more in our announcement: baseten.co/blog/nvidia-bi…
00:35
NVIDIA Healthcare
@NVIDIAHealth
6月23日
Science is entering a new era - one where AI agents can do scientific work. 🧬 Today NVIDIA is launching the BioNeMo Agent Toolkit - an open, agent-ready toolkit that gives any AI agent callable tools for protein structure prediction, molecular docking, generative chemistry,
4.6K
Baseten reposted
Philip Kiely
@philipkiely
6月23日
Article
How we built the world’s fastest API for GLM-5.2
GLM-5.2 is the biggest news in open models since DeepSeek-R1. It’s easy to see why. GLM-5.2 delivers comparable performance to GPT 5.5 and Opus 4.8 at a fraction of the cost, generally 70-80% less...
519K