Log inSign up
Baseten
2,490 posts
user avatar
Baseten
@baseten
Inference is everything.
San Francisco and New York
baseten.co
Joined March 2021
351
Following
12.9K
Followers
  • 已置顶
    user avatar
    Baseten
    @baseten
    6月22日
    We’re excited to announce our $1.5B Series F. Baseten exists to help companies own their intelligence and run AI products in production with speed, reliability, and control. As we enter this next chapter, three things are clear: 1. Customers like Abridge, Clay, Cursor, Decagon,
    user avatar
    Tuhin Srivastava
    @tuhinone
    6月22日
    Article
    Announcing our Series F
    Today, we are thrilled to announce Baseten’s $1.5B Series F, led by Altimeter Capital, Conviction Partners, and Spark Capital, co-led by Sands Capital and Wellington Management, with participation...
    61.1K
  • Baseten reposted
    user avatar
    ali
    Baseten
    @waterloo_intern
    3h
    Article cover image
    Article
    some notes on writing the fastest video kernel in the world
    in this worklog, I explain how we retrofitted sparsity into a model at inference time, yielding the world's fastest video generation kernel. we iteratively optimize a frontier oss model by 54x the...
    14.9K
  • user avatar
    Baseten
    @baseten
    9h
    NYC team repping at Nasdaq tower this week! 💚 We're growing fast, and we're hiring across the board. If you're excited about helping companies own their intelligence, reach out!
    2.3K
    user avatar
    Baseten
    @baseten
    9h
    Careers at Baseten
    From baseten.co
    337
  • Baseten reposted
    user avatar
    Amir Haghighat
    Baseten
    @amiruci
    11h
    Live inference data is the best training data for speculators. The hard part is training them online without storing hidden states or impacting inference performance and reliability. We built exactly that into the Baseten Inference Stack. Result: +20% median acceptance rates,
    2.3K
  • user avatar
    Baseten
    @baseten
    13h
    Live draft model training is now part of our Speculation Engine in the Baseten Inference Stack. Where rolled out, we see a 20% median increase in acceptance rate (= 20% faster speculative decoding), with 100%+ on some constrained traffic patterns.
    user avatar
    Rachel Rapp
    Baseten
    @rachelrapp
    13h
    Article cover image
    Article
    Live draft model training for speculative decoding
    Draft models, such as EAGLE-3 and DFlash, have become a widely adopted technique for accelerating large language model (LLM) inference, leading to 2-3x higher throughput and lower latency. However,...
    2.5K
  • user avatar
    Baseten
    @baseten
    6月25日
    Excited to power GLM-5.2 on @cline! How to use it in about 10 seconds:
    00:00
    5.8K
  • Baseten reposted
    user avatar
    Madison Kanna
    @Madisonkanna
    6月24日
    With the launch of GLM 5.2 this week, I see everyone asking "have open models caught up to closed models?" The more interesting question that's getting missed: what can you do with an open model that you can't do with a closed one? You can specialize them. And when you do, the
    00:00
    27:58
    user avatar
    Madison Kanna
    @Madisonkanna
    3月26日
    What is AI inference engineering, why is it such an in-demand skill, and how do you break into the field? With author of Inference Engineering @philipkiely and head of training at Baseten @oneill_c 0:00: What is inference? 2:47: History of inference 4:59: Downstream effects
    12.4K
  • user avatar
    Baseten
    @baseten
    6月24日
    "Frontier models for the hardest general intelligence and post-trained open source for high-volume and specialized workloads... Many specialized models, serving many specialized workflows, inside many specialized products." Thank you, Apoorv, for taking the time to write about
    user avatar
    Apoorv Agrawal
    Altimeter Capital
    @apoorv03
    6月22日
    Article cover image
    Article
    Why we are doubling down on Baseten
    We backed Baseten in Q4 2025, and I wrote up the thesis then. Six months on, it has only gotten more obvious to us, and faster. By the end of Q1, Baseten had already surpassed the full-year CY26...
    4.3K
  • Baseten reposted
    user avatar
    Alex Ker 🔭
    Baseten
    @thealexker
    6月24日
    Article cover image
    Article
    How to run GLM-5.2 in any harness
    GLM-5.2 is this year’s DeepSeek moment. It’s already shifting the trajectory of how we interact with and consume intelligence. As we and our agents continue to tokenmax, tokenonomics and performance...
    24.6K
  • user avatar
    Baseten
    @baseten
    6月24日
    You can now access our GLM-5.2 API through the Merge Gateway! GLM-5.2 matches frontier model intelligence while running 4x+ faster and at 1/5th the cost. Try it out: merge.dev/gateway
    4.4K
  • user avatar
    Baseten
    @baseten
    6月23日
    "That's when they come to open-source models, that's when they come to Baseten, that's when they come to post-train models on Baseten, to be able to do it better, faster, and cheaper. That's when you get both intelligence everywhere and unit economics that make sense for your
    user avatar
    Tuhin Srivastava
    @tuhinone
    6月23日
    Thanks to @EdLudlow for having us on Bloomberg Tech yesterday to talk about our latest fundraise and the growing number of companies owning their open and specialized models.
    00:00
    4.5K
  • user avatar
    Baseten
    @baseten
    6月23日
    Excited to be a day 0 launch partner for BioNeMo, NVIDIA's new, fully-open agent toolkit for scientific workflows! All 10 BioNeMo NIMs are available in our model library. Learn more in our announcement: baseten.co/blog/nvidia-bi…
    00:35
    user avatar
    NVIDIA Healthcare
    NVIDIA
    @NVIDIAHealth
    6月23日
    Science is entering a new era - one where AI agents can do scientific work. 🧬 Today NVIDIA is launching the BioNeMo Agent Toolkit - an open, agent-ready toolkit that gives any AI agent callable tools for protein structure prediction, molecular docking, generative chemistry,
    4.6K
  • Baseten reposted
    user avatar
    Philip Kiely
    Baseten
    @philipkiely
    6月23日
    Article cover image
    Article
    How we built the world’s fastest API for GLM-5.2
    GLM-5.2 is the biggest news in open models since DeepSeek-R1. It’s easy to see why. GLM-5.2 delivers comparable performance to GPT 5.5 and Opus 4.8 at a fraction of the cost, generally 70-80% less...
    519K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up