Reid Hoffman Weighs In: The 'Tokenmaxxing' Bet to Forge the Next Compute Giant

Published 1 month ago• 4 minute read

Uche Emeka

Reid Hoffman Weighs In: The 'Tokenmaxxing' Bet to Forge the Next Compute Giant

The concept of “tokenmaxxing,” which involves tracking employee AI token usage as a measure of AI adoption and potential productivity, has recently taken Silicon Valley by storm, sparking considerable debate. An AI token represents a small piece of data processed by an AI model to understand prompts and generate responses, also serving as the unit for measuring AI service costs. Companies have begun tracking token usage as a proxy for identifying employees who actively embrace AI tools, leading to the coining of “tokenmaxxing,” drawing on Gen Z slang for optimizing something.

However, this practice has faced scrutiny, with engineers questioning its viability as a productivity metric, drawing parallels to ranking individuals based on their spending habits. Despite this, LinkedIn co-founder and venture capitalist Reid Hoffman has publicly supported the concept. In an interview at Semafor’s World Economy summit, Hoffman advocated for tracking employee token spend as a valuable dashboard for companies adopting AI. He emphasized the importance of widespread engagement and experimentation with AI across various functions, stating that while token usage might not be a perfect measure of productivity, it provides insight into who is engaging with AI, even if some usage is exploratory or experimental. Hoffman advised pairing token tracking with an understanding of *how* tokens are being used, encouraging a diverse group of employees to collectively and simultaneously experiment with AI. He also suggested embedding AI across organizations and conducting weekly check-ins to share learnings and new applications for AI, both personally and for company productivity.

Beyond individual employee usage, there is an immense and growing demand for AI tokens at an infrastructural level. As Parasail CEO Mike Henry observes, developers building software on generative AI models often express a mantra: “Give me tokens. Just give me tokens. I want them fast. I want them cheap. I want them now.” This reflects the critical need for efficient and cost-effective AI inference — the process of running a trained AI model to make predictions or generate content.

Parasail, a company that provides cloud computing services for running AI models for inference, is directly addressing this demand. Generating an astounding 500 billion tokens a day, Parasail recently secured $32 million in Series A funding. Henry, who previously built the cloud offering for LLM-focused chipmaker Groq, recognized early on that developers would require specialized cloud processing for AI models. While Parasail uses some of its own GPUs, its core strategy involves renting processing time from 40 data centers across 15 countries and purchasing more from liquidity markets. By cleverly orchestrating workloads and avoiding demand peaks, Parasail aims to significantly drive down the cost of inference requests, positioning itself to compete with firms that own their own silicon and may be constrained by existing customer commitments.

Parasail's potential is further amplified by the continued proliferation of open-source models and agents, which are increasingly being favored over offerings from frontier labs like Anthropic and OpenAI due to growing costs and friction. Andreas Stuhlmüller, CEO of Elicit (a startup developing an LLM-based research assistant), highlights this shift, noting that his pharmaceutical company customers have moved towards open models for initial screenings to reduce costs, especially as they integrate agents to improve their offerings. This hybrid architecture, where open models handle initial tasks before a more capable frontier model provides a final answer, is becoming prevalent. The explosion of model queries, particularly with the rise of agents in software development, is driving substantial investment into infrastructure providers like Parasail that facilitate cheap inference. Samir Kumar of Touring Capital projects that inference will account for at least 20% of future software development costs.

In the crowded cloud compute market, Henry believes Parasail distinguishes itself through its exclusive focus on inference (eschewing training) and its willingness to accommodate startup customers without demanding long-term commitments. This differentiates it from larger cloud-computing companies focused on enterprise business and even well-funded competitors in the cloud inference space. Despite the inherent risk of serving mostly seed and Series B startups in the unpredictable AI sector, investors remain optimistic. Steve Jang of Kindred Ventures asserts that the economics of deploying models will necessitate the kind of compute brokerage Parasail offers, even before widespread adoption of models for content generation and robotics. He emphatically states, “Everyone thought there was an AI bubble. There’s no AI bubble. Inference demand is far outstripping supply.”