Amazon Ends AI Leaderboard as Tokenmaxxing Costs Rise

Amazon has shut down Kirorank, an internal leaderboard that ranked employees by how many AI tokens they burned, after staff started gaming it on purpose. The practice has a name now: tokenmaxxing, where workers point AI agents at pointless, low-value tasks just to inflate their score and climb the rankings. The dashboard, built informally by a group of employees and live for only a few weeks, did exactly what a usage metric does when you reward volume. It rewarded volume.

The irony is that the tools were meant to prove AI was paying off. Instead they taught people how to spend money proving it. Amazon is the latest name on a growing list that already includes Meta, Uber and Microsoft, all of which have quietly pulled back after the bill came in louder than the benefit.

Amazon Pulled the Plug on Kirorank

The leaderboard sat on Kiro, Amazon’s in-house AI coding platform, and it tracked one thing: how often employees were leaning on AI. Dave Treadwell, an Amazon senior vice president, sent staff a blunt instruction after the gaming became obvious. “Please don’t use AI just for the sake of using AI,” he wrote in a memo first reported by the Financial Times.

An Amazon spokesperson told CNET the tool was never sanctioned in the first place. “One of the internal dashboards, called Kirorank, was recently created by a group of employees who wanted to drive awareness for how AI can accelerate work, and was never intended to promote the use of AI for usage’s sake,” the spokesperson said. “The beta dashboard was not a formal or approved tool, and has since been deprecated.”

The company says it still tracks token usage to understand cost and efficiency, but it has moved the spotlight to a metric it calls normalised deployments, meant to capture work that actually ships rather than raw activity. The message to engineers is that the number that matters is output, not throughput.

Amazon AI leaderboard shutdown and tokenmaxxing pullback across major tech companies.

How a Usage Score Turned Into Tokenmaxxing

To follow the problem you need three plain definitions, because the whole mess turns on them.

Tokens are the chunks of text an AI model reads and writes, and they are the unit cloud providers bill on. More work, more tokens, more cost.
Tokenmaxxing is treating token count as a productivity score, then chasing the score by feeding AI busywork it was never needed for.
Normalised deployments is Amazon’s replacement yardstick, which ties credit to completed, useful work instead of how loud the meter runs.

This is a textbook case of a measure becoming a target and then ceasing to be useful, the dynamic economists call Goodhart’s law. A leaderboard tells people what the company values. Workers responded rationally to a system that scored quantity and said nothing about quality.

The deeper trouble is that token count was never a clean proxy for value. A single careless prompt loop can rack up more tokens than a week of careful, surgical use. So the employee at the top of the board might be the one doing the least useful thing with the tool, which is the opposite of what the dashboard was built to celebrate.

Meta, Uber and Microsoft Hit the Same Wall

Amazon’s retreat is not an outlier. Across the industry, companies that spent a year pushing employees to use as much AI as possible are now pumping the brakes, and most of them tripped over the same incentive.

Company	What happened	Trigger	Response
Amazon	Kirorank usage leaderboard	Staff gaming token scores	Killed the board, switched to normalised deployments
Meta	Employee-built “Claudeonomics” board	Top user hit 281 billion tokens in 30 days	Shut down two days after the leak went public
Uber	Claude Code rolled out to about 5,000 engineers	Entire 2026 AI budget gone by April	COO publicly questioned the return
Microsoft	Internal Claude Code licenses	Costs outran budget within six months	Cutting most licenses, steering teams to GitHub Copilot CLI

Why Token Billing Makes Heavy Use Expensive

The Meta dashboard, nicknamed Claudeonomics, aggregated AI use from more than 85,000 employees and handed out badges like “Token Legend” and “Cache Wizard.” The top contender reportedly burned through 281 billion tokens in a single 30-day window. Mark Zuckerberg, by various accounts, did not crack the top 250.

Uber’s case shows where the costs land. The company deployed Anthropic’s Claude Code coding agent to roughly 5,000 engineers, and adoption climbed fast. The bills climbed faster.

Here is the shape of the squeeze that is showing up across these firms:

$500 to $2,000 per engineer, per month, in API costs at Uber, well past internal forecasts.
84% of Uber engineers classified as agentic coding users by March, up from 32% in February.
June 30 is the deadline Microsoft set for cutting most Claude Code access in its Windows and Office division.

The reason heavy use gets expensive so quickly is structural. Agentic tools, the kind that chain many steps and call the model again and again, can magnify a simple request tenfold or more. Salesforce and DoorDash have reportedly joined the rationing too, part of the same shift from throwing AI at everything to spending it where it earns its keep, a discipline familiar from the multibillion-dollar AI build-outs Amazon and Microsoft have committed to in India.

Usage Is Still Going Vertical

None of this means the AI wave is cresting. Quite the opposite. At its developer conference on May 19, Google said it now processes 3.2 quadrillion tokens a month across its surfaces, up from 480 trillion a year earlier, a sevenfold jump in twelve months that you can read in Google’s I/O 2026 keynote figures.

The forecasts point higher still. A Goldman Sachs Research forecast on agentic AI token demand projects a 24-fold rise by 2030, reaching 120 quadrillion tokens a month. So the cost story and the growth story are running at the same time, which is exactly why the pullback reads as housekeeping rather than collapse.

It isn’t surprising, but probably not enough of a slowdown that it is going to burst the generative AI bubble that we seem to be in. As companies get better at sorting the applications that provide real value versus using AI just for the sake of using AI, demand will only increase.

That assessment came from Jackie Rees Ulmer, dean of the Ohio University College of Business, in an email weighing the recent retreats.

Sorting Value From Volume

What the leaderboard era exposed is a measurement problem dressed up as a spending problem. Uber’s leadership put it plainly when adoption soared but shipped features did not keep pace.

“It’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25% more useful consumer features,'” said Andrew Macdonald, Uber’s chief operating officer, describing the gap between token consumption and real output. Will McGough, chief investment officer at Prime Capital Financial, told the Wall Street Journal that companies are still “figuring things out” on effective AI use.

Ulmer’s prescription for students is to learn the AI applications relevant to their field while they “double down” on “human skills, such as critical thinking and communication.” If the next phase rewards judgment over volume, the companies that win will be the ones that figured out what to stop measuring. The token meters will keep spinning regardless; whether they spin on work worth paying for is the question every dashboard was supposed to answer and none of them did.

Frequently Asked Questions

What Is Tokenmaxxing?

Tokenmaxxing is the practice of treating AI token consumption as a productivity score and then inflating that score on purpose, often by pointing AI tools at unnecessary or low-value tasks. Because cloud AI is billed per token, the behavior runs up real costs while producing little useful work.

Why Did Amazon Shut Down Kirorank?

Amazon shut down Kirorank because employees were gaming it to climb the rankings, which raised AI costs without delivering proportional value. The company called the leaderboard an unapproved beta tool and has shifted to a metric called normalised deployments that credits completed work rather than raw usage.

Which Companies Are Cutting Back on AI Use?

Amazon, Meta, Uber and Microsoft have all reined in internal AI use. Meta shut down an employee-built leaderboard, Uber burned through its 2026 AI budget by April, and Microsoft is cutting most internal Claude Code licenses. Salesforce and DoorDash have reportedly rationed AI spending too.

Does This Mean the AI Bubble Is Bursting?

Most signals say no. Google reported processing 3.2 quadrillion tokens a month as of May 2026, and Goldman Sachs projects token demand could rise 24-fold by 2030. The pullback is about cutting wasteful usage, not abandoning AI, and overall demand keeps climbing.

Why Does Agentic AI Cost So Much?

Agentic AI tools chain many steps together and call the model repeatedly, which can multiply a single request tenfold or more in token terms. Since billing is per token, those repeated calls push per-engineer monthly costs into the hundreds or thousands of dollars far faster than basic chatbot use.