AI Models & APIs Apr 18 claudecodecamp.com

Claude 4.7's New Tokenizer Increases Token Usage by 47% in Tests

Original: Measuring Claude 4.7's tokenizer costs

Why This Matters

Higher tokenization costs directly impact AI development budgets and API usage limits

Analysis of Claude Opus 4.7's tokenizer shows 1.47x token increase on technical documentation, exceeding Anthropic's stated 1.0-1.35x range. Testing revealed higher costs for code content compared to prose, with CJK languages less affected.

Independent testing of Claude Opus 4.7's tokenizer using Anthropic's token counter API revealed token usage increases of 1.47x on technical documentation and 1.45x on real CLAUDE.md files, above Anthropic's documented 1.0-1.35x range. Seven real-world samples averaged 1.325x increase, while twelve synthetic content types showed varying impacts: English prose (1.20x), Python code (1.29x), TypeScript (1.36x), and technical docs (1.47x). CJK languages and symbols showed minimal increases of 1.01-1.07x. The analysis suggests 4.7 uses shorter sub-word merges for English and code patterns, with code content hit harder due to repeated high-frequency strings like keywords and identifiers. This increases operational costs as the same quota burns through faster and cached prefixes cost more per turn.

Source

claudecodecamp.com — Read original →

Claude 4.7's New Tokenizer Increases Token Usage by 47% in Tests

Why This Matters

Source

Related articles

Sign in to listen