Thursday, October 16, 2025
No Result
View All Result
newshub
  • Global news
    • Climate & energy
      • Climate
      • Carbon
      • Coal
      • Disruptive
      • Gas
      • Nuclear
      • Oil
      • Solar
      • Water
      • Waves
      • Wind
      • Renewable
      • South America
    • Lifestyle
      • Best chefs
      • Cocktail of the week
      • History
      • Influential women
      • Newshub long-read
  • Financial insights
    • Australia
    • Banking
    • Business of the week
    • Central Banks
    • China
    • Commodities
    • Corporate
    • Europe
    • Fin & tech
      • Tech
      • AI
      • Blockchain
    • Investment
    • Japan
    • Neobanking
    • South East Asia
    • UK
    • US
  • Africa
    • Africa finance
    • Burundi
    • Gambia
    • Senegal
  • Asia
    • Asia finance
    • Laos
    • Malaysia
    • South Korea
  • Caribbean
  • MSTRpay
  • Press releases
  • Global news
    • Climate & energy
      • Climate
      • Carbon
      • Coal
      • Disruptive
      • Gas
      • Nuclear
      • Oil
      • Solar
      • Water
      • Waves
      • Wind
      • Renewable
      • South America
    • Lifestyle
      • Best chefs
      • Cocktail of the week
      • History
      • Influential women
      • Newshub long-read
  • Financial insights
    • Australia
    • Banking
    • Business of the week
    • Central Banks
    • China
    • Commodities
    • Corporate
    • Europe
    • Fin & tech
      • Tech
      • AI
      • Blockchain
    • Investment
    • Japan
    • Neobanking
    • South East Asia
    • UK
    • US
  • Africa
    • Africa finance
    • Burundi
    • Gambia
    • Senegal
  • Asia
    • Asia finance
    • Laos
    • Malaysia
    • South Korea
  • Caribbean
  • MSTRpay
  • Press releases
No Result
View All Result
newshub
No Result
View All Result
ADVERTISEMENT

AI trained on trillions of tokens – the staggering amount of text needed to create intelligence

2025/10/16/09:00
in AI
Reading Time: 2 mins read
238 18
A A
Malaysia tightens data centre rules amid AI chip export concerns

The development of large language models is no longer measured only in computing power and billions of parameters – but in tokens, the tiny fragments of text that form the basis of linguistic understanding. One trillion tokens equal roughly four terabytes of raw text, illustrating the enormous scale of data today’s AI systems process.

From tokens to text
A token can represent a whole word, part of a word, or even a single character, depending on the language and the model. Roughly speaking, one token corresponds to about 0.75 words or around four characters of text. This means that a model trained on one trillion tokens has processed approximately 750 billion words – or about four terabytes of uncompressed raw text.

Modern AI models are therefore built on datasets that far exceed any traditional linguistic corpus, approaching the scale of the entire internet in scope.

Known data points from research sources
OpenAI’s GPT-3 was trained on roughly 300 billion tokens, equivalent to about 1.2 terabytes of text, according to Nvidia. Meta’s Llama 2 raised the bar to two trillion tokens, or eight terabytes, while Llama 3 went even further – over 15 trillion tokens, corresponding to at least 60 terabytes of raw text.

Other open datasets such as Falcon RefinedWeb (5 trillion tokens, 20 TB) and The Pile (825 GB of English text) are widely used in research. Common Crawl, the open web archive that underpins many of these models, itself amounts to several hundred terabytes of raw data per snapshot.

Secret numbers but clear patterns
OpenAI has not disclosed the exact number of tokens used to train GPT-4 or GPT-5, but industry analysts estimate that GPT-4 was trained on around 13 trillion tokens – roughly 52 terabytes of text. Although unconfirmed, this gives an indication of the magnitude involved in the most advanced systems.

According to the so-called Chinchilla scaling laws, an optimal balance between model size and token count is crucial for efficiency. This is why today’s AI developers are scaling not only their models but also their data to match.

Why tokens matter
Tokens are the fundamental units that determine how much language, factual knowledge, and context a model can absorb. The more tokens, the broader the model’s understanding of human expression – though this also increases demands on filtering, data quality, and energy efficiency.

In practice, a model trained on trillions of tokens has been exposed to nearly everything ever written in digital form – but not as a simple copy. Much of the material is filtered, deduplicated, and curated to optimise understanding rather than sheer volume.

Newshub Editorial in Europe – 16 October 2025

No Result
View All Result

Recent Posts

  • MSTRpay Appoints Dean Petkanas as Chief Executive Officer
  • AI trained on trillions of tokens – the staggering amount of text needed to create intelligence
  • Trump admits to authorising covert CIA operations in Venezuela; Maduro denounces ‘coup d’état’
  • African Union suspends Madagascar after military takeover
  • Asian markets open mixed as investors weigh inflation data and US earnings

Recent Comments

    Archives

    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022

    Categories

    • Africa
    • Africa finance
    • AI
    • An diesem Tag
    • Asia
    • Asia finance
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Burundi
    • Business of the week
    • Carbon
    • Caribbean
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Deutsch
    • Deutsch PR
    • Digital Banking
    • English PR
    • Europe
    • Financial insights
    • Focus on neobanking
    • Gas
    • Global news
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Jamaica
    • Japan
    • Laos
    • Laos
    • Lifestyle
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • Newshub long-read
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Senegal
    • Solar
    • South America
    • South East Asia
    • South Korea
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind
    • World safety

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Recent Posts

    • MSTRpay Appoints Dean Petkanas as Chief Executive Officer
    • AI trained on trillions of tokens – the staggering amount of text needed to create intelligence
    • Trump admits to authorising covert CIA operations in Venezuela; Maduro denounces ‘coup d’état’
    • African Union suspends Madagascar after military takeover
    • Asian markets open mixed as investors weigh inflation data and US earnings

    Categories

    • Africa
    • Africa finance
    • AI
    • An diesem Tag
    • Asia
    • Asia finance
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Burundi
    • Business of the week
    • Carbon
    • Caribbean
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Deutsch
    • Deutsch PR
    • Digital Banking
    • English PR
    • Europe
    • Financial insights
    • Focus on neobanking
    • Gas
    • Global news
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Jamaica
    • Japan
    • Laos
    • Laos
    • Lifestyle
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • Newshub long-read
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Senegal
    • Solar
    • South America
    • South East Asia
    • South Korea
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind
    • World safety

    Archives

    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    newshub

    © 2023-2025
    MSTRpay AB
    Legal & Disclosure

    • Global news
    • Financial insights
    • Africa
    • Asia
    • Caribbean
    • MSTRpay
    • Press releases

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In

    Add New Playlist

    No Result
    View All Result
    • Global news
      • Climate & energy
        • Climate
        • Carbon
        • Coal
        • Disruptive
        • Gas
        • Nuclear
        • Oil
        • Solar
        • Water
        • Waves
        • Wind
        • Renewable
        • South America
      • Lifestyle
        • Best chefs
        • Cocktail of the week
        • History
        • Influential women
        • Newshub long-read
    • Financial insights
      • Australia
      • Banking
      • Business of the week
      • Central Banks
      • China
      • Commodities
      • Corporate
      • Europe
      • Fin & tech
        • Tech
        • AI
        • Blockchain
      • Investment
      • Japan
      • Neobanking
      • South East Asia
      • UK
      • US
    • Africa
      • Africa finance
      • Burundi
      • Gambia
      • Senegal
    • Asia
      • Asia finance
      • Laos
      • Malaysia
      • South Korea
    • Caribbean
    • MSTRpay
    • Press releases

    © 2023-2025
    MSTRpay AB
    Legal & Disclosure