Trust and Sustainability through Data Spaces: Building the Foundation for an AI-Driven Future

Ever since the human brain developed memory, we have been generating data. Ever since the first pictographs were scratched on rock walls, we have been generating ever more recorded data. The ability to record data and share what is known has been the very cornerstone of human advancement. If humans were able to build empires with clay tablets and launch satellites with books and mimeographs, with all of known data being made easily accessible to the power of things like generative AI what promises are awaiting us? It seems disease, climate change, and poverty could be part of mankind’s past as much as cave dwellings. Except the systems capable of moving us forward are by current design also pushing us back. In our current digital landscape, all data is created equal. But should it be? The absence of distinctions in data quality allows misinformation and sophisticated manipulation to thrive, making it difficult to distinguish truth from fiction.

Not only is all data equal, but all data processing, the vast majority of which is redundant, unnecessary and even harmful, is allowed equal, unfettered and often wasteful use of energy resources. The new technology of data spaces offers many promises, including the ability to secure and reliably derail misinformation and also conserve energy.

Data spaces are secure, collaborative environments designed to store, share, and validate data in a controlled and trusted manner. Unlike the traditional data-silo model, data spaces allow multiple stakeholders—such as businesses, government agencies, and research institutions—to pool and share data within a secure framework that emphasizes quality, accuracy, and transparency. This structure ensures that data is more reliable and free from the unchecked spread of misinformation, as it is vetted and maintained by trusted participants. By creating a system for verified data, data spaces serve as a backbone of accurate information that supports applications like artificial intelligence (AI), analytics, and machine learning.

Before the advent of the Internet, in about 1990, all the world’s recorded data – from hieroglyphics on obelisks, papyrus scrolls and clay cuneiforms, to books, newspapers, magazines, photographs, celluloid film, magnetic tape, broadsheets, phone books, card catalogs, and all other media including the day’s floppy discs and punch cards – would all add up to about 1,000-1,500 terabytes of data. In today’s information age, we generate that same volume–what had previously been the entirety of the world’s recorded knowledge–about every 17 minutes. In about five years, with the rise of quantum computing, we should be producing this much data every second.

Let’s use grains of sand as a physical representation for data and imagine a 50-meter Olympic swimming pool as our container. Imagine a single grain of sand representing 1 megabyte (MB) of data. One megabyte will hold approximately the data contained within one 300-400 page book in a digital format such as PDF. Since a terabyte (TB) is roughly a trillion bytes, a terabyte would require one billion grains of sand. (There are currently only around 160 million unique books, which would be about 1.6 liters of sand in our example).

We currently produce around 1,500 terabytes of data–the equivalent of all recorded knowledge ever before 1990–every 17 minutes. We’re filling 1,500 Olympic swimming pools with 1 megabyte grains of sand about 85 times every day. As we project forward, data generation is expected to accelerate dramatically, potentially reaching 1,500 terabytes per second within five years, requiring over 129,600,000 Olympic pools of sand per day. Much of this data is redundant, as it contains repeated, unnecessary, or misleading information that can skew public perception and lead to harmful conclusions. Without quality control mechanisms, misinformation can spread uncontrollably, creating significant societal impacts.Data spaces offer a mechanism to reliably navigate this sandstorm.

Energy consumption

In addition to reducing misinformation, data spaces hold the potential to conserve energy in the age of AI. Arizona's recent data center boom is leading to power outages for many communities. Amazon's Jeff Bezos is investing at least $500 million in different nuclear power projects. Each AI query, whether simple or complex, demands significant processing power and thus a substantial amount of energy. It is estimated that a single AI query can consume somewhere between 0.9 and 8 kilowatt-hours (kWh) of energy. For the purposes of illustration, let’s choose a middle value of 4 kilowatt-hours (kWh) of energy, equivalent to the energy used by an average-sized swimming pool heater for an hour. In a world where millions of queries are made per second, this demand grows quickly. Data spaces can help mitigate this by functioning as a shared, validated repository of information, enabling AI to "remember" basic and reliable knowledge rather than processing redundant or previously verified information repeatedly. Think of a data space as AI's "subconscious" or "muscle memory," where established facts are stored, allowing AI to focus on new, complex inquiries instead of relearning foundational data with each query.The reason large language models and generative AI are extremely energy intensive activities is because AI currently starts “from scratch” with each and every query. Whether the query is, “What is 2+2?” or “At the quantum level, why is there a seemingly arbitrary distinction between the observer and the observed system?” AI systems are starting up from the same point, at the very bottom of the data model and working its way through millions upon millions of layers to reach the answer. Compare this to human brains, which establish fixed neural networks once something is learned: AI lacks the "muscle memory" or optimization that makes the human brain so energy-efficient, even though AI seems like it should have that capability. The high energy demands of AI come down to differences between the efficiency of human brains and the brute-force nature of current AI systems. Here's why this happens:

1. Massive Parallel Computation

When you ask a question, AI models like ChatGPT activate thousands of "neurons" across hundreds of layers to evaluate different combinations of words, phrases, and concepts. Unlike the human brain, which activates only the necessary pathways (synapses) needed for specific tasks, AI models engage large parts of the model regardless of the complexity of the question. This all-at-once approach consumes a lot of power because the AI is "searching" across millions of possible patterns every time, rather than sticking with a pre-set pathway like a brain with learned experience.

2. Lack of Specialized Memory and Learning Structures

Human brains develop “shortcuts” or muscle memory pathways, especially for repetitive tasks. For example, after learning to drive, we no longer need to actively think about every action involved in driving and an experienced driver can easily use our brain to hold conversations, brainstorm ideas or pay attention to a podcast while simultaneously keeping track of other cars, traffic signals, possible hazards, vehicle speed, vehicle operations like lights and windshield wipers and more.

Current AI models don’t build or use memory in the same efficient way. Every time an LLM answers a question, it computes everything from scratch using trillions of weights. This means each interaction requires full activation of the model, without any prior stored "knowledge" of what came before. Although certain AI models are exploring ways to integrate memory, true "muscle memory", it would require a different type of model that’s built to dynamically adjust and store frequently used patterns or knowledge for quicker retrieval.

3. Data-Intensive Training

Before an AI model can answer questions at all, it needs massive amounts of training data to recognize patterns in language. Training these models requires processing terabytes of text repeatedly, which involves billions of calculations.

This is extremely energy-intensive, especially in the initial phases where the model “learns” by processing the same data thousands of times. By contrast, human learning is far more efficient; we typically only need a few repetitions to memorize information.

4. No Hierarchical Energy Use in AI Inference

For humans, most thought happens on an as-needed basis. If we’re reading a book, we’re not activating the parts of our brain needed for complex spatial reasoning or intense concentration. When an LLM answers a question, it uses a single processing flow regardless of the question's difficulty. A simple question like “What is 2 + 2?” and a complex one about planetary science engage the same model and computation intensity. AI models lack the "adaptive bandwidth" that would allow them to conserve energy by scaling their processing power up or down based on question complexity. Imagine ChatGPT like a highly complex calculator with billions of gears. When you ask it “2 + 2,” it has to spin up all those gears, using the same computational steps as if you’d asked a vastly more complex question. Even after the one-billionth time of being asked “2 + 2”, ChatGPT will fully compute the answer from scratch each time it’s asked. This is because the model doesn't have a separate, optimized pathway for "simple questions" versus "complex questions." Every input runs through the same dense network of calculations, with each layer of the network interpreting and reinterpreting the input to give an accurate response.

5. Hardware Limitations

Human brains use a highly efficient biochemical process that conserves energy and is unparalleled in its power efficiency. A single human brain requires about 20 watts of power consistently—roughly the same energy needed to light a light bulb.

In contrast, the server farms running AI models like ChatGPT require thousands of high-power GPUs and CPUs. Each of these machines consumes hundreds of watts or more, multiplied by the thousands or tens of thousands of units needed to run a model of ChatGPT's scale. Cooling the hardware adds another layer of energy demand. A single ChatGPT query–again, whether it is 2+2 or translating James Joyce into ancient Sumerian–will use around 4 kWh, enough power to keep one 20 Watt LED bulb lit for around 8 days or 200 LED bulbs for one hour. Depending on many factors, energy needed could be twice that.ChatGPT has an estimated 100 million weekly users. For every 100,000,000 ChatGPT queries, approximately 400,000,000 kilowatt-hours (kWh) of energy are consumed.

To put this into perspective, 400,000,000 kWh of energy is roughly the same amount of power used by almost 38,000 average U.S. homes in a year. Alternatively, this amount of energy could also power the city of San Francisco (approximately 900,000 residents) for over 5.5 years on average. To break it down further, if each of ChatGPT’s 100 million weekly users submits just one query per week, that will require 235,294 barrels of oil, 162,601 tons of coal or 38,095,238 cubic meters of natural gas.

Given ChatGPT's scale of usage worldwide, millions of queries happen each day, which translates into millions of kilowatt-hours, revealing both the resource intensity of generative AI and the substantial energy demands needed to sustain it at scale. This kind of metric not only illustrates the energy consumed per query but also highlights the cumulative energy impact that comes with millions of queries, making it clear why optimizing for efficiency is increasingly important in AI development. If LLM use continues to grow as anticipated—potentially into billions of queries daily—without further efficiency gains or renewable energy sourcing, there could be increased strain on power grids. Future-proofing energy infrastructure and building energy-aware AI solutions will be essential to sustain the exponential growth of AI usage without compromising energy stability or exacerbating environmental impacts. So far, no direct link has been established between the advent of LLMs' widespread use and major infrastructure impacts like blackouts. However, the larger issue is that massive server farms powering AI applications, cloud services, and data processing centers have contributed to a significant portion of global electricity usage, projected to exceed 4% by 2030.

Why Don’t LLMs Have “Muscle Memory”

The technology that would allow LLM to develop “muscle memory” for common responses or patterns is still in the experimental stages. Current AI models don’t yet build permanent memories or specialized pathways in the same way human brains do, meaning they don’t have shortcuts or cached responses for frequently asked questions. Even if ChatGPT has answered “2 + 2” a billion times before, it will fully compute the answer from scratch each time it’s asked.

Potential for Efficiency Improvements

Researchers are exploring ways to introduce memory structures into AI models, but these need to be carefully designed to ensure accuracy, especially for complex or nuanced queries. If AI could develop "shortcuts" for frequently repeated questions, it could significantly reduce energy demands by treating simple, familiar queries like "muscle memory" rather than needing full computation.

Using data spaces as a trusted, reusable, verified source of information could significantly reduce the energy footprint of AI. By leveraging pre-validated knowledge, AI could bypass the energy-intensive process of "learning to drive from scratch" for every question, just as a seasoned driver relies on muscle memory to navigate familiar roads, only expending extra mental energy for new challenges like driving in unfamiliar territory. The presence of such an "AI subconscious" within data spaces would streamline processing, reduce energy consumption, and enhance the efficiency of AI systems, creating a more sustainable digital ecosystem.

Data spaces offer a robust solution to two pressing challenges of the digital age: misinformation and energy consumption. By acting as centralized, trusted repositories, data spaces enhance the reliability of information available to AI, reducing the risk of error and the power needed to process each query.

As we navigate an era defined by rapid digital expansion, the integrity and sustainability of our information landscape are at stake. The unchecked proliferation of data—often redundant, unverified, or misleading—threatens trust in our systems, leading to societal confusion and divisiveness. At the same time, the massive energy demands of AI, exacerbated by reprocessing the same information repeatedly, push our planet's resources to the brink.

Data spaces present a solution to these urgent challenges. By creating secure, collaborative environments, data spaces enable trusted participants—researchers, government agencies, and businesses—to store, verify, and share high-quality data. This model ensures that only reliable information is fed into AI systems, effectively reducing the energy costs of AI queries and stemming the tide of misinformation. Imagine data spaces as the “muscle memory” of AI—a structured, reliable subconscious where established truths are stored, freeing AI to focus on what truly requires new computation.