Decentralized Artificial Intelligence
is needed, closer than you think, and transforms its load profile
Post-bitcoin, decentralization became a popular meme for marketing a business masquerading as a technological breakthrough. Decentralized Uber. Decentralized Linkedin. Decentralized… uggh….decentralization.
For the vast majority of products, services, and missions, concentration of responsibility and power into one discreet unit, whether that be an individual or one organization, is generally preferable. Centralization is the path of least resistance towards incentive alignment, economies of scale, and legible accountability. The boogey man in centralization is not the how, which is extremely efficient and effective, but the what and the who. What is being centralized and who is at the center of this responsibility?
Something as pervasive and important as money is naturally looked upon with great concern. What is serving the role of money and who is facilitating that role, are perennial questions every society has to engage with. Diffusion of responsibility, control, and discretion, in the case of money, can actually be highly desired features. Especially when the monopolist has the necessary power and force to crush any legally legible competition.
The reasoning behind the architecture of bitcoin gradually fizzled away and decentralization became a synonym for this is “morally good” or “potentially successful” because it’s being created with the same design principle in mind.
Decentralized AI (“DeAI”) has felt similar to these other cargo cults because everyone is exhausted by the decentralization meme and there wasn’t a clear mental path forward. The obvious power and pervasiveness of AI, centralization of its creation and management is a legitimate concern. State chartered entities and even States themselves forming impenetrable moats around incomprehensibly powerful artificial intelligence understandably gives us all a sense of impending danger. The ability to radically shape public “common knowledge” as well as market and political outcomes is not too hard to imagine. To mitigate these risks, decentralization isn’t just a nice meme but a feature we’d want and might even need.
Given this backdrop its been nice to see explosive growth for the space (DeAI) in terms of project and capital formation. Over the last year, many projects that tackle a piece of the traditional AI stack have been funded, reducing the immense problem space down to one component such as data, compute, training, fine tuning, etc. Grass, IO net, Exo Labs, and many others are good examples of this. This approach makes a lot of sense but there are also more ambitious projects that are tackling the entire stack, creating a decentralized monolith that are direct competitors to the other corporate behemoths that control the entire stack. Bittensor, at the moment, is in the best position to challenge their hegemony as a coherent unity. I could see both approaches producing a handful of winners.
In addition, legitimate breakthroughs in communication and resiliency have been achieved that make the prospect of truly decentralized AI more of a when question than it ever has been.
Nous, an AI research collective with some roots in the crypto-ecosystem, released a paper on their “DisTrO” family of optimizers, claiming that there is a way to conduct decentralized pre-training of frontier models with no loss in performance. They do this by compressing the amount of data that needs to be passed from each GPU to every other GPU by several orders of magnitude. According to them, this is the conservative estimate. Nous will be releasing a more detailed paper and code soon, for all to evaluate.
This is highly significant because up to this point, all pre-training for the best performing models have been done in one location full of extremely expensive GPUs with extremely high levels of interconnectivity between the GPUs. DisTrO opens up the possibility of training models over the public internet.
There have also been advances that improve the fault tolerance of training frontier models. A group of academics created a system called Oobleck, that replicates pipeline templates throughout a network of GPUs. This is functionally similar to blockchains having their state or transaction data being highly replicated throughout the nodes supporting the chain. Because of this, the chain can tolerate large failures of nodes but still make progress as a network. Oobleck does the same thing but for frontier models.
High upfront capital costs, low fault tolerance, and technologically mandated spacial concentration is a recipe for high levels of centralization. And this is exactly what we’ve seen. The best performing models are owned and operated by transnational corporate behemoths. But, if a high level of connectivity isn’t needed and fault tolerance is achievable, the door is open to Decentralized AI. The dream of the meme is within site.
From Locust to Dung Beetle
There’s an interesting grid dimension to this development as well. After all, this is an energy and power substack. The current paradigm dictates artificial intelligence be an inflexible load. My colleague, Tundranaut, has a piece about this very topic. But the prospect of DeAI makes the need for an uninterrupted connection to power much less of a necessity and opens the possibility of AI related compute that are flexible loads which participate in demand response programs along side bitcoin miners.
If a frontier model— equipped with DisTrO and Oobleck—is being pre-trained across 100,000 GPU in 250 locations with even moderate geographic diversity, a handful of those locations having to reduce their consumption of electricity to accommodate grid stress, in theory, would have very limited, if any, impact on overall performance. In the current regime, those 100,000 GPUs would be in the same massive compute center and could not be as flexible without a considerable loss of performance. Plus, the public entity with a 100,000 GPUs and public shareholders is a lot more sensitive to short term set backs. In short, any interruption is as close to unacceptable as many other commercial loads.
The natural question that follows is, how do we coordinate and align the many different entities responsible for those geographically dispersed GPUs? We ultimately need a new form of incentives and tooling for the human and computational coordination necessary to give us other options to the impressive utility of the modern artificial intelligence producing corporation.
Crypto-Convergence
This other set of tooling I'm describing has already been created and is in the process of being incorporated into the development and governance of artificial intelligence through projects like Bittensor. Outside the purview of headlines and multi-trillion dollar valuations are small but rapidly growing communities developing decentralized artificial intelligence(s) using the tools, techniques, and resources of the greater crypto-ecosystem.
Embedded within crypto infrastructure and its general design philosophy are the characteristics that aid in preserving and promoting trust and human autonomy at tremendous scale. Transparency and privacy deployed where appropriate. Open access and censorship resistance. Broad based ownership, deep liquidity, and the ability to fork or clone any network in order to implement new rules. In combination, this constellation of characteristics enable DeAI to compete with the corporate Goliaths and make it hard for any one artificial intelligence or consortium of artificial intelligences to dominate.
As I was writing this piece, another great DeAI project, Prime Intellect, announced the first successful decentralized training of a 10B parameter model. And the CEO of Nvidia, Jensen Huang, said during an interview that (in the context of questions about decentralized training) “…async distributed computing is going to be discovered, and I am very enthusiastic and optimistic about that”.
DeAI arrives from the future.