webvic-b

What role will AI play in the future of world energy?

AI-driven solutions could hold the key to resolving carbon emissions. (Image source: MBZUAI)

MBZUAI is exploring various ways in which hardware-software co-design can reduce the energy consumption of artificial intelligence.

AI has huge potential to reduce waste and enhance efficiency across many sectors, including power and water distribution. While these topics are under the spotlight at the World Future Energy Summit, they are likely to be partly overshadowed by concerns about the enormous energy demands of AI systems.

But as well as consuming energy, AI-driven solutions could hold the key to resolving one of the most fundamental questions of our age – how can we keep developing and utilising powerful AI models while still moving towards carbon-free, sustainable economies?

Research currently being undertaken at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) takes a multi-pronged approach toward this challenge.

Hardware specialisation

One area of focus is Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs). Traditional computing architectures such as CPUs are not always the most efficient for AI tasks, so GPUs and TPUs have emerged as specialised tools designed specifically for the parallel processing requirements of AI. These architectures are the building blocks of AI, with tens of thousands at work in data centres right now, and many more required for the new generation of data centres currently in development. To continue fueling the AI revolution, all these devices must conserve and limit energy usage, increasing processor performance with the minimal number of Joules possible.

Hardware specialisation holds the key. By manufacturing specialised AI hardware pipelines such as in GPUs or TPUs to be energy efficient, it will be possible to increase the energy efficiency of data centres at scale, even as they manage an ever-growing volume and complexity of calculations. The aim is to develop individual components in a co-designed way, so that energy consumption is reduced at the hardware level without impacting software performance.

Reliability and sustainability

As transistors continue to scale and become smaller, an important consideration is to ensure their reliability in the face of errors. Errors in hardware have been a thorn in the side of large-scale data centre companies such as Google and Meta . Yet building reliable processors can not only address energy efficiency but can further help build a sustainable future. The longer a manufactured processor can be utilised in a data centre (i.e., because of its reliability), the lower its carbon footprint due to the huge upfront cost of building these processors.

Field-Programmable Gate Arrays (FPGAs) offer another alternative, allowing for customisable hardware solutions tailored to specific AI tasks. By enabling developers to optimise circuits for applications, FPGAs can significantly reduce energy consumption while maintaining high performance.

Reducing waste

In parallel, MBZUAI is looking at ways to reduce waste and deploy resources more efficiently in the upper layers, building sustainability in the development and application of AI models. System software used for large language models, both training (to build the models) and inference (to use them), needs to work closely with the hardware design to achieve better energy efficiency.

Two lines in the current systems research done by MBZUAI researchers to this end strive toward AI sustainability. One is to improve the distributed model training through aggressive overlapping operators using heterogeneous resources within a GPU server, such as compute-intensive matrix multiplication ones with network-intensive communication ones. This way, we improve both overall performance and resource utilisation.

The other is to reduce the amount of computation in inference. Potential solutions in this direction examine efficient and scalable deployment of smaller, specialised models, as well as caching for model inference, where responses (or intermediate inference results leading to them) can be cached and reused in processing similar prompts.

This piece was authored by Xiaosong Ma, acting department chair and professor of computer science at MBZUAI, and Abdulrahman Mahmoud, assistant professor of computer science at MBZUAI. It has been edited for brevity.