In the past, achieving a factor of 1000 in computational and data handling throughput for real-world applications has taken 20 years, and was mostly produced by computer chip advancement alone. In the absence of considerable effort, and nearing the end of Moore’s law[1], a factor of 1000 is unachievable — even if one were willing to wait 20 years. Just as technology evolves to a point where breakthroughs come into focus, its development shifts course in ways that (almost cruelly) threaten to keep them out of reach — at least for those pursuing business as usual. Changing how we compute and deal with data, and dispensing with business as usual, is at the heart of ExtremeEarth.
To expand the throughput of the currently most performant systems by a factor of 1000 will require a radical re-design of current software frameworks and hardware systems. A full integration of computing, data and connectivity is envisaged, one where HPC and cloud technology will converge, around institutional, industrial as well as transnational, initiatives such as the European Open Science Cloud (EOSC) and EuroHPC[2]. To meet that challenge ExtremeEarth will adopt a contemporary approach to software ingesting components that are widely supported in the information and communications technology (ICT) industry and that are part of the main roadmaps of digital technology development. In this way, ExtremeEarth will bring a new quality to the European HPC programme, one that is fundamentally different from the prestige driven, exa-scale computing developments in East Asia and the USA, and that is more likely to succeed.
ExtremeEarth will be application driven. Only a new system design at exa-scale for computing, data handling and user-connectivity can achieve this aim. The system will capitalise on joint HPC and cloud technologies to develop optimised hardware-software linkages[3] that achieve the required throughput for real world applications, thus achieving the ExtremeEarth science objectives and making a new way of working possible for all users. The application-driven approach proposed by ExtremeEarth can only be made to happen by a concerted effort with substantial and sustained funding – and it requires a focused science-technology co-design and co-development over a long (10 year) timescale. Furthermore, this must take place in ways that integrate input from diverse communities to produce systems that effectively address their needs, yet remain agile enough to allow efficient adaptation to changing computational architectures in the future.
Big data handling in ExtremeEarth represents a similarly extreme challenge in terms of both data volume and diversity. Current technologies and workflows dealing with the kind of data ExtremeEarth envisages are either unavailable or very limited in their scope and flexibility. ExtremeEarth must therefore also embark on the co-design of information and communication technology solutions for the efficient and timely handling of diverse data being produced by high-definition models and observational sensor networks at rates far in exceedance of what could be accommodated by incremental advances to existing systems.
To fully exploit this model-data fusion ExtremeEarth introduces the idea of the ExtremeEarth science-Cloud (EEsC, pronounced “easy”). EEsC will take full advantage of modern software and communication technologies and exploit extreme-scale computing and big data capabilities to unburden users and communities from excessive technological challenges. EEsC will develop and provide the programming interface for users to interact and even steer high-definition Earth-system simulations, observational data, applications and analysis systems. This capability goes well beyond current cloud technologies. ExtremeEarth will thus re-invent value chains as a collaborative space, one which allows for interactions between domain scientists and application communities at all levels. EEsC will be the ultimate technological and scientific game changer as it requires ground-breaking developments in extreme computing and data technology to provide a radically different way for scientists and users to interact with simulations and data.