Building of the iToBoS Cloud: Advanced Services

One of the key challenges of research in the field of computer science is assembling the appropriate infrastructure needed for scientists to successfully carry out their work.

One of the founding principles of cloud computing, fully upheld by ELKH Cloud [1] (the infrastructure the iToBoS Cloud is built upon), is enabling users to provision substantial computing resources swiftly and effortlessly. ELKH Cloud aims to support members of the scientific community in the establishment of their desired research environments as one of the leading Infrastructure-as-a-Service (IaaS) providers of Hungary.

The difficulties researchers face however, do not end with the convenient access to computing resources, as studies in the field of computer science and research projects such as iToBoS often require complex system architectures and stacks of software and services. Therefore, ELKH Cloud intends to support the work of researchers beyond the scope of infrastructure, and further promote the swift realization of new ideas by providing Platform-as-a-Service (PaaS) level components.

Such components are reference architectures [2], provided by ELKH Cloud to ease the deployment of complex software systems, a demanding task faced by many members of the scientific community. These are architectural blueprints composed of proven solutions, built with best practices in mind, to be applied in a variety of sub-fields of computer science. Reference architectures are based on the principles of the Infrastructure-as-Code (IaC) methodology, which in combination with the orchestration and configuration management tools applied, enables scientists to perform the automated deployment of their research environments in a matter of minutes. Similarly to ELKH Cloud itself, reference architectures are built based upon open-source technologies, furthering the propagation of Open Science in Hungary.

ELKH Cloud is developed by researchers who work closely together with other members of the scientific community, and therefore have a good understanding of their needs and practices. The numerous different reference architectures made available on ELKH Cloud cover important areas such as deep learning, high performance computing, workload management, container orchestration, and several others. All this is realized by relying on some of the most widespread tools, technologies, and platforms, thus ensuring that researchers on ELKH Cloud can smoothly carry out their work with an up-to-date, suitable toolset. 

While ELKH Cloud is a general-purpose infrastructure, reference architectures enable the utilization of its resources on a highly efficient level in various specific use cases. For instance, distributed deep learning is an increasingly important topic nowadays, and it is one of the areas in which the question of infrastructure is a crucial one due to the considerable resource requirements of the process. Using the Horovod reference architecture, distributed training can be performed on ELKH Cloud with great scaling efficiency [3]. Beyond the existing reference architectures, ELKH Cloud also provides technical consultation services for its users, aiding them in efficiently utilizing the cloud, or even supporting their work with specifically granted resources, be it hardware or software. This can lead to the creation of new reference architectures precisely tailored to the given research project. Platform-as-a-Service level components and the principles applied play a crucial role in increasing the usability of the platform, and empowering research projects such as iToBoS to utilize its resources to their full potential.

ELKH Cloud aims to provide the Hungarian scientific community an infrastructure that supports their research on a level that enables them to participate in international projects, and establish itself as an essential piece of the European infrastructure. The Horovod reference architecture [3,4], originally developed for ELKH Cloud, served as a basis for a service published on the European Open Science Cloud (EOSC) Marketplace, which was part of our objective of ELKH Cloud joining the EOSC and SLICES ESFRI initiatives.

You will be able to find out more about this in Part III of this blog series.

References

[1] ELKH Cloud, Cloud services for national and international research projects. Available at https://science-cloud.hu/en

[2] ELKH Cloud, Reference architectures. Available at https://science-cloud.hu/en/reference-architectures

[3] A. Farkas, K. Póra, S. Szénási, G. Kertész, R. Lovas. “Evaluation of a distributed deep learning framework as a reference architecture for cloud environment,” IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC 2022)

[4] Science Cloud, Horovod Reference Architecture. Available at https://git.sztaki.hu/science-cloud/reference-architectures/horovod