Uncertainty estimation through quantile forest for prescriptive scheduling of data processing at ALMA

Rodrigo A. Carrasco, Luis Aburto, Jorge García Yus, Alfredo De Rodt, Gianfranco Speroni

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

The Atacama Large Millimeter/submillimeter Array (ALMA) is a prominent astronomical observatory known for its detailed imaging capabilities. Efficient scheduling of ALMA's data processing tasks, especially those involving complex pipeline executions, is crucial for maximizing operational productivity. This paper addresses the challenge by developing a predictive model that estimates the runtime of these tasks, enabling more effective scheduling and resource management. Our approach employs the Light Gradient Boosting Machine (LGBM) and Quantile Forest models to predict processing times and quantify uncertainties. The use of these models is innovative, as it not only provides accurate predictions but also offers insights into the variability of processing times. This is particularly beneficial for handling the dynamic nature of the data processing workload at ALMA. We enhance the model's performance and reliability by incorporating variable scaling and logarithmic transformations. To determine the best model, we comprehensively evaluated seven different machine-learning techniques. Our results show that the LGBM model and quantile estimation outperform traditional methods in predicting task durations. This leads to more efficient scheduling, as it allows the system to account for potential delays and optimize the sequencing of jobs. The quantile approach, in particular, offers a robust method for dealing with the inherent uncertainty in processing times. Our predictive tool has demonstrated a substantial reduction in overall flow time, decreasing it by 5.7%. Further improvements were achieved using stochastic scheduling techniques, which leverage the uncertainty estimates provided by our model. This research highlights the potential of machine learning to significantly enhance the operational efficiency of large-scale observatories like ALMA, providing a scalable and practical solution for managing complex data processing tasks.

Idioma originalInglés
Título de la publicación alojadaObservatory Operations
Subtítulo de la publicación alojadaStrategies, Processes, and Systems X
EditoresChris R. Benn, Antonio Chrysostomou, Lisa J. Storrie-Lombardi
EditorialSPIE
ISBN (versión digital)9781510675193
DOI
EstadoPublicada - 2024
Publicado de forma externa
EventoObservatory Operations: Strategies, Processes, and Systems X 2024 - Yokohama, Japón
Duración: 17 jun. 202420 jun. 2024

Serie de la publicación

NombreProceedings of SPIE - The International Society for Optical Engineering
Volumen13098
ISSN (versión impresa)0277-786X
ISSN (versión digital)1996-756X

Conferencia

ConferenciaObservatory Operations: Strategies, Processes, and Systems X 2024
País/TerritorioJapón
CiudadYokohama
Período17/06/2420/06/24

Huella

Profundice en los temas de investigación de 'Uncertainty estimation through quantile forest for prescriptive scheduling of data processing at ALMA'. En conjunto forman una huella única.

Citar esto