Introduction to Streaming Algorithms

Nivan Ferreira - Federal University of Pernambuco

Abstract: The wide availability of large data sets has had a significant impact on the design of algorithms. In fact, while working with big data, classical algorithms are often too inefficient in terms that they are too slow, or require too much space which can be prohibitive in many scenarios. The stream model of data processing is one such scenario in which we assume a data stream that is very large (or even unbounded), and we can not perform random access over the data. Given these constraints streaming algorithms often employ probabilistic methods to compute summaries over a data stream, while achieving low (often sub-linear) complexity on both space and time. This course introduces the main ideas behind the algorithms designed to compute statistical summaries (such as number of distinct elements, heavy hitters and quantiles) over data streams as well as presents applications of these algorithms to analyze large datasets.

Short-bio: Nivan Ferreira is an Assistant Professor at Centro de Informática at Universidade Federal de Pernambuco in Brazil. He received his Ph.D. in Computer Science from New York University (2015). Nivan was a visiting professor at the Université Paris-Saclay (2022) and also was a postdoctoral scholar at the University of Arizona. Nivan served on several program committees, including IEEE SciVis, IEEE VIS and International Conference on Learning Analytics & Knowledge (LAK). His research focuses on many aspects of interactive data visualization including systems/techniques for analyzing spatiotemporal datasets and techniques for supporting scalability in interactive data analysis.

AI4EO: a mini course on machine learning applied to Earth observation

Laura Rosa - Wageningen University & Research

Abstract: The free availability of the continuously growing body of Earth observation (EO) images has created an urgent need to develop effective and efficient algorithms capable of processing vast amounts of data and extracting crucial information for various applications, ranging from environmental monitoring to precision farming. In this context, deep learning (DL) algorithms have proved to be effective for learning relevant features directly from data and have become state-of-art in many EO image analysis tasks, including land use and land cover classification, object detection, change detection, and domain adaptation. This mini course aims to introduce the students to the evolution of computer vision applied to EO data, focusing on breakthrough DL-based models and their application to real-world problems.

Short-bio: Laura Elena Cué La Rosa obtained her bachelor's degree in Biomedical Engineering at the Higher Polytechnic Institute “Jose Antonio Echeverria”, Havana, Cuba in 2013. Subsequently, she joined the Pontifical Catholic University of Rio de Janeiro (PUC-Rio) as a postgraduate student. She obtained her Master in Engineering degree in 2018 and her Ph.D. in 2022 at PUC-Rio. During her studies, she was an intern at IBM Research Brazil and a Ph.D. guest at Helmholtz Institute Freiberg for Resource Technology, Germany. Her professional interests comprise deep learning methods (DL) applied to Remote Sensing (RS) image analysis. Currently, she is a postdoc researcher at the Laboratory of Geo-information Science and Remote Sensing, Wageningen University & Research. Her research project focuses on predicting near-future deforestation using multi-sensor time-series data and artificial intelligence. 

From Algebraic Topology to Data Analysis

Raphaël Tinarrage - Getulio Vargas Foundation

Abstract: In this course, I will present Topological Data Analysis (TDA), and in particular persistent homology. This theory, born in the early 2000s, has now largely invested the field of computational geometry and data analysis in general. In a few words, in TDA, we seek to discover and understand the topology – that is to say, the shape – of datasets. Rather than applying rigid models to the data, we preserve their inherent complexity, which we explore through topological invariants. By illuminating data analysis from a new angle, TDA opens the door to new insights and discoveries.
In order to present both the mathematical and practical aspects of TDA, the course will be divided into three sessions. In the first one, I'll explain what topology means, what topological invariants are, and how they can help us understand datasets. During the second session, we will focus on the estimation in practice of these invariants - persistent homology and its famous persistence diagrams will come into play. We'll finish with a programming session: I invite everyone to come with a laptop and the python library Gudhi installed (available on pip).

Short-bio: I learned mathematics at Institut de Mathématiques d'Orsay, as well as École Normale Supérieure de Saclay, where I obtained my agrégation (teaching diploma) and master degree. I did a Ph.D. in Topological Data Analysis (TDA), under the supervision of Frédéric Chazal and Marc Glisse, in the DataShape team (INRIA Saclay). I am now a post-doc at Escola de Matemática Aplicada FGV, working on theoretical foundations and applications of TDA, and more generally algorithmic and algebraic aspects of combinatorial topology.

A Arte de Modelagem via Processos Gaussianos

César Mattos - Federal University of Ceará

Abstract: Processos Gaussianos (Gaussian Processes - GPs) são modelos Bayesianos não paramétricos que permitem a quantificação da incerteza no contexto de aprendizagem de máquina probabilística. Importantes saltos teóricos e práticos na última década têm permitido pesquisadores explorarem a expressividade e eficiência de modelos de GPs nas mais diversas disciplinas. Este minicurso apresenta tanto os princípios fundamentais quanto alguns desenvolvimentos mais modernos sobre modelagem via GPs. A explanação parte dos blocos constituintes de um modelo de GP, passa pela escolha de funções de kernel e estratégias de inferência, seguindo com extensões não Gaussianas e escaláveis para grandes quantidades de dados. Ao final, tópicos avançados, como modelos de variáveis latentes e representações profundas, serão apresentados para ilustrar pesquisas mais recentes. O objetivo dos encontros é inspirar os participantes a incorporar modelos de GPs em suas aplicações e motivar o estudo e a pesquisa nesta área.

Short-bio: César Lincoln Cavalcante Mattos tem graduação (2009), mestrado (2011) e doutorado (2017) em Engenharia de Teleinformática pela Universidade Federal do Ceará (UFC), com período de doutorado sanduíche na University of Sheffield, Reino Unido. Desde 2018 é professor adjunto do Departamento de Computação da UFC, pesquisador associado do grupo de pesquisa em Lógica e Inteligência Artificial (LOGIA), e membro do Programa de Mestrado e Doutorado em Ciência da Computação (MDCC-UFC). Tem experiência em pesquisa nas áreas de Processos Gaussianos, Aprendizado Profundo, Identificação de Sistemas e Detecção de Anomalias. Tem aplicado métodos de Aprendizagem de Máquina em diversos projetos de pesquisa e desenvolvimento em tarefas de modelagem de sistemas dinâmicos, análise de risco em saúde, visão computacional e prognóstico de falhas.