Topic 5: Parallel and Distributed Data Management

1 Description

The proliferation of large and heterogeneous data sets poses a complex hierarchy of requirements ranging from the integration and management of data to complex data analytics, both for cloud and high-performance computing environments. In addition, managing varied data requires solutions that integrate several data management paradigms. Thus, data intensive applications require new approaches and efficient techniques to perform such tasks on the locally stored or geographically dispersed data to cope with this data explosion and heterogeneity.

An important issue is the design of highly scalable distributed data platforms offering consistency levels and programming models capable of simplifying the development of complex, big-data applications, with the ultimate goal of shielding programmers from sources of complexity like concurrency, distribution, and failures. The profound understanding of applications and storage systems, leading to these scalable data platforms, should be based on empirical evidence.

It is still necessary to improve the provisioning, staging, manipulation, continuous maintenance, and monitoring of data hosted in distributed and heterogeneous systems, including the interaction between object storage systems, key-value stores, and parallel file systems with batch-systems and middleware environments. The issue of self-tuning is also of paramount importance for distributed data platforms, which aim to minimize the infrastructure's operational costs or to provide quality-of-service levels by elastically adapting their scale to match dynamic shifts of the workload. Interestingly, these problems can be approached using inter-disciplinary methodologies, such as machine learning, analytical modelling, and control theory.

The parallel and concurrent execution at all levels remains key to enable the development of scalable and effective data intensive applications, which is also affected by enhanced capacities and extended functionalities of the IT infrastructures.

This topic seeks papers in all aspects of distributed and parallel data management and data intensive applications, which are focused around the notions of concurrency, parallelism and distributed processing.

2 Focus

Parallel, replicated, and highly-available distributed databases
Data-intensive clouds and grids
Empirical evaluation of storage systems
Middleware for processing large-scale data
Distributed and parallel transaction and query processing over homogeneous and heterogeneous management paradigms
Management of parallel and distributed data sources
Integration of large datasets on parallel systems
Internet-scale data-intensive applications
Sensor-network data management
Mobile data management
Parallel and distributed information retrieval
Data-intensive peer-to-peer systems
Cloud- and HPC-based storage architectures and file systems
Parallel data streaming and data stream mining
NoSQL data management and analysis: key value, graph management, etc.
Parallel and distributed knowledge discovery and data mining
Algorithms for security and privacy in data management
New storage hierarchies in distributed data systems based on Flash- and NVRAM-technologies

3 Topic Committee

3.1 Global chair

André Brinkmann, University of Mainz, Germany

3.2 Local chair

Harald Kosch, University of Passau, Germany

3.3 Additional members

Gabriel Antoniu, INRIA Rennes, France
Veronika Sonigo, FEMTO-ST, Besançon, France