Topic 5: Parallel and Distributed Data Management
1 Description
The proliferation of large and heterogeneous data sets poses a complex hierarchy of requirements ranging from the integration and management of data to complex data analytics, both for cloud and high-performance computing environments. In addition, managing varied data requires solutions that integrate several data management paradigms. Thus, data intensive applications require new approaches and efficient techniques to perform such tasks on the locally stored or geographically dispersed data to cope with this data explosion and heterogeneity.
An important issue is the design of highly scalable distributed data platforms offering consistency levels and programming models capable of simplifying the development of complex, big-data applications, with the ultimate goal of shielding programmers from sources of complexity like concurrency, distribution, and failures. The profound understanding of applications and storage systems, leading to these scalable data platforms, should be based on empirical evidence.
It is still necessary to improve the provisioning, staging, manipulation, continuous maintenance, and monitoring of data hosted in distributed and heterogeneous systems, including the interaction between object storage systems, key-value stores, and parallel file systems with batch-systems and middleware environments. The issue of self-tuning is also of paramount importance for distributed data platforms, which aim to minimize the infrastructure's operational costs or to provide quality-of-service levels by elastically adapting their scale to match dynamic shifts of the workload. Interestingly, these problems can be approached using inter-disciplinary methodologies, such as machine learning, analytical modelling, and control theory.
The parallel and concurrent execution at all levels remains key to enable the development of scalable and effective data intensive applications, which is also affected by enhanced capacities and extended functionalities of the IT infrastructures.
This topic seeks papers in all aspects of distributed and parallel data management and data intensive applications, which are focused around the notions of concurrency, parallelism and distributed processing.
2 Focus
- Parallel, replicated, and highly-available distributed databases
- Data-intensive clouds and grids
- Empirical evaluation of storage systems
- Middleware for processing large-scale data
- Distributed and parallel transaction and query processing over homogeneous and heterogeneous management paradigms
- Management of parallel and distributed data sources
- Integration of large datasets on parallel systems
- Internet-scale data-intensive applications
- Sensor-network data management
- Mobile data management
- Parallel and distributed information retrieval
- Data-intensive peer-to-peer systems
- Cloud- and HPC-based storage architectures and file systems
- Parallel data streaming and data stream mining
- NoSQL data management and analysis: key value, graph management, etc.
- Parallel and distributed knowledge discovery and data mining
- Algorithms for security and privacy in data management
- New storage hierarchies in distributed data systems based on Flash- and NVRAM-technologies
3 Topic Committee
3.1 Global chair
- André Brinkmann, University of Mainz, Germany
3.2 Local chair
- Harald Kosch, University of Passau, Germany
3.3 Additional members
- Gabriel Antoniu, INRIA Rennes, France
- Veronika Sonigo, FEMTO-ST, Besançon, France