Euro-Par 2015 Accepted Papers
Software consolidation as an efficient energy and cost saving solution for a SaaS/PaaS cloud model
Targeting the Parallella
A Multicore Parallelization of Continuous Skyline Queries on Data Streams
Automatic Data Layout Optimizations for GPUs
Fast Parallel Suffix Array on the GPU
Leveraging MPI-3 shared-memory extensions for efficient PGAS runtime systems
Systematic Fusion of CUDA Kernels for Iterative Sparse Linear System Solvers
Efficient Nested Dissection for Multicore Architectures
A Practical Transactional Memory Interface
Improving Performance of Convolutional Neural Networks by Separable Filters on GPU
Effective Barrier Synchronization on Intel Xeon Phi Coprocessor
PR-STM: Priority Rule Based Software Transactions on the GPU
Feature Extraction Multi-Level Hypergraph Partitioning Algorithm
Non-preemptive Throughput Maximization for Speed-Scaling with Power-Down
Accelerating Lattice Boltzmann Applications with OpenACC
Performance impacts with Reliable Parallel File Systems at Exascale level
High Performance Multi-GPU SpMV for Multi-component PDE-based Applications
Hardware Round-Robin Scheduler for Single-ISA Asymmetric Multi-Core
Data Layout Optimization for Portable Performance
Scalable Data-driven PageRank: Algorithms, System Issues & Lessons Learned
Moody Scheduling for Speculative Parallelization
Semi-Discrete Matrix-Free Formulation of 3D Elastic Full Waveform Inversion Modeling
A Duplicate-Free State-Space Model for Optimal Task Scheduling
Efficient Execution of Multiple CUDA Applications using Transparent Suspend, Resume and Migration
Automatic On-line Detection of MPI Application Structure with Event Flow Graphs
How many threads will be too many? On the scalability of OpenMP implementations
Parallelization of an advection-diffusion problem arising in edge plasma physics using hybrid MPI/OpenMP programming
Scheduling Trees of Malleable Tasks for Sparse Linear Algebra
A Connectivity Model for Agreement in Dynamic Systems
MPI Thread-level Checking for MPI+OpenMP Applications
VMPlaceS A Generic Tool to Investigate and Compare VM Placement Algorithms
Optimizing Task Parallelism with Library-Semantics-Aware Compilation
Low-overhead detection of memory access patterns and their time evolution
DFEP: Distributed Funding-based Edge Partitioning
Event-Action Mappings for Parallel Tools Infrastructures
Iterative Sparse Triangular Solves for Preconditioning
Allocating jobs with periodic demands
Exploiting Task-Based Parallelism in Bayesian Uncertainty Quantification
Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems
High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters
Concurrent Priority Queues are not Good Priority Schedulers
On the Heterogeneity Bias of Cost Matrices when Assessing Scheduling Algorithms
10,000 performance models per minute - scalability of the UG4 simulation framework
A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
Scheduling tasks from selfish multi-tasks agents
Behavioral Non-Portability in Scientific Numeric Computing
Load Balancing Prioritized Tasks via Work-Stealing
A Composable Deadlock-free Approach to Object-based Isolation
Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines
Rapid Tomographic Image Reconstruction via Large-Scale Parallelization
Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime