Description
The limitations of the standard collective relocalization
operations justify the need for extended versions which provide more
flexibility to the programmer. The MPI library provides a set of collective
operations as well as their variants. The proposed extensions for UPC closely
follow MPI's approach. However, there are subtle differences in the
programming models, which naturally translates to syntax differences. It is
important to note that MPI does not provide anything analogous to UPC's
permute operation; nor is synchronization as big of an issue with collective
operations in MPI as it is in UPC. For each standard UPC collective function,
there can be two variants. In the "vector" variant, each block of data can be
of different size; and it allows the function to pick distinct non-contiguous
data-blocks. The second variant is a further generalization of the first
variant. In this case, the programmer may explicitly specify each data-block
and their size.
Notes on implementation
The first version of this implementation was written as part of the last
paper on the list below and extended to include the MYSYNC synchronization mode
for the first paper below.
Please look over the Makefile to make sure in contains the correct paths to
the compiler before compiling. Run 'make try_all' to compile and execute a test
program to make sure everything was compiled properly.
Sourcecode
References
Implementing UPC's MYSYNC Synchronization Mode Using Pairwise
Synchronization of Threads
Technical Report 05-07, Michigan Technological
University, Department of Computer Science (2005)
Author(s): P. Dhamne and S. Seidel
Abstract
(
pdf |
ps )
High Performance Unified Parallel C (UPC) Collectives for Linux/Myrinet
Platforms
Technical Report 04-05, Michigan
Technological University,
Department of
Computer Science
(August,
2004)
Author(s):
A.
Mishra
and
S.
Seidel
Abstract: Unified Parallel C (UPC) is a partitioned shared memory
parallel programming language that is
being developed by a consortium of academia, industry and government. UPC is an
extension
of ANSI C. In this project we implemented a high performance UPC collective
communications library of functions to perform data relocalization in UPC programs for
Linux/Myrinet clusters. Myrinet is a low latency, high bandwidth local area network. The
library was written
using Myrinet's low level communication layer called GM.
We implemented the broadcast, scatter, gather, gather all, exchange and permute
collective functions as defined in the UPC collectives specification document. The
performance of these
functions was compared to a UPC-level reference implementation of the
collectives library. We also designed and implemented a micro-benchmarking application to measure and
compare the
performance of collective functions. The collective functions implemented in GM
usually ran
two to three times faster than the reference implementation over a wide range of
message
lengths and numbers of threads.
(
pdf
|
ps
)
Last modified 12/8/4