We describe the use of reliable multicasts in existing 
message passing (MPI) programs to reduce collective 
communication bottlenecks. By broadcasting information 
through one point-to-multipoint message instead of a series 
of (tree-structured) point-to-point TCP messages, 
communication delays can be reduced by factors up to 
$log_2(N_{nodes})$, e.g., 6x for 64-node clusters. This 
reduction in the cost of collective communications broadens 
the class of problems appropriate for Beowulf-style 
supercomputing. We present raw communication and science 
application (Swift N-body integrator) speed-up results. 

In addition to impressive gains in scaling for some classes 
of existing programs, the broader benefit of these results 
is to reduce the importance of domain decomposition. The 
cost of sending a message to all nodes is the same as one 
node-to-node message. This allows Beowulfs to be applied 
effectively even to problems requiring every node to know 
the current state of all other nodes' particles/cells/.... 
This should significantly increase the range of problems 
approachable with relatively cheap, Ethernet-based Beowulf 
clusters. The reduced cost of collective communications can 
also make it easier to write adequate parallel programs, 
thereby increasing the pool of potential Beowulf 
programmers. Finally, we discuss as a future possibility how 
Internet2's ubiquitous support for multicasts and high 
reliability make it a promising medium for expanding these 
results to geographically distributed computing resources. 

This research is supported by NASA AISRP through grant 
NAG5-9320.