Professor
Kai Hwang, University of Southern California, Los Angeles,
California, USA
Abstract: This talk covers fault tolerant architecture, single-system-image (SSI) services, and innovative applications of clusters built with commodity components. I will introduce several new R&D ideas through the architectural design and benchmark experiences of the Trojans PC/Linux Cluster built at the University of Southern California. The Trojans cluster is unique in its architectural features supported by middleware consolidation and Linux/OS extensions. The major goal is to support critical SSI services and fault-tolerant operations of the cluster. A new hierarchical checkpointing scheme is presented for building large-scale clusters with a distributed RAID architecture.
On a worldwide scale, many innovative applications have explored the
supercomputing power of the PC/WS clusters. The talk illustrates several
new applications for distributed multimedia processing, intelligent software
agents, data-mining in E-commerce, and bio-informatics for health-care.
We present recent research results on parallel image rendering, video-on-demand
scheduling, financial and economic analysis, and parallel gene/DNA sequence
matching. Impacts to information technology and scalable commodity computing
trends will be discussed.
Dr.
Anthony Skjellum, President, MPI Software Technology Inc., USA
Abstract: A lot of attention has been paid to so-called Beowulf/Avalon clusters, where PCs or Alphas are strung together with 100Mbit/s Ethernet and portable programs from supercomputers have been run on these, particularly when modest bandwidth and latency requirements are posed by the example applications. In addition, heroic efforts to scale clusters using early gigabit/s scalable fabrics has been done across the world, but these systems, like the Beowulf counterparts, have relied on software from the previous generation of multicomputers and supercomputing systems.
However, commercial-grade software tools (middleware and distributed
environments) for clusters have matured considerably since the initial
Beowulf type experiments, as have the availability of easy-to-use cluster
interconnects. In this talk we review the technical achievements thus far
in production-grade environments for both message passing and cluster scheduling,
both for NT and Linux. This talk emphasises the option of having tools
and hardware that is scalable, to varying degrees, and presents a taxonomy
of hardware, software, and applications that divides the space of activities
and also seeks to establish areas where additional opportunities for new
software and other tools exist. Issues of security and scalability are
considered as are cost of ownership vs. freeware, with proposed economic
models for both company and university adopters of clusters. We discuss
pros and cons of open source vs. commercial products as viable options
moving forward.
Professor David Abramson, Monash University, Melbourne,
Australia
Abstract: PC Clusters have become an extremely inexpensive way to build quite high powered parallel machines, and it is clear that this trend will continue for some time. Such clusters can provide enormous computational power for individual projects or departments within an organisation. However, as the number of clusters increases within an enterprise, and then globally, there is the need for a software architecture which can integrate these into larger "grids of clusters".
This talk will focus on the issues of integratation, with particular
reference to "Globus", an international effort designed to deliver a toolkit
for programming on a global computational grid. I will cite our experience
at using Nimrod (and its commercial counterpart, Clustor), a tool for performing
parametric modelling, and explain the design and development of a new tool
called Nimrod/G, which is "Grid Aware". Nimrod/G provides specific functions
for resource location and scheduling where the machine base is highly variable.
The talk will then discuss some of the related enhancements to Nimrod,
such as Nimrod/O, a tool for performing complex computational optimisation
using the grid.
Presentation Slides.