The aim of the panel session is for Australian researchers to present a brief summary of their current work in the area of cluster computing, including highlighting aspects such as: important results, software made available and commercial products. In addition we would like panel members to address some of the following questions:
Members:
Clusters are a very cheap way of gaining computing power. However, the challenge is to produce software environments that provide an illusion of a single resource, rather than a collection of independent computers. I will briefly address this issue.
Parallelising compilers -- Bill Appelbe
Twenty years of research and development in parallelizing compilers still leaves us far short of the "holy grail" of a tool that could automatically convert serial programs into efficient parallel programs. The open question is what progress is likely in the next decade, and how compiler technology and tools will affect the adopting of cluster computing by a broad user community beyond expert programmers.
Beowulf cluster systems -- Ken Hawick
Beowulf cluster systems are becoming widely prevalent
at academic and industrial sites across the world. There are clear
price-performance benefits over conventional supercomputer systems for
many applications. Recent developments suggest that the Beowulf model
is becoming viable not only for task farming applications but for large
scale parallel programs with more intensive communications patterns. These
issues are of great interest for the future of both cluster computing and
parallel computing. Work is in progress to embody the research ideas
in parallel computing from the last 20 years into libraries and templates
of code to aid the use of Beowulf systems. Some
links are given at: http://dhpc.adelaide.edu.au/projects/beowulf
DISCWorld -- Ken Hawick
DISCWorld is a long term project to research the issues in building long lived, robust, distributed metacomputing systems. A great deal of software technology, particularly from the last few years enables "Problem Solving Environments" to be built. These aid non-specialists or rather application domain experts to use distributed clustered resources - both on their local network as well as services provided by specialist remote servers at other sites. DISCWorld is an attempt to assimilate state-of-the-art research ideas into a high level concept framework, manifesting itself as an ongoing series of prototypes and technology integration studies. Some further description and links are available at: http://dhpc.adelaide.edu.au/projects/DISCWorld
Scheduling -- Albert Zomaya
In either sequential or parallel systems, the
architecture is characterized by functional components, the communication
topology and facilities, and control
structures and mechanisms. However, there are
several issues related to parallelization that do not arise in sequential
programming. One of the most important issues is task-allocation, that
is the breakdown of the total workload into smaller tasks assigned to different
processors, and the proper sequencing of the tasks when some
of them are interdependent and cannot be executed
simultaneously. To achieve the highest level of performance it is important
to ensure that each processor is properly utilized. This process is called
load-balancing or scheduling and it is considered to be extremely "formidable"
to solve. The scheduling problem belong
to a class of problems known as NP-complete.
See http://www.ee.uwa.edu.au/~paracomp/projects.html.
Applications of the Swinburne Supercluster in Astrophysics -- Matthew Bailes
The Swinburne supercluster is a 65-node configuration of alpha workstations which is involved in several major projects involving large-scale processing. The applicability of clusters to observational astronomy will be discussed along with tools to manage the processing and the proposed upgrade of the system to 256 nodes.
Parallel/cluster computing at ANU -- Chris Johnson
Parallel and cluster computing at ANU have a long
history and a diversity of projects and interests. There is a history of
programming systems and operating
systems research and development on the Fujitsu
multiprocessor computers that range from AP1000 (128 x SPARC 1) to AP3000
(12 x 170 MHz UltraSparc)
on fast proprietary networks, ranging from proprietary
operating systems to Linux, and some (applications) history with a Thinking
Machines CM-5. There are currently at least 2 Beowulf clusters at ANU within
the strict classification: 12 x 533 MHz Alphas on Fast Ethernet at ANU
Supercomputer Facility (Ben Evans), and 9 x 400MHz Pentium IIs at Research
School of Information Sciences and Engineering (Jonathan Baxter); and a
project to purchase a 128 processor system
is under way (combined RSISE and Dept of Computer
Science), for a variety of applications and systems development projects.
In systems, the commodity cluster movement has
been trading off faster communication against cost, and the proprietary
machines' networks have also lost their previous balance of communications
speed against processor speed. The two approaches have in common the need
for operating systems developments to
improve communications speeds through optimistic
protocols, minimal buffering, user-process level calls, and coordinated
scheduling of processes across the
cluster. I expect that recent experience with
the proprietary machines will feed across to developments in the Linux
domain. It is also a concern to see that the software development environments
for clusters are still at a very low level; simple, widely applicable,
portable tools for the Linux cluster environment are a priority.
The Mianjin Parallel Programming Language -- Paul Roe
We have designed and implemented a parallel programming language, Mianjin, for programming non-dedicated clusters of workstations. Mianjin supports a virtual shared object space and uses type information to enforce safe communication in the presence of abstractions. I will outline some of the interesting features of Mianjin and our current cluster computing research at QUT www.plasrc.qut.edu.au/Gardens.
Operating Systems and Execution Environments Supporting Parallel Processing on Clusters -- Andrzej M. Goscinski
Many research groups from universities, research laboratories and industry have been involved in the last few years in studying the development of software which would enable efficient parallel processing on Clusters. Some results, in particular in the area of finding and expressing parallelism, are very good. However, parallelism management problems do not have satisfactory solutions and lessons learnt in one area are not used to form a uniform vision and approach to developing software to support parallel processing on Clusters. Available parallel programming packages, parallel programming languages and parallelising compilers are only supported by classical network operating systems, e.g., Unix-based. Parallelism management is very limited and does not go beyond basic process and communication management. Thus, there is a need to identify and discuss the basic issues of and solutions to the management of parallel processing on Clusters, and to propose a new solution. It is claimed that parallelism management be offered by an operating system that inherits some features of a distributed operating system and provides new services which address the needs of parallel processes, Cluster’s resources, and application programmers. This approach will allow the achievement of high performance of parallel processing on Clusters; relieve the programmer from error prone and time consuming work of allocation of processes to workstations, interprocess communication and process synchronization; provide single system image by supporting transparency; make the whole Cluster based parallel system easy to use, and allow to use resources efficiently.
Australian Partnership for Advanced Computing (APAC) Plans and Strategies -- John O'CallaghanThe Australian Partnership for Advanced Computing (APAC) has been established with a grant of $19.5m from the Federal Government to underpin significant achievements in Australian research, education and technology diffusion by establishing and supporting an effective advanced computing capability ranked in the top 10 countries. One of the roles for APAC is to provide users particularly in the Higher Education sector with 'peak' computing systems far beyond the capacity that is currently available. Another important role for APAC is to strengthen the expertise and skills necessary for the effective use and development of these facilities. The broader role for APAC is to form a partnership to lead the development of an Australia-wide computing and communications systems infrastructure supported by Centres of Expertise in advanced computing. APAC is in the process of selecting a peak computing system for a National Facility to be based at the ANU, and developing strategies for strengthening complementary infrastructure at other locations around Australia. The talk will outline the current state of the APAC plans and strategies.