Panel

Cluster Computing R&D in Australia

The aim of the panel session is for Australian researchers to present a brief summary of their current work in the area of cluster computing, including highlighting aspects such as: important results, software made available and commercial products. In addition we would like panel members to address some of the following questions:

There will be a brief question and answer session at the end.

Moderator: Paul Roe, Queensland University of Technology

Members:

Software Tools for Cluster Computing -- David Abramson

Clusters are a very cheap way of gaining computing power. However, the challenge is to produce software  environments that provide an illusion of a single resource, rather than a collection of independent computers. I will briefly address this issue.

Parallelising compilers -- Bill Appelbe

Twenty years of research and development in parallelizing compilers still leaves us far short of the "holy grail" of a tool that could automatically convert serial programs into efficient parallel programs. The open question is what progress is likely in the next decade, and how compiler technology and tools will affect the adopting of cluster computing by a broad user community beyond expert programmers.

Beowulf cluster systems -- Ken Hawick

Beowulf cluster systems are becoming widely prevalent at academic and industrial sites across the world.  There are clear price-performance benefits over conventional supercomputer systems for many applications.  Recent developments suggest that the Beowulf model is becoming viable not only for task farming applications but for large scale parallel programs with more intensive communications patterns. These issues are of great interest for the future of both cluster computing and parallel computing.  Work is in progress to embody the research ideas in parallel computing from the last 20 years into libraries and templates
of code to aid the use of Beowulf systems. Some links are given at:  http://dhpc.adelaide.edu.au/projects/beowulf

DISCWorld -- Ken Hawick

DISCWorld is a long term project to research the issues in building long lived, robust, distributed metacomputing systems.  A great deal of software technology, particularly from the last few years enables "Problem Solving Environments" to be built.  These aid non-specialists or rather application domain experts to use distributed clustered resources - both on their local network as well as services provided by specialist remote servers at other sites.  DISCWorld is an attempt to assimilate state-of-the-art research ideas into a high level concept framework, manifesting itself as an ongoing series of prototypes and technology integration studies. Some further description and links are available at: http://dhpc.adelaide.edu.au/projects/DISCWorld

Scheduling -- Albert Zomaya

In either sequential or parallel systems, the architecture is characterized by functional components, the communication topology and facilities, and control
structures and mechanisms. However, there are several issues related to parallelization that do not arise in sequential programming. One of the most important issues is task-allocation, that is the breakdown of the total workload into smaller tasks assigned to different processors, and the proper sequencing of the tasks when some
of them are interdependent and cannot be executed simultaneously. To achieve the highest level of performance it is important to ensure that each processor is properly utilized. This process is called load-balancing or scheduling and it is considered to be extremely "formidable" to solve. The scheduling problem belong
to a class of problems known as NP-complete. See http://www.ee.uwa.edu.au/~paracomp/projects.html.

Applications of the Swinburne Supercluster in Astrophysics -- Matthew Bailes

The Swinburne supercluster is a 65-node configuration of alpha workstations which is involved in several major projects involving large-scale processing. The applicability of clusters to observational astronomy will be discussed along with tools to manage the processing and the proposed upgrade of the system to 256 nodes.

Parallel/cluster computing at ANU -- Chris Johnson

Parallel and cluster computing at ANU have a long history and a diversity of projects and interests. There is a history of programming systems and operating
systems research and development on the Fujitsu multiprocessor computers that range from AP1000 (128 x SPARC 1) to AP3000 (12 x 170 MHz UltraSparc)
on fast proprietary networks, ranging from proprietary operating systems to Linux, and some (applications) history with a Thinking Machines CM-5. There are currently at least 2 Beowulf clusters at ANU within the strict classification: 12 x 533 MHz Alphas on Fast Ethernet at ANU Supercomputer Facility (Ben Evans), and 9 x 400MHz Pentium IIs at Research School of Information Sciences and Engineering (Jonathan Baxter); and a project to purchase a 128 processor system
is under way (combined RSISE and Dept of Computer Science), for a variety of applications and systems development projects.

In systems, the commodity cluster movement has been trading off faster communication against cost, and the proprietary machines' networks have also lost their previous balance of communications speed against processor speed. The two approaches have in common the need for operating systems developments to
improve communications speeds through optimistic protocols, minimal buffering, user-process level calls, and coordinated scheduling of processes across the
cluster. I expect that recent experience with the proprietary machines will feed across to developments in the Linux domain. It is also a concern to see that the software development environments for clusters are still at a very low level; simple, widely applicable, portable tools for the Linux cluster environment are a priority.

The Mianjin Parallel Programming Language -- Paul Roe

We have designed and implemented a parallel programming language, Mianjin, for programming non-dedicated clusters of workstations. Mianjin supports a virtual shared object space and uses type information to enforce safe communication in the presence of abstractions. I will outline some of the interesting features of Mianjin and our current cluster computing research at QUT www.plasrc.qut.edu.au/Gardens.

Operating Systems and Execution Environments Supporting Parallel Processing on Clusters -- Andrzej M. Goscinski

Many research groups from universities, research laboratories and industry have been involved in the last few years in studying the development of software which would enable efficient parallel processing on Clusters. Some results, in particular in the area of finding and expressing parallelism, are very good. However, parallelism management problems do not have satisfactory solutions and lessons learnt in one area are not used to form a uniform vision and approach to developing software to support parallel processing on Clusters. Available parallel programming packages, parallel programming languages and parallelising compilers are only supported by classical network operating systems, e.g., Unix-based. Parallelism management is very limited and does not go beyond basic process and communication management. Thus, there is a need to identify and discuss the basic issues of and solutions to the management of parallel processing on Clusters, and to propose a new solution. It is claimed that parallelism management be offered by an operating system that inherits some features of a distributed operating system and provides new services which address the needs of parallel processes, Cluster’s resources, and application programmers. This approach will allow the achievement of high performance of parallel processing on Clusters; relieve the programmer from error prone and time consuming work of allocation of processes to workstations, interprocess communication and process synchronization; provide single system image by supporting transparency; make the whole Cluster based parallel system easy to use, and allow to use resources efficiently.

Australian Partnership for Advanced Computing (APAC) Plans and Strategies -- John O'Callaghan

The Australian Partnership for Advanced Computing (APAC) has been established with a grant of $19.5m from the Federal Government to underpin significant achievements in Australian research, education and technology diffusion by establishing and supporting an effective advanced computing capability ranked in the top 10 countries. One of the roles for APAC is to provide users particularly in the Higher Education sector with 'peak' computing systems far beyond the capacity that is currently available. Another important role for APAC is to strengthen the expertise and skills necessary for the effective use and development of these facilities. The broader role for APAC is to form a partnership to lead the development of an Australia-wide computing and communications systems infrastructure supported by Centres of Expertise in advanced computing. APAC is in the process of selecting a peak computing system for a National Facility to be based at the ANU, and developing strategies for strengthening complementary infrastructure at other locations around Australia. The talk will outline the current state of the APAC plans and strategies.