LCMON 1.1 (L3DGEWorld Cluster-node Monitoring)

August 9th, 2007
By Carl Javier (CAIA winter intern, 2007)
(website co-authored with Grenville Armitage)

Overview

LCMON 1.1 (L3DGEWorld Cluster-node Monitoring) demonstrates the use of L3DGEWorld 2.2 to provide near real-time visualisation of the Swinburne supercomputer cluster. Cluster nodes are represented within an interactive 3D environment by unique entities floating in space. Each entity is animated to represent activity in the associated cluster node (such as current CPU load or memory usage). In a typical scenario an LCMON 1.1 server provides a virtual world into which LCMON clients connect and view activity in the Swinburne supercomputer cluster. Multiple LCMON clients may simultaneously connect to a LCMON server and independently move around the virtual world.

LCMON 1.1's core, L3DGEWorld, is being developed as a network monitoring and control application based on Open Arena (a game built on the GPL'd Quake III Arena engine). LCMON demonstrates how L3DGEWorld may be more broadly utilised to create interactive 3D virtual worlds within which arbitrary real-time state information is represented.

Building on Open Arena means that LCMON (and L3DGEWorld) are easy to install (and rebuild if desired) under Windows, Mac OSX, FreeBSD and Linux.

Swinburne Supercomputer

Run by the Centre for Astrophysics and Supercomputing, the Swinburne Supercomputer (as of July 2007) consists of over 1160 processors across 145 cluster nodes and has a theoretical peak processing capacity of 10 Teraflops. Each cluster node contains 2 quad-core Clovertown processors running at 2.33 GHz.  The nodes are controlled by a head node which distributes jobs to the cluster via a queue system (itself controlled by Moab cluster management software).  The head node is named 'green', and cluster statistics are currently provided through a web interface known as ganglia. Cluster nodes are named 'shrek001.ssi.swin.edu.au', 'shrek002.ssi.swin.edu.au', and so-on. (Internal users may access more details about the cluster here.)

Visual appearance, at a glance

The following screenshots capture the view seen by a single user after logging into an LCMON 1.1 server and flying around the virtual environment.  When a user first logs into the LCMON server they are placed floating above the amphitheatre of stars representing cluster nodes in the Swinburne supercomputer. Figure 1 shows a screenshot of LCMON before prior to being populated with real time metrics from the supercomputer. Figure 2 shows a screenshot of LCMON after the nodes have been populated with metrics.

Different types of activity in the Swinburne supercomputer causes the entities in the 3D world to change their physical behaviour. The entities will spin and change colour as a function of the CPU load. Changes in packet rate causes the entities to bounce on the spot and memory consumption of the nodes is reflected by entities changing in size.

Screen shot    
   



Rotation rate & colour
CPU Load (%)

Scale size
Memory Usage (%)

Bounce height
Traffic in (PPS)
Figure 1: Supercomputer cluster node overview with no state information
Figure 2: Snapshot of cluster nodes populated with metrics from the Swinburne supercomputer

Detailed images and examples

Additional images and examples of the LCMON 1.1 user interface can be found here.

Example video

The following video (hosted on YouTube) illustrates LCMON's dynamic representation of cluster state.

We begin with a single user's view of the cluster changing from idle to quite busy as more and more nodes become active. At about 1min20sec we show how the user can learn detailed information about particular cluster nodes either by flying up close to the node's star, or by 'shooting' a node's star from a distance. At 1min54sec we show how a second user (who are themselves inspecting cluster nodes) would appear within the virtual environment. (Note that the cluster behaviour shown here has been synthesised for demonstration purposes.)

Note the video created shows LCMON 1.0 in operation, however there are no major visual differences between LCMON 1.0 and LCMON 1.1



Functional overview and live demonstration server

As illustrated in Figure 3, the major components of an LCMON 1.1 system are:
  • LCMON Server
    •  An instance of L3DGEWorld 2.2 server running a specific 'map' to represent the LCMON 1.1 virtual world.
    • Gpoll (ganglia poll) daemon 0.2 - A utility which parses the data obtained from the supercomputer and transmits the relevant metrics to the L3DGEWorld server.
  • LCMON Client
    • An instance of L3DGEWorld 2.2 client, allow users to interact with the LCMON virtual world.

There is a publicly accessible LCMON server at l3dgeworld.caia.swin.edu.au:27960. This server actively polls the Swinburne supercomputer cluster every 2 minutes.

The gpoll.sh script is executed by a cron job every 2 minutes in order to fetch and update the cluster nodes in the LCMON virtual world (step 1). The state of all cluster nodes is handed back in XML-encoded form (step 2), which Gpoll then parses (step 3) to extract relevant data (such as CPU load, memory usage, and network traffic in/out).  Gpoll 0.2 uses a generalised interface to the L3DGEWorld 2.2 engine to update the entities, or stars (step 4).
Multiple LCMON clients running on different platforms may connect to the LCMON server to view activity in the supercomputer cluster (step 5).


Figure 3: Information flow and client-server relationship in LCMON 1.1

(For people familiar with Quake III Arena, LCMON is essentially a custom map running on a modified Quake III Arena game engine. The client and dedicated server executables have both been modified to enable L3DGEWorld 2.2 to communicate with external daemons. As with a normal Quake III Arena game, one or more clients may be connected to a single server, each one rendering a separately controlled view of the virtual environment. Clients may connect and disconnect from the server at any time without disrupting the server's virtual environment, and may do so from where ever there is UDP/IP connectivity to the LCMON Server.)

LCMON 1.1 may be used to monitor other supercomputer clusters (or even entirely different systems) by modifying Gpoll (and optionally redesigning the Quake III Arena 'map' [2] used to represent the virtual environment shown in Figures 1 and 2).

System Requirements

LCMON 1.1's underlying L3DGEWorld 2.2 core (both client and server sides) has been verified to run on FreeBSD 6.2, Mac OS X 10.4.9, Linux (Ubuntu 7.04) and Windows XP Platforms (with the addition of cygwin for server-side scripts). Gpoll 0.2 has only been verified to run on FreeBSD 6.2 (although we believe it should be portable to other platforms).

[Update 5 July 2010: LCMON 1.1 will run as-is under Mac OS X 10.6.4 Snow Leopard, and the win32 binaries will run under FreeBSD 8.x using Wine 1.2-rc4. However the native FreeBSD binaries will not run under FreeBSD 8.x (they look for older versions of libm.so and libc.so, and unfortunately a bug (in OpenArena pre version 0.7) prevents the underlying L3DGEWorld 2.2 source code from recompiling under FreeBSD 8.x. Nevertheless, it is possible to copy the LCMON 1.1 maps and config files into L3DGEWorld 2.3, which will recompile under FreeBSD 8.x. Gpoll will compile and run under FreeBSD 8.x]

LCMON Client Requirements:
  • Video card supporting OpenGL acceleration.
  • libSDL and libOpenAL are required on FreeBSD and Linux.
Dedicated LCMON Server Requirements:
  • FreeBSD 6.2
  • BASH (Bourne Again SHell)
  • libSDL and libOpenAL are only required if you run a dedicated server using the ioquake3 binaries. This is because they contain both the client and server components of L3DGEWorld, the ioq3ded binaries contain only the server component which does not require SDL or OpenAL.
Licensing

L3DGEWorld 2.2 and LCMON 1.1 are copyright (C) 2007, the Centre for Advanced Internet Architectures, Swinburne University of Technology, and distributed under version 2 of the GNU General Public Licence.

Download

Download LCMON 1.1 package:

Authors and Acknowledgments

  • gpoll 0.2 and the LCMON 1.1 release was contributed to by Adam Black and Lucas Parry
  • LCMON 1.0 was developed by Carl Javier and is based on L3DGEWorld 2.1. LCMON was developed under the supervision of Grenville Armitage
  • We appreciate the co-operation from Dr Jarrod Hurley and Professor Matthew Bailes from the Centre for Astrophysics and Supercomputer.
  • L3DGEWorld 2.2 was developed by Lucas Parry.
  • The Gpoll input daemon was developed by Carl Javier and Lucas Parry.
  • We have received a lot of valuable feedback, website editing and system testing by Grenville Armitage.
  • Thanks to the OpenArena team - their free textures and artwork on the Quake III Arena codebase made it possible for us to distribute LCMON as a complete package.

References

  1. L.Parry "L3DGEWorld 2.1 Input & Output Specifications", CAIA Tech Report 070808A, August 2007
  2. C.Javier "Map & Entity Modeling for L3DGEWorld, CAIA Tech Report 070809A, August 2007

Change Log

Changes in LCMON 1.1
  • LCMON 1.1 is based on the L3DGEWorld 2.2 package.
  • Reduced the delay between sending messages to the L3DGEWorld server, so the entities are now updated quicker.
  • Other bugfixes and improvements, mainly to gpoll.c


Go back to the L3DGE project main page


Last Updated: Monday 5-Jul-2010 15:10:56 EST | Maintained by: Grenville Armitage (garmitage@swin.edu.au) | Authorised by: Grenville Armitage (garmitage@swin.edu.au)