Cloud Helptivity

Two researchers at an Australian university needed to perform a complex statistical analysis of their data. Using grant money, they bought two powerful computers to run MatLab on. On the first run they found that the computers couldn’t work with their datasets. The computers were underpowered! The researchers were forced to iteratively upgrade those machines, until finally, one was able to perform the required analysis. It had taken 12 months from the date of purchase to get to this point. With well over $20 000 of grant money spent on computers!

If, instead of purchasing hardware, those researchers had turned to the cloud, they would have had their initial computer running within minutes. If it turned out not to be powerful enough, they would have been able to upgrade to a larger computer almost instantly. They probably wouldn’t even have had to re-install their software.

The really good news for researchers is that everyone with an AAF login can experience this game changer for themselves, at virtually no cost: through the NeCTAR cloud. But before you go rushing to the NeCTAR cloud you have to understand that the cloud environment is a not the same as a dedicated computer! If you don’t take time to learn the subtle differences you will experience pain along the way.

In order to help you get up to speed on your voyage to the cloud, we have organized a “Cloud Helptivity” day. You bring your research computing needs, and we will match you up, one on one, with a cloud computing expert. The goal is simple:  you get a full day hands-on help utilising the cloud for your research. Help with what you want to do.

But to match you with an expert, we need to know that you are coming. Register here (https://goo.gl/n8aYiX).

BYO computer and we’ll help with the rest!

On Thursday, March 23, 2017, 10am – 5pm.

Posted in Training | Leave a comment

Auckland University of Technology Visit February 2017

A visit and presentation was conducted at Auckland University of Technology on February 27th, 2017. AUT has a small HPC research laboratory and like the rest of New Zealand makes us of the NESI national facilities, of which the “Pan” system hosted at the University of Auckland is local. As with other visits to New Zealand universities, there is a definite sense that whilst there is great virtue in having national facilities, there should also be local facilities for policy diversity, real-team processing, and especially storage and transfer matters. The AUT research centre has a particular interest in the Square Kilometre Array project with the director, Dr. Andrew Ensor, also holding the position of Director of the NZ SKA Alliance.

As with other institutional visits, a presentation on the history, architecture, development, and future of Spartan was provided to a group of members at the HPC Centre. Two aspects were of particular interest to those present: One being the optimisation of the system to account for large quantities of single core jobs or job arrays (especially through usage of MATLAB), and the second being the proposed development of abstracted job submissions for Virtual Laboratories. Continuing communication with the AUT team will inform them of further developments in that area.

Posted in Uncategorized | Leave a comment

Nyriad Visit February 2017

In recent weeks a project has been established by University of Melbourne marine research led by Dr. Eric Treml and Nyriad, an startup NZ company specialising in GPU software, in optimising code for marine population samples. Nyriad’s main mission however is aimed to resolve one of the biggest and growing issues in computation, the growing gap between data computation and data i/o. The technical solution, led by Alex St. John (whose works provided DirectX, the foundations for the first GPUs, and Google Maps), is to combine compute and i/o on GPUs.

This is, of course, a very simple description of a complex problem in development, and one which has been explored before (for example, see M. Silbersetin, GPUfs: Integrating a File System with GPUs, 2013), but without the technological breakthrough for mass adoption. The organisation itself follows the role of an “agile startup done right”, which I discuss in much more detail on my personal ‘blog (this is a technical ‘blog relating to research computing, not organisational management). Employing a wide-range of young engineers and computer scientists, often sourced the local Waikato University, a notable staff member is Bill Rogers, senior lecturer in Computer Science, from said institution (will this finally be the home of Avoca?).

Whilst at Nyriad, time was spent with Andreas Wicenec, through whom Nyriad has established milestone agreements for the processing of data for the presursor of the Square Kilometre Array. From the University of Melbourne’s point of view – in addition to their ability to assist research projects with GPU programming – two extremely important considerations include the ability of Nyriad’s technologies to replace traditional RAID and storage services with GPUs whilst providing greater resilience, and in terms of operating system any potential GPU extensions to the existing Spartan cluster.

Posted in Uncategorized | Leave a comment

Multicore World 2017

It is difficult to describe the annual Multicore World conference with brevity. For the past six years it has operated out of New Zealand, the brain-child of Nicolas Erdody of Open Parallel, and for five of those six years your ‘blogger has had the honour of MC for much of the proceedings. It is not a big conference by any stretch of the imagination, typically attracting around seventy participants. However rather like New Zealand itself, what it lacks in size it makes up with quality; Multicore World consistently manages to attract some of the most important names in computer science and the people dealing with the hardest of problems.

This year’s conference opened with Professor Satoshi Matsuoka giving an overview of Japan’s massive plans for developing High Performance Computing especially for the processing of large datasets and artificial intelligence. It was followed by a keynote by Professor Tony Hey on the convergence of data and compute in scientific research. Professor Michelle Simmons director of the Centre for Quantum Computation and Communication Technology gave an overview of the theory, current practise, and future plans of quantum computing. The opportunity has been taken to connect the with University of Melbourne’s Quantum Error Correction and Quantum Information project who are using a special Spartan partition researching the same. Spartan itself received special consideration, with your blogger presenting on the first year of implementation and the alternative HPC/cloud hybrid model used at the University of Freiburg. As a precursor to the following day’s emphasis on the SKA, Juan Guzman discussed the ASKAP Science Data Processing system, followed by Nathan DeBardeleben from Los Alamos National Laboratory on resilience (or the lack thereof) in supercomputing. The concluding keynote of the first day by Pete Beckman of the Northwestern Argonne Institute covered the use of computation for collecting and analysing real-time city event metrics.

The second day of Multicore World opened with Dr. Happy Sithole describing the implementation and operation of the Cheetah HPC system in South Africa and especially their education programme. This was followed by an address by the Honourable Paul Goldsmith, Minister for Tertiary Education, Skills and Employment, Minister of Science and Innovation, and Minister for Regulatory Reform. Their shadow, Clare Curran, was also at the conference (which she regularly attends) at participated in a subsequent panel. This was followed by three presentations and a panel on the Square Kilomtere Array, with the first presentation by Professor Andreas Wicene on Data Activated 流 (Liu) Graph Engine (DALiuGE), followed by Andrew Ensor on the status of the Square Kilometre Array in New Zealand, and finally Piers Harding on SKA-SDP middleware. Following this was two presentations on Linux kernel development, one from Balasz Gerofi on the IHK/McKernel multikernel and then by Paul McKenney on Read-Copy-Update. The day was concluded by a presentation on vehicular automation and technological dividends by the ever-entertaining and insightful Paul Fenwick.

The final day witnessed another presentation by Professor Satoshi Matsuoaka, on physical limitations to Moore’s Law with the observation that multicore was a change that can only occur once (however, see Angstrom Project. After another panel, this time on budgets, John Gustafson – noting hardware implementation issues with the Unums project, has proposed a variant – posits and valids – which have some of the advantages of unums whilst being easier to implement. In the afternoon, NZ ICT professional of the year, Victoria MacLennan spoke on the continuing issue of retaining women in STEM subjects, followed by Ralph Hignam on the technologies for early detection of breast cancer. Finally, to wrap up the conference, there were two keynotes, one from Professor Michael Kelly of Cambridge University on the manufacturing computational devices and the relationship with exascale computing, followed by ARM’s former head of architecture on the development of that reduced instruction set computing (RISC) architecture for computer processors.

This short ‘blog post can only giving passing justice to the enormous scope, depth of detail, and general importance of this small conference. From the university perspective the contacts, insights, and connections made here are essential for our own development and awareness. One can only look forward to future Multicore World’s and the results on the many initiatives that are announced at these events.

Posted in Uncategorized | Leave a comment

Otago University Visit 2017

A visit to the University of Canterbury was conducted on Feburary 16, 2007. Like the University of Canterbury, Otago University now almost entirely has outsourced its HPC facilities to the NESI national facilities, although there is (small, aging) departmental clusters, an argument for local installations for real-time processing of streaming-data. As with the visit to the University of Canterbury, encouragement has been given for the institutions (or NESI) to take up the offer of shipping the recently decommissioned (but still Top 500 machine), Avoca, across The Ditch from The West Island.

Otaga University has made several impressive contributions to high performance computing in past years, most recently with the paper Efficient Selection Algorithm for Fast k-NN Search on GPUs presented to the 2015 IEEE 29th International Parallel and Distributed Processing Symposium. There are strong hopes (and plans afoot) for the University of Melbourne and the University of Otago to collaborate further on fundamental algorithms for GPU processing.

Special thanks are given to the members of the various New Zealand facilities who took their time to accommodate my visit and provide tours of their facilities. This includes Dave Eyers and Jim Cheetham at Otago University.

Posted in Uncategorized | Leave a comment

University of Canterbury Visit 2017

A visit to the University of Canterbury was conducted on February 15, 2017. The University of Canterbury used to have its own impressive collection of HPC facilities. Alas, much of that has now been decommissioned (although Popper is still operational) with users largely moved to the national facilties, coordinated by NESI and hosted at NIWA and the University of Auckland respectively. The NIWA system is a 3200+ core P575/POWER6 system running AIX, whereas the University of Auckland system is a 6,000+ core system running Linux with over 40 GPU devices. Both use Infiniband as their interconnect (DDR for NIWA, QDR for AU). Canterbury is, however, heavily involved in the QuakeCORE project, building a national network of leading New Zealand Earthquake Resilience Researchers.

The main issues confronting HPC at the University of Canterbury are familiar; the cost of operating such facilities, the level of user education required, and almost paradoxically, their necessity for the processing of large datasets. With regards to the first issue significant interest was expressed in “the Melbourne model” where larger computational resources were made available on a limit budget through use of RDMA over Converged Ethernet and use of cloud resources for single-node multinode jobs. With regards to the second issue the University is promoting a staged approach where users can move gradually from the problems of an overwhelmed desktop system to using cloud resources, to using testing HPC on the cloud, to using smaller departmental HPCs systems, to moving to the peak facilities.

Special thanks are given to the members of the various New Zealand facilities who took their time to accommodate my visit and provide tours of their facilities. This includes Dan Sun, Sung Bae, Daniel Lagrava, and Francois Bissey at the University of Canterbury.

Posted in Uncategorized | Leave a comment

Barcelona Centro Nacional de Supercomputacion Visit 2016

The Centro Nacional de Supercomputacion (BSC/CNS) is the peak national HPC facility in Spain and is home to MareNostrum (1.1 Pflops, 48,896 Intel Sandy Bridge processors in 3,056 nodes, including 84 Xeon Phi 5110P in 42 nodes, and 2 PB of storage; 29th system in the top500 in June 2013). They also have Mino Tauro, a heterogeneous GPGPU cluster. MareNostrum is not the most powerful system in the world, but it is the most beautiful. It is housed in the Chapel Torre Girona, a 19th century (deconsecrated) church.

The BSC/CNS has an extensive PhD and Masters programs (with the Polytechnic University of Catalonia), internship, and diverse training programme with PRACE, including programming, performance analysis, data analytics, and HPC systems administration. The Centre has a very active outreach programme, encouraging regular visits to their data centre, as well as an extensive training and lecture series.
Mare Nostrum
The visit to the Centre was carried out with Research Platforms and NeCTAR and was advertised as part of the Severo Ochoa Research Seminar Lectures. After the lectures, we had an extensive discussion on the state and distribution of IaaS cloud deployments, and the internship programme that the BSC offers with other similar institutions, followed by a tour of the Torre Girona data center. We were also treated to a very memorable lunch at a local Catalan restaurant, Restaurante Pati Blau. It was a fine way to conclude the 2016 tour of European HPC facilities.

Special thanks are given to the members of the various European facilities who took their time to accommodate my visit and provide tours of their facilities. This includes (my deepest apologies for names I’ve overlooked!) Vassil Alexandrov, Maria-Ribera Sancho, Fabrizio Gagliardi, Javier A. Espinosa Oviedo at the Barcelona Supercomputing Centre

Posted in Uncategorized | 1 Comment

OpenStack Summit Barcelona 2016

Several members of the Research Platforms team, as well as members of the NeCTAR Reseach Cloud, attended the OpenStack Summit in Barcelona from Sunday October 23 to Friday October 28th. OpenStack is big enough to have major conferences, “summits”, twice year since 2010, correlating with an OpenStack software release. The Barcelona Summit was held at Centre de Convencions Internacional de Barcelona Plaça (CCIB) and consisted of over 5000 attendees, almost 1000
organisations and companies, and 500 sessions, spread out over three days, plus one day of “Upstream University” prior to the main schedule, plus one day after the main schedule for contributor working parties. It coincided with the release of “Newton”.

Keynote from openstack.org

The previous release of OpenStack, “Mitaka”, concentrated on integration and management. This included developing a client to have a common command-line across all projects. “Newton” included a very long list of incremental updates, as well as improvements in security, container support, and networking. There were significant improvements in security, including encrypted credentials in Keystone. There was strong development between the the Neutron network project and Kuryr container networking project. Every system that implements Neutron API can now be used for container networking. Full support for Neutron networking within the OpenStack client. Integration between Ironic bare-metal deployment with Magnum container orchestration for containers.

For members of the Research Platforms team, there was special interest with the presentations on the HPC stream. The University had a presented accepted on “Spartan”, a HPC – Cloud Hybrid: Delivering Performance and Flexibility, which attracted a great deal of attention, with an enthusastic reaction of audience members: Why isn’t everyone doing this?” , was the first question (answer: They will!). Representatives from Red Hat wanted to know when we were going to visit the U.S., and met with a Research Platforms member on Monday October 31st to discuss the application of the model to other very large installations – in other words, Spartan achieved the purpose of being a small reseach cluster-cloud hybrid which could be expanded.

The conference also was timed with the release of a new OpenStack publication edited by Stig Telfer, The Crossroads of Cloud and HPC: OpenStack for Scientific Research (Open Stack, 2016)

Posted in Uncategorized | Leave a comment

Centre Informatique National de l’Enseignement Supérieur (CINES) Visit 2016

An visit to the Centre Informatique National de l’Enseignement Supérieur (CINES) in Montpellier was taken on Friday 21st of October.

The Centre Informatique National de l’Enseignement Supérieur (CINES) is a peak public research facility in France, based in Montpellier and employing approximately sixty people (engineers, technical, and administrative staff). It was founded originally in 1999 as part of CNUSC (Centre National Universitaire Sud de Calcul), itself created in 1980, and is administrated and funded by Ministry of Higher Education & Research (MESR). CINES is part of the Grand Equipement National de Calcul Intensif (GENCI) which implements the national strategy for equipping the National Tier-1 datacenters in HPC resource (i.e., CCRT (C.E.A.)‫‏‬., IDRIS (C.N.R.S.)‫‏‬., and CINES (M.E.S.R.)

CINES facility

The computer rooms of CINES take up some 1400m^2, with two power lines (ERDF) providing a maximum of 12.5 MW. This infrastructure is used by a world-class supercomputer with several petabytes of storage capacity, and 2 x 10 Gbit national network attachements. The two main systems are Cristal for pre/post processing (13.1 TFLOPS, 224 core plus GPGPUs), and Occigen (2.1 Pflops, 50,554 Intel Haswell, Infiniband, Lustre for scratch and Panasas for home). Occigen is currently being expanded to include another 1260 nodes and 35280 cores with Intel Broadwell.

CINES uses Slurm for workload management. However unlike many other facilities, users do minimal resource allocations. Rather they are required to submit time, nb_nodes or nb_tasks. Usual submission information (e.g., partition) is not set by users but rather by job_submit plugin. Emphasis is orientated towards large (2400-16800 cores), short (0 to
24 hour) jobs. THe system is highly available (98.7% in February 2016) and busy (91.6% utilisation).

The operations and development staff at CINES are currently working on “virtual quotas”, an in-house development for automatic management of storage spaces, which checks the usage rate (number of files, volume, age, etc) before making decisions on archiving. This is will be integrated with Slurm; in case of overflow running jobs can complete, new job submissions are rejected.

Special thanks are given to the members of the various European facilities who took their time to accommodate my visit and provide tours of their facilities. This includes (my deepest apologies for names I’ve overlooked!): Gérard Gil, Jean-Christophe Penalva, Johanne Charpentier, and Nicole Audiffren at CINES

Posted in Uncategorized | Leave a comment

European Organization for Nuclear Research (CERN) Visit 2016

An visit to the European Organization for Nuclear Research (CERN) in Geneva was taken on Thursday 20th of October, an event that was announced on CERN’s website, with a programme of prsentations.

CERN has it’s own large data centre at Meyrin with two 100GBe connections with Wigner Physics Research Centre in Budapest. In particular since the production release of CERN’s OpenStack in July 2013 CERN have been making increasingly heavy use of cloud compute operations. Currently there are 190K cores in production under OpenStack, and >90% of computer resources are virtualised. An additional >100K cores to be added in the next six months, with a view towards increasing federation with other cloud providers. Data is expecting to record 400PB/year by 2023 with the High Luminosity LHC upgrade (Facebook is currently about 180PB, Google’s search DB about 100PB).
CERN data centre
For computer services, universal resource provisioning layer for bare metal, containers and VMs. Opensource HTCondor as the single end user interface, moving from proprietary LSF. There are 113k CPU cores and increasing for high-throughput batch service with fair-share policies. High Perfomance Computing is used for MPI applications (Theory Lattice QCD, accelerator physics, beam simulation, plasma simulations, CFD), currently lxbatch resources (SLC6) running MPI. Planning New Theory QCD Infiniband cluster (72 Quanta E4 16 core / 64Gb, Infiniband), Dedicated (recent) clusters for TE plasma simulations (16 core Quanta) and Windows Engineering HPC service (Ansys, Comsol etc, IT-CDA)

Scientific Linux 5 and 6 are still supported; SL 6.8 released May 2016 and SL5 security updates to 2019. CERN Centos 7 has been introducted since April 2016; some additional CERN CentOS software. About 800 licenses for RHEL. Staged updates and internal snapshots for CERN Centos with a workflow integrated with ticketing / gitlab and QA testing of repositories enabled before production.

Special thanks are given to the members of the various European facilities who took their time to accommodate my visit and provide tours of their facilities. This includes (my deepest apologies for names I’ve overlooked!): Edward Karavakis, Tim Bell, Gavin McCance, Caroline Lindqvist, Philippe Ganz, Nils Høimyr at CERN

Posted in Uncategorized | Leave a comment