Big Data File Transfer

MeerKAT serves not only as a precursor for the telescope but also for engineering tools needed for the global SKA at large. The ability to efficiently and reliably transfer the very big data coming from the telescope is one of those engineering challenges.

The so-called File Transfer Service (FTS) was originally developed at CERN, the international nuclear physics laboratory based in Switzerland and France. The service aims to provide a flexible solution for handling big data transfers. It offers a variety of interfaces, including a graphical web interface, and is able to transfer data using a variety of data communication protocols.

It is important to ensure that transferred data makes it, in its entirety, to a destination and is not corrupted. Checks are implemented for this purpose. One of the ways to increase data throughput is to transfer the data in parallel, meaning through several connections simultaneously.

FTS monitors the state of transfers and will manage any errors. There is an optimiser built into FTS, which automatically manages the number of connections on paths to try to maximise throughput while avoiding overloading the link.

In the context of MeerKAT, FTS was used to ship data from the MeerKAT archive based in the Centre for High Performance Computing (CHPC) data centre in Cape Town to IDIA and from IDIA to ASTRON in the Netherlands.

It was demonstrated that more than 90% of the 10G? link capacity between the CHPC and IDIA could be utilized but that long-distance transfers – i.e., connections with a high bandwidth-delay product – required a large number of parallel transfer connections to work optimally.

Over 90% bandwidth utilization can be achieved, on average, when transferring datasets with medium (8MB – 1GB) and large (2GB – 8GB) file sizes. This demonstrates the feasibility of the IDIA implementation of a FTS system for the SKA regional science data centres.