[MAINIMAGE]
 
  Welcome
  Community
  Documentation
  Download
  Machines
 
Projects 
For Developers 
Dev Plans 
Bug Tracking 
Feature Requests 
Mailing Lists 
Viz Tools 
  Showcase
  Links
 



Quick Subscription:
News Mailing List


more Mailing Lists
  Search CactusCode:


GridFtpVFD-HDF5: GridFtp Virtual File Driver for the HDF5 library

Cactus uses the HDF5 (Hierarchical Data Format version 5) as its standard I/O library and binary file format for the output of multi-dimensional datasets. Cactus output files generated by the corresponding HDF5 I/O methods can then be imported by any standard visualization program which supports the HDF5 file format.

For the OpenDX Visualization Toolkit we provide an OpenDXutils package with various import modules to read datasets from local HDF5 files into an OpenDX network. One of these modules, ImportHDF5, is also prepared to import datasets from HDF5 datafiles which are located on remote FTP servers. It uses the GridFtp Virtual File Driver of the HDF5 library to transparently access HDF5 files over the network. This effectively enables you to visualize remote datasets directly, without having to stage them to a local filesystem first.

The following sections describe how to obtain the GridFtp driver, how to build an HDF5 installation with it, and how to use it for remote HDF5 file access.

Downloading and Installing

The GridFtp driver is contained in the GridFtpVFD-HDF5 package which can be downloaded here.
Note that HDF5 itself is distributed under the NCSA license (see the file COPYING in a standard HDF5 distribution for details) whereas the GridFtpVFD-HDF5 package is licensed under the GPL General Public License (see the file COPYING.GridFtpVFD in the package for details). Mainly because of these software licensing issues, the GridFtp driver has not been made part of the official HDF5 distribution. Instead its source code is contained in the GridFtpVFD-HDF5 package as a patch to be applied to a standard HDF5 source code distribution. The patch file in the current version of the GridFtpVFD-HDF5 package was created for HDF5 version 1.6.0.

In order to build an HDF5 installation with GridFtp Virtual File Driver support you should follow the steps below. As a prerequisite, you will also need to have Globus with the globus_ftp_client library installed on your machine (please refer to the Globus Toolkit installation pages for details).
The environment variable GLOBUS_LOCATION must be set to point to your Globus installation directory. The environment variable GLOBUS_FLAVOR may also be set to specify a valid globus build flavor which should be used to build the GridFtp driver - if no flavor is specified the HDF5 configure script will automatically pick an appropriate one.

  # get the standard HDF5 source distribution and the GridFtpVFD-HDF5 package
  # and unpack them in some scratch directory
  tar xzf hdf5-1.6.0.tar.gz
  tar xzf hdf5-1.6.0.GridFtpVFD-HDF5.tar.gz

  # apply the patch to the standard HDF5 source distribution
  cd hdf5-1.6.0
  patch -p1 < ../hdf5-1.6.0.GridFtpVFD.patch

  # build and install as usual
  configure --enable-gridftp-vfd [any other configure options as needed]
  make
  make install
This will create an HDF5 installation with the GridFtp driver built-in.

Other software can now link against the HDF5 library and access remote HDF5 files via the GridFtp Driver API (see section below). In addition, the HDF5 tools library is prepared to use the GridFtp driver already so that the utility tools built with the distribution can also be used to examine remote HDF5 files directly, eg.:

  h5ls   ftp://origin.aei.mpg.de:20/tmp/phi.h5
  h5dump gsiftp://194.94.224.100/tmp/phi.h5
In general, remote HDF5 files are uniquely identified by a URL of the following form:
  <protocol>://<FTP server name or IP address>[:<port>]/<full pathname of HDF5 file on that server>
where protocol is either ftp (for standard FTP using a login name and password to connect to the FTP server) or gsiftp (for secure FTP using a Globus certificate proxy with optional subject name and user account information).
The remote FTP server can be specified either by its hostname or IP address, appended by an optional port number to connect to (default port is 20 for standard FTP and 2811 for GSIFTP).

Available GridFtp servers

In order to serve individual requests to datasets in remote HDF5 files, the FTP daemon running on the remote FTP server has to provide commands for partial remote file access.

One server which implements such extended FTP operations is the wuftpd server with GridFTP patches, as distributed with the Globus toolkit. This server offers extended get/put commands which, besides the filename, take an offset and a length as parameters to access a contiguous sequence of bytes in a remote file space. Higher-level operations such as reading an individual dataset in a remote HDF5 file are then executed as a sequence of such raw remote reads. The drawback of this simple data protocol is that hyperslab requests which access non-contiguous regions in the remote file space (eg. due to downsampling) may result in a large number of individual raw remote read transactions, increasing the overall latency of the single HDF5 read call.

SFTPD-HDF5 is another, more efficient implementation of an FTP server which can be used for remote HDF5 file access. It is based on the Globus Striped Ftp Server (SFTPD) and has been enhanced by a plugin which can execute HDF5 read requests directly (without translating them into lower-level transactions).
More information on the SFTPD-HDF5 server can be found here.

GridFtp Driver API

A user chooses the virtual file driver to be used to perform I/O on files via file access property lists.
To select the GridFtp driver for a given file access property list, the routine H5Pset_fapl_gridftp() must be called. This routine takes a pointer to a structure which contains additional file access properties specific for the GridFtp driver (excerpt from the header file H5FDgridftp.h):
/* GridFtp driver-specific file access properties */
typedef struct H5FD_gridftp_fapl_t
{
  const char *user;                  /* user login name                      */
  const char *password;              /* user login password                  */
  const char *account;               /* user account information             */
  const char *subject;               /* subject information                  */
  globus_ftp_control_mode_t mode;    /* file transfer mode                   */
  unsigned int num_streams;          /* number of streams for extended block */
                                     /* file transfer mode                   */
  int debug;                         /* enable/disable debug plugin module   */
} H5FD_gridftp_fapl_t;


/* prototypes of GridFtp driver API functions */
H5_DLL herr_t H5Pset_fapl_gridftp (hid_t fapl_id,
                                   const H5FD_gridftp_fapl_t *fapl);
H5_DLL herr_t H5Pget_fapl_gridftp (hid_t fapl_id,
                                   H5FD_gridftp_fapl_t *fapl /*out*/ );
The user and password arguments are used for FTP-protocol based connections whereas account and subject may be used for GSIFTP access. "ftp" and "anonymous" are the defaults for FTP access, and NULL both for the user account and subject information for GSIFTP (the GridFtp library routines will then obtain this information automatically from the proxy).
The file transfer mode determines the control mode to be used by the GridFtp library routines to access the remote file. The default mode is set to GLOBUS_FTP_CONTROL_MODE_STREAM (defined in the Globus header file "globus_ftp_control.h").
In the GLOBUS_FTP_CONTROL_MODE_EXTENDED_BLOCK mode it is possible to bundle mulitple stream sockets into a single FTP connection. The num_streams parameter specifies the number of such streams to be used.
The debug flag toggles the activation of the debug plugin for the GridFtp client library. When this flag is enabled, the debug plugin will log all FTP traffic between the client and the FTP server to stdout. By default (flag = 0) the debug plugin is deactivated.

Support and Copyright

Thomas Radke is the author and maintainer of the GridFtpVFD-HDF5 package. The development work has been supported by the Deutsches Forschungsnetz Verein through the GriKSL project under contract TK 602 - AN 200.

Please report bugs and send any comments to the maintainer of the GridFtpVFD-HDF5 package.

The software in the GridFtpVFD-HDF5 package is available under the GNU General Public License. Please refer to the files README.GridFtpVFD and COPYING.GridFtpVFD in the GridFtpVFD-HDF5 package for the full copyright notice and distribution conditions.
In addition to the conditions in the GNU General Public License, the author strongly suggests using this software for non-military purposes only.

Last modified: $Header: /cactus/CactusWebSite/VizTools/GridFtpVFD-HDF5.html,v 1.3 2004/01/05 11:38:01 tradke Exp $
      

Cactus Webmaster Last Modified: Monday, 05-Jan-2004 06:20:03 CST