GridFtpVFD-HDF5: GridFtp Virtual File Driver for the HDF5 library
Cactus uses the HDF5 (Hierarchical
Data Format version 5) as its standard I/O library and binary file
format for the output of multi-dimensional datasets.
Cactus output files generated by the corresponding HDF5 I/O methods can
then be imported by any standard visualization program which supports the
HDF5 file format.
For the OpenDX Visualization Toolkit we
provide an OpenDXutils package with various import
modules to read datasets from local HDF5 files into an OpenDX network.
One of these modules, ImportHDF5,
is also prepared to import datasets from HDF5 datafiles which are located on
remote FTP servers. It uses the GridFtp Virtual File Driver of the HDF5
library to transparently access HDF5 files over the network.
This effectively enables you to visualize remote datasets
directly, without having to stage them to a local filesystem first.
The following sections describe how to obtain the GridFtp driver, how to build
an HDF5 installation with it, and how to use it for remote HDF5 file access.
Downloading and Installing
The GridFtp driver is contained in the GridFtpVFD-HDF5 package which can be downloaded here.
Note that HDF5 itself is distributed under the NCSA license (see the
file COPYING in a standard HDF5 distribution for details)
whereas the GridFtpVFD-HDF5 package is licensed under the GPL
General Public License (see the file COPYING.GridFtpVFD in the package
for details).
Mainly because of these software licensing issues, the GridFtp
driver has not been made part of the official HDF5 distribution.
Instead its source code is contained in the GridFtpVFD-HDF5 package
as a patch to be applied to a standard HDF5 source code distribution.
The patch file in the current version of the GridFtpVFD-HDF5 package was
created for HDF5 version 1.6.0.
In order to build an HDF5 installation with GridFtp Virtual File Driver support
you should follow the steps below. As a prerequisite, you will also need to
have Globus with the globus_ftp_client library installed on your
machine (please refer to the
Globus Toolkit installation pages for details).
The environment variable GLOBUS_LOCATION must be set to point to your
Globus installation directory. The environment variable GLOBUS_FLAVOR
may also be set to specify a valid globus build flavor which should be used to
build the GridFtp driver - if no flavor is specified the HDF5 configure script
will automatically pick an appropriate one.
# get the standard HDF5 source distribution and the GridFtpVFD-HDF5 package
# and unpack them in some scratch directory
tar xzf hdf5-1.6.0.tar.gz
tar xzf hdf5-1.6.0.GridFtpVFD-HDF5.tar.gz
# apply the patch to the standard HDF5 source distribution
cd hdf5-1.6.0
patch -p1 < ../hdf5-1.6.0.GridFtpVFD.patch
# build and install as usual
configure --enable-gridftp-vfd [any other configure options as needed]
make
make install
This will create an HDF5 installation with the GridFtp driver built-in.
Other software can now link against the HDF5 library and access remote HDF5
files via the GridFtp Driver API (see section below).
In addition, the HDF5 tools library is prepared to use the GridFtp driver
already so that the utility tools built with the distribution can also be used
to examine remote HDF5 files directly, eg.:
h5ls ftp://origin.aei.mpg.de:20/tmp/phi.h5
h5dump gsiftp://194.94.224.100/tmp/phi.h5
In general, remote HDF5 files are uniquely identified by a URL of the following
form:
<protocol>://<FTP server name or IP address>[:<port>]/<full pathname of HDF5 file on that server>
where protocol is either ftp (for standard FTP using
a login name and password to connect to the FTP server) or gsiftp (for
secure FTP using a Globus certificate proxy with optional subject name and
user account information).
The remote FTP server can be specified either by its hostname or IP address,
appended by an optional port number to connect to (default port is 20 for
standard FTP and 2811 for GSIFTP).
Available GridFtp servers
In order to serve individual requests to datasets in remote HDF5 files,
the FTP daemon running on the remote FTP server has to provide
commands for partial remote file access.
One server which implements such
extended FTP operations is the wuftpd server with GridFTP patches,
as distributed with the Globus toolkit. This server offers extended get/put
commands which, besides the filename, take an offset and a length as
parameters to access a contiguous sequence of bytes in a remote file space.
Higher-level operations such as reading an individual dataset in a remote HDF5
file are then executed as a sequence of such raw remote reads.
The drawback of this simple data protocol is that hyperslab requests which
access non-contiguous regions in the remote file space (eg. due to
downsampling) may result in a large number of individual raw remote read
transactions, increasing the overall latency of the single HDF5 read call.
SFTPD-HDF5 is another, more efficient
implementation of an FTP server which can
be used for remote HDF5 file access. It is based on the Globus Striped Ftp
Server (SFTPD) and has been enhanced by a plugin which can execute HDF5 read
requests directly (without translating them into lower-level transactions).
More information on the SFTPD-HDF5 server can be found
here.
GridFtp Driver API
A user chooses the virtual file driver to be used to perform I/O on files
via file access property lists.
To select the GridFtp driver for a given file
access property list, the routine H5Pset_fapl_gridftp() must be called.
This routine takes a pointer to a structure which contains additional file access
properties specific for the GridFtp driver (excerpt from the header file
H5FDgridftp.h):
/* GridFtp driver-specific file access properties */
typedef struct H5FD_gridftp_fapl_t
{
const char *user; /* user login name */
const char *password; /* user login password */
const char *account; /* user account information */
const char *subject; /* subject information */
globus_ftp_control_mode_t mode; /* file transfer mode */
unsigned int num_streams; /* number of streams for extended block */
/* file transfer mode */
int debug; /* enable/disable debug plugin module */
} H5FD_gridftp_fapl_t;
/* prototypes of GridFtp driver API functions */
H5_DLL herr_t H5Pset_fapl_gridftp (hid_t fapl_id,
const H5FD_gridftp_fapl_t *fapl);
H5_DLL herr_t H5Pget_fapl_gridftp (hid_t fapl_id,
H5FD_gridftp_fapl_t *fapl /*out*/ );
The user and password arguments are used for FTP-protocol
based connections whereas account and subject may be used
for GSIFTP access. "ftp" and "anonymous" are the defaults for FTP
access, and NULL both for the user account and subject
information for GSIFTP (the GridFtp library routines will then obtain this
information automatically from the proxy).
The file transfer mode determines the control mode to be used by the GridFtp
library routines to access the remote file. The default mode is set to
GLOBUS_FTP_CONTROL_MODE_STREAM (defined in the Globus header
file "globus_ftp_control.h").
In the GLOBUS_FTP_CONTROL_MODE_EXTENDED_BLOCK mode it is
possible to bundle mulitple stream sockets into a single FTP connection. The
num_streams parameter specifies the number of such streams to be used.
The debug flag toggles the activation of the debug plugin for the
GridFtp client library. When this flag is enabled, the debug plugin will log
all FTP traffic between the client and the FTP server to stdout.
By default (flag = 0) the debug plugin is deactivated.
Support and Copyright
Thomas Radke is the author and
maintainer of the GridFtpVFD-HDF5 package. The development work has been supported
by the Deutsches Forschungsnetz Verein through
the GriKSL project under contract TK 602 -
AN 200.
Please report bugs and send any comments to the
maintainer of the GridFtpVFD-HDF5 package.
The software in the GridFtpVFD-HDF5 package is available under the GNU General
Public License. Please refer to the files README.GridFtpVFD and
COPYING.GridFtpVFD in the GridFtpVFD-HDF5 package for the full
copyright notice and distribution conditions.
In addition to the conditions in the GNU General Public License,
the author strongly suggests using this software for non-military purposes
only.
Last modified: $Header: /cactus/CactusWebSite/VizTools/GridFtpVFD-HDF5.html,v 1.3 2004/01/05 11:38:01 tradke Exp $
|