ImportHDF5

Category

Import and Export

Function

Imports datasets from an external HDF5 data file.

Syntax

result, max_index = ImportHDF5 (filename, origin, thickness, stride, index,
                                reopen, single_precision, user, password,
                                subject, num_streams, vectorimport);

Inputs
Name Type Default Description
filename string (none) filename or URL of the HDF5 file to import datasets from
origin integer list or vector NULL lower-left corner grid point of the slab to read
thickness integer list or vector NULL thickness in grid points of the slab to read
stride integer list or vector NULL include every <stride>-th grid point in the slab to read
index integer or string 0 index or name of the dataset to import
reopen flag 0 reopen file on each execution
single_precision flag 1 import double precision floating-point datasets as single precision
user string (none) user name for standard Ftp authentification during remote file access
password string (none) password for standard Ftp authentification during remote file access
subject string (none) subject name for GSI authentification during remote file access
num_streams integer 1 number of parallel streams to use during remote file access
vectorimport flag 0 import the dataset as a vector array

Outputs
Name Type Description
result field the dataset imported as a field with regular positions and connections
max_index integer largest possible dataset index

Functional Details

The module imports data from a single dataset of an HDF5 datafile and provides it as a field with a position-dependent data component described on a regular grid.

The imported data can comprise the full N-dimensional dataset in the file or just a slab of it. A slab is an orthogonal subregion within the full dataset, potentially with less than N dimensions, and defined by its origin, thickness, and stride parameters. If the vectorimport flag is set, the last dimension of the dataset is treated as a vector index, and the dataset is imported as (N-1)-dimensional array (or less, if a slab is selected), with each element being a vector.

There might be several datasets in an HDF5 datafile which are identfied by a unique index starting from 0. The ImportHDF5 module imports data from one dataset at a time, selected by its index or name.

The datafile can be either an HDF5 file on a local filesystem or a remote HDF5 file streamed via a live socket connection or provided by a remote GridFtp server. Each HDF5 datafile must satisfy the following conditions:

  1. there must exist at least one dataset in the file

  2. datasets must contain either integer or floating-point data (ie. the data class of the datasets' data type must be H5T_INTEGER, H5T_FLOAT, or H5T_COMPOUND with two H5T_FLOAT membertypes to represent complex data)

    Since OpenDX fields require their data component to be either TYPE_INT, TYPE_FLOAT, or TYPE_DOUBLE, the ImportHDF5 module silently ignores any other types of datasets found within the HDF5 datafile.

    Depending on a dataset's floating-point precision the data component in the created OpenDX field will have a data type of either float (for single-precision data) or double (for double-precision data). Double-precision data can also be converted on-the-fly into single precision data during the import.

  3. a dataset should have attached to it two attributes named origin and delta

    The origin and delta attributes must be vectors of N floating-point numbers where N denotes the number of dimensions of the dataset. Their values should describe the dataset's underlying sample space as a regular grid.

    The ImportHDF5 module uses the attributes's values as N-dimensional regular arrays in order to construct the positions and bounding box components for the imported field. If a dataset has no origin or delta attribute, the module will use a vector with default values of N zeros or N ones repectively.

    Note: The connections component for the imported field will be constructed from the actual dimensions of the selected slab to read (not from the dataset's dimensions). If a slab was selected with a thickness of 1 in a given dimension the connections between points in that dimension will be eliminated (thus effectively decreasing the connection component's dimensions and changing the imported field's connections type accordingly).

For efficiency reasons, the ImportHDF5 module will keep the HDF5 datafile open until a new filename was given or a flag was set to reopen the current file for the next execution.

When an HDF5 datafile is opened, the module browses through its contents in order to build a list of all floating-point datasets available in the file. Individual datasets can then be addressed by their index into that list. Additionally, if datasets have a time attribute attached to them (as is usually the case for a time series HDF5 datafile) the list will be sorted by their values. The value of the largest possible dataset index is available on the max_index output tab of the module.

On each module's execution, the data from a single dataset, as selected either by its index or name, is read from the HDF5 datafile - either completely as a full dataset or as a slab according to the origin, thickness, and stride slab parameters as explained below.

In addition to the four data, positions, connections, and bounding box components, the imported field will also get a copy of all the attributes attached to the selected dataset, with the same values, data type, and dimensions as their original. Additionally, if no name attribute existed in the selected dataset the ImportHDF5 module will create a string attribute with the name of the dataset as found in the HDF5 datafile and also add it to the imported field.

filename

A required string parameter to specify the name of the HDF5 file to import datasets from.

The HDF5 file can be

  1. a file on a local file system

    A local file is identified by its filename, optionally prefixed by a pathname. If the name contains an absolute path, the file will be loaded from there, otherwise the module will start searching for the file in the current working directory.

    Note: At the moment the DXDATA environment variable is not used, and the directories listed there are not searched for the given file.

  2. a remote HDF5 file streamed via a socket connection

    Streamed HDF5 files are identified by "<host>:<port>" where host is the hostname or IP address of the machine to receive the streamed HDF5 file from, and port is the TCP port to connect to on that host.

    Note: In order to support data import from streamed HDF5 files, the ImportHDF5 module must to be compiled and linked against an HDF5 installation which has the Stream Virtual File Driver built in.

  3. a remote HDF5 file located on a GridFtp server

    Remote HDF5 files on a GridFtp server are identified by a URL "{ftp|gsiftp}://<host><pathname>" where host is the hostname or IP address of the GridFtp server, and pathname is the absolute pathname of the HDF5 file on that server. User authentification on the GridFtp server is done according to the prefix used in the URL by following the standard Ftp protocol (using a user name and clear-text password), or the GSI protocol (using a GSI proxy and subject name) respectively.

    Note: In order to support HDF5 data import from remote GridFtp servers, the ImportHDF5 module must to be compiled and linked against an HDF5 installation which has the GridFtp Virtual File Driver built in.

origin

An optional parameter to specify the origin of a slab to read from a requested dataset. This parameter must be a list of integers or a vector of integer elements, giving the coordinates in grid points of the slab's start positions within the dataset. The positions are counted starting at 0 and must be in the range [0, dimsi-1] where dimsi is the size of the dataset in dimension i.

For each dataset dimension, a separate position coordinate can be specified; 0 is taken as the default coordinate for unspecified dimensions; if origin has more elements than the number of dataset dimensions, exceeding elements are ignored and a warning message is printed. If origin is given as NULL (which is the default value for this parameter) then the slab's origin is taken as all zeros (ie. the slab starts at the dataset's origin).

thickness

An optional parameter to specify the thickness of a slab to read from a requested dataset. This parameter must be a list of integers or a vector of integer elements, giving the slab's size in grid points. The values for thickness must be in the range [0, dimsi-origini] where dimsi is the size of the dataset in dimension i, and origini is the slab's origin in that dimension.

For each dataset dimension, a separate thickness can be specified; if thickness has more elements than the number of dataset dimensions, exceeding elements are ignored and a warning message is printed. For unspecified dimensions, if thicknessi is given as zero, or if thickness is given as NULL (which is the default value for this parameter), the slab's thickness defaults to dimsi-origini.

stride

An optional parameter to specify downsampling factors for a slab to read from a requested dataset. The parameter must be a list of integers or a vector of integer elements, giving the number of grid points to move in each dimension to get to the next grid point to be included in the slab. The values for stride must be >= 1 so that at least one grid point will be included.

For each dataset dimension, a separate stride can be specified; 1 is taken as the default stride for unspecified dimensions; if stride has more elements than the number of dataset dimensions, exceeding elements are ignored and a warning message is printed. If stride is given as NULL (which is the default value for this parameter) then the slab's stride is taken as all ones (ie. the slab includes thicknessi grid points).

index

A required integer parameter to specify the index of the dataset to import, or a string parameter to specify the name of the dataset to import. If index is an integer, its value must be in the range [0, max_index] where max_index is the total number of datasets found in the HDF5 file, minus 1. This value is available on the max_index output.

reopen

An optional hidden flag parameter to specify whether the HDF5 file as given by filename should be reopened at each module execution (disabled by default).

The reopen flag should be set for HDF5 files which contents might change during time (as is usually the case for streamed HDF5 files).

single_precision

An optional hidden flag parameter to specify whether double-precision floating-point datasets should be converted into single precision during the import (enabled by default).

The single_precision flag must be reset in order to preserve the floating-point precision during an import of double-precision datasets.

user

An optional hidden string parameter to specify the user name for standard Ftp authentification during remote file access to a GridFTP server. If no user name is specified, ftp (for the anonymous user) will be used.

password

An optional hidden string parameter to specify the password for standard Ftp authentification during remote file access to a GridFTP server. The password must be given in clear text. If no password is specified, anonymous (for the anonymous user) will be used.

subject

An optional hidden string parameter to specify the subject name for GSI authentification during remote file access to a GridFTP server. If no subject name is specified, the subject name of the user's GSI proxy will be used.

num_streams

An optional hidden integer parameter to specify the number of parallel streams to use during remote file access. By default only a single stream will be used for each connection to a GridFtp server.

vectorimport

An optional hidden flag parameter to specify whether the dataset should be imported as a vector array (diabled by default)

The vectorimport flag must be set in order to preserve dimension during an import of vector datasets (e.g., position, velocity, ... etc.).

See Also

Import, Slab

Example Visual Programs

SlabViz.net

The example program is contained in the net/ subdirectory of the OpenDXutils package. Please make sure that you change into this directory before running the program because it uses an HDF5 sample data file located relative to that directory.

Further Documentation

The ImportHDF5 data import module is provided by the OpenDXutils package. More information about this package can be found on the Cactus Code visualization page for OpenDX.

Last modified: $Header: /cactus/CactusWebSite/VizTools/ImportHDF5.html,v 1.6 2006/09/12 08:52:32 tradke Exp $