People use data compression to save storage space and also for improving data transfer speed. We define Compressed I/O as the method of compressing a block of data and then writing the compressed image to storage. For reading, it reads in the compressed image and then uncompress it. We define Uncompressed I/O as the straight forward method of I/O without compression. We want to investigate the factors that will affect the throughput of Compressed I/O to be faster than that of the Uncompressed I/O.
We used a machine with a 1.8GHz processor and ran our tests first writing to a local disk and then to an NFS mounted disk. In order to simulate different levels of compressability of a buffer, we'd fill the buffer with random data (using the /dev/urandom device) which is not compressable and then overwrite a contiguous part of the buffer with null values.
We'd write the buffer to disk without compressing it first, record the time spent writing, and then write the buffer to disk compressing that buffer before each write, recording the time it took to do that. From this data, we could calculate the throughput for each write.
As you can see from the graphs, when writing to a local disk, unless the data is highly compressable (0-10% of the original size), there isn't much to be gained from compressing the data before writing it to disk. The real benefit comes from writing to an NFS mounted file. In that scenario, we see better throughput for almost every level of data compressibility.
We plan to implement the Compressed I/O method as a feature in HDF5. Applications may invoke this method when they want to access data files reside on slow storage such as network disk or remote file servers.