扫描gzip数据,可以生成下面这个结构,用来存储随机access point,一个index包含了这个段数据在gzip文件中的cmp_offset,以及他在解压后的数据中的ucomp_offset,还有一个bits,用来处理gzip数据是非byte对齐的情况,最后就是一个data部分,是这段数据对应的字典,当我想解压一段数据的时候,只需要根据要读取的未压缩的offset范围找到对应的 access point,然后通过cmp_offset字段来读取压缩文件对应位置并解压。
/* Access point entry. */
struct point {
off_t out; /* corresponding offset in uncompressed data */
off_t in; /* offset in input file of first full byte */
int bits; /* number of bits (1-7) from byte at in-1, or 0 */
unsigned char window[WINSIZE]; /* preceding 32K of uncompressed data */
};
z_stream内部数据结构
typedef struct z_stream_s {
z_const Bytef *next_in; /* next input byte */
uInt avail_in; /* number of bytes available at next_in */
uLong total_in; /* total number of input bytes read so far */
Bytef *next_out; /* next output byte will go here */
uInt avail_out; /* remaining free space at next_out */
uLong total_out; /* total number of bytes output so far */
z_const char *msg; /* last error message, NULL if no error */
struct internal_state FAR *state; /* not visible by applications */
alloc_func zalloc; /* used to allocate the internal state */
free_func zfree; /* used to free the internal state */
voidpf opaque; /* private data object passed to zalloc and zfree */
int data_type; /* best guess about the data type: binary or text
for deflate, or the decoding state for inflate */
uLong adler; /* Adler-32 or CRC-32 value of the uncompressed data */
uLong reserved; /* reserved for future use */
} z_stream;
The Z_BLOCK option assists in appending to or combining deflate streams. To assist in this, on return inflate() always sets strm->data_type to the number of unused bits in the last byte taken from strm->next_in, plus 64 if inflate() is currently decoding the last block in the deflate stream, plus 128 if inflate() returned immediately after decoding an end-of-block code or decoding the complete header up to just before the first byte of the deflate stream. The end-of-block will not be indicated until all of the uncompressed data from that block has been written to strm->next_out. The number of unused bits may in general be greater than seven, except when bit 7 of data_type is set, in which case the number of unused bits will be less than eight. data_type is set as noted here every time inflate() returns for all flush options, and so can be used to determine the amount of currently consumed input in bits.
z_stream的data_type中低三位是存放最后一个字节中有多少未使用的bit,如果正在decoding上一个block,那么会+64,如果是正在decoding end-of-block或者是一个gzip stream的header部分都会+128。我们只有在确定是一个完整的block的时候,才添加一个index point,所以一般代码都会这么写,确认是一个index point,这个时候才可以是一个index point。
if ((strm.data_type & 128) && !(strm.data_type & 64) &&
(totout == 0 || totout - last > span)) {
index = addpoint(index, strm.data_type & 7, totin,
totout, strm.avail_out, window);
if (index == NULL) {
ret = Z_MEM_ERROR;
goto deflate_index_build_error;
}
last = totout;
}
构建Index的过程如下:
解压的过程: