Unsure how LZSS compression is assigning dictionary references

Question

I am having trouble understanding how this LZSS compression assigns dictionary match references. This particular LZSS compression algorithm has a dictionary space of 4096 bytes that are assigned 0 to start. The first index used is 4078, and the match length is 3 to 18. Each data chunk is a flag byte followed by 8 objects. An object is a 1 byte literal or two byte reference/length pair.

The flag byte determines whether the next object will be a literal or a reference/length. As an example, 0xEF (0b11101111) means the fifth object will be a dictionary reference. All other objects will be literals.

For the reference/length pair, 0XECF0 means the dictionary reference is 0xFEC and the length is 3 (0+3).

This set of data:

00 0A 7C 8D 00 00 00 03 00 5B

Compresses into

EF 00 0A 7C 8D EC F0 03 00 5B

This 100% makes sense to me. The 0xFEC, 0xFED, and 0xFEE all contain zeros in the dictionary.

The next sets of bytes to be encoded are

00 5C 00 5D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 E0

Which it compresses into

CF 00 5C 00 5D DC FF E5 F7 01 E0

Why is it DC FF E5 F7 (reference 0XFDC length 18, reference 0xFE5 length 10) and not DD FF 0D 07? I thought LZSS was supposed to work backwards. The encoder is ignoring the 0 stored at reference 0XFEE for the first 18 zeros. I have seen other circumstances where chains of 18 0s are previously referenced. I have even used someone else’s code and got the same result I did.

UPDATE

This algorithm is completely baffling. The 0xB36 byte to be encoded is 18 0s and it pulls it from 0x1DF to 0x1F0. The very next bytes to be encoded (0XB48) is 5 0s. It goes all the way back 0XFE9 to 0xFEE to pull the 5 zeros. Where the dictionary references are pulled from seem completely arbitrary.

mimak · Answer 1 · 2023-08-04T14:51:58.623

I reproduced your case in Golang:

package main
import (
    "bytes"
    "fmt"
&quot;github.com/bovarysme/lzss&quot;

)
func compressAndDisplay(name string, input []byte) {
    buffer := new(bytes.Buffer)
    writer := lzss.NewWriter(buffer)
    fmt.Println(fmt.Sprintf("%s:", name))
    fmt.Println(fmt.Sprintf("%x", input))
    writer.Write(input)
    writer.Close()
    // get rid of the null-termination byte
    hex := buffer.Bytes()
    hex = hex[0 : len(hex)-1]
    fmt.Println(fmt.Sprintf("%s compressed:", name))
    fmt.Println(fmt.Sprintf("%x", hex))
}
func main() {
    input := []byte {0x00, 0x0A, 0x7C, 0x8D, 0x00, 0x00, 0x00, 0x03, 0x00, 0x5B}
    compressAndDisplay("1st part", input)
    input = []byte {0x00, 0x5C, 0x00, 0x5D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0xE0}
    compressAndDisplay("2nd part", input)
}

It produces the same output for the first buffer:

EF 00 0A 7C 8D EC F0 03 00 5B

And, funnily enough, yet another sequence for the second buffer:

CF 00 5C 00 5D DD FF 03 07 01 E0

I'm no expert in compression algorithms, but this seems like an implementation-dependent behaviour (although the result only differs on a single byte from what you expected).

Unsure how LZSS compression is assigning dictionary references

1 Answers1