So far I have not experimented with audio steganography, but anyway, maybe these ideas are of some use for you.
Adding echoes to a signal is pretty easy as this process just adds a delayed version of the signal back to itself. You probably want to use a gain factor $f_\text{gain}$ for the additional signal that is smaller than one, $f_\text{gain} \leq 1$, to avoid having an echo that is louder than the original sound. In DSP terms this process corresponds to FIR filtering with an impulse response that consists of a delta impulse at time zero ($\delta[n]$) and a time-shifted and weighted delta impulse:
$$
\mathbf{b} = \delta[n] + f_\text{gain}\cdot \delta[n-L_\text{delay}],
$$
with $L_\text{delay}$ the delay time in samples
and
$$
\mathbf{a} = 1
$$
because it's an FIR filter. To get multiple echoes simply use more than one delayed delta impulses.
For detecting hidden echoes in the signal, maybe it's worth a try whether you can achieve satisfactory performance with an adaptive filtering algorithm. For example, the (normalized) least mean-squares ([N]LMS) algorithm comes to my mind. I'm sure you can find implementations in a number of programming languages to download from the internet. Using the signals both as input signal and desired signal to the adaptive filtering algorithm would lead to a self-cancelling behaviour. Maybe, from analyzing the filter coefficients, you can find out about whether there are echoes in the signals and, if so, what the delay times are.
If you experiment with the Cepstrum method described in the web page that the link in your question points, take care to have your signal blocks long enough to capture the echoes.