It's relatively straightforward to do using librosa:
If your beeps are a known signal, then correlating the beep with the signal and picking peaks works reasonably well:
audio, fs = librosa.load(file, mono=True)
beep, _ = librosa.load('beep.wav', sr=fs)
clicks = []
correlation = signal.correlate(audio, beep, mode='valid', method='fft')
correlation /= np.max(np.abs(correlation), axis=0)
peaks = librosa.util.peak_pick(correlation,
pre_max=512,
post_max=512,
pre_avg=512,
post_avg=512,
delta=0.5,
wait=2048)
ticks = librosa.samples_to_time(peaks, sr=fs)
for t in ticks:
print(f'beep {t:0.3f}')
If your beep isn't a known signal then librosa's note onset detection is an alternative - it's less accurate than the correlation approach but works reasonably well on a signal with clearly defined transitions. If the beeps are at regular intervals then you can use beat tracking:
audio, fs = librosa.load(file, mono=True)
onset_env = librosa.onset.onset_strength(y=audio, sr=fs, aggregate=np.median)
tempo, frames = librosa.beat.beat_track(onset_envelope=onset_env, sr=fs)
beats = librosa.frames_to_time(frames, sr=fs)
for beat in beats:
print(f'beat @ {beat:0.3f}')
If the beeps aren't at regular intervals then use the note onsets:
audio, fs = librosa.load(file, mono=True)
onset_env = librosa.onset.onset_strength(y=audio, sr=fs, aggregate=np.median)
frames = librosa.onset.onset_detect(onset_envelope=onset_env, sr=fs)
onsets = librosa.frames_to_time(frames, sr=fs)
for onset in onsets:
print(f'note onset @ {onset:0.3f}')
```