This question may sound like "why do we use window functions", but it's not. So here it's:
What we currently do: If we have a function $g(t)$ and its Fourier transform $G(f)$, and a window $W(f)$ and its time domain version $w(t)$, where $t$ is time and $f$ is frequency. The convolution theorem says that convolving $g(t)$ and $w(t)$ is equivalent to multiplying the spectra of each of them. So, to apply a low-pass filter, we ideally apply a square filter at some cut-off frequency. But this doesn't work, because then $w(t)$ has to be infinitely long, because the Fourier transform of the the square window I have to apply is an infinitely long sinc. So to get rid of this sinc, we apply the window to minimize the side-lobes.
This whole story is fine, but here's my problem: This whole story is based on that we're doing all this in the time domain, and hence we use convolution. But consider this (which I know is not an innovation, but I want to understand why that doesn't work): I take my function $g(t)$, calculate its Discrete Fourier transform $G(f)$, multiply it by my perfect square window and filter the spectrum (or in other words, remove the frequencies that I don't like), and then do the reverse transform. What's the problem with that?