as far as I can estimate that, several processing steps would be neccesary to achieve what you are looking for. In a first step, you should preprocess your land-only raster and throw away any channels and pixel values, to be able to convert it to a simple extents-shapefile without killing your PC in a second step:
gdal_calc.py --calc=1 --outfile=land-only-extents.tif -A land-only.tif
gdal_polygonize land-only-extents.tif land-only.shp
What you want, is the difference of land-only-extents and the extents of land-and-water.tif. The reason is, if you use gdal_warp with your land only extents like you proposed, you will get your land-only.tif – again. Therefore you also need the extents of your land-and-water.tif. If this is a rectangle, you can use gdaltindex as you already mentioned. If not, use the same process again. Having that, there exists the possibility to calculate the difference of both with ogr2ogr. This source indicates how to do that, and for your case it should look something like that (untested):
ogr2ogr difference.shp land-and-water.shp -dialect SQLite -sql "SELECT ST_Difference(land-and-water.geometry, land-only.geometry) AS geometry FROM land-and-water, 'land-only.shp'.land-only"
Some further information on ogr sql can be found here. This should give you a clipping layer that should work somehow like that:
gdalwarp -cutline difference.shp -dstalpha land-and-water.tif water-only.tif
If you think nodata-values might be a problem, or maybe have to fight with coordinate systems, you can always set that with gdal_translate:
gdal_translate -a_srs <source srid> -s_srs <dest srid> -a_nodata <nodata value> input_raster.tif output_raster.tif
Please let us know if that helped or worked for you.