There's massive literature for sampling schemes for spatial interpolation. Here's some thoughts:
On 1: yes they will generate different empirical variograms and different kriged maps. But if the underlying data is independent of your sampling scheme then the outputs should be in agreement within standard errors. But if your data are somehow related to your sampling scheme (maybe you have geological features running along that transect such that you sample the same thing lots of times) then all bets are pretty much off.
On 2: "better" depends on a multitude of things. Firstly, what are your criteria for "better"? Prediction error at some set of points? Integrated prediction error over regions? Inference on possible covariates? Better predictive probabilities of some threshold exceedence?
On 3: Yes, you can use both sampling strategies, combine the points, and do a single interpolation. More data always helps.
General opinion on sampling schemes is: structure your scheme to reflect what you think the underlying spatial correlation range is likely to be; have a mix of close points (to get a handle on short-range correlation) and far points (for long-range correlation); random sampling is generally better on most measures than sampling along transects except things like "time taken to do survey" or "fuel cost to survey sites". Transect sampling is often a convenience measure because a researcher can walk in a straight line taking samples rather than randomly wander (and get lost...).
R makes it easy to investigate this - generate an underlying raster (like the one in your image) and try various sampling schemes, generate variograms, compare, generate predictions, compare within standard errors. Play.