Is it something like the sensor captures a circular image and then what we get is the cropped version?
Something like this:

Or did I get it completely wrong?
Is it something like the sensor captures a circular image and then what we get is the cropped version?
Something like this:

Or did I get it completely wrong?
You are correct.
Photos are the size and shape they are just because of the size and shape of the film or digital sensor used to capture the image from the lens. The rest of the image that falls to the top, bottom, and sides of the film or sensor are just not recorded.
It's the way you've illustrated. The only exceptions are circular fisheye lenses, which have it other way around. (E.g. you have a full 180° circle in center of your photo, and everything else is dark. See pictures from Peleng 8mm Fisheye.)
The lens will always produce a circular image - the sensor however is rectangular, and only captures the portion of the image falling on it.
With some "digital optimised" lenses (be they third party or something like Canon's EF-S mount), the circle is much smaller, as they tend to be optimised for an APS-C sized sensor. Conversely, you can get Tilt/Shift lenses which generate a much larger circle of an image to allow for the shifting.
As seen from your illustration, some of image circle always gets wasted by missing the imaging area. You still have to pay for and carry the glass that creates the unseen parts; a lens with rectangular image would create an ugly blocky bokeh. At least those cropped parts are the lowest quality parts of image (same quality that you have in the rectangular image's corners).
The square image you get from 6x6 medium format cameras is probably the closest to maximum utilization of image circle you can get (not counting lenses with undersized image circles, such as circular fish-eyes or lenses designed for smaller sensors, and for those square is most efficient use of imaging area).
Not only is the film/sensor in a camera rectangular, but so are our display mediums: LCDs (and their pixels) and photo paper (mostly because their printers are designed around rectangles). Not to mention photo albums. We live in a very orthogonal society.
So, somewhere in the photographic chain, you're going to have to turn that circle into a rectangle.
As it happens, it's cheapest to do that high up the chain: both film and digital sensors are more economical when produced as rectangles. Having circular versions of those would pretty much be a waste, and would be more expensive to boot.