That's really three questions, which breaks the Question/Answer model, but if you rewrite the question to only ask the third question the the answer could be:
All vector data is inherently georeferenced (only imagery can be
unreferenced). It might not be in a known reference system (and
therefore unusable with other data), but there's an origin and
scaling factor in play. Exactly which coordinate reference system is
being used can be inferred from a file with the same name as the
shapefile, but an extension of .prj. It's not part of the shapefile
specification, but is usually present. If it isn't present, then
there are various brute force methods for determining what it might be
(starting by looking at the range of values in the file), but that's
a whole number of other questions.
There is no way to know the
scale at which vector data was collected and processed, unless that
information is passed as metadata (data about data). For shapefiles, Esri uses a
.shp.xml suffix as the standard location (that's not part of the
shapefile spec either). Without one there is no way of knowing the
processing history of the data without speaking to every person who
may have altered it (even with one, you still can't really know if
some unlisted changes have occurred; from there depends on a chain of
trust in good data hygiene).
Simple restriction (making a subset) does
not change data in any way, and therefore does not change the scale
of the data. It is only coordinate processing which can actually
alter scale (usually only for the worse, not better). Data does
not suddenly start showing details not previously present without
supernatural involvement (most mapping systems use multiple datasets
at different scales to handle this sort of magic).
But the best solution would be @q-shofwan-Muhammad move 2 questions to new question – Brad Nesom Nov 02 '13 at 15:22