19

I'm plotting some geolocations (lon, lat) of tweets collected using the Stream API with a 7mi. radius. The picture below shows two maps of the same data, but using different alpha/transparency for each observation.

Looking at the map on the left, I noticed that the geolocations follow mostly a sort of imaginary "grid" with specific nodes. Anyone could help me to understand why this pattern surfaces?

enter image description here

Of course, I do have roundings in the coordinates, eg:

"loc: 42.7388,13.1798"         "loc: 42.6252,13.2948"         "loc: 42.6008,13.293"          "loc: 42.73,13.2028"          
"loc: 42.66918468,13.27893702"

but my question is why these roundings occur since I assumed that the Twitter API would guarantee the same level of precision?

Taras
  • 32,823
  • 4
  • 66
  • 137
Dambo
  • 289
  • 2
  • 10
  • 4
    It could either be the precision of the tweeter's GPS or other location data, or the twitter client only sending to a certain precision. Can you see what client sent the tweet? Is the grid 1/10 of a degree? Or is it a precise number of metres? Hmmmm – Spacedman Jan 19 '18 at 18:23
  • 1
    I didn't not scrape the data but they were collected using the Twitter API. The thing is some of the points are tracked at a high level of precision, some are rounded to the 4th decimal (hence the "grid" pattern). I was wondering why there is such a volatility in Twitter data though. The accuracy is definetely higher close to cities, and less in rural areas. I wonder if anyone ever had the same issues when dealing with twitter geolocation. – Dambo Jan 19 '18 at 19:08
  • 2
    It doesn't seem like Twitter provides only four decimal points, at least not a while back: https://gis.stackexchange.com/questions/119923/converting-twitter-tweets-into-points?rq=1 – underdark Jan 19 '18 at 19:23
  • 2
    It seems based on the version of the software that is used, and on the device: https://help.twitter.com/en/safety-and-security/tweet-location-settings – JGH Jan 19 '18 at 19:26
  • 1
    @underdark I'm not saying it does, otherwise, there wouldn't be any difference between the two maps. But it looks like many records are being rounded. I added a sample of my data to show that I do in fact have records with different degrees of accuracy – Dambo Jan 19 '18 at 19:26
  • 2
    they are being aggregated using a fishnet tessellation, or something similar. – atxgis Jan 19 '18 at 20:26
  • 2
    @atxgis any idea about who does the aggregation? Is the device that cannot communicate with a level of accuracy that is high enough? – Dambo Jan 19 '18 at 21:52
  • 2
    There is an option when tweeting to share precise location which is turned off by default. Can this be the reason for the rounding? – Techie_Gus Apr 01 '18 at 20:59
  • 1
    @Techie_Gus is that the same as for sharing the location? How is "precise" defined? – Dambo Apr 05 '18 at 02:53
  • 1
    @Dambo I didn't get your first question. As for the second I don't have data to compare precise location Vs. not precise. – Techie_Gus Apr 05 '18 at 05:55
  • 1
    this could be linked with some privacy issues. If it is too precise you can find the address of everybody. – radouxju Nov 06 '18 at 14:53
  • 2
    I'm surprised they have that many decimals. I also think privacy. With the data you have, you can generate a pretty nice looking heat map. You could also add a small amount of random jitter to your points before making the heat map. Of course, depends on what your goal is. – user1269942 Nov 10 '18 at 20:03
  • 1
    @user1269942 I was interested in understanding tweets' precision, and it turned out that ca. 30% is rounded to 4 decimals – Dambo Nov 11 '18 at 00:46
  • 1
    @user1269942 Given a large enough data set, the average of truly-random random offsets is zero. In such a scheme you could e.g. find someone's home address by looking for the largest cluster (in terms of point count) and averaging the cluster's point locations. – Alex Hajnal Nov 11 '18 at 22:08
  • 1
    @AlexHajnal good point but that is still possible to an extent even with the "rounded" locations, at least in areas that are not densely populated. I wonder if privacy was really a concern, shouldn't we see measures way less precise than 4 decimals? – Dambo Nov 11 '18 at 23:09
  • 1
    @Dambo Valid point. If you're going to provide location data at all you're going to have trade-offs. Honestly, even rounding to the hemisphere level could cause privacy problems, e.g. someone from Mumbai claiming to be in London on business but posting from a beach in Hawai'i. – Alex Hajnal Nov 11 '18 at 23:17

1 Answers1

4

I have looked at tweets to some extent. But not in your area.

But I suspect this is down to two types of tweets.

Actual tweets that fall into two categories. The correct geolocation with specific precision (see the tweets in the urban area), and town level geolocation (see the stacks in the urban area).

Then you have automated tweets. These are the ones that fall on the grids. For example for the data I am capturing:

enter image description here

The tweet highlighted above the black lines are all from an automated flood alert system.

I would check your data first to see if there are any trends.

HeikkiVesanto
  • 16,433
  • 2
  • 46
  • 68
  • 1
    You mean those are tweets from the same system but from different locations? What would the purpose be? I'll take a look and get back to you, I remember thinking about what you mentioned, but I actually have different levels of rounding (not just either 4 or 9 decimals). That suggests rounded geolocations are not necessarily produced from discrete geotags. – Dambo Nov 13 '18 at 17:10
  • 1
    I'm suggesting there is a twitter bot that is tweeting the gridded locations. For example in my tweets its a flood alert bot. These are geolocated flood alerts, but the locations fall along a grid. – HeikkiVesanto Nov 13 '18 at 18:06
  • Yes, I got what you mean. I was just wondering whatfor... – Dambo Nov 14 '18 at 21:48
  • If I was creating a bot I would probably do it to 4 decimal places as well. Otherwise you are suggesting a level of accuracy that just cannot be attained if you are not on the ground. 4 digits is down to around 10 meters, https://gis.stackexchange.com/questions/8650/measuring-accuracy-of-latitude-and-longitude. – HeikkiVesanto Nov 14 '18 at 22:00
  • Again, I get that, but what I am trying to say is: why would you need a bot that mimics different locations for tweeting flooded-related content. What can the goal possibly be? – Dambo Nov 14 '18 at 23:15