13

I have a database of GPS points. There aren't any tracks, only points. I need to calculate some value for every 100 meters, but sometimes GPS gave a wrong coordinates that lies far from real GPS points, and instead of calculating values for a small square, I have to calculate it for a really big rectangular area.

What is the best algorithm to filter wrong GPS points?

I made a screenshot to help understand:

![http://content.screencast.com/users/smirnoffs/folders/Jing/media/94624331-db6a-4171-bed9-e2183f953a1d/gps_error.png]

whuber
  • 69,783
  • 15
  • 186
  • 281
smirnoffs
  • 233
  • 2
  • 5
  • 1
    I'd use a small multiple of the moving frame (say 10 last points) average distance between points as the criterion to detect such outliers. – lynxlynxlynx Aug 07 '12 at 12:02
  • Can you describe your method more detailed? I have a database of points, they are not sorted in any kind. So the distance could be 2 meters or 500 meters. But some of points are very far. I made a screenshot to help you understand – smirnoffs Aug 07 '12 at 12:51
  • 2
    I see. In this case my approach is not so good. I would instead calculate the nearest neighbouring point for each point and then shave off the outliers there. – lynxlynxlynx Aug 07 '12 at 13:09
  • 2
    The second approach suggested by @lynx would work well with the sample data, especially when the outlier detection method is a good one. See questions about outliers on our stats site for options. For instance, many creative (and valid) approaches are suggested at http://stats.stackexchange.com/questions/213/. – whuber Aug 07 '12 at 15:16

3 Answers3

3

THis might help to get a list of the outliers:

SELECT p1.point_id 
FROM p1 AS points, p2 AS points
WHERE p1.point_id <> p2.point_id AND
ST_Distance(p1.geom, p2.geom) > 10000

Here, point_id would be the primary key in your points table. The distance function will find points where the nearest is greater than 10000 meters. (You can, of course, put any value appropriate)

If the above works, then change to a DELETE statment, something like:

DELETE FROM points WHERE point_id IN (
-- SELECT as above
SELECT ....
);
Micha
  • 15,555
  • 23
  • 29
  • 1
    Maybe I did not understand. From your image, I see that almost all points are clustered in one area, and a very small number are very far away. Is that not the problem? If a point is only 150 meters away from another, how do you know it's an outlier? – Micha Aug 08 '12 at 05:24