Questions tagged [clustering]

Partitioning data points into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels.

Overview

Clustering, or cluster analysis, is a statistical technique of uncovering groups of units in multivariate data. It is separate from classification (clustering could be called "classification without a teacher"), as there is no units with known labels, and even the number of clusters is usually unknown, and needs to be estimated. Clustering is a key challenge of data mining, in particular when done in large databases.

Although there are many clustering techniques, they fall into several broad classes: hierarchical clustering (in which a hierarchy is built from each unit representing their own cluster up to the whole sample being one single cluster), centroid-based clustering (in which are units are put into the cluster nearest to a specific centroid), distribution- or model-based clustering (in which clusters are assumed to follow a specific distribution, such as multivariate Gaussian), and density-based clustering (in which clusters are obtained as the areas of the highest estimated density).

608 questions
11
votes
1 answer

Spatial analysis and clustering near features

Im working on marking behaviour in group living animals and im interested in how marking behaviour is affected by certain characteristics of the neighbouring groups. I have plotted the territories of each group from 95% density isopleths created…
9
votes
1 answer

Natural neighborhoods terminology

Sometimes the hardest part of an analysis is knowing what something is called. What are some R packages, but more importantly, terminology I should be searching for to define contiguous city neighborhoods based on home price changes - basically a…
2
votes
1 answer

creating clusters of point data

I have a map of stores for a retailer. I am trying to form clusters (of these stores) such that each member of a cluster is at most 5km away from any other member and any non-member is at least 10 km from any member. I also would like to create a…
umut
  • 37
  • 1
  • 2
0
votes
0 answers

Reference article for k-mean clustering on multiple TIFF files

I have multiple TIFF files which represent different variables. some of them are interpolated soil data some of them are satellite imagery, some of them are DEM, Slope, TWI. I used all of them together to run k-mean clustering to design management…
0
votes
0 answers

Create hierarchical school clusters

I have a CSV containing UK County data for schools. For each school I have it's location, name, size (number of pupils), and phase (primary/secondary/tertiary). I am trying to cluster the schools - associating the many smaller primary schools with…
strangecharm
  • 109
  • 10
0
votes
1 answer

ST_ClusterDBSCAN in Bigquery, how to get a list of points for each cluster?

The algorithm DBSCAN should be able to group points together in clusters based on their proximity. However, the example in GCB Bigquery has only the number of clusters. WITH Geos as (SELECT 1 as row_id, st_geogfromtext('point empty') as geo UNION…
Thadeu Melo
  • 103
  • 2
0
votes
2 answers

How to cluster points based only on location

I have profile data points which lie almost on a line (s. picture below) Is there an easy way / plugin to add an attribute to each point which states to which profile line it belongs? It should not be very difficult from a mathematical point of view…
Mathiaas
  • 59
  • 4