I have a table which contains support tickets. Each ticket has a title and many of them have common phrases in them (XXX cannot connect to YYY, XXX requires a new YYY, XXX requires upgrading, etc. etc.)
I would like to do some analysis on the data and extract these common phrases along with a count of their usage.
Repeated character sequences would be acceptable, but ideally I would want to split the title into words and identify repeated sequences of words. Here is a simple example:
Title
------------------------------------------------
John cannot connect to SERVER01
The backup service cannot connect to the client
Mary cannot connect her laptop to the network
From this data I would like to be able to extract the following:
Phrase Count
-------------------------------------------------- -------
cannot connect 3
cannot connect to 2
I have a plan to do this using a c# application and brute force, but can anyone suggest a more elegant solution?
varcharfield. At the time I didn't even thought of doing it by querying; instead I extract the contents of the field, parse the data, count and remove duplicates on a '5-time' rule (less than 5 times doesn't count) and finally remove non related words (a, the, from,, etc). Would like to see if there's a way of doing it with SQL. – Nelson Apr 08 '15 at 11:14'to', 3,'to the', 2and'the', 3, right? – ypercubeᵀᴹ Apr 08 '15 at 12:52