I have a worksheet where column A has various names in varying formats:
A1 John Smith
A2 Jones, Mary
A3 Sally Gomez
A4 The Gonzalez family
Column B has similar data:
B1 The Smith Family Trust
B2 Bob and Mary Jones
B3 Blackwell, John
B4 Luz Gonzalez
I would like to identify the instances where the same last name is found in column A and column B. In the examples above, the formula, if placed in column C, would result in
C1 TRUE (because "Smith" is found in both A1 and B1)
C2 TRUE (because "Jones" is found in both A2 and B2)
C3 FALSE (because there are no common words between A3 and B3)
C4 TRUE (because "Gonzalez" is found in both A4 and B4)
Is this even possible?
- 31
4 Answers
Given your comments as well as your question, it seems you want to return TRUE if any word in one phrase matches a word in the adjacent phrase. One way to do this is with a User Defined Function (VBA). The following excludes any words that are in arrExclude, which you can add to as you see fit. It will also exclude any characters that are not letters, digits or spaces, and any words that consist of just a single character.
See if this works for you.
Another option would be take a look at the free fuzzy lookup add-in provided by MS for excel versions 2007 and later.
To enter this User Defined Function (UDF), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=WordMatch(A1,B1)
in some cell.
EDIT2: Find Matches segment changed to see if it works better on Mac
Option Explicit
Option Base 0
Option Compare Text
Function WordMatch(S1 As String, S2 As String) As Boolean
Dim arrExclude() As Variant
Dim V1 As Variant, V2 As Variant
Dim I As Long, J As Long, S As String
Dim RE As Object
Dim sF As String, sS As String
'Will also exclude single letter words
arrExclude = Array("The", "And", "Trust", "Family", "II", "III", "Jr", "Sr", "Mr", "Mrs", "Ms")
'Remove all except letters, digits, and spaces
'remove extra spaces
'Consider whether to retain hyphens
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "[^A-Z0-9 ]+|\b\S\b|\b(?:" & Join(arrExclude, "|") & ")\b"
.Global = True
.ignorecase = True
End With
With WorksheetFunction
V1 = Split(.Trim(RE.Replace(S1, "")))
V2 = Split(.Trim(RE.Replace(S2, "")))
End With
'Find Matches
If UBound(V1) <= UBound(V2) Then
sS = " " & Join(V2) & " "
For I = 0 To UBound(V1)
sF = " " & V1(I) & " "
If InStr(sS, sF) > 0 Then
WordMatch = True
Exit Function
End If
Next I
Else
sS = " " & Join(V1) & " "
For I = 0 To UBound(V2)
sF = " " & V2(I) & " "
If InStr(sS, sF) > 0 Then
WordMatch = True
Exit Function
End If
Next I
End If
WordMatch = False
End Function
EDIT: Here is a screenshot of the results, using both your original examples, and also the examples you gave in a comment below where you indicated you were having a problem.

- 8,673
The most difficult part of this exercise is determining what, in column A, constitutes a last name. In your example, it's either:
- The first word, if there's a comma in the whole name
- The second word
If that rule is true, then you can just do a formula like this:
=NOT(ISERROR(FIND(last_name, B1:B4)))
The formula to actually determine the last name is a little more complex. You essentially have to figure out what character positions the spaces are in, and then pull the letters in between. There's a good explanation on this thread:
- 143
-
Unfortunately there is no real pattern to what the last name is, as there are examples like "Bob and Mary Smith" or "Bob L Smith Et Al" or "Smith Bob L and Mary" with no comma.
But this gives me a direction to head in. Thank you.
– candez May 02 '15 at 09:04 -
1Actually in most cases, a TRUE value any time any single word is repeated, could work. It wouldn't be perfect, as it would incorrectly return a TRUE value when comparing "Bob and Mary Smith" and "John and Linda Jones," or "Bob L Smith" and "Jones, Bob" as TRUE. But in the majority of cases it could work. – candez May 02 '15 at 09:14
-
That isn't applicable to this problem. The solution you referenced is for an exact match of the entire cell contents. The problem here is for a match of any word that is a portion of the cell. – fixer1234 May 03 '15 at 20:07
Highlight both columns > conditional formatting (home tab) > highlight cell rules > duplicate values.
This will highlight all duplicates in both columns.
Make sure you are highlighting the columns and not the cells.
- 1,704
- 4
- 16
- 28
-
Doesn't this compare the entire cell contents rather than finding a matching word that is a portion of each cell? – fixer1234 May 03 '15 at 20:10
-
The other challenge with this is that I can't use the results in a formula. Ultimately I would like to create a SUMIF formula based on the TRUE or FALSE result in column C. – candez May 07 '15 at 05:36
#VALUE!error is being returned by UDF. – Ron Rosenfeld May 07 '15 at 10:49Is this what you are looking for? In the first row, the UDF should have returned TRUE, and in the second row FALSE.
I can also upload a screen shot if that would be more helpful. (and thank you for your help)
– candez May 08 '15 at 07:50Find Matchessegment withApplication.Matchbehaving differently in the two versions. I will edit my answer to provide a different method (using Regular Expressions) to check for matches, and see if that works better on your machine. – Ron Rosenfeld May 09 '15 at 11:07