How to Get the Second Sequence of Characters using Regular Expressions

Question

I have a field with an address in it (ex. 68 TIDAL BREEZE DR) and this regular expression gets everything before the last sequence (ex. 68 TIDAL BREEZE):

substring (address from '(.*) ')

My question is, how do I modify this expression to get everything after the first sequence (ex. 68) and everything before the last sequence (ex. DR) like so: TIDAL BREEZE?

I'm using PostgreSQL 9.5.

The title ('Second Word in an Attribute') asks for something else than the body of the question ('everything after the first group (ex. 68) and everything before the last group'). You might also define "group". Sequence of word characters? non-space characters? Separator is always a single space or something else? A couple more examples help if you should have a hard time to express the problem clearly. — Erwin Brandstetter, Mar 15 '17 at 18:33

score 1 · Accepted Answer · answered Mar 15 '17 at 12:46

1

postgres=# select substring('68 TIDAL BREEZE DR' from '\s+(.*)\s');
  substring
--------------
 TIDAL BREEZE
(1 row)

Lazy match the first whitespace character to chop the first bit off.

answered Mar 15 '17 at 12:46

Philᵀᴹ

31,762
10
83
107

how would I use this same approach to get the prefix of an address ex. 'W' from 'W HOLLYWOOD BLVD'? – Matt Mar 15 '17 at 13:57
select substring('W HOLLYWOOD BLVD' from '(.+?)\s'); – Philᵀᴹ Mar 15 '17 at 13:59
The regexp is pretty smart, I see no corner case or possible improvement. (For the question in the body of the Q.) – Erwin Brandstetter Mar 15 '17 at 18:37

score -1 · Answer 2 · edited Apr 13 '17 at 12:42

-1

Not using a regex

I highly suggest you don't parse addresses with a regex. If you're looking for a better practice, there is a method of parsing addresses demonstrated and explained in this answer

SELECT * FROM standardize_address(
  'us_lex','us_gaz','us_rules',
  '68 TIDAL BREEZE DR, Citytown Texas, 77346'
);
 building | house_num | predir | qual | pretype |     name     | suftype | sufdir | ruralroute | extra |   city   | state | country | postcode | box | unit 
----------+-----------+--------+------+---------+--------------+---------+--------+------------+-------+----------+-------+---------+----------+-----+------
          | 68        |        |      |         | TIDAL BREEZE | DRIVE   |        |            |       | CITYTOWN | TEXAS | USA     | 77346    |     | 
(1 row)

Or to get just certain values

SELECT house_num || ' Other Stuff ' || suftype AS whatever
FROM standardize_address('us_lex', 'us_gaz', 'us_rules',
  '68 TIDAL BREEZE DR, Citytown Texas, 77346'
);
       whatever       
----------------------
 68 Other Stuff DRIVE
(1 row)

edited Apr 13 '17 at 12:42

Community

1

answered Mar 15 '17 at 15:33

Evan Carroll

63,051
46
242
479

how would I substitute '68 TIDAL BREEZE DR, Citytown Texas, 77346' with a column name that contained the following format for all addresses: '68 TIDAL BREEZE DR'? – Matt Mar 15 '17 at 16:35
@Matt I don't think you can parse just street addresses with it. You'd either have to give them a fictitious location (as I did), or this method wouldn't work. That said, if you've got it -- provide it. If not, this method may not work for you here. – Evan Carroll Mar 15 '17 at 16:40
I guess my question is, all of the examples I see using the standardize_address function are all using strings like '68 TIDAL BREEZE DR, Citytown Texas, 77346'. If I did add a new field to my table and concatenate it together with a bogus zip code and city name, how would I then use the column name in the 'from' statement instead of a string? – Matt Mar 15 '17 at 16:52
SELECT a.* FROM georgetown.addresses CROSS JOIN LATERAL parse_address(address || ', Citytown Texas, 77346') AS a; – Evan Carroll Mar 15 '17 at 17:07

How to Get the Second Sequence of Characters using Regular Expressions

2 Answers2

Not using a regex