3

I know $ is used to check if a line end follows in a Java regular expression.

For the following codes:

String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$?", "$1");
System.out.println(test_domain);

The output is:

http://www.google.com
line2
line3

I assume that the pattern (\\.[^:/]+).*$? matches the first line, which is http://www.google.com/path, and the $1 is http://www.google.com. The ? makes a reluctant match (so matches the first line.)

However, if I remove the ? in the pattern and implement following codes:

String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);

The output is:

http://www.google.com/path
line2
line3

I think it should give out the result http://www.google.com

  1. (\\.[^:/]+) matches http://www.google.com
  2. .*$ matches /path\nline2\nline3

Where is my misunderstanding of the regex here?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
chrisTina
  • 2,298
  • 9
  • 40
  • 74

2 Answers2

2

You have a multiline input and trying to use anchor $ in your regex for each line but not using MULTILINE flag. All you need is (?m) mode in front of your regex:

String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);

This will output:

http://www.google.com
line2
line3

RegEx Demo

Without MULTILINE or DOTALL modes your regex: (\.[^:/]+).*$ will fail to match the input due to presence of .*$ since dot will not match newlines and $ (end of line) is present after 2 newlines.

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Your regex does not match the input string. In fact, $ matches the end of string (at the end of line3). Since you are not using an s flag, the . cannot get there.

NOTE! that the $ anchor - even without Pattern.MULTILINE option - can match a position before the final line feed char, see What is the difference between ^ and \A , $ and \Z in regex?. This can be easily tested with "a\nb\n".replaceAll("$", "X"), resulting in "a\nbX\nX", see this Java demo.

More, the $ end of line/string anchor cannot have ? quantifier after it. It makes no sense for the regex engine, and is ignored in Java.

To make it work at all, you need to use s flag if you want to just return http://www.google.com:

String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?s)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);

Output of this demo:

http://www.google.com

With a multiline (?m) flag, the regex will process each line looking for a literal . and then a sequence of characters other than : and /. When one of these characters is found, the rest of characters on that line will be omitted.

    String test_domain = "http://www.google.com/path\nline2\nline3";
    test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1");
    System.out.println(test_domain);

Output of this IDEONE demo:

http://www.google.com
line2
line3
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563