I have a java program that is supposed to read a file from a URL (the URL location is a virtual directory under an IIS web site; below, and in my initial testing, I'm treating it like any other file system location). Unfortunately the path to all the files that need to be read includes a pound sign (#) in one of the directory names, and there’s nothing I can do to change that. The program works beautifully when (as a test) I point it to a location that doesn't have that pound sign in the path.
I started by creating a URL from a string passed to the program. For a file path like /Documents/#2012/09/11 (where Documents is a Windows share), I could get the program to process successfully if I passed it a path like this on my command line:
file://serverIPaddress/Documents/\%232012/09/07/16/DOC4671179.DOC
That is, with the pound sign manually encoded as %23, and a back slash escaping the % of the %23.
There was just one line to get that URL:
URL url = new URL(filePath); // filePath is passed in
But the program isn’t going to be spoon-fed an encoded path like that, so I had to figure out how to encode the pound sign programmatically. Going on the good advice found at how to encode URL to avoid special characters in java, I created a URI using a multi-argument constructor (I broke up the parameter I had been passing to the program into three separate parameters to accommodate that change). Here’s what that looked like:
URI uri = new URI(protocol, host, filePath, null); // all values are passed in
That encoded the pound sign properly; my URI was:
file://serverIPaddress/Documents/%232012/09/07/16/DOC4671179.DOC
But without the backslash in front of the %23, the program came back with Connection refused, presumably because it’s misinterpreting the path without the benefit of that backslash.
So I thought, ok, I’ll add the backslash myself. I created the same URI, extracted its rawPath, and with a bit of string manipulation, put a backslash in front of the %23. I then created a new URI using that new string:
URI uri = new URI(protocol, host, filePath, null); // all values are passed in
String rawPath = uri.getRawPath();
int pctPos = rawPath.indexOf("%");
String escaped = new String("\\");
String firstPart = rawPath.substring(0,pctPos);
String secondPart = rawPath.substring(pctPos);
String newPath = firstPart + escaped + secondPart;
URI uri2 = new URI(protocol, host, newPath, null);
However, predictably, that gave me a URI like this:
file://<serverIPaddress>/Documents/%5C%25232012/09/07/16/DOC4671179.DOC
with both the backslash and the % encoded. Makes sense, but still doesn’t work at execution time.
The URL API says:
The URL class does not itself encode or decode any URL components according to the escaping mechanism defined in RFC2396. It is the responsibility of the caller to encode any fields, which need to be escaped prior to calling URL
So I thought, ok, instead of creating a second URI, I’ll create a URL from that new string I generated in the last try:
URI uri = new URI(protocol, host, filePath, null); // all values are passed in
String rawPath = uri.getRawPath();
int pctPos = rawPath.indexOf("%");
String escaped = new String("\\");
String firstPart = rawPath.substring(0,pctPos);
String secondPart = rawPath.substring(pctPos);
String newPath = firstPart + escaped + secondPart;
URL url = new URL(protocol + "://" + host + newPath);
But in that approach, even though my new path looked good as:
/Documents/\%232012/09/07/16/DOC4671179.DOC
the resulting URL comes back as:
file://serverIPAddress/Documents//%232012/09/07/16/DOC4671179.DOC
with an extra forward slash in front of the %23 instead of a backslash.
And with that I’ve run out of ideas.
What makes the back slash in this last approach turn into a forward slash in the URL?
What can I do go get the URI/URL I need?
Or maybe I should ask: why does the program need the % in the %23 to be escaped in the first place, if that %23 is part of a legitimate URI or URL, and is there something I can do about that instead?