8

So AWS converts a space into + for the bucket/file URL. But a filename that already has + in it is encoded as %2B. I am confused how to handle this case.

When the input URL for an application is :

https://s3-us-west-2.amazonaws.com/mybucket/Pul0419_32_a+b.zip

how do I decide whether the file that actually exists is Pul0419_32_a+b.zip or Pul0419_32_a b.zip

  • You can try to retrieve URI info `$_SERVER['QUERY_STRING']` or `$_SERVER['REQUEST_URI']` – Dolbik Apr 20 '16 at 05:30
  • @Dolbik The data is sent as `JSON` along with the `POST` request –  Apr 20 '16 at 06:06
  • 2
    If you `urldecode()` the filename from the query string, then whatever it outputs will be what the filename is called. It will convert back the `%2B` to the `+` and the `+` to a space. – Rasclatt Apr 20 '16 at 06:14
  • @Rasclatt What if the file name originally had a plus sign `Pul0419_32_a+b.zip`? How do I know this? –  Apr 20 '16 at 06:39
  • Because the urldecode() converts it for you. You don't have to worry about it. If there is a plus, The server converts it automatically using a function like urlencode() so you are just let to decode it – Rasclatt Apr 20 '16 at 06:58
  • @Rasclatt `a b` is received as `a b` and `a+b` is received as `a+b`. That is the problem. I am sending a `POST` request with data as `JSON`. Example Input : `{"file_path":"https://s3-us-west-2.amazonaws.com/pts/ds/MXF/TEST/a b.mxf"} –  Apr 20 '16 at 07:09
  • Ok, I still don't see the problem here. It still seems straight forward. Can you show and example of code that illustrates further what you mean? – Rasclatt Apr 20 '16 at 07:15
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/109628/discussion-between-aashna-and-rasclatt). –  Apr 20 '16 at 07:19
  • I've been bitten by this too. We now rewrite all user uploads into a random filename. – ceejayoz Apr 21 '16 at 01:52

1 Answers1

11

AWS enthusiast that I am, I have to concede that the original architects of S3 made an extremely unfortunate error when they decided that + in the path of a URL should be interpreted as if it were equivalent to ASCII 0x20 ("space").

The + character only carries this meaning when part of the query string. In the path, it should have been interpreted literally.

In the path of a correctly encoded and interpreted URL, + is equivalent to %2B.

There is, then, no dependable answer to the question, because of the fundamental flaw that causes S3 to handle correct URLs incorrectly.

Given the fact that if the example URL were used by a browser, S3 would assume those were spaces, your interests would probably be best served by not transforming the URL to use %2B but rather to use it as-is in the interaction with S3... unless practical experience suggests that the original source of these URLs has actually interacted with S3 and did indeed transform them to %2B without storing them for subsequent use with consistent encoding, in which case the argument could be made that they are being provided to you wrong but you may have to transform them anyway, for reasons that may be more political than technical.

But, as it appears you already suspect, the answer is less than straightforward.

Community
  • 1
  • 1
Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427