0

I am trying to load a JSON file and do some analysis in R.

The JSON file contains parts like this:

 '{"property":"blabla \"some goofy name\" more blabla"}'

Which means there are a couple of double quotes inside a string value of a property. This is supposed to be valid JSON (or not?).

The problem is that if I try to parse it with jsonlite or any other library, I need to have it assigned to a string variable in R. Like that:

 a = '{"property":"blabla \"some goofy name\" more blabla"}'

but then, if I type a and press enter, I get this back:

[1] "{\"property\":\"blabla \"some goofy name\" more blabla\"}"

Which means that the already existing \" instances are now equal to the actual " instances, so I can't even replace them with regular expression. If I feed this to any JSON parsing library there are errors with invalid characters etc.

Is there any way to 'catch' those nasty \" instances before R considers them the same with plain ", so that I can eliminate the \" and continue the JSON parsing?

The difference with a similar issue is that the inner quotes are already escaped forming a valid JSON. My ultimate challenge is to parse this JSON: http://next.openspending.org/api/3/cubes/ba94aabb80080745688ad38ccad9bfea:at-austria-at11-burgenland/facts?pagesize=30

Community
  • 1
  • 1
  • This is not an exact duplicate, as the OP of the linked question has an input that is not escaped, or at least, it doesn't seem to be so. If you assign `[{"id":"484","comment":"They call me \"Bruce\""}]` to a variable, you can see the same problem. In the selected answer, it is suggested like `fromJSON("[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]")` which is not practical for me, as I neither write the JSON by hand nor can I assign it to a variable to do the replacement. – ʌἁɀɑƿѻѕ ɪɷɑⱴⱴἱɗʜѕ Nov 21 '16 at 13:10
  • I'm not sure I've understood: do you need to delete the couple of characters \" ? or do you need to have them still after the parsing? :) – Ale Nov 21 '16 at 13:16
  • I would go either way. The original problem is that I can't find a way to parse this JSON: http://next.openspending.org/api/3/cubes/ba94aabb80080745688ad38ccad9bfea:at-austria-at11-burgenland/facts?pagesize=30 It is supposed to be a valid one, but there are two issues: 1. Carriage returns and new lines which I can easily replace with a regular expression ([\r\n]) 2. Already escaped double quotes. This is where I am out of luck so far. It would be acceptable to just get rid of the inner quotes, but if they can be left untouched that would be better – ʌἁɀɑƿѻѕ ɪɷɑⱴⱴἱɗʜѕ Nov 21 '16 at 13:26

1 Answers1

0

Updated answer following the OP's update

I think I may still have not understood 100% what you want to accomplish, so let me know if this is not your intended output. I didn't deal with the newline characters in your file since that doesn't seem relevant. Your file contains strings that contain "\"Bienenkorb\"" as you described.

url <- "http://next.openspending.org/api/3/cubes/ba94aabb80080745688ad38ccad9bfea:at-austria-at11-burgenland/facts?pagesize=30"
parsed <- jsonlite::fromJSON(url)
print(parsed$data$activity_project_id.project_name[3])
#[1] "Neugestaltung und\nModernisierung des\nRestaurants \"Bienenkorb\""
cat(parsed$data$activity_project_id.project_name[3])
#Neugestaltung und
#Modernisierung des
#Restaurants "Bienenkorb"

If you want to assign it to a string and then parse it, you can do s <- readLines(url); parsed <- jsonlite::fromJSON(s).

konvas
  • 14,126
  • 2
  • 40
  • 46
  • This specific sample works, but you have manually added the escape back-slashes to the inner quotes. You have `\\'` instead of `\'` which is my input. Anyway, here is a test file (I have updated the original question to include that): http://next.openspending.org/api/3/cubes/ba94aabb80080745688ad38ccad9bfea:at-austria-at11-burgenland/facts?pagesize=30 Keep in mind that you also have to remove [\r\n] from this – ʌἁɀɑƿѻѕ ɪɷɑⱴⱴἱɗʜѕ Nov 21 '16 at 22:16
  • I've updated my answer, let me know if this works for you - if not can you also provide your desired output please. – konvas Nov 22 '16 at 09:08