3

I'm trying to begin working myself into web-scraping. Now my target is to get my personal rated movies from the moviepilot.de page.

For this I need to access following page: http://www.moviepilot.de/users/schlusie/rated/movies. But without authentication it is not possible.

I've read that the httr package can do something like this, save it as a handler with handle and than navigating over the homepage with your login-information. And thus accessing desired page. It should look like this:

library(httr)
mp = handle("http://moviepilot.de")
# authentication step
GET(handle=mp, path="/users/schlusie/rated/movies")

This is the login-page: http://www.moviepilot.de/login

Can someone please give me any pointers?

schlusie
  • 1,907
  • 2
  • 20
  • 26
  • 1
    Technically this is not a dup of -- [How do I use cookies with RCurl?](http://stackoverflow.com/questions/2388974/how-do-i-use-cookies-with-rcurl) -- since you're using `httr` and not `RCurl` directly (`httr` is pretty much an `RCurl` wrapper). Take a look at that SO post and see if you can retrofit it for your needs. – hrbrmstr Apr 13 '14 at 10:54
  • With `httr`, you don't need to do anything to have cookies preserved across requests, it does it by default. To figure out what request you need to send to login, you'll need to inspect the html or use browser debug features. – hadley Apr 14 '14 at 14:56
  • Thanks. Im king of a newbie to this. do you know some tutorials or helpers how to inspect what I should `POST` to access the page? – schlusie Apr 14 '14 at 15:53
  • I suggest using Developer tools (e.g. in Google Chrome > View > Developer > Developer Tools) under Network you can observe the requests sent. Did you have any luck with your project? I am also trying a similar task. – Verena Haunschmid Jan 23 '16 at 10:36

0 Answers0