0

Following the very nice explanations on this blog post about "Logging in With Requests" and the code snippet from this answer to a question on SO about 'How to “log in” to a website using Python's Requests module?', I have the following code (*) in order to enter and navigate through a website with authentification:

import requests, lxml.html

logurl = 'http://www.somesite.fr/subsite/'
url2 = 'http://www.somesite.fr/subsite/anotherpath/1135'

with requests.session() as s:
    login = s.get(logurl)
    login_html = lxml.html.fromstring(login.text)
    hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
    form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
    form['email'] = 'myemail'
    form['password'] = 'mypassword'

    response = s.post(logurl, data=form)

    r2 = s.get(url2)

If I print form:

{'form_action': 'connexion', 
 'CSRFGuard_token': '762bd944c74e4194db5248279a80bc3eba8e417f0439af2701364e39c0e4b67376c0afc19ba05f2b8fd98ce3b14ac9625d59827b19f2134b4da98c43bef2b57a', 
 'password': 'mypassword', 
 'email': 'myemail'}

With r2 = s.get(url2), I am trying to navigate into this website after authentification. url2 is the url I get when I navitage "manually" after logging in in logurl, and the html (and appearances) of these two pages are well different. But if I do print response.text and r2.text, I get exactly the same html code, i.e. the one of the login page. I conclude that the logging in was not successful, or that the session does not keep this status...

What am I doing wrong? Thanks!


EDIT

Running the code suggested by Brian M. Sheldon:

import logging
import requests

# enable debug logging with basic logging config
logging.basicConfig(level=logging.DEBUG)

with requests.session() as s:
    s.headers['user-agent'] = 'myapp'  # use non-default user-agent
    response = s.post(logurl, data={'email': 'myemail', 'password': 'mypassword'})
    print response.headers

DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.somesite.fr

DEBUG:requests.packages.urllib3.connectionpool:http://www.somesite.fr:80 "POST /subsite/ HTTP/1.1" 200 1415

and response.headers is:

{'Content-Length': '1415', 'Content-Encoding': 'gzip', 'Set-Cookie': 'PHPSESSID=741q7fj6pnkdl1ho4pr6s35cl1; path=/', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Vary': 'Accept-Encoding,Origin', 'Keep-Alive': 'timeout=5, max=100', 'Server': 'Apache', 'Connection': 'Keep-Alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Date': 'Tue, 25 Apr 2017 14:57:52 GMT', 'Content-Type': 'text/html; charset=UTF-8'}

s.cookies is:

RequestsCookieJar[ Cookie PHPSESSID=t9t9gvt7enp70v5mb2viebr8v0 for www.somsite.fr/ ]>

and s.get(url2) gives:

DEBUG:requests.packages.urllib3.connectionpool:http://www.somesite.fr:80 "GET /subsite/anotherpath/1135 HTTP/1.1" 200 1378

Does it help to understand what I am doing wrong?


PS: apparently the field has been moving fast the last years, and some answers from a few years ago already appear obsolete/replaced by better options. From my readings, I think Requests is the best to achieve what I want, but other solutions are welcome too. And if I forgot some useful info, please let me know and I'll edit.

(*) I am sorry but my problem is with a website with authentification and I cannot give a reproducible example.

Community
  • 1
  • 1
ztl
  • 2,512
  • 1
  • 26
  • 40

1 Answers1

1

Without more information, a more specific answer is not possible. The first thing I would probably check is if the authentication is returned in a header. The headers are available in response.headers. The reason the second request is failing is the session isn't providing the required authentication, so its redirecting you to the login url. If you enable debug logging then you can see if the request is being redirected. Additionally some websites block requests with the default requests user-agent, so setting a user-agent may help. Also the whole lxml section is likely unnecessary. Try the following to get more details on what is actually happening so we can assist further:

import logging
import requests

# enable debug logging with basic logging config
logging.basicConfig(level=logging.DEBUG)  

logurl = 'http://www.somesite.fr/subsite/'

with requests.session() as s:
    s.headers['user-agent'] = 'myapp'  # use non-default user-agent
    response = s.post(logurl, data={'email': 'myemail', 'password': 'mypassword'})
    print(response.headers)
  • Thanks a lot for your help! Could there be a typo in `login.post` --> `s.post`? The `print response.headers` gives: `{'Content-Length': '1416', 'Content-Encoding': 'gzip', 'Set-Cookie': 'PHPSESSID=9mrmk3h131hflg40i1o0v5hc97; path=/', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Vary': 'Accept-Encoding,Origin', 'Keep-Alive': 'timeout=5, max=100', 'Server': 'Apache', 'Connection': 'Keep-Alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Date': 'Tue, 25 Apr 2017 14:53:39 GMT', 'Content-Type': 'text/html; charset=UTF-8'} ` – ztl Apr 25 '17 at 14:58
  • Thanks - I edited my question to show what your code produces because I still don't understand the problem... – ztl Apr 25 '17 at 15:10
  • It looks like there's a PHP session ID cookie. See if the session contains the cookie (`s.cookies`). Also try a followup request to url2 and see if the logging shows any redirects. – Brian M. Sheldon Apr 25 '17 at 15:23
  • Thanks really much for your help. It seems there is a cookie, and I don't see a redirect for the url2 (see edited question for the output) - unless I can't exploit the info provided? – ztl Apr 25 '17 at 15:32
  • Also it may not hurt to try printing the response headers and cookies from your original code to see if the action and token are necessary in the post data. If you notice the output from my code, the expiration time has already passed so the action and token may be required. – Brian M. Sheldon Apr 25 '17 at 15:34
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/142639/discussion-between-brian-m-sheldon-and-ztl). – Brian M. Sheldon Apr 25 '17 at 15:37