5

I am new to Python and Web Scraping and I am trying to write a very basic script that will get data from a webpage that can only be accessed after logging in. I have looked at a bunch of different examples but none are fixing the issue. This is what I have so far:

from bs4 import BeautifulSoup
import urllib, urllib2, cookielib

username = 'name'
password = 'pass'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('WebpageWithLoginForm')
resp = opener.open('WebpageIWantToAccess')
soup = BeautifulSoup(resp, 'html.parser')
print soup.prettify()

As of right now when I print the page it just prints the contents of the page as if I was not logged in. I think the issue has something to do with the way I am setting the cookies but I am really not sure because I do not fully understand what is happening with the cookie processor and its libraries. Thank you!

Current Code:

import requests
import sys

EMAIL = 'usr'
PASSWORD = 'pass'

URL = 'https://connect.lehigh.edu/app/login'

def main():
    # Start a session so we can have persistant cookies
    session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'username': EMAIL,
        'password': PASSWORD,
        'LOGIN': 'login',
    }

    # Authenticate
    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    r = session.get('https://lewisweb.cc.lehigh.edu/PROD/bwskfshd.P_CrseSchdDetl')

if __name__ == '__main__':
    main()
Aaron Rotem
  • 356
  • 1
  • 3
  • 18
  • Possible duplicate of [Login to website using python](http://stackoverflow.com/questions/8316818/login-to-website-using-python) – Harrison Aug 01 '16 at 18:36

1 Answers1

1

You can use the requests module.

Take a look at this answer that i've linked below.

https://stackoverflow.com/a/8316989/6464893

Community
  • 1
  • 1
Harrison
  • 5,095
  • 7
  • 40
  • 60
  • Can I call the method main() without the lines `if __name__ = __main__`? – Aaron Rotem Aug 01 '16 at 18:43
  • You don't have any functions defined in your script so far, let alone a main function. It's good practice to group similar blocks of code into functions. Also, why don't you want to use `if __name__ == '__main__':`? – Harrison Aug 01 '16 at 18:44
  • What does the line mean exactly? `if __name__ == '__main__': main()` – Aaron Rotem Aug 01 '16 at 18:46
  • It means that anything found in that block will be executed first when you run your script. If my answer helped you, feel free to mark it as an answer. :) – Harrison Aug 01 '16 at 18:47
  • Once I get it working I will mark it as an answer AND upvote it, I keep getting this error `session = requests.session(config={'verbose': sys.stderr}) TypeError: session() takes no arguments (1 given) `, what am I doing wrong? – Aaron Rotem Aug 01 '16 at 18:50
  • I don't know. I don't have your code in front of me to look at. – Harrison Aug 01 '16 at 19:02
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/118843/discussion-between-aaron-rotem-and-hleggs). – Aaron Rotem Aug 01 '16 at 19:06