I recently took up Python and decided to start my first project which involves scraping my University's website. Right now I am stuck since I can't get past the login page. Basically I am facing the exact same issue described in this question.
From my limited understanding and as per the last comment posted by @t.m.adam, it seems like that I need use inspect element on the login page, search for the 11th tag and parse the js code with regex. I am pretty much lost though since the 11th tag looks nothing like a hex string.
I am posting my code below for reference:
import requests
from bs4 import BeautifulSoup
# all cookies received will be stored in the session object
s = requests.Session()
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Content-Type': 'application/x-www-form-urlencoded',
'Origin': 'https://student.cc.uoc.gr',
'DNT': '1',
'Connection': 'keep-alive',
'Referer': 'https://student.cc.uoc.gr/login.asp?mnuID=student&autologoff=1',
'Upgrade-Insecure-Requests': '1',
}
data = {
'userName': '*****',
'pwd': '*****',
'submit1': '%C5%DF%F3%EF%E4%EF%F2',
'loginTrue': 'login',
}
# Add headers in session.
s.headers.update(headers)
page = s.get('https://student.cc.uoc.gr')
login = s.post('https://student.cc.uoc.gr/login.asp', data=data)
home_page = s.get("https://student.cc.uoc.gr/studentMain.asp")
target = s.get("https://student.cc.uoc.gr/stud_CResults.asp")
soup = BeautifulSoup(target.content,"lxml", from_encoding='utf8')
print(soup.text)