The html form code for the site:
<form class="m-t" role="form" method="POST" action="">
<div class="form-group text-left">
<label for="username">Username:</label>
<input type="text" class="form-control" id="username" name="username" placeholder="" autocomplete="off" required />
</div>
<div class="form-group text-left">
<label for="password">Password:</label>
<input type="password" class="form-control" id="pass" name="pass" placeholder="" autocomplete="off" required />
</div>
<input type="hidden" name="token" value="/bGbw4NKFT+Yk11t1bgXYg48G68oUeXcb9N4rQ6cEzE=">
<button type="submit" name="submit" class="btn btn-primary block full-width m-b">Login</button>
Simple enough so far. I've scraped a number of sites in the past without issue.
I have tried: selenium, mechanize(albeit had to drop back to earlier version of python), mechanicalsoup, requests.
I have read: multiple posts here on SO as well as: https://kazuar.github.io/scraping-tutorial/ http://docs.python-requests.org/en/latest/user/advanced/#session-objects and many many more.
Sample code:
import requests
from lxml import html
session_requests = requests.session()
result = session_requests.get(url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[@name='token']/@value")))[0]
result = session_requests.post(
url,
data = payload,
headers = dict(referer=url)
)
result = session_requests.get(url3)
print(result.text)
and
import mechanicalsoup
import requests
from http import cookiejar
c = cookiejar.CookieJar()
s = requests.Session()
s.cookies = c
browser = mechanicalsoup.Browser(session=s)
login_page = browser.get(url)
login_form = login_page.soup.find('form', {'method':'POST'})
login_form.find('input', {'name': 'username'})['value'] = username
login_form.find('input', {'name': 'pass'})['value'] = password
response = browser.submit(login_form, login_page.url)
Try as I might I just cannot return anything other than the html code for the login page and I don't know where to explore next to actually figure out what's not happening and why.
url = variable that holds login page url, url3 = a page I want to scrape.
Any help would be much appreciated!