Python / Selenium / BeautifulSoup: How to get the html of a page that requires a login

Question

What I'm trying to do is create a beautiful soup object from the html of a web page after logging into the site.

I'm using selenium to log into the site which is working fine, but when I try to create a Beautifulsoup object I get the html of the login page.

Is it possible to create a soup object when the site requires a log in?

My code is below:

from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import logging, requests
logging.basicConfig(level=logging.DEBUG, format=' %(asctime)s -%(levelname)s - %(message)s')

logging.debug('Start of Program')

browser = webdriver.Chrome()
email = 'REDACTED'
password = 'REDACTED'
browser.get('http://tetra.delbridge.solutions/loginRequired.php')
ua = UserAgent()

def lovely_soup(url):
    r = requests.get(url, headers = {'User-Agent': ua.chrome})
    r.raise_for_status()
    return BeautifulSoup(r.text, 'html.parser')

# Log into tetra
loginButton = browser.find_element_by_css_selector('a')
loginButton.click()

emailElem = browser.find_element_by_css_selector('input')
emailElem.send_keys(email)
nextButton = browser.find_element_by_id('identifierNext')
nextButton.click()


passwordElem = WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.NAME,'password')))
logging.debug(passwordElem)
passwordElem.send_keys(password)
passwordNext = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.ID,'passwordNext')))
logging.debug(passwordNext)
passwordNext.click()


#TODO: Get the hours for the week and save those hours to a csv file
Elem = ''
while not Elem:
    try:
        Elem = browser.find_element_by_id('maintable')
        current_url = browser.current_url
        logging.debug('TEST: '+current_url)
    except:
        continue

soup = lovely_soup(current_url)
test = soup.select('#maintable')
logging.debug('TEST' + str(soup))
logging.debug(current_url)
#TODO: Take the above csv file and change the format to match the upload to Vena

logging.debug('End of Program')

Use Selenium to login and then soupify the page's html from the `driver.page_source`. Otherwise, BS4 sees the page as a completely different session/user who has not logged in yet (or an error). See this other SO answer: [How can I parse a website using Selenium and Beautifulsoup in python?](https://stackoverflow.com/a/13960516/1431750) — aneroid, May 14 '20 at 17:56
So essentially, after the step `passwordNext.click()` and the try-except block, do `soup = BeautifulSoup(browser.get_source)`. You don't need the `lovely_soup()` function, since that attempts to get the page as a separate request. — aneroid, May 14 '20 at 18:01
Instead of `soup = lovely_soup(current_url)` use `soup =BeautifulSoup(browser.page_source)` and you don't need `lovely_soup()` — KunduK, May 14 '20 at 18:20

Python / Selenium / BeautifulSoup: How to get the html of a page that requires a login

0 Answers0