I am trying to login to a website which requires a username and password using rvest.
I am using this as a resource as I found it very helpful: https://awesomeopensource.com/project/yusuzech/r-web-scraping-cheat-sheet#rvest7.5
When I submit the form for login I receive a HTTP 404 warning message and can not proceed with reading any of the html on the webpage.
Submitting with 'NULL'
Warning message:
In request_POST(session, url = url, body = request$values, encode = request$encode, :
Not Found (HTTP 404).
Can anyone who understands HTML please help me understand if I am passing the right fields in my submit form?
My code looks as follows:
install.packages("pacman")
# LOAD LIBRARIES
pacman::p_load(rvest,purrr,xml2,dplyr,stringr)
# TARGET URL
url <- "https://www.mywebsite.com/"
# SPOOF THE USER AGENT TO LOOK LIKE A BROWSER
ua <- httr::user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36")
# CREATE A PERSISTANT SESSION
my_session <- rvest::html_session(url,ua)
# FIND ALL FORMS IN THE WEB PAGE
unfilled_forms <- rvest::html_form(my_session)
# SELECT THE FORM THAT YOU NEED TO FILL IN
login_form <- unfilled_forms[[1]]
#FILL IN THE FORM
filled_form <- set_values(login_form, username = "myUsername", password = "myPassword")
# SUBMIT THE FORM TO LOGIN
login_session <- submit_form(my_session, filled_form)