Intro:
I am pretty inexperienced, but recently I have been trying to access some data from a website using Google Apps Scripts. However, to access the data, I must be logged into that website. There have actually been many posts about similar issues before, but none of them were very helpful until I came to this one: how to fetch a wordpress admin page using google apps script. The accepted answer gave a method for saving the cookies and sending them out again in the second request. I basically copied and pasted the code into my own GAS file. Since the problem in that post was logging into Wordpress, I tried that first, and it worked. I had to remove the if statement checking for the response code because 200 was being returned even when I entered the correct combo. I don't know if that was just an error in the post's code or what. In any case, I verified that the second request I made returned information as if I was logged in.
Details about specific site:
The actual website that I am trying to log onto has a some kind of weird hashing method that I haven't seen on any other login pages. When you click submit, the password changes to something really long before going to another page. The opening form tag looks like this:
<form action="/guardian/home.html" method="post" name="LoginForm" target="_top" id="LoginForm" onsubmit="doPCASLogin(this);">
As you can see, it has an "onsubmit" attribute, which I believe will just run "doPCASLogin(this);" when the form is submitted. I decided to play around with the page by just entering javascript into the address bar. What I found was that doing a command like this (after entering in my username and password):
javascript: document.forms[0].submit();
didn't work. So I dug around and found the function "doPCASLogin()" in a javascript file called "md5.js". I believe md5 is some kind of hash algorithm, but that doesn't really matter. The important part of "doPCASLogin()" is this:
function doPCASLogin(form) {
var originalpw = form.pw.value;
var b64pw = b64_md5(originalpw);
var hmac_md5pw = hex_hmac_md5(pskey, b64pw)
form.pw.value = hmac_md5pw;
form.dbpw.value = hex_hmac_md5(pskey, originalpw.toLowerCase())
if (form.ldappassword!=null) {
form.ldappassword.value = originalpw;
}
}
There is some other stuff as well, but I found that it didn't matter for my login. It is pretty obvious that this just runs the password through another function a few times using "pskey" (stored in a hidden input, different on each reload) as a key, and puts these in inputs on the original form ("dbpw" and "ldappassword" are hidden inputs, while "pw" is the visible password entry input). After it does this, it submits. I located this other "hex_hmac_md5()" function, which actually connects to a whole bunch of other functions to hash the password. Anyway, that doesn't matter, because I can just call the "hex_hmac_md5()" from the javascript I type in the address bar. This is the working code that I came up with, I just broke the line up for readability:
javascript:
document.forms['LoginForm']['account'].value="username";
document.forms['LoginForm']['pw'].value="hex_hmac_md5(pskey, b64_md5('password');)";
document.forms['LoginForm']['ldappassword'].value="password";
document.forms['LoginForm']['dbpw'].value="hex_hmac_md5(pskey, 'password')";
document.forms['LoginForm'].submit();
Wherever you see "username" or "password", this just means that I entered my username and password in those spots, but obviously I have removed them. When I discovered that this worked, I wrote a small Chrome extension that will automatically log me in when I go to the website (the login process is weird so Chrome doesn't remember my username and password). That was nice, but it wasn't my end goal.
Dilemma:
After discovering all this about the hashing, I tried just putting in all these values into the HTTP payload in my GAS file, though I was skeptical that it would work. It didn't, and I suspect that is because the values are just being read as strings and the javascript is not actually being run. This would make sense, because running the actual javascript would probably be a security issue. However, why would it work in the address bar then? Just as a side note, I am getting a 200 response code back, and it also seems that a cookie is being sent back too, though it may not be valid. When I read the actual response, it is just the login page again.
I also considered trying to replicate the entire function in my own code after seeing this: How to programmatically log into a website?, but since "pskey" is different on each reload, I think the hashing would have to be done with the new key on the second UrlFetch. So even if I did copy all of the functions into my GAS file, I don't think I could successfully log on because I would need to know the "pskey" that will be generated for a particular request BEFORE actually sending the request, which would be impossible. The only way this would work is if I could somehow maintain one page somehow and read it before sending data, but I don't know how I would do this with GAS.
EDIT: I have found another input, named "contextData", which is the same as "pskey" when the page is loaded. However, if I login once and look at the POST request made using Chrome Developers tools, I can copy all the input values, including "contextData", and I can send another request a second time. Using javascript in the address bar, it looks like this:
javascript:
document.forms['LoginForm']['account'].value="username";
document.forms['LoginForm']['pw'].value="value in field that browser sent once";
document.forms['LoginForm']['ldappassword'].value="password";
document.forms['LoginForm'['dbpw'].value="value in field that browser sent once";
document.forms['LoginForm'['contextData'].value="value in field that browser sent once";
document.forms['LoginForm'].submit();
I can sign into the website as many times as I want in this manner, no matter what "pskey" is, because I am submitting everything directly and no hashing is being done. However, this still doesn't work for me, so I'm kind of stuck. I should note that I have checked the other hidden input fields and I can still log in successfully with the javascript above even after clearing every input in the form.
QUESTIONS:
-was I correct in assuming that the code I was sending was being interpreted as a string?
-why is the new code below that I just recently wrote not working?
-for future reference, how would I use GAS to sign into a site like Google where a randomly generated string is sent in the login form, and must be sent back?
function getData() {
var loginURL = 'login page';
var dataURL = 'page with data';
var loginPayload = {
'account':'same as in previous code block',
'pw':"same as in previous code block",
'ldappassword':'same as in previous code block',
'dbpw':"same as in previous code block",
"contextData":"same as in previous code block",
};
var loginOptions = {'method':'post','payload':loginPayload,'followredirects':false};
var loginResponse = UrlFetchApp.fetch(loginURL,loginOptions);
var loginHeaders = loginResponse.getAllHeaders();
var cookie = [loginResponse.getAllHeaders()["Set-Cookie"]];
cookie[0] = cookie[0].split(";")[0];
cookie = cookie.join(";");
var dataHeaders = {'Cookie':cookie};
var dataOptions = {'method':'get','headers':dataHeaders};
var dataResponse = UrlFetchApp.fetch(dataURL,dataOptions);
Logger.log(dataResponse);
}