The returned page is only viewable in a text editor, and looks like thus:
<html style="height:100%">
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<script type="text/javascript" src="/_Incapsula_Resource?SWJIYLWA=2977d8d74f63d7f8fedbea018b7a1d05"></script>
</head>
<body style="margin:0px;height:100%">
<iframe src="/_Incapsula_Resource?CWUDNSAI=23&xinfo=8-12690372-0 0NNN RT(1406173695342 164) q(0 -1 -1 -1) r(0 -1) B12(4,315,0) U10000&incident_id=257000050029892977-66371435311988824&edet=12&cinfo=4b6fe7bcc753855a04000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 257000050029982977-66371435131988824</iframe>
</body>
</html>
I'm doing the following in perl:
# Suddenly web robot.
my $mech = WWW::Mechanize->new();
$mech->agent_alias('Mac Safari');
How are they detecting it? It can't be just from the user agent string I wouldn't think. Is there any way to bypass this? I'm not doing anything nasty, just trying to download my retirement account savings without having to do it manually.
I see several results on how to honor a robots.txt, but nothing on how to escape detection.
Looking through the page with Chrome, it seems that they use these guys somehow:
http://www.incapsula.com/website-security/
Anyone have any ideas?