2

I am trying to build a simple webbot in Python, on Windows, using MechanicalSoup. Unfortunately, I am sitting behind a (company-enforced) proxy. I could not find a way to provide a proxy to MechanicalSoup. Is there such an option at all? If not, what are my alternatives?

EDIT: Following Eytan's hint, I added proxies and verify to my code, which got me a step further, but I still cannot submit a form:

import mechanicalsoup

proxies = {
    'https': 'my.https.proxy:8080',
    'http':  'my.http.proxy:8080'
}
url = 'https://stackoverflow.com/'
browser = mechanicalsoup.StatefulBrowser()
front_page = browser.open(url, proxies=proxies, verify=False)
form = browser.select_form('form[action="/search"]')
form.print_summary()
form["q"] = "MechanicalSoup"
form.print_summary()
browser.submit(form, url=url)

The code hangs in the last line, and submitdoesn't accept proxies as an argument.

Igor F.
  • 2,649
  • 2
  • 31
  • 39

2 Answers2

3

It seems that proxies have to be specified on the session level. Then they are not required in browser.open and submitting the form also works:

import mechanicalsoup

proxies = {
    'https': 'my.https.proxy:8080',
    'http':  'my.http.proxy:8080'
}
url = 'https://stackoverflow.com/'
browser = mechanicalsoup.StatefulBrowser()
browser.session.proxies = proxies   # THIS IS THE SOLUTION!
front_page = browser.open(url, verify=False)
form = browser.select_form('form[action="/search"]')
form["q"] = "MechanicalSoup"
result = browser.submit(form, url=url)
result.status_code

returns 200 (i.e. "OK").

Igor F.
  • 2,649
  • 2
  • 31
  • 39
1

According to their doc, this should work:

browser.get(url, proxies=proxy)

Try passing the 'proxies' argument to your requests.

Eytan Avisror
  • 2,860
  • 19
  • 26
  • I see: browser.get is a wrapper around requests.Session.get, which accepts proxies. However, still no luck. Now I get "HTTPSConnectionPool(host='stackoverflow.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))" – Igor F. Dec 19 '17 at 14:39
  • 1
    This is a separate issue. Add verify=False to the function arguments. This will give you a warning - but should work - there is also a way to suppress the warning. -- browser.get(url, proxies=proxy, verify=False) – Eytan Avisror Dec 19 '17 at 14:56