First, please excuse my naivety with this subject. I'm a retired programmer that started before DOS was around. I'm not an expert on ASP.NET. Part of what I need to know is what I need to know. (If yo follow me...)
So I want to log into a web site and scrape some content. After looking at the HTML source with notepad and fiddler2, it's clear to me that the site is implemented with ASP.NET technologies.
I started by doing a lot of google'ing and reading everything I could find about writing screen scrapers in c#. After some investigation and many attempts, I think I've come to the conclusion that it isn't easy.
The crux of the problem (as I see it now) is that ASP provides lots of ways for a programmer to maintain state. Cookies, viewstate, session vars, page vars, get and post params, etc. Plus the programmer can divide the work up between server and client scripting. A rich web client such as IE or Safari or Chrome or Firefox knows how to handle whatever the programmer writes (and the ASP framework implements under the covers).
WebClient isn't a rich web client. It doesn't even know how to implement cookies.
So I'm at an impasse. One way to go is to try to reverse engineer all the features of the rich client that the ASP application is expecting and write a WebClient on steroids class that mimics a rich client well enough to got logged in.
Or I could try embedding IE (or some other rich client) into my app and hope the exposed interface is rich enough that I can programmatically fill a username and password field and POST the form back. (And access the response stream so I can parse the HTML to scrape out the data I'm after...)
Or I could look for some 3rd party control that would be a lot richer that WebClient.
Can anyone shed some keen insight into where I should focus my attention?
This is as much a learning experience as a project. That said, I really want to automate login and information retrieval from the target site.