Well, since there has been a bit of an uproar lately about how exactly to get and parse information from web pages, I figured I'd write a small tutorial about how to do so. (Using MMOwned as a test dummy for the URL.)
First thing we need to do, is create a new windows forms application. (This tutorial assumes it's named MMOwnedRepParser)
First things first, we'll make it flashy.
Very simple, right?
Few things to keep in mind, to make it easier to explain.
The form's name is "MainForm"
The "Get Rep" button's name is "btnGetRep"
The "No User Selected!" lable is "lblUserRep"
The textbox name is "txtUserName"
Now, first things first, we need to add in a simple HTTPGET method to return a web page's source.
Create a new class named "Http" and change it to the following:
The code itself is documented fairly well, so I won't bother explaining it.Code:using System.IO; using System.Net; using System.Text; namespace MMOwnedRepParser { public class Http { private static HttpWebResponse HttpWResponse; public static string GetHTTP(string url) { // Send a request to the URL provided when the method was called. var HttpWRequest = (HttpWebRequest)WebRequest.Create(url); // Set some specific things needed for certain web pages to be viewed. HttpWRequest.Credentials = CredentialCache.DefaultCredentials; HttpWRequest.UserAgent = "MMOwned Wins Hard"; HttpWRequest.KeepAlive = true; HttpWRequest.Headers.Set("Pragma", "no-cache"); HttpWRequest.Timeout = 300000; // We are only GETting the page information. We are not passing it any. HttpWRequest.Method = "GET"; // This is in a try/catch block due to some pages going offline. // (If we didn't catch the error, we would crash the app) try { // Get the response we sent with the HttpWRequest from above. HttpWResponse = (HttpWebResponse)HttpWRequest.GetResponse(); // Read the page we got from the response, and pass it out as our return statement. var sr = new StreamReader(HttpWResponse.GetResponseStream(), Encoding.ASCII); var s = sr.ReadToEnd(); // Make sure we close our reader, or we end up with some nasty bugs. sr.Close(); return s; } catch (WebException) { // The page could not be viewed. So we return an ERROR string instead. return "ERROR"; } } } }
Just keep in mind, this is a very simple httpGET method. It does not handle POST http methods.
Now that we have our way to grab the page information, let's create a way to find the rep using the user profile page of MMOwned.
First things first we need to see what type of page source is generated. Using my own profile view (by clicking on my name, not by going to "User CP") Right click and select "View Page Source" (Might be different in other browsers, you want to view the source of the page.)
Now we need to find where reputation is displayed. (Luckily, this page is mostly static, so the position of what we want is always in the same place.)
The bit of information we want to find is the following:
All we really want is the "202 point(s) total" since we just want to see how much rep a given person has.Code:<span class="smallfont" style="float:right"> 202 point(s) total <a href="/forums/members/apoc.html#top" onclick="return toggle_collapse('profile_reputation')">
Now we're going to use a bit of regex (regular expressions) to find that single line so we can use it.
Create a new method. (I created it right in the MainForm code file. Just double click the form to open it, or select it from the solution browser.)
We first need to add the following using directive:
This will allow us to use Regex.Code:using System.Text.RegularExpressions;
Now we add the following method to parse the page we received and get our reputation points.
The "var rx" is initializing a new instance of Regex using the supplied regex string. It will return a match of "<any number of digits> point(s) total" if it finds it.Code:private string Rep(string toSearch) { var rx = new Regex(@"d*spoint(s)stotal"); return rx.Match(toSearch).ToString(); }
Then we just return the match from our page string we will be passing to it in a minute.
Now, to make this do anything, we need to add some code for the button itself.
Back in the designer view for the form, double click the button, to bring up the OnClick event handler. (Visual Studio does this automatically for you when you double click)
So now we need to add in our code, first, let's make sure we have something typed in the text box for the user name.Code:private void btnGetRep_Click(object sender, EventArgs e) { }
Pretty simple right?Code:private void btnGetRep_Click(object sender, EventArgs e) { if (txtUserName.Text.Length == 0) { MessageBox.Show("Please enter a user name!"); } }
Now let's actually make this thing work!
Update the method as follows:
I split this up to make it easier to read. The first part of the else statement, grabs our url source, and stores it in a string variable. (The compiler will use the implicitly typed "var" as a string by itself.)Code:private void btnGetRep_Click(object sender, EventArgs e) { if (txtUserName.Text.Length == 0) { MessageBox.Show("Please enter a user name!"); } else { var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text)); lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource)); } }
Next we update our lblUserRep to show the name we entered in the username box, and the Rep that we parsed using our Rep method from earlier.
Now, here's a little bit more on this tutorial, what if we searched for a member who doesn't exist?
Our label ends up saying "<SomeUser> has" with no rep. Well, we can make it look a bit prettier and easier by making the following changes:
In the Rep method:
Now we've changed the return statement to something you may not understand easily. In short, if the regex match was successful, return the matching string, if not, return null.Code:private string Rep(string toSearch) { var rx = new Regex(@"d*spoint(s)stotal"); return rx.Match(toSearch).Success ? rx.Match(toSearch).ToString() : null; }
Now in our button method we change it to the following:
Now, if our Rep method returns null, we'll get a message box, and our label will show that the user in question does not exist!Code:private void btnGetRep_Click(object sender, EventArgs e) { if (txtUserName.Text.Length == 0) { MessageBox.Show("Please enter a user name!"); } else { var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text)); if (Rep(urlSource) != null) { lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource)); } else { lblUserRep.Text = String.Format("User {0} does not exist!", txtUserName.Text); MessageBox.Show("Invalid username!"); } } }
All done! You can use this method to do a lot of other types of web parsing as well.
This is almost the same method I use in the Account Check Aisle Four program.
Enjoy folks!
Edit: Source is below. (Written in Visual Studio 2008 Team Suite and .NET 3.5. If you have problems with it, too bad. I'm not re-doing it in another IDE or .NET version.)
MassMirror.com - Download MMOwnedRepParser.rar
Edit2: Since I know someone will come complaining, this tutorial does NOT touch on thread invoking to stop the GUI from freezing while the web request method is called. That's beyond the scope of this tutorial, and will be handled elsewhere, or google'd.