How to get and parse information from a web page. menu

User Tag List

Results 1 to 9 of 9
  1. #1
    Apoc's Avatar Angry Penguin
    Reputation
    1387
    Join Date
    Jan 2008
    Posts
    2,750
    Thanks G/R
    0/12
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to get and parse information from a web page.

    Well, since there has been a bit of an uproar lately about how exactly to get and parse information from web pages, I figured I'd write a small tutorial about how to do so. (Using MMOwned as a test dummy for the URL.)

    First thing we need to do, is create a new windows forms application. (This tutorial assumes it's named MMOwnedRepParser)

    First things first, we'll make it flashy.



    Very simple, right?

    Few things to keep in mind, to make it easier to explain.

    The form's name is "MainForm"
    The "Get Rep" button's name is "btnGetRep"
    The "No User Selected!" lable is "lblUserRep"
    The textbox name is "txtUserName"


    Now, first things first, we need to add in a simple HTTPGET method to return a web page's source.

    Create a new class named "Http" and change it to the following:

    Code:
    using System.IO;
    using System.Net;
    using System.Text;
    
    namespace MMOwnedRepParser
    {
        public class Http
        {
            private static HttpWebResponse HttpWResponse;
    
            public static string GetHTTP(string url)
            {
                // Send a request to the URL provided when the method was called.
                var HttpWRequest = (HttpWebRequest)WebRequest.Create(url);
    
                // Set some specific things needed for certain web pages to be viewed.
                HttpWRequest.Credentials = CredentialCache.DefaultCredentials;
                HttpWRequest.UserAgent = "MMOwned Wins Hard";
                HttpWRequest.KeepAlive = true;
                HttpWRequest.Headers.Set("Pragma", "no-cache");
                HttpWRequest.Timeout = 300000;
    
                // We are only GETting the page information. We are not passing it any.
                HttpWRequest.Method = "GET";
    
                // This is in a try/catch block due to some pages going offline. 
                // (If we didn't catch the error, we would crash the app)
                try
                {
                    // Get the response we sent with the HttpWRequest from above.
                    HttpWResponse = (HttpWebResponse)HttpWRequest.GetResponse();
    
                    // Read the page we got from the response, and pass it out as our return statement.
                    var sr = new StreamReader(HttpWResponse.GetResponseStream(), Encoding.ASCII);
                    var s = sr.ReadToEnd();
    
                    // Make sure we close our reader, or we end up with some nasty bugs.
                    sr.Close();
                    return s;
                }
                catch (WebException)
                {
                    // The page could not be viewed. So we return an ERROR string instead.
                    return "ERROR";
                }
            }
        }
    }
    The code itself is documented fairly well, so I won't bother explaining it.

    Just keep in mind, this is a very simple httpGET method. It does not handle POST http methods.

    Now that we have our way to grab the page information, let's create a way to find the rep using the user profile page of MMOwned.

    First things first we need to see what type of page source is generated. Using my own profile view (by clicking on my name, not by going to "User CP") Right click and select "View Page Source" (Might be different in other browsers, you want to view the source of the page.)

    Now we need to find where reputation is displayed. (Luckily, this page is mostly static, so the position of what we want is always in the same place.)

    The bit of information we want to find is the following:

    Code:
    <span class="smallfont" style="float:right">
    
    				202 point(s) total
    				&nbsp; &nbsp;
    				<a href="/forums/members/apoc.html#top" onclick="return toggle_collapse('profile_reputation')">
    All we really want is the "202 point(s) total" since we just want to see how much rep a given person has.

    Now we're going to use a bit of regex (regular expressions) to find that single line so we can use it.

    Create a new method. (I created it right in the MainForm code file. Just double click the form to open it, or select it from the solution browser.)

    We first need to add the following using directive:

    Code:
    using System.Text.RegularExpressions;
    This will allow us to use Regex.

    Now we add the following method to parse the page we received and get our reputation points.

    Code:
    private string Rep(string toSearch)
            {
                var rx = new Regex(@"d*spoint(s)stotal");
                return rx.Match(toSearch).ToString();
            }
    The "var rx" is initializing a new instance of Regex using the supplied regex string. It will return a match of "<any number of digits> point(s) total" if it finds it.

    Then we just return the match from our page string we will be passing to it in a minute.

    Now, to make this do anything, we need to add some code for the button itself.

    Back in the designer view for the form, double click the button, to bring up the OnClick event handler. (Visual Studio does this automatically for you when you double click)

    Code:
            private void btnGetRep_Click(object sender, EventArgs e)
            {
    
            }
    So now we need to add in our code, first, let's make sure we have something typed in the text box for the user name.

    Code:
            private void btnGetRep_Click(object sender, EventArgs e)
            {
                if (txtUserName.Text.Length == 0)
                {
                    MessageBox.Show("Please enter a user name!");
                }
            }
    Pretty simple right?

    Now let's actually make this thing work!

    Update the method as follows:

    Code:
            private void btnGetRep_Click(object sender, EventArgs e)
            {
                if (txtUserName.Text.Length == 0)
                {
                    MessageBox.Show("Please enter a user name!");
                }
                else
                {
                    var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text));
                    lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource));
                }
            }
    I split this up to make it easier to read. The first part of the else statement, grabs our url source, and stores it in a string variable. (The compiler will use the implicitly typed "var" as a string by itself.)

    Next we update our lblUserRep to show the name we entered in the username box, and the Rep that we parsed using our Rep method from earlier.

    Now, here's a little bit more on this tutorial, what if we searched for a member who doesn't exist?

    Our label ends up saying "<SomeUser> has" with no rep. Well, we can make it look a bit prettier and easier by making the following changes:

    In the Rep method:

    Code:
            private string Rep(string toSearch)
            {
                var rx = new Regex(@"d*spoint(s)stotal");
                return rx.Match(toSearch).Success ? rx.Match(toSearch).ToString() : null;
            }
    Now we've changed the return statement to something you may not understand easily. In short, if the regex match was successful, return the matching string, if not, return null.

    Now in our button method we change it to the following:

    Code:
            private void btnGetRep_Click(object sender, EventArgs e)
            {
                if (txtUserName.Text.Length == 0)
                {
                    MessageBox.Show("Please enter a user name!");
                }
                else
                {
                    var urlSource = Http.GetHTTP(String.Format("http://www.mmowned.com/forums/members/{0}.html", txtUserName.Text));
                    if (Rep(urlSource) != null)
                    {
                        lblUserRep.Text = String.Format("{0} has {1}", txtUserName.Text, Rep(urlSource));
                    }
                    else
                    {
                        lblUserRep.Text = String.Format("User {0} does not exist!", txtUserName.Text);
                        MessageBox.Show("Invalid username!");
                    }
                }
            }
    Now, if our Rep method returns null, we'll get a message box, and our label will show that the user in question does not exist!

    All done! You can use this method to do a lot of other types of web parsing as well.

    This is almost the same method I use in the Account Check Aisle Four program.

    Enjoy folks!

    Edit: Source is below. (Written in Visual Studio 2008 Team Suite and .NET 3.5. If you have problems with it, too bad. I'm not re-doing it in another IDE or .NET version.)
    MassMirror.com - Download MMOwnedRepParser.rar


    Edit2: Since I know someone will come complaining, this tutorial does NOT touch on thread invoking to stop the GUI from freezing while the web request method is called. That's beyond the scope of this tutorial, and will be handled elsewhere, or google'd.

    Last edited by Apoc; 04-14-2008 at 02:57 PM.

    How to get and parse information from a web page.
  2. #2
    2dgreengiant's Avatar ★ Elder ★


    Reputation
    1190
    Join Date
    Feb 2007
    Posts
    7,129
    Thanks G/R
    1/1
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    gawd just what i needed :P
    If you need me you have my skype, if you don't have my skype then you don't need me.

  3. #3
    -Lex's Avatar Banned
    Reputation
    88
    Join Date
    Jun 2007
    Posts
    946
    Thanks G/R
    0/0
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    cool ........

  4. #4
    Yeti's Avatar Banned
    Reputation
    181
    Join Date
    Feb 2008
    Posts
    624
    Thanks G/R
    0/0
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    apoc this is awesome!
    thank you for commenting the code too!

  5. #5
    slack7219's Avatar Member
    Reputation
    4
    Join Date
    Feb 2008
    Posts
    12
    Thanks G/R
    0/0
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    you don't invoke a thread,you just run that method on a different thread,if you have something to modify to the controls on the form you use a delegate and control.invoke it.

  6. #6
    Garosie's Avatar Active Member
    Reputation
    18
    Join Date
    Jul 2007
    Posts
    68
    Thanks G/R
    0/0
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Awesome

  7. #7
    Apoc's Avatar Angry Penguin
    Reputation
    1387
    Join Date
    Jan 2008
    Posts
    2,750
    Thanks G/R
    0/12
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Originally Posted by slack7219 View Post
    you don't invoke a thread,you just run that method on a different thread,if you have something to modify to the controls on the form you use a delegate and control.invoke it.
    That's invoking a thread. -_-

    And I'm fully aware of how to use cross thread calls. (I usually use AsyncCallback for these types of things) But hey, whatever floats your boat.

  8. #8
    slack7219's Avatar Member
    Reputation
    4
    Join Date
    Feb 2008
    Posts
    12
    Thanks G/R
    0/0
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    you may use an async call but there may be cases where you would like your thread to wait for those updates to the ui thread for god knows whatever reason.on a side note, the BackgroundWorker class is a nice little helper that could be used here ,it does the same thing as a normal thread would but with some bonuses

  9. #9
    Apoc's Avatar Angry Penguin
    Reputation
    1387
    Join Date
    Jan 2008
    Posts
    2,750
    Thanks G/R
    0/12
    Trade Feedback
    0 (0%)
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Originally Posted by slack7219 View Post
    you may use an async call but there may be cases where you would like your thread to wait for those updates to the ui thread for god knows whatever reason.on a side note, the BackgroundWorker class is a nice little helper that could be used here ,it does the same thing as a normal thread would but with some bonuses
    And drawbacks on high priority asynch threads. (Which a BackgroundWorker is not)

    If you want to talk threading, please make a new thread in these forums. (Please excuse the pun)

Similar Threads

  1. How to get under Northrend (starting from Menethil Harbor)
    By storm4 in forum World of Warcraft Exploits
    Replies: 4
    Last Post: 07-19-2009, 12:29 AM
  2. Replies: 0
    Last Post: 05-19-2009, 03:51 AM
  3. How to get rid of Newbies from BG
    By Cofoxis in forum World of Warcraft Guides
    Replies: 3
    Last Post: 01-01-2008, 04:55 AM
  4. how to get you account back from scamers
    By EliMob441 in forum World of Warcraft Guides
    Replies: 23
    Last Post: 06-13-2007, 11:30 AM
  5. Hunter: How To Get 35 Happines/sec from lvl 0 food
    By Hounro in forum World of Warcraft Guides
    Replies: 8
    Last Post: 12-01-2006, 10:34 AM
All times are GMT -5. The time now is 09:23 AM. Powered by vBulletin® Version 4.2.3
Copyright © 2024 vBulletin Solutions, Inc. All rights reserved. User Alert System provided by Advanced User Tagging (Pro) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
Digital Point modules: Sphinx-based search