Loading ...

C# Parse Meta Tags | HTML Meta Tag Parser | CodeAsp.Net

C# Parse Meta Tags

 /5
0 (0votes)

You may come across an instance in your C# and ASP.NET programming where you need to download an external webpage and parse the meta tags... specifically, the "Title," "Meta Description," and "Meta Keywords."

The method below will show you how to:

  • download an external webpage
  • parse the meta title
  • parse the meta description
  • parse the meta keywords

The parsing is done using regular expressions.

NOTE: This may not be the best way of doing this, but it is a solution that you can use.

using System;
using System.Collections.Generic;
using System.Text;
using System.Net;
using System.Text.RegularExpressions;
using System.IO;

namespace Tim.Examples.Classes
{
    public class WebMetaData
    {
        public string metaTitle;
        public string metaDescription;
        public string metaKeywords;

        public bool GetMetaTags(string url)
        {
            try{
                //get the HTML of the given page and put into a string
                string html = AcquireHTML(url);

                if (GetMeta(html))
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
            catch(Exception ex)
            {
                // do something with the error
                return false;
            }
        }

        private string AcquireHTML(string address)
        {
            HttpWebRequest request;
            HttpWebResponse response = null;
            StreamReader reader;
            StringBuilder sbSource;

            try
            {
                // Create and initialize the web request  
                request = System.Net.WebRequest.Create(address) as HttpWebRequest;
                request.UserAgent = "your-search-bot";
                request.KeepAlive = false;
                request.Timeout = 10 * 1000;

                // Get response  
                response = request.GetResponse() as HttpWebResponse;

                if (request.HaveResponse == true && response != null)
                {
                    // Get the response stream  
                    reader = new StreamReader(response.GetResponseStream());

                    // Read it into a StringBuilder  
                    sbSource = new StringBuilder(reader.ReadToEnd());

                    response.Close();

                    // Console application output  
                    return sbSource.ToString();
                }
                else
                    return "";
            }
            catch (Exception ex)
            {
                response.Close();
                return "";
            }
        }

        private bool GetMeta(string strIn)
        {
            try
            {
                // --- Parse the title
                Match TitleMatch = Regex.Match(strIn, "<title>([^<]*)</title>, RegexOptions.IgnoreCase | RegexOptions.Multiline);
                metaTitle = TitleMatch.Groups[1].Value;

                // --- Parse the meta keywords
                Match KeywordMatch = Regex.Match(strIn, "<meta name=\"keywords\" content=\"([^<]*)\">", RegexOptions.IgnoreCase | RegexOptions.Multiline);
                metaKeywords = KeywordMatch.Groups[1].Value;

                // --- Parse the meta description
                Match DescriptionMatch = Regex.Match(strIn, "<meta name=\"description\" content=\"([^<]*)\">", RegexOptions.IgnoreCase | RegexOptions.Multiline);
                metaDescription = DescriptionMatch.Groups[1].Value;

                return true;
            }
            catch (Exception ex)
            {
                // do something with the error
                return false;
            }
        }

    }
}



If you know of a better or more efficient way, please share by commenting below.

Thanks!

Comments (9)

   
vivek_iit
wow, this is a very useful example!
3/19/2009
 · 
by
   
Anonymous
You do realize this does not work.....
4/2/2009
 · 
by
   
teisenhauer
Hmmm...ya know, it's possible that it doesnt.  But, I am using it in a number of small apps that I've built.  Is there some but somewhere?
4/3/2009
 · 
by
   
Anonymous
Doesnt work. There is nothing in the pattern field of the regex, so how can it work??
4/25/2009
 · 
by
   
teisenhauer
The WYSIWYG editor removed the regex patterns...  I HTML encoded them and re-added them.  Now it should work.  Sorry for the issues.  Many thanks!
5/11/2009
 · 
by
   
vlado
vlado
IT still doesnt work ... could you please help me out
5/19/2010
 · 
by
   
john
woow, thats wonderfull its better than with webbrowser control.
thanks a lot
9/26/2011
 · 
by
   
Jake
Jake
Adding  either a string[] or list<string> output of the keywords split by ',' with a trim() would help.

Still works great, thanks for the code
4/30/2012
 · 
by
   
Nishant Dave
Really good post
6/12/2012
 · 
by
  • :*
  • :*
  • :
 *

Top Posts