Why your website needs a sitemap and how you should build it.

by Joshua Odmark

First and foremost, there is a website that describes in thorough detail how to build a sitemap. It is located at http://sitemaps.org/.

The purpose of this blog is to shrink down their “protocol” aka, instructions, to a short and sweet version with commentary.

These instructions are intended for sites with less than 50,000 web pages.

You will be creating and/or editing three files.

  1. sitemap_index.xml
  2. sitemap.xml
  3. robots.txt

All of these files will be located in your website’s root directory. This is generally the same place that all of your public Internet files are located (commonly referred to as ‘public_html’).

First, we will create our sitemap_index.xml file.

The proper formatting for this file is as follows:

<sitemapindex xsi:schemaLocation=”http://www.sitemaps.org/schemas/sitemap/0.9  http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd”>
<sitemap>
<loc>http://www.your-website-domain.com/sitemap.xml</loc>
<lastmod>2009-02-23</lastmod>
</sitemap>
</sitemapindex>

The above example is referred to as a Sitemap Index. The purpose of this is to allow you to include multiple sitemaps for your website. This is useful for websites who would like to utilize multiple sitemaps, such as a video sitemap, or any other type of sitemap.

Open up your favorite html editor and paste the above into the editor (I prefer Dreamweaver PC & MAC and TextMate MAC). Make sure to change “www.your-website-domain.com” to your own website domain.

Once you have done this, save the file as “sitemap_index.xml“.

Now you are ready to create your sitemap.xml file.

The formatting for this file is as follows:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/0.9″>
<url>
<loc>http://www.your-website-domain.com/</loc>
<lastmod>2009-01-01</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.your-website-domain.com/contact.php</loc>
<lastmod>2009-01-01</lastmod>
<changefreq>daily</changefreq>
<priority>0.6</priority>
</url>
</urlset>

You will notice that there are two <url></url> values in the above code. This indicates there are two URL’s in this sitemap. One has a priority of .8, and the other has a priority of .6. The higher the priority the more importance the search engine will put into the URL (it is rumored and common belief in the industry that this has no affect on SERPS).

Duplicate the values between the <url></url> tags (changing the appropriate information) until all of your website pages have been included in this file. This can also be auto-generated via a programming script.

Save this file as “sitemap.xml“.

Now create a “robots.txt” file, which is a simple textfile.

You can use notepad (PC) or TextEdit (MAC) for this.

If your website already has a robots.txt, than you will simply need to add the following line to your file.

Sitemap: http://www.your-website-domain.com/sitemap_index.xml

Not sure if you already have a robots.txt file? Type this into your website browser: http://www.your-website-domain.com/robots.txt

If you do not receive a 404 error from your website, than you do have a robots.txt file.

The above code tells the search engines where your sitemap index file is located. From there, they will pull all of your sitemaps in one shot.

The last step is an optional step for those who are familiar with file compression.

Search engines allow you to compress your large sitemap files with GZIP. The command to gzip is simple, in terminal (MAC), type “gzip sitemap.xml” from the directory the sitemap is in. For windows users here is GZIP.

This will create sitemap.xml.gz, a compressed version of your sitemap. You must change the sitemap_index.xml file to the new sitemap file name. This change will be from “sitemap.xml” to “sitemap.xml.gz”.

The obvious benefit to this is less bandwidth usage when search engine spiders download your sitemap. For one of my websites, I compressed a 7MB file down to 400KB using GZIP.

Once you have done this, upload your files to the “public_html” folder, and you are all done!

To verify that they are there, you may check via a website browser for these files:

http://www.your-website-domain.com/robots.txt
http://www.your-website-domain.com/sitemap_index.xml
http://www.your-website-domain.com/sitemap.xml

Add to Technorati Favorites

One Response to “Why your website needs a sitemap and how you should build it.”

  1. Hello. I think the article is really interesting. I am even interested in reading more. How soon will you update your blog?

Leave a Reply