Steve Smith, host of your TQA Weekly, explains the process in creating a full web-site sitemap, complete with sitemap index, and robots.txt integration.
Episode #2-48 released on August 26, 2012
Web-sites are more than just the pages we see as we navigate, but in order to be found the path must be clear, the navigation impeccable, and everything easily found. This is easy to develop for human traffic, but what about those robots, you know bots that scan our web-sites to index the content in the many search engines, not just Google. We'll, today, another hard coder episode and a new way to get your web-site and it's many pages found.
What you need to remember is that a web-site is like a cake. It has many layers, and you should treat it as such. In a day where we can automatically generate content from a database, and the pages we use are basically empty until a specific article is selected, making your pages search-able can be difficult. What if your articles have their own pages, but the pages used to serve the links didn't, that could make parts of your web-site unsearchable.
There are two tools you could use together to make all your pages search-able, and better yet, it requires little effort from your side. These tools are the sitemap, and robots.txt file. You use the robots.txt file to allow full or limited indexation and link to your sitemap master file which links to all the folder specific site maps.
Before I show you how to program your own sitemaps, I need to explain something to everyone. I see this error done over and over again. Even though most search engines will understand the sitemap if all the files are linked in the same file, the correct method of making a sitemap is folder by folder. For each sub-folder you need a new sitemap file, even within a folder with a folder instance. For this reason, the master sitemap file exists. It is programmed in XML so anyone who knows HTML will be comfortable with this.
We will commence with the folder specific sitemap first, then create a full web-site sitemap index, then learn how to make a basic robots.txt file with sitemap integration. This way we work from the least important skill, to the most important skill, and it does get easier in this fashion.
Much like HTML you need to open the XML file responsible for the in folder sitemap with the urlset and the xmlns value of http://www.sitemaps.org/schemas/sitemap/0.9. You will close the urlset the same way you do the HTML tag in an HTML web-page, only at the end. Now within each this file you place every page within this directory you want search-able easily. You open a URL tag, open a LOC tag, enter the URL to the files in question, only one at a time. Close the LOC tag. Open a CHANGEFREQ tag, enter the change frequency value such as daily, weekly, monthly, yearly, or never. Close the CHANGEFREQ tag. Open a PRIORITY tag, which is used to help search engines know the value of each page. The maximum value is 1.0, but do not value everything as 1.0 otherwise it will simply be ignored. Homepages, landing pages, etc... have higher values, articles may, also, have a higher value, but the highest like 0.8, resource pages, about pages, etc... may have lower values, of course, this all depends on the actual content of your web-site. If it is not important to your web-site, rate it lower. Close the PRIORITY tag. Then close the URL tag, then restart the process for each file within the directory. Save the file with an XML extension. Make an XML file like this for every directory you want publicly search-able. Below is an example. When the file is completed, do not forget to close the URLSET tag.
Our next creation will be the sitemap index. It's job is to index all the sitemaps of the web-site and make the job of bots easier for indexing the related content.
<?xml version="1.0" encoding="UTF-8"?>
This time you open with the SITEMAPINDEX tag, and copy it from the show notes, it will be easier. You should add the proceeding XML version tag to make it easier to read for bots. Then for each sitemap you created you open a SITEMAP tag, open a LOC tag, enter the URL of a sitemap in a folder, then close the LOC tag, Close the SITEMAP tag. Once you have entered all the sitemaps in your index, you close the SITEMAPINDEX tag. You now have a fully functioning sitemap, of course there are more options you can get by following the sources in my show notes.
The question is though, how is this made useable. It is not because you have this that your web-site is search-able, it is merely a step. You can go to Google, Bing, etc.. and enter manually in the webmaster account the location of the sitemap index file. And, you can make a robots.txt file. A robots.txt file isn't hard to make, in fact, it is a lot simpler than making sitemaps.
A robots.txt file contains a user-agent field, an allow field, and, in our case, a sitemap field. Like this example below.
Enter User-agent: * on one line, then Allow: / on another line, then finally, sitemap: url to sitemapindex, save as robots.txt, store in the root of your web-site, and that's it. You submit the sitemapindex to any search engine manually you want to be entered in, and others that find your page scan for the robots.txt file and index your web-site automatically, you now have the best of both worlds. Now, work on the content, and keep the sitemaps up to date.
Next week, I'll teach you how to make your own RSS feed for your articles, and some best practices, as well.
Remember to like this episode if you were interested in today's topic, share if you think someone else could benefit from the topic, and subscribe if you want to learn more. For the show notes of this episode and others, for more information on other ways to subscribe to our show, to subscribe to our weekly newsletter, and how to participate by submitting your questions, comments, suggestions, and stories, head over to TQAWeekly.com.
Host : Steve Smith | Music : Jonny Lee Hart | Editor : Steve Smith | Producer : Zed Axis Productions