Xydw Dot Com

Valid url encode with googlebot

Posted on August 8th, 2009 luckymouse No comments

If you are use non alpha characters in URL, you should encode them. It is easy to do this in many program langauge.  I use php to do this:

<?php

$url_arg = urlencode(‘一将功成万骨枯@www.xydw.com’);

?>

You will get a encoded url arguments:  “%E4%B8%80%E5%B0%86%E5%8A%9F%E6%88%90%E4%B8%87%E9%AA%A8%E6%9E%AF%40www.xydw.com”. It is valid url in most of the browsers like IE, FF, Oprea. But if you are more insterest in SEO, you will think is it a valid url with googlebot. The answer is NO!

I desgin a non-english site recently. I used the encode url as arguments to get content. I test the site in IE, FF,Oprea. All things are running well. I make a sitemap and submit to google. Soon, I find errors with many 4xx urls in google webmaster tools. I check these urls, they seem vaild urlencode URL. I copy them to browser and try to visit them. It get 404 error too. I login to my site control panel and find these page, visit them. I am very surprise that I can visit them and the url in browser is same as I copy. Why?

I check these urls and codes very carefully. Then I find a very insterest thing that googlebot doing decode URLs is not same as PHP urldecode. Googlebot would think punctuations to be characters not encoded urls. Example:

PHP ENCODE URLS:

http://www.xydw.com/?googlebot=%E4%B8%80%E5%B0%86%E5%8A%9F%E6%88%90%E4%B8%87%E9%AA%A8%E6%9E%AF%40www.xydw.com

Googlebot will think it is:

http://www.xydw.com/?googlebot=%E4%B8%80%E5%B0%86%E5%8A%9F%E6%88%90%E4%B8%87%E9%AA%A8%E6%9E%AF@www.xydw.com

Finnaly, if you want to use encode url as arg in your program and valided with googlebot. You should encode them again in your program.


PHP Example:

<?php

$googlebot = urlencode(urldecode($_GET['googlebot']));

?>


You must keep my site link: http://www.xydw.com,  If you copy my article.

This entry was posted on Saturday, August 8th, 2009 at 10:29 am and is filed under Web Design & SEO. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

 

Leave a Reply

You must be logged in to post a comment.