Valid url encode with googlebot
If you are use non alpha characters in URL, you should encode them. It is easy to do this in many program langauge. I use php to do this:
<?php
$url_arg = urlencode(‘一将功成万骨枯@www.xydw.com’);
?>
You will get a encoded url arguments: “%E4%B8%80%E5%B0%86%E5%8A%9F%E6%88%90%E4%B8%87%E9%AA%A8%E6%9E%AF%40www.xydw.com”. It is valid url in most of the browsers like IE, FF, Oprea. But if you are more insterest in SEO, you will think is it a valid url with googlebot. The answer is NO!
I desgin a non-english site recently. I used the encode url as arguments to get content. I test the site in IE, FF,Oprea. All things are running well. I make a sitemap and submit to google. Soon, I find errors with many 4xx urls in google webmaster tools. I check these urls, they seem vaild urlencode URL. I copy them to browser and try to visit them. It get 404 error too. I login to my site control panel and find these page, visit them. I am very surprise that I can visit them and the url in browser is same as I copy. Why?
I check these urls and codes very carefully. Then I find a very insterest thing that googlebot doing decode URLs is not same as PHP urldecode. Googlebot would think punctuations to be characters not encoded urls. Example:
PHP ENCODE URLS:
http://www.xydw.com/?googlebot=%E4%B8%80%E5%B0%86%E5%8A%9F%E6%88%90%E4%B8%87%E9%AA%A8%E6%9E%AF%40www.xydw.com
Googlebot will think it is:
http://www.xydw.com/?googlebot=%E4%B8%80%E5%B0%86%E5%8A%9F%E6%88%90%E4%B8%87%E9%AA%A8%E6%9E%AF@www.xydw.com
Finnaly, if you want to use encode url as arg in your program and valided with googlebot. You should encode them again in your program.
PHP Example:
<?php
$googlebot = urlencode(urldecode($_GET['googlebot']));
?>
You must keep my site link: http://www.xydw.com, If you copy my article.