Google’s new search engine (called “Caffeineâ€) is showing promise for some website owners. However, duplicate content and unoriginal page text are being filtered more rigorously by the Google algorithm. If your website pages aren’t being indexed by Google, it doesn’t mean the pages are unoriginal or duplicated. Instead, try a new Google tool labeled “Fetch as Googlebot†in your webmaster tools. Using this tool, you can uncover some coding and hosting server errors that may be causing Googlebot to disregard and de-index your pages. The advantage of this tool is the ability to view exactly what Googlebot retrieves when the search engine crawls your website.
Google’s Webmaster Tools
If your website isn’t registered with Google’s webmaster tools, you’re already behind competitors. These tools help you upload a sitemap, add your URL to Google’s index, and detect malware or compromised websites. Located in these tools is also a list of experiments opened to public beta testing. The beta testing tools are under the “Labs†section of your webmaster tools console. Underneath the list of lab tools is a link labeled “Fetch as Googlebot.†Enter a URL for your website, and the tool queries your page and displays results within a few minutes.
Identifying Web Page Issues and Crawl Errors
After several minutes, a “Success†or “Fail†status is shown next to the web page you entered. If no status is shown after several minutes, click the “Refresh†button on your browser’s toolbar.
If the page status shows a failure, you may have DNS server issues, or it can be poor coding. In some cases, your web host’s server can be the cause. If your web host cannot handle the traffic sent to the website, Googlebot also receives timeout errors. These problems trigger Google to remove the pages from the index. DNS servers are the machines that translate the friendly domain name to an official IP address. If DNS server entries are bad, Googlebot cannot locate the website address. DNS server issues and slow server performance are problems that need to be addressed with the host. In some cases, you may need to move your website to a more capable host provider.
Even if the page status is a success, you still can have coding errors that lead to rejected pages in the Google index. Click the “Success†link next to the web page URL. Clicking the link opens a new page filled with the code Googlebot sees when it crawls your web page. If you aren’t an HTML expert, it may be difficult to identify problems. However, simple coding errors can be identified by studying the page for common mistakes. For instance, if you have several hidden elements, unclosed HTML tags or duplicate title tags, then this could be the reason Googlebot is rejecting the page for the search engine index.
Identifying the cause for web page rejection from Google can take several days or attempts at fixing the problem. However, the “Fetch as Googlebot†tool helps website owners find mistakes more quickly. Take advantage of this tool if you find that your web pages aren’t being indexed by the search engine.