Prevent Staging Site From Being Listed in Google

Technical SEO Jul 6, 2017

 

Generally, the staging server should only be viewed by the developers and the client. Unfortunately, that doesn’t always happen because the staging site wasn’t properly blocking Google, resulting in the site being listed in Google.

Staging site accidentally gets listed in Google’s search results. This is very bad for many reasons.

  • Staging websites contain unfinished designs and incomplete content.
  • Fear of being penalized for duplicate content once the production site goes live.
  • It’s really embarrassing when the client Googles themselves and sees the staging site in the results.
  • Public access to these staging websites could even damage a business if it leads to premature exposure of a new campaign or business decision.

Many people think blocking Google from indexing your site means you won’t be listed in their search results. So, they end up using robots.txt to block Google from indexing the staging site. That is very wrong. Here’s why:

Being indexed means that your site or web pages have been downloaded and added to a server.

Being listed means that Google is displaying a link to your site or page in their search results. So, blocking Google from indexing your site doesn’t mean they still won’t list your site in their search results. It just means they won’t keep a copy of it on their server.

How To Protect Your Staging Sites

Protecting staging environments is pretty simple, so there really isn’t an excuse to get it wrong.

Step 1: Restricting crawling via robots.txt

All your staging environments should block search engine access in their robots.txt file. If you don’t know how to stop Google from indexing your site in robots.txt, just add this to a robots.txt file, and place it in the root folder of your website.

In the robots.txt file simply add these two lines:

User-agent: *

Disallow: /

NoIndex: /

This directive means Google will not only be unable to access the site, it’ll also not be allowed to include any page in its index – even if someone else links to it.

This is sufficient to prevent Google and Bing bots from indexing your website and appearing in search results.

Step 2: .htaccess login

The next step and the most effective way to stop your staging site from being listed in Google’s search results is to require a username and password to enter the site using .htaccess. This will stop Google from accessing and listing your staging site in their search results.

In the staging site’s .htaccess file put the following text:
AuthType Basic
AuthName “Protected Area”
AuthUserFile /path/to/.htpasswd
Require valid-user

This will make your staging environment much more secure and will prevent unauthorized access.

Step 3: IP address restriction

You can restrict access to your staging sites to specific IP addresses. By limiting access to the staging sites to users coming from your own internal office network and the client’s network, you can really nail the security down and make it more secure.

Simply add the following text to the staging site .htaccess file for allowed IP addresses :

order allow,deny
allow from 123.456.789.012
deny from all

This directive means that your webserver will allow access to the site for the specified IP addresses and deny access to everyone else.

Conclusion

If you follow above steps your staging site won’t get listed in Google’s search results. Additionally, it would be wise to use something like Google’s Webmaster Tools. With it, you can test your site’s crawler settings against individual pages.

Be pro-active in securing your clients’ information, and block access to your staging sites for everyone except those who need it.

 

Leave a Reply

Your email address will not be published. Required fields are marked *