Prevent Staging Site From Being Listed in Google

 

Generally, the staging server should only be viewed by the developers and the client. Unfortunately, that doesn’t always happen because the staging site wasn’t properly blocking Google, resulting in the site being listed in Google.

Staging site accidentally gets listed in Google’s search results. This is very bad for many reasons.

  • Staging websites contain unfinished designs and incomplete content.
  • Fear of being penalized for duplicate content once the production site goes live.
  • It’s really embarrassing when the client Googles themselves and sees the staging site in the results.
  • Public access to these staging websites could even damage a business if it leads to premature exposure of a new campaign or business decision.

Many people think blocking Google from indexing your site means you won’t be listed in their search results. So, they end up using robots.txt to block Google from indexing the staging site. That is very wrong. Here’s why:

Being indexed means that your site or web pages have been downloaded and added to a server.

Being listed means that Google is displaying a link to your site or page in their search results. So, blocking Google from indexing your site doesn’t mean they still won’t list your site in their search results. It just means they won’t keep a copy of it on their server.

How To Protect Your Staging Sites

Protecting staging environments is pretty simple, so there really isn’t an excuse to get it wrong.

Step 1: Restricting crawling via robots.txt

All your staging environments should block search engine access in their robots.txt file. If you don’t know how to stop Google from indexing your site in robots.txt, just add this to a robots.txt file, and place it in the root folder of your website.

In the robots.txt file simply add these two lines:

User-agent: *

Disallow: /

NoIndex: /

This directive means Google will not only be unable to access the site, it’ll also not be allowed to include any page in its index – even if someone else links to it.

This is sufficient to prevent Google and Bing bots from indexing your website and appearing in search results.

Read More

Getting Dropped By Google? Do Something About It.

Ever wonder why some people seem to be immune to Google fluctuations? It’s because they’re not overly optimized on a small set of words. Instead, they’re diversely optimized. We’ve seen this effect on Connors Clients for over 6 years now. The rest of the world is only just starting to discover that a diversity of carefully targeted content effectively hedges your bets against Google algorithm tweaks. Here’s one person who saw a 40% GROWTH in traffic while a lot of his counterparts are falling out. Guess what he attributes his success to.

Yahoo SEO

So, how quickly do the search engines respond to a deluge of in-bound non-reciprocal links and blog-posts being created on a semi-popular keyword? Since we got Netscape’d the other day (sort of like getting SlashDotted), A LOT more eyes have been on HitTail. User registrations are doubling every week, and we’re considering cutting off new beta tester sign-ups. So if you’re interested, sign up now.

Anyway, a few days ago I was happy that the HitTail site was coming up in position #140 in Google on the term “long tail” (with the space, but without double quotes). Today, it’s in position #29. That’s a stellar jump, putting it onto the third page of results (for now). This is consistent with the Google patent information of last month, stating how Google is sensitive to the RATE AT WHICH new in-bound links are established. This can also account for the rolling window of opportunity that newly discovered content often experiences. The rate of new in-bound links decreases as it becomes old news, and so relevancy and position in search results follow.

So, everyone please tell your bestest buddy about HitTailing, least someone else (maybe a competitor) fills the limited beta tester slots.

So, what about MSN? We’re in position #50. That’s the bottom of the fifth page of results. Not the first three pages, but it’s a start.

Yahoo? Nowhere to be found (yet).

Anyway, with the exception of Yahoo, I think we can infer that some clue that has been left about the Internet recently, be it the pure Web content, or the new rash of inbound links, or all the blog pings, Google and MSN are quick to promote a site. Yahoo, while the pages are indexed and the site is “known”, it has not received a similar relevancy boost from the meme-chasers.

That doesn’t necessarily make Yahoo better or worse than the other two engines–only different. It takes longer to do well with Yahoo with long tail writing AND sudden linking. The clues that that Yahoo follows to rank sites in the short-term is simply more subtle. The pages are in there. They’re just not ranking well yet. I’ll keep you posted here as we have developing evidence of the effectiveness of HitTailing with Yahoo. But for those with little patience, only expect to see changes during that magical 2-week cycle in MSN and Google.

How Quick Into Google Search Results?

So, how quickly will a brand new domain show up on Google? I should have been checking day-to-day, but today is January 31, and the site was technically launched on January 16th. That’s just over 2-weeks, the period of time traditionally quoted as the minimum for a Google sweep. So, now’s a good time to do a quick review. Thankfully, the domain name is a totally made-up name, and I can do some very insightful tests here.

I decide to search blogsearch.blogger.com first to ensure that at least the blog posts are in the specialized blog-content engine. It produces 28 results, including the first test post made on HitTail.blogspot.com (now defunct). Every post was made since January 25. So, in one week, every blog post has been included in the blog search.

Next, I search on HitTail in the default search, and I see one result. It’s the domain with no title or description. This is what’s often referred to as the Google sandbox. We can see that Google is aware that the domain exists, but is not producing any of the site’s content in the results. We see in the spiderspotter app that the first visit by GoogleBot was January 25th, the same day I started blogging.

From the 25th to today is exactly 1 week (7 days). In seven days, we have gone from an a previously unknown site to the domain being findable, but collapsed down to 1 page, and no actual page-content in the results. How recent is it? A quick search on a couple of different Google datacenters reveals that even just this 1-page listing is only on a couple of datacenters, and non-existent in others. So I am indeed catching it during the process of propagation, and we have our undisputed evidence that a site can go from zero to listed in some form in 1 week. Have I avoided the Google sandbox penalty altogether?

And finally, we check for specific quoted content from the first blog post. I know it won’t show, but I’m at least doing the text for the sake of completeness. So, it’s 1-week to show up at all. And it’s sometime longer before content appears. After content appears, the results tend to “dance around”, nicknamed the “Google Dance” before the data has propagated across all data centers.

Another factor affecting the results settling down is something people don’t talk about much. The Google patent from March of last year revealed the fact that Google is very sensitive to the amount that a site has changed from one visit to the next. That is to say, how much of the site has changed? How many new links have been established to the site? And when a site is brand new, every few pages you add constitute a significant percentage of the overall site. So, Google is seeing a very volatile site. And the results are correspondingly volatile. Therefore, when a page is first discovered, it goes into what I think of as a moving window of opportunity. I believe they get this extra relevancy boost to see if they have the potential for gangbusters fad-like success.

Fad-like success? Fads, I believe, overrule traditional rules of slow organic growth. These are pages that somehow become massively popular and everyone starts linking to, passing around in email, finding due to events in the news, etc. If a page does suddenly become massively popular, Google sees this, because they’re quietly recording click-through data, similarly to how DirectHit did back in the day. But DirectHit’s system, subsequently merged with Ask Jeeves, was ultimately defeated, because by touting that they were doing this, they invited click-stuffing abuse. Google on the other hand not only doesn’t advertise click-through tracking, but they use very clever JavaScript to keep it from even look like it’s occurring. It’s not evil. It’s just smart. And if a site goes gangbusters, there is a totally organic pattern created that is difficult to fake, because there are hundreds of links from non-related sites, and thousands of click-through from disparate IPs that couldn’t possibly be under one person’s control. This fad traffic pattern then “buoys” that page’s relevancy in future searches. This is just speculation based on observations, and only stands to reason that certain relevancy criteria can outweigh each other criteria, that criteria is both particularly difficult to fake and out of balance to the others.

Anyway, what are my conclusions? This test proves…

  • How long does it take to go from zero-to-being in Google results at all? 1-week.
  • How long does it take to go from zero-to-being in Google results in a meaningful way? Verdict not in, but expected soon. Stay tuned.
  • How long does it take to go from zero-to-being in Google resuts in a stabilized decent and decent fashion necessary to drive sales? Will not know for three to six months
  • How long does it take to go from zeron to being viewes as a healthy, growing site worthy of regular, predictable inclusion of new content? Well, that’s the purpose of HitTail!

Benchmark Keywords Spanning Many Years

This post is about keyword benchmarking for search optimization. After a recent update to the Google search results, nicknamed “Jagger” last November, my personal domain dropped off the first page of results for my name, “Mike Levin”. The photographer of the same name maintained his position #1. My site went four pages in, but a bunch of other pages that also referred to me moved onto the first page of results, including my Blogger profile, the profile on my employer’s site, and my SearchEngineForum staff profile. In other words, I went from holding position #2 with my personal domain that has been around for a very long time, to holding 3 lower positions on the main homepage, but loosing my personal one.

Admittedly, I haven’t kept my personal site very updated and the sites linking to it might be someone dubious, since I’ve been in the SEO circles since day-one. SearchEngineForums is not only one of the oldest search-oriented forums on the Web, but it’s one of the oldest Web forums, period. It was started by Jim Wilson, who has since passed. Webmaster World has mostly taken its place in spirit. Over the years, I’ve worked as something of an intrapreneur (rather than an entrepreneur) at companies like Prophet 21 (now bought by Activant), and Scala Multimedia. There have been certain benchmarks over the years that have helped me gauge what was going on with search.

Searching on my own name was, of course, one of them. And the Google Jagger update was significant in that newer sites, but not too new, suddenly had an edge over long-standing sites, which you might call stale. But another benchmark I occasionally monitor is the term “distribution software”. It was relatively easy to conquer across all the engines of the time, and has sustained itself remarkably over time. So, it was with great interest that I watched when the new purchasers of Prophet 21, and the awesome 3-letter domain p21.com, forwarded the Prophet site to an Activant third-level domain. I don’t think the third-level domain had been around for very long, but the Activant site had. So, would it incur the sandbox penalty? Would it maintain its across-the-board top positions? Was Activant unwittingly walking away from one of its potentially most valuable acquisitions and assets?

The answer is that the Google juice transferred over from www.p21.com to distribution.activant.com very smoothly, at least for the benchmark keyword that I still monitor. The sandbox penalty had been evaded by using a sub-domain of a long-standing second-level domain. If you search on distribution software on Google, Yahoo, MSN and Ask Jeeves, you will find Activant as the VERY TOP result in 3 of the 4 sites, and #3 in Ask Jeeves (which still shows the old domain).

This tells us several lessons. Across-the-board fortified results of the sort I achieved (with help from a fellow named Steve Elsner) are transferable. The transfer can occur in a relatively short period of time (a matter of months). A sub-domain can quickly acquire a great deal of clout—probably more quickly than a newly registered domain, given the new Jagger reality. And when I left P21 back in 1999, I left the Web pieces in some very good hands, and someone at Activant took a gamble that paid off and gave me some important SEO lessons for the SEO landscape as it exists at this particular instant.

Over time, a great deal of evidence mounts up that such-and-such a site is relevant on such-and-such a topic. These breadcrumb trails (mostly link topology) point back to hardwired domain names. So, changing a domain name is serious business.

I have another situation similar to the above one, but the transfer of considerable existing-site clout was to a brand-new domain name. This was December of 2004, before anyone knew newly registered domains were about to have the wind taken out of their sales. Their site appeared in the top results in the Google on their keywords within four months of site-launch, right on schedule and in line with our time estimate to the client. But then it dropped out and didn’t come back.

The client cringed. We cringed. We applied about as much “upward pressure” as we possibly could without crossing ethical boundaries. I was convinced we were worse than stuck in the sandbox, because we had the positions for quite some time and lost them. Then, news broke of the Jagger update. I totally understood the reasoning, and did what I always do when that happens. I metaphorically climbed into the head of the Google Engineers and rummaged around in there for awhile and discovered that if a domain was registered specifically for spamming, they would only be registering it for a year. If a site survived over that 1-year boundary, then bam! You’re out.

So, I gave the client the time estimate based on the new domain launch. I laid out their options, and the risks of sticking it out or bailing to the old domain name too soon. They took our advice and stuck it out to get past that 1-year point, and it paid off. I nailed the time estimate of when the sandbox/Jagger penalty would lift down to the week. It was 1 year and 3 weeks after they launched the new site.

One of my claims to fame in the SEO circles in the early years was my mission to conquer a 2-word keyword combo that landed squarely in the crosshairs of Macromedia, Apple, and a number of other companies: “multimedia software”. I achieved similar fortified results on this 2-keyword combo as I did with “distribution software”, and over the years, it has continued to hover around position #5. And although the term multimedia is so “80’s”, it is also highly competitive—maybe not in bidding, but certainly in how many products I had to push down. So, after I nailed the 2-word combo, I moved onto just the single term “multimedia”. I drove that sucker almost up to page one before I moved onto my next ventures. Also here, I worked the term “digital signage”, which was MUCH easier, since it was a bit more off the beaten track. It has still remained one of my benchmark keywords for taking the pluse of the search landscape.

At Connors Communications, my job is really cut out for me. It’s a PR firm, and doesn’t have the ace up its sleeve that both P21 and Scala had—a product and a user base. Yes, a product and user base are two of the most valuable tools for SEO. Because with a product, you can offer a free downloadable version, which triggers of the viral marketing thing like little else. Everyone adds you to the download sites, and you suddenly have both inbound links AND buzz. But you also have a user base who, for better or for worse, are going to talk about you in forums, and blog about you, and link to you on their websites (sometimes other corporate sites if your product is corporate). It’s even better if you have a network of dealers, distributors and legacy users, which Scala did. It was mostly a matter of directing momentum—or as Sun Tzu would say—throwing rocks on eggs. SEO for Scala was quite easy.

But Connors is a PR firm, which is a service. By nature, it can only serve a small number of clients at any one time. And no matter how talented the Connors crew is (and they are VERY talented, having launched Amazon, Priceline, and most recently, Vonage), it is still just a PR company without the advantages of a product or installed user base. So what is the hook I can hang my hat on from an SEO perspective? What will my benchmark keywords be for Connors? And how do I leverage all the zillions of search hits I’ll be generating for them with SEO if we can’t take everyone onboard simultaneously as clients?

The answer to “what keywords” is “pr firm”, for which we’ve risen to page one in MSN, page 2 in Google and page 2 in Yahoo. This serves as a beachhead for other keyword combos (more on the beachhead concept in later posts), and shows that the methodologies I developed not only are fortified across time (P21 and Scala), but they work across industries. So, the next step is to product-tize an aspect of the PR industry that is exciting to everyone, and can seem in many ways like a downloadable product. Once again, enter HitTail.