Benchmark Keywords Spanning Many Years

This post is about keyword benchmarking for search optimization. After a recent update to the Google search results, nicknamed “Jagger” last November, my personal domain dropped off the first page of results for my name, “Mike Levin”. The photographer of the same name maintained his position #1. My site went four pages in, but a bunch of other pages that also referred to me moved onto the first page of results, including my Blogger profile, the profile on my employer’s site, and my SearchEngineForum staff profile. In other words, I went from holding position #2 with my personal domain that has been around for a very long time, to holding 3 lower positions on the main homepage, but loosing my personal one.

Admittedly, I haven’t kept my personal site very updated and the sites linking to it might be someone dubious, since I’ve been in the SEO circles since day-one. SearchEngineForums is not only one of the oldest search-oriented forums on the Web, but it’s one of the oldest Web forums, period. It was started by Jim Wilson, who has since passed. Webmaster World has mostly taken its place in spirit. Over the years, I’ve worked as something of an intrapreneur (rather than an entrepreneur) at companies like Prophet 21 (now bought by Activant), and Scala Multimedia. There have been certain benchmarks over the years that have helped me gauge what was going on with search.

Searching on my own name was, of course, one of them. And the Google Jagger update was significant in that newer sites, but not too new, suddenly had an edge over long-standing sites, which you might call stale. But another benchmark I occasionally monitor is the term “distribution software”. It was relatively easy to conquer across all the engines of the time, and has sustained itself remarkably over time. So, it was with great interest that I watched when the new purchasers of Prophet 21, and the awesome 3-letter domain, forwarded the Prophet site to an Activant third-level domain. I don’t think the third-level domain had been around for very long, but the Activant site had. So, would it incur the sandbox penalty? Would it maintain its across-the-board top positions? Was Activant unwittingly walking away from one of its potentially most valuable acquisitions and assets?

The answer is that the Google juice transferred over from to very smoothly, at least for the benchmark keyword that I still monitor. The sandbox penalty had been evaded by using a sub-domain of a long-standing second-level domain. If you search on distribution software on Google, Yahoo, MSN and Ask Jeeves, you will find Activant as the VERY TOP result in 3 of the 4 sites, and #3 in Ask Jeeves (which still shows the old domain).

This tells us several lessons. Across-the-board fortified results of the sort I achieved (with help from a fellow named Steve Elsner) are transferable. The transfer can occur in a relatively short period of time (a matter of months). A sub-domain can quickly acquire a great deal of clout—probably more quickly than a newly registered domain, given the new Jagger reality. And when I left P21 back in 1999, I left the Web pieces in some very good hands, and someone at Activant took a gamble that paid off and gave me some important SEO lessons for the SEO landscape as it exists at this particular instant.

Over time, a great deal of evidence mounts up that such-and-such a site is relevant on such-and-such a topic. These breadcrumb trails (mostly link topology) point back to hardwired domain names. So, changing a domain name is serious business.

I have another situation similar to the above one, but the transfer of considerable existing-site clout was to a brand-new domain name. This was December of 2004, before anyone knew newly registered domains were about to have the wind taken out of their sales. Their site appeared in the top results in the Google on their keywords within four months of site-launch, right on schedule and in line with our time estimate to the client. But then it dropped out and didn’t come back.

The client cringed. We cringed. We applied about as much “upward pressure” as we possibly could without crossing ethical boundaries. I was convinced we were worse than stuck in the sandbox, because we had the positions for quite some time and lost them. Then, news broke of the Jagger update. I totally understood the reasoning, and did what I always do when that happens. I metaphorically climbed into the head of the Google Engineers and rummaged around in there for awhile and discovered that if a domain was registered specifically for spamming, they would only be registering it for a year. If a site survived over that 1-year boundary, then bam! You’re out.

So, I gave the client the time estimate based on the new domain launch. I laid out their options, and the risks of sticking it out or bailing to the old domain name too soon. They took our advice and stuck it out to get past that 1-year point, and it paid off. I nailed the time estimate of when the sandbox/Jagger penalty would lift down to the week. It was 1 year and 3 weeks after they launched the new site.

One of my claims to fame in the SEO circles in the early years was my mission to conquer a 2-word keyword combo that landed squarely in the crosshairs of Macromedia, Apple, and a number of other companies: “multimedia software”. I achieved similar fortified results on this 2-keyword combo as I did with “distribution software”, and over the years, it has continued to hover around position #5. And although the term multimedia is so “80’s”, it is also highly competitive—maybe not in bidding, but certainly in how many products I had to push down. So, after I nailed the 2-word combo, I moved onto just the single term “multimedia”. I drove that sucker almost up to page one before I moved onto my next ventures. Also here, I worked the term “digital signage”, which was MUCH easier, since it was a bit more off the beaten track. It has still remained one of my benchmark keywords for taking the pluse of the search landscape.

At Connors Communications, my job is really cut out for me. It’s a PR firm, and doesn’t have the ace up its sleeve that both P21 and Scala had—a product and a user base. Yes, a product and user base are two of the most valuable tools for SEO. Because with a product, you can offer a free downloadable version, which triggers of the viral marketing thing like little else. Everyone adds you to the download sites, and you suddenly have both inbound links AND buzz. But you also have a user base who, for better or for worse, are going to talk about you in forums, and blog about you, and link to you on their websites (sometimes other corporate sites if your product is corporate). It’s even better if you have a network of dealers, distributors and legacy users, which Scala did. It was mostly a matter of directing momentum—or as Sun Tzu would say—throwing rocks on eggs. SEO for Scala was quite easy.

But Connors is a PR firm, which is a service. By nature, it can only serve a small number of clients at any one time. And no matter how talented the Connors crew is (and they are VERY talented, having launched Amazon, Priceline, and most recently, Vonage), it is still just a PR company without the advantages of a product or installed user base. So what is the hook I can hang my hat on from an SEO perspective? What will my benchmark keywords be for Connors? And how do I leverage all the zillions of search hits I’ll be generating for them with SEO if we can’t take everyone onboard simultaneously as clients?

The answer to “what keywords” is “pr firm”, for which we’ve risen to page one in MSN, page 2 in Google and page 2 in Yahoo. This serves as a beachhead for other keyword combos (more on the beachhead concept in later posts), and shows that the methodologies I developed not only are fortified across time (P21 and Scala), but they work across industries. So, the next step is to product-tize an aspect of the PR industry that is exciting to everyone, and can seem in many ways like a downloadable product. Once again, enter HitTail.

Added WiFi Hotspots, and Paying Less

So today I joined the ranks of the wireless warriors. I was on a $60/mo T-Mobile plan that got me 600 peek hours, unlimited night & weekends plus unlimited Internet and download (over the phone only). But now with my shiny new Averatec that everyone thinks is an Apple iBook, I have the itch to walk NYC, sitting down in any Starbucks to do my work. So after about a half hour of talking to a helpful T-Mobile rep named Sidney, I came up with a combo that gets me everything I want.

I’m now spending $10 less per month, and I have unlimited T-Mobile WiFi hotspot access from my laptop as well. What I gave up were 90% of my peek hours and evenings. After I got off the phone with them, I suspended my XP laptop at home, walked over to the closest Starbucks, and connected. A remote desktop session that I had running to my PC at the office came right back, even though it was a different WiFi network and had been in suspend mode. Thing have really improved.

Blog pinging and Pingomatic

I plan on understanding a lot more about how pinging works in blogging systems. I’ve built blogging systems, but before all this pinging stuff was going on, so nothing on that system becomes part of the blogosphere, proper. I just submitted HitTail at the Pingomatic site. And a lot has to be learned just from that process—not the lest of which is simply the names of the different pinging services. Pingomatic even shows you the feedback of the ping.

I don’t know if Pingomatic is using Web Services or simulating a webpage submit. But if this isn’t an application built for Web Services, I don’t know what is. For posterity, and for later review, I did a screen capture. Now that I did this ping, HitTail is really going to be in the blogosphere, because who knows what happens as a result of a one-time ping. There will very likely be discovery-bots sent out, and automatic revisits without pinging by proactive news gatherers. And for the sake of interest, here’s the spider visitation within the first half-hour of doing this blog-ping. Some of these are crawlers that I haven’t seen on HitTail before. Hmmmm…

TrackBack, Link Farms, Jagger Update and Blogger

How much do I miss the track back feature by going with Blogger? Not at all. Why? New information shows just how non-helpful, and perhaps even damaging it can be. Perhaps even Blogger never implemented TrackBack intentionally, due to possibly knowing what was coming down the pike from parent Google—especially in light of the Jagger update from last November. Reciprocal links were penalized—or at least stopped delivering as much value as they used to. If every link you receive is reciprocated, you’re a link farm—at least as far as the Web topology you’re creating.

To deal with this aspect of TrackBack, it’s generally all-or-nothing. You can use TrackBack or turn it off. The second strike against TrackBack is, of course, spam. People link to you and send a TrackBack ping specifically to get a link from your page, even if their site is totally unrelated. The reciprocal links with the unrelated sites goes even further to create that terrible link farm topology. There’s a thin line between the organic pattern created by a genuinely popular site, and the pseudo-organic pattern created by link farms.

My money is on Google getting better and better at recognizing these automatic cross-link patterns. And like every other spam trap, there’s some sort of threshold. Stay below that threshold, and you’re golden. Go over that threshold, and your site is flagged for human review, or possibly even automatic banning.

The real way these blogging software companies should implement TrackBack is to get rid of the silly pinging and TrackBack codes. Blog posts don’t need a unique identifier. The permalink page has a URL! That’s unique enough. The code system is too geeky, and can be automated. Analytics-like Tracking systems built into blogs should simply recognize people following links. If it’s a first-time referrer, it should send a crawler out to check the validity of the page (not all referrers are accurate), and put that link into an inbox queue in the blog user interface. The person running the blog can then go and visit each of the sites and make a human evaluation whether it’s worthy of receiving a link back. If it is, they checkbox it.

This has a number of advantages. First, the human checking process will block spam. Second, it will pick up much more referrers than the TrackBack system in its current form, which requires action on the part of the person linking to your blog. This information is already being passed back and forth. Why not use what’s already there? Third, it serves as a sort of competitive intelligence gatherer for the blogger. They get to see all referring links to their blog as a matter of interest, without necessitating that they receive a link from you.

The time has come, the Walrus said, to speak of many things. The do’s and don’t of SEO, of tracking-back and pings.

An addendum to this post, moments after I published it: in going into Blogger’s settings, I discovered the “Backlink” feature. It sounds like it’s implemented much like I imagined. No codes are necessary. You just turn it on. So, I did (to get the experience). If I think it’s starting to create a link-farm pattern, it gets turned off, pronto. It will be interesting to see what happens. It says that it uses the “link:” feature, which makes me think that the referring site has to be in the Google index, and perhaps even have passed whatever criteria they use to reduce the number of actual results reported by it. That would perhaps deal with the spam issue, if the site linking to the post needs to have, say, a PR of over 4.

Under-Promise, Over-Deliver

I woke up remarkably early, all things considered, and sent out an email informing my team I’d be taking Monday on the HitTail project. I have such momentum, and am so ready to wire up the main homepage to be ready for the first visitors, that I’d be crazy to go into the office, engage, and risk putting it off for another week. My to-do list for today looks like this…

  1. Create the template files.
  2. Put placeholder files in location for FAQ, SEO Best Practices, Why Sign Up.
  3. Fix the navigational links to point to these new placeholder files.
  4. Put the new navigational links into the Blogger templates.
  5. Figure out how the babystep tutorials are going to be linked in.
  6. Link in the first tutorial, and the respective spiderspotter app.
  7. Connect the submit form to lead management.
  8. Start putting content on the placeholder pages.

The work that I need to do on my second round of intensive focus include…

  1. Final thought-work on the actual HitTail app.
  2. Creating the HitTail app.
  3. Giving a flavor for its power directly on the MLT homepage.
  4. Start communicating with the people who signed up early—probably create a public forum, so I can efficiently communicate with all of them at once, and they can communicate with each other.
  5. Ensuring that the conversations that are developing into Connors new client prospect opportunities are being handled properly.

One pitfall to avoid is actually acting on the information that the spider spotter app is revealing to me. For example, Bitacle bot has been trying to retrieve my atom.xml file from the wrong location. I realized the path I set to the XML feed in my Blogger settings was incorrect. I fixed it, but realized I had an absolutely fascinating app to write: one where I could measure the time between me submitting a blog entry, and spiders request that page or the data-feed.

I think I’ll make a list of lists that I need on the HitTail site.

  • Apps that I need to write (which will also become tutorials)
  • Markets, industries and technologies that I want to target
  • People that I need to reach out to (the influencers), and the message I need to deliver to each, based on their interests
  • Topics for the SEO Best Practice
  • Questions for the FAQ section
  • Topics that I intend to blog about
  • Pitfalls to avoid

Perhaps the biggest pitfall of all is over-promising. There is little as damaging as building up expectations, only to be let down. I stand the danger of over-promising to two different audiences: Connors (specifically, Connie), and the people who sign up early for HitTail. I have to start with a small, but potent kernel. HitTail will be modest in how far it’s reaching, but designed to strike a fundamental chord—one that’s in the tornado’s path. There’s no need to over-promise, because that small kernel is totally enough—and I have to focus on over-delivering that one small piece.

That’s very Web 2.0 thinking, by the way. Because everything interoperates, relatively easy, people can write mash-ups based on your app. Each person writing their mash-up is likely to have way more expertise in their problem domain than I do, so what they write USING my service is better than what I could write alone. My role then becomes to put out a few sample mash-ups to stimulate everyone’s imaginations.

The HitTail app will be one of the first Web Services for SEO. Hopefully, it will have the same attractiveness as Tag Clouds, Blog Roles, Bookmarks, and all the other things that are serving as mash-up fodder and material for blog templates. Of course, Google Maps is the ultimate mash-up service, and I will continue to use it for inspiration. But no over-promising!

NYC PR Firm and SEO

OK, here I am at Hollywood Diner Sunday night at 1:00AM evaluating how I did over these past four days. There’s still I’d like to do tonight, but a reasonable person would put it aside, get some good sleep, and be in the office tomorrow morning to catch up. I took 4 continuous days, Thu-Sun, to focus on this project. I basically ignored all emails (and that made all the difference), and bore down on the work.

Am I happy with my progress? Does it match what I visualized? What I finished has indeed matched what I visualized very closely. I just haven’t finished as much as I would have liked. The baby-step tutorial markup project took two full days. But I knocked a lot of foundational issues out of the way. I’ve committed myself down the Microsoft, VBScript route in order to get the project finished. I have the first full tutorial done. I have the spider-spotter application finished. I have the homepage designed and implemented.

I just don’t have Lead Management wired up, don’t have placeholder pages for the different top-level navigation pages, and don’t have the tutorial or spider-spotter app actually linked in. I did not achieve my objective of having this site operational as an opportunity generator before the weekend was out. But it’s all set up, just waiting to be hit home. Lead Management is working on the Connors site, and I could move it over quite easily. And it’s still early.

I’m having coffee and getting some food at the Diner. So, I should be set until 5:00AM again. But I can’t do that if I’m committed to meetings tomorrow morning. It actually looks clear enough. I’ll have to send out an email that I’ll be taking another day. I really shouldn’t have to feel guilty about focusing on this. This is the value I have to bring to Connors—much more so than client management. Our people are very good, and self-sufficient. I’m mostly there for high-level guidance, a backup net, and for new business development. I want to be in on Tuesday for an on-site client meeting on one of the more detailed SEO projects that we do (URL re-writing).

Once finished, the HitTail website will elevate Connors’ role in the PR industry from being one of NY’s top PR firms that specializes in emerging technologies, to being an emerging technology company itself. At very least, it will be a PR firm that can demonstrate its technical chops in a very public, very glitzy way.

So, how to get from here to there?

Building the actual HitTail application, which I still haven’t really talked about yet, is the biggest part. The next step is to practice what I preach. By making the HitTail site massively successful, documenting as I go, I’ll be spelling out the HitTail formula and process. What I’ll be doing will actually go beyond the prescribed MLT-formula, but those things will actually be part of the playbook under SEO Best Practices.

Even SEO Best Practices is a misnomer, but it’s the best label right now for the audience-building task. We will address all aspects of online marketing, publicity and promotion that are unpaid. Of course the employee’s salaries are going into the work, their electricity, rent, and all other burden costs. But what is not going into it is a large marketing budget. It is quite possible for a single, passionate individual to outperform an entire marketing department through word-of-mouth evangelism. The Internet and Web simply bring automation and persistence to old fashioned word-of-mouth. Search is a special part of the equation, because it’s the wildcard, and the one on which fortunes can flip-flop. It’s the area that has an amplifying effect that you don’t have to pay for, resulting in getting more out than what you put in.

So, isn’t Connors setting up its own competition with the HitTail site? In some cases, yes. We will be spelling out a process whereby a dedicated individual within a company can create a lot of publicity for themselves without hiring an outside company, and with much less investment than a traditional marketing budget full of advertising and events. In a very real way, we will be teaching them how to do a very advanced form of high-tech PR—exactly our specialty.

Then, why is Connors doing this? Because the total number of people needing this far exceeds what we can service, and we would rather have this relationship with you than not. We would rather be the ones ushering in this next evolution of search marketing than not. How can it not result in anything but good for Connors, and those we will have the privilege of serving? We believe in sewing a thousand seeds and seeing what blossoms.

UPDATE: Connors has evolved from traditional PR to high end search engine marketing.

Foundational Design and SEO Considerations

Now, it’s time to consider aesthetics and search optimization considerations. As many people in the SEO field will tell you, there is a balance to strike between SEO best practices, and design perfection. If you were just going after perfect search optimization and usability, everything would look like Jakob Nielsen’s (talk about a dated look). But to go after uncompromising design, you would do the entire thing in Macromedia Flash, and make the entire site invisible to search, defeating a primary purpose of a website—generating the sales opportunity in the first place. But just as a skilled poet can communicate perfectly while maintaining pentameter and rhyme, a skilled Web developer can seamlessly combine optimization and design. I have three extreme advantages going in my favor…

  1. I’m creating the site from scratch, so all my decisions are foundational.
  2. I’m a V.P. of the company AND the entire art/programming team on this project, so I have no artist to satisfy.
  3. I’m proceeding with a very clean and sparse Google-like look, so art’s not a large project.

I should not use the Google sparse look as a license to go boring. Remember my comment about Jakob’s site? I don’t want to be a hypocrite. So, I do indeed plan on spicing up the look of the site. I’m quite partial towards the look of the blog-site too-biased, the blog site for which the Ruby on Rails Typo program was developed. So, what design parameters do I have to work with?

  • Logo
  • Value proposition in the form of a tagline
  • Navigational elements
  • Pervasive Sign Up form
  • Space to push out message du jour—when the MLT app is done, this space will be where we give a preview of the sizzling visual of HitTail.

The logo placement is already decided. The sign-up form will initially be just a single line positioned like a search box. The initial tagline is already written. So, I need to nail down the navigational elements. I think I’ll make them very Google-like in that they’re plain text links, easily changeable, and implying tabs even though the tabs aren’t really there. Plan text links being used as-if they were tabs (and maybe making them look like tabs wholly through CSS) is a perfect example of the 80/20 rule in design. I could spend a whole weekend just designing a cool tab look that someone, somewhere would hate (or would break some browser). Design is so subjective and pitfall-ridden, that you have to choose your battles carefully. I’m sure to do a dedicated post on that topic later. But for now, I’m starting out with the navigational elements:

  • Home
  • Why Sign Up?
  • SEO Best Practices
  • Blog
  • FAQ

The visual proportions and weighting when these words are laid out are perfect. I would like to add the terms “PR” and “Pod” to the navigational links, but it really throws off the balance right now, and I won’t have content to add to the Pod section right away. Everything from the PR link could be put under the FAQ link. FAQ feels a little old school, but PR is too obscure. Everyone understands what an FAQ is these days. And everyone understands Blog. But even Blog is getting to feel a bit old school. Pod is the way to go, but I can’t right now. I’ll be able to populate the FAQ quite easily using the CMS.

But I will be able to do Pods soon. I’m going to set up a tiny but adequate audio/video production facility. Talk about humanizing a sight. I can video-document the birth of a Web 2.0 company, and my becoming a part of the Manhattan scene. I’ll try to produce something that maybe could be picked up by Google Current (Google’s cable TV channel). They call Pods VC^2 for viewer contributed content. I’ll attend the iBreakfasts and more conferences, helping generate buzz for my own site by promoting them. Maybe I’ll pitch the idea to my neighbor, Fred Wilson, who publishes submitted Pod-format elevator pitches on his Union Square Ventures website. I’m not going to make an extensive PodCasting post here, but it does merit mention. Just as developing public speaking skills is necessary for certain types of careers, being able to speak well is turning into an optional, but compelling part of Web publishing.

Another important aspect is that I’m putting all my content, with the exception of the logo, into plain text. The headline and tagline definitely look better as graphics. But I need every scrap of SEO-power I can muster in constructing this site. As is the overwhelming trend these days, I’ll be using div’s to format style. But unlike today’s trends, I’ll very deliberately be using tags like p (for paragraph) and b (for bold) to keep the semantics in place. Nothing has set back the Semantic Web like the proliferation of the use of meaningless div id’s, and the stripping out of all conventional document context. HTML tags like h1, p, b, i, blockquote, and many others are still very much worth using, because it is part of the clues you’re leaving search engines about what’s important. Just use div’s to block together elements for stylization.

Span’s are an interesting question, because they are inline. On the one hand, you can avoid them completely by putting id’s on elements such as bold or italics. But then you change the conventional presentation of these tags and risk the search engines parsing engines not knowing what they are at all. A compromised solution is to continue to use bare-bones tags like b or i, and just put a span tag AROUND the conventional HTML tag. It’s a bit of extra code, but it purges out all ambiguity. You know with certainty that div’s and span’s will be parsed for attributes. It’s also very likely that a lot of dumb parsers are not expecting parameters on p’s and i’s. So, this combination removes all ambiguity, and forces search engine to accept the meaning that you intend.

May be misinterpreted…

stylize me

Withholds information from the Semantic Web…

stylize me

A little bit of extra code, but cannot be misinterpreted…

stylize me

There are important facts to consider. If you are willing to make all your “b” tags look alike, you can just create a style that applies to all your bold’s. This is how so many blogs change their anchor-text style from a solid underline to a dotted underline. If you’re able to do this, you don’t need the extra span tags, and you don’t need ID’s on your bold tags. That’s another best-case scenario. But what I’m considering here is the main homepage of, and main homepages always have a different set of rules. You need to stylize elements on a case-by-case basis without affecting the whole rest of the site.

Now the issue to keep in mind here is the “C” in CSS. C stands for cascading, meaning that how things are nested controls which style wins. Last style wins. Inline elements like span cannot/should not contain block element attributes. So, you can’t use margins and padding on a span element. Use div’s when it’s like a blockquote, and use span’s when it’s like a bold.

Styles are rendered outside-in. That is, the definition of span will override the b tag. This is great for inline text, and really helps with the Semantic Web. But when you’re using the above bling example with a paragraph tag, it doesn’t hold up. Div’s and span’s are only meta-container tags. That is, they only exist to contain other elements, and add some meta-data such as ID’s and class names, and imply nothing about content relevancy or importance. Everything belonging to such a unit belongs INSIDE the container—especially if you’re using the container to move things around, such as you do with div’s. So you see, you can get away with the bold tag outside a span, because it will cascade properly, and you never MOVE a span. You get the semantic value of the bold tag, but the span tag wins in applying style, because it’s working outside-in.

But you can’t do that with div’s, because a bare-bones paragraph tag inside a div tag will override the div’s style with the paragraph’s default style. And you can’t change the default paragraph’s style without affecting the rest of the site (or page), and you can’t add an ID to the p or you throw off creating a perfect document structure for the Semantic Web. It’s something of a conundrum, and those who can solve it get a sliver of potential SEO advantage. Many things in SEO are not about whether they definitely provide a boost today, but are rather about whether they may ever produce a boost someday, and can never be interpreted as bad form or spanning. Often, the solution is to stick to the bare bones HTML code in order to get the semantic advantage, but to use a second style definition for just that page that overrides the global style. Practices like this may seem over-the-top, but it’s the weakest link in the chain principle. Much more on that later.

Well, this has been quite a post. I could break it into smaller posts, but it really was part of one unit of thought, so I’ll keep it intact. But that leads the SEO issue of what to name the post. The title transforms into the title tag, the headline of the permalink page, the words used in anchortext leading to the page, and the filename. This all combines to make it the single most search-influential criteria for the page. If I were going wholly for optimization, I would break this post into many smaller posts, using the opportunity to create more titles, and consequently more sniper-like attempts at Web traffic on those topics. But more on that later!

Short-term Objectives

OK, I’m effectively done the first of the two spider spotting projects, and I think I’m going to not do the second one today. The first project has given me the structure for stepping through all my log files and extracting what I need for the second project, so there is no urgency. No data is being lost.

Where there is urgency is getting the HitTail site a little more ready for prime time. Not that it will be a compete app, or even make a lot of sense to people right away. But it can no longer look like a work in progress. Thanks to the popularized “clean” Google main homepage look, it’s quite easy to make a site look finished when it’s not even close.

That’s what I’m going to do this weekend. But I need to guide the precious fleeting focus time left with a plan. Keep in mind the 80/20 rule principle, because it’s really got to be applied here. What are some of your objectives?

Create the template files from the CMS system, so you can wrap any of your ASP files in the rest of the site’s look. I do something like use Server Side Includes (SSI), but I don’t physically break the master templates into header and footer files. I find it more powerful to keep template files in one piece, and just mark them up with content begin and end tags.

Put the first babystep tutorial onto the HitTail site. You went through all this effort to produce the first one. So, you need to plug it in. Also, add the spider spotter application and cross-link it with the tutorial. I should also think about cross-linking with blog posts. I have sort of a structure going here:

– Thought-work
– Baby-step Tutorial
– The Application

Too many words. Can I abbreviate it?

– Thoughts
– Baby-steps
– The App

They each have a strong, distinct identity. That’s good. I don’t think this is something that will really last once HitTail starts to go mainstream, because it’s a little two tech-geeky. I’m not recommending with HitTail that the average marketing person go through these tutorials. But whenever a Marketing person wants to give their group a competitive advantage, I would like to provide him/her with a convenient link to forward to their Tech.

OK, another objective of HitTail is actually to find prospective clients for Connors Communications. As the world is getting more technical, many public relations firms are getting left in the dust. They’ve been able to catch onto blogging in great part, because blogging software is just so simple. But that’s not enough. You need to know how to give your clients cutting-edge advice regarding their corporate blogging strategies. Now consider, that this site started getting spider visits within days of being created, and a search engine submit never even occurred. Why? Because I planted the blog on the same domain as the main site, transferring Google juice to the “main” corporate site. And the blog search hits are still valuable from a sales standpoint, because the blog is wrapped in the main site’s navigation. You are always presented with the company’s encapsulated message (logo, tagline, etc.), and are one click away from the main homepage.

This is not always the direction you want to go, but how many PR firms can speak to these issues with authority? What’s more important, search engine optimization or having a corporate blog? What is the relationship between the two? Should the blog be on the main corporate site, or its own separate domain and entity? How can search optimization be done without risking future banning? What can we do if we’ve committed to a particular Web technology infrastructure that prevents us from performing search optimization?

So, that secondary objective is generating new prospective client opportunity for Connors, and my goal for today is to have an easily-applied template look, and to activate the sales lead acquisition and management system. Such a system worked like gangbusters for me in the past, because it was a unique and differentiated product. But now, I’m in the PR industry. So, HitTail will be Connors’ unique and differentiated product that you have to sign up to get. But it won’t be done in a week, so I will need some simple explanation of what HitTail is—enough to entice volunteering contact info. And there should be two different ways to capture contact data:

1. Email-only, which is just enough to do some sort of follow-up.
2. Full contact info, necessary for more thorough follow-up.

I want to make a very strong value proposition and teaser to get people to sign up. But I also want to start putting the right sort of content here (and on the Connors site) to draw in promising prospective clients. For a little bit of time, the HitTail site is going to be a little bit of a playground. I want a way to mention all the various industries who could benefit by using HitTail. I also need to talk about a lot of marketing principles and how they apply in the evolving online landscape.

I should really nail down what the main navigational elements are, because that’s going to inform, guide and influence the rest of the development of the site. It also will have an implied version of the site’s value proposition.

– SEO Best Practices

OK, I’m saying that HitTail is a better way. So, why not…

– SEO Good Practices
– SEO Best Practices

Is it an SEO site? Yes, for now. But it will also be a public relations site. And I want to keep the message VERY simple.

– Why Sign Up?
– SEO Best Practices
– PR
– Blog

Then, there are the items beneath the surface of the iceberg (more on that philosophy later). Those include…

– Emerging markets, industries and technologies
– The Baby-step tutorials
– Marketing principles, traditional and new
– Geek issues, like watching spider activities

So, the to-do list reads like this…

1. Adjust the navigational links.
2. Create the template pages.
3. Put place-holder pages in for each link.
4. Turn the main homepage into an email address collector.
5. Make the email response page offer to start sales lead process.

Issues to keep in mind
– I’m going to want to plug in the first babystep tutorial.
– I want the main homepage to feature the latest thing: tutorial, blog post, etc.
– I need to make it compatible with lead management on the Connors side.

Caffine is My Drug of Choice

OK, let’s get the first little nested project out of the way. Find a post that meets the condition of having babystep code, but the previous post doesn’t. But back a little further, there is more babystep code. It’s a recursive app, going back in time feeding the most recently considered post ID as a parameter, plus the master message ID. The function, when given a master message ID, it looks at the immediately prior post in that same discussion to see if it finds a bapystep post. If it finds a match, it returns that post’s ID. If it doesn’t find itself, it calls itself. This relies on the newly found ID bubbling up through the recursion. But of course, I don’t trust that in VBScript, so I’m going to use a global variable. The recursion automatically ends when it reaches the master ID, which is the first post in the discussion.

That project is out of the way. It’s 12:30 midnight on a Saturday night in Manhattan, and I’m just getting underway with a programming project. Sad. But that’s my choice. It’s only with this sort of mad dedication that truly inspired projects come to fruition. I’ve had too much time feeling like I was just spinning my wheels not getting anywhere. It’s time now for that drive that gets wasted on term papers in college. I often think how much greater the world would be if the youthful energy that gets dumped into diplomas to hang on the wall, and stupid rights of passage, actually got funneled into entrepreneurial projects with a positive social impact. The world would be a much better place. Anyway, to build and keep the momentum for the spider-spotting project, I need caffeine. Time to run out.

This site is called HitTail, because it is going to focus on the long tail of search, and ways to tap into the power of unpaid search without resorting to shadowy practices. But I’m thinking I may also want to call it full MyFullLifecycle, in how it’s addressing two full lifecycles: first, the birth of the site itself. This goes from the creative parts, to the first spider visits, to the first search hits, to the first user feedback, to the first user of the service, to the de-geekifying of the site once it starts to catch on, to the site’s rise to popularity. But it also will be very concerned with the lifecycle of the customer, from getting into their head to know what type of searches they’re going to perform, to finding HitTail, to eventually providing contact info, to signing up for the service, to productively using the service, to measuring this user as a win or a loss based on them getting the next person in (more on that later). But you can see, I’m thinking in depth about both the lifecycle of the site, and the lifecycle of customers using the site.

OK, the re-engagement process is important for maintaining focus. I went to grab a bite to eat, and pick up some caffeine. When I got back, I immediately wanted to plop in front of the TV and vedge. I see that I am in constant need of stimulation. TV provides it way too easily. I’ve got to switch to radio and music, so I can keep doing it even while I’m working. But I’ve never much been one for music. Nothing ever pulled me in to really make a fan. You can count on one hand the number of CDs I bought. And the things I like are usually so offbeat that they don’t even constitute a genre. So, I’m using Pandora to find more music I might like based on the handful of things I really enjoy. But the Animaniacs and Eric Idle haven’t made it into the Music Genome Project. I really like novelty music. My best luck so far has been from putting in the seed song “The Lime in the Coconut”. It describes the station as mild rhythmic syncopation, heavy use of vocal harmonies, acoustic sonority, extensive vamping and paired vocal harmony. I tchose Caffine by Toxic Audio, which I enjoyed and found appropriate, so I guess it works.

OK, let’s really get started with spider spotter project #1. It’s 1:20AM. It seems like I piddled away hours since I started, but not really. I actually made most of the design decisions in my head. I can jump into this thing head-first. I’m really excited about creating my first publicly consumable baby-step tutorial. This is one that will actually be of great use to some people.

MSWC.IISLog or the TextStream Object to Parse Logfiles

OK, the first step in the first spider spotter project is choosing which technology to use to open and manipulate log files. There are basically two choices: the TextStream object, and the MSWC.IISLog object. Both would be perfectly capable, but they bring up different issues. The power of manipulating the log files as raw text comes in using regular expression matching (RegEx). But doing RegEx manipulation directly within Active Server Pages requires dumping the contents of the log file into memory and running RegEx on the object in memory. And log files can grow to be VERY large. One way to control how much goes into memory is to encase the ReadLine method of the TextStream object in logic to essentially create a first-pass filter. So, if you were looking for GoogleBot, you could pull in only the lines of the logfile that  mention GoogleBot. Then, you could use RegEx to further filter the results.

The other approach is to use MSWC.IISLog. I learned about this from the O’Reilly ASP book. It essentially parses the ASP file into fields. And I’m sure it takes care a lot of the memory issues that come up if you try using the TextStream object. One problem is that it’s really an Windows 2000 Server technology, and I don’t even know if it’s in Server 2003. It uses a dll called logscrpt.dll. So, first to see if it’s still even included, I’m going to go search for that on a 2003 server. OK, found in the inetsrv directory. So, it’s still a choice. The next thing is to really think about the objectives of this app. It’s going to have a clever aspect to it, so the more you use it, the less demanding it is on memory. And I’ll probably create a dual ASP/Windows Scripting Host (WSH) existence for this program. One will be real-time on page-loads. And the other will be for scheduled daily processing.

Even though it’s really not worth pulling in the entire logfile into a SQL database, it probably is worth pulling in the entire spider history. Even a popular site only gets a few thousand hits per day from GoogleBot, and from a SQL table perspective, that’s nothing. So, why write an app that loads the log files directly? It’s the enormous real-time nature of the thing, and the fact you’ll usually be looking at the same day’s logfiles for up-to-the-second information. So, the first criteria for the project is to work as if it were just wired to the daily log files. But lurking in the background will be a task that after the day’s log file has cycled, it will spin through, moving information like GoogleBot visits into a SQL table. It will use the time and IP (or UserAgent) as the primary key, so it will never record the same event twice. You could even run it over and over without doing any damage, except maybe littering your SQL logs with primary key violation error messages.

MSWC.IISLog has another advantage. Because it automatically parses the log file into fields, I will be able to hide the IP addresses on the public-facing version of this app if I deem it necessary. Generally, it will only be showing GoogleBot and Yahoo Slurp visits, but you never know. I’d like the quick ability to turn off the display of the IP field, so I don’t violate anyone’s privacy by accidentially giving out their IP addresses. OK, it sounds like I’ve made my decision. I don’t really need the power of RegEx for spotting spiders. IIISLog has a ReadFilter method, but it only takes a start and end time. It doesn’t let you filter based on field contents. OK, I can do that manually—even with RegEx at this point. If it matches a pattern on a line-by-line basis, then show it. Something else may be quicker, though.

OK, it’s decided. This first spider spotter app will use MSWC.IISLog. I’m also going to do this entire project tonight (yes, I’m starting at 11:00PM). But it doesn’t have nearly the issues of the marker-upper project. And it is a perfect time to use the baby-step markup system. I do see one issue.

There are two nested sub-projects lurking that are going to tempt me. The first is a way to make the baby-step markup able to get the previous babystep code post no matter how far back it occurred in the discussion. That’s probably a recursive little bit of code. I think I’m going to get that out of the way right away. It won’t be too difficult, and will make the tutorial-making process even more natural. I don’t want to force babystep code into every post. If I want to stop and think about something, post it, and move on, I want to feel free to do that.

The other nested project is actually putting the tutorial out on the site. I’ve got an internal blogging system where I actually make the tutorials. But deciding which once to put out, how, and onto what sites is something that happens in the content management system. Yes, the CMS can assemble Web content for sites pulling it out of blogging systems. In sort, the CMS can take XML feeds from any source, map them into the CMS’s own data structure, apply the site’s style sheet, and move the content out to the website. But the steps to do this are a little convoluted, and I have the itch to simplify it. But I’ll avoid this nested sub-project. It’s full of others.