San Francisco SEO and Internet Marketing | Blind Five Year Old

The Pen Salesman

July 17 2011 // Marketing + Web Design // 5 Comments

If you work with me for any amount of time you’ll likely hear some of my stories and analogies. One of my favorites is an old direct marketing story passed down to me when I was just getting started.

The Pen Salesman

pen from the pen salesman story

There once was a pen salesman who had two types of pens. One was a very nice but basic model and the other was a fancier, more expensive, high-end model.

The pen salesman was doing a pretty brisk business but he had a problem. He wasn’t selling enough of the high-end model. This was troubling because the margin on his high-end pen was … higher. People seemed to like the high-end model but, on par, most wound up buying the basic model instead.

So what did the pen salesman do?

He decided to create a new premium pen. It would be even fancier and more expensive then his high-end pen. Now the pen salesman had a selection of three pens from which to choose. The secret was that the pen salesman didn’t really want to sell the premium pen! In fact, he wasn’t even really stocking them. But a funny thing happened, customers began to select the high-end (now the middle) model in droves.

When presented with three choices (good, better and best), the middle pen suddenly became far more attractive and looked like a better value. Had the pen changed? No. But the context in which it was presented did, and that made the difference.

That doesn’t mean you can go on forever adding more and more models to your product line and expect similar results. No, I can also talk your ear off about The Paradox of Choice by Barry Schwartz some of which is based on work by Sheena S. Iyengar, author of When Choice is Demotivating (PDF).

In short, consumer behavior is fascinating and powerful.

Internet Marketing Maxima

cat trapped in invisible box

I sometimes wonder if we as Internet marketers are using these old school techniques and stories when implementing our campaigns. The ability to conduct A/B and multi-variate tests has soared but the root of most successful campaigns is in understanding context and consumer behavior. Don’t get me wrong, I love numbers and am all about data-driven decision making. But not in isolation.

I worry that the technology we rely upon creates local maxima issues, which is a highfalutin way of saying that we constrain ourselves to the best of a limited set of outcomes instead of seeking a new (and better) solution altogether. Harry Brignull of 90% of Everything and Joshua Porter or 52 Weeks of UX explain this far better than I could, so go off and do some reading and then come back to finish.

The pen salesman could have tried different colors (of pen or ink), or a different pitch, or added features or cut prices or offered a gift box with purchase or any number of other typical marketing techniques to help increase sales of his high-end pen. But it’s unlikely any of them would have achieved the monumental shift in sales he saw by introducing that premium pen.

So I hold on to the story of the pen salesman as a way to remind me to think (really think) about context and consumer behavior.

Panda and Big Data

July 15 2011 // SEO // 2 Comments

I’ve been thinking a lot about Google’s Panda update. Who in the SEO community hasn’t, right? But there’s one thing in particular that continues to bug me.

Why is Panda applied at the site level?

A site wide quality metric seems very un-Googly. It was one of my major complaints when Panda (then called Farmer) was rolled out. It treats lousy content the same as great content. This seems to run contrary to Google’s mission to return the best and most relevant search results.

You might argue that the great content will continue to rank well based on other signals, but there’s little doubt that it will be negatively impacted. And the content that now outranks it may not be better at the page level.

Panda Mechanics

At this point I believe we have a fairly good idea of how Panda is applied. Bill Slawski (here and here), Danny Sullivan and Eric Enge have all provided great insight into how Panda might have been constructed and implemented.

In general, the conclusion seems to be that Panda acts as a type of quality filter that sits on top of the algorithm, placing a penalty of sorts on those sites it deems to be unworthy.

My Panda Theory

It seems probable that Panda is a document based classifier that evaluates and scores the quality of a page. But why not integrate that page score as a true signal so lousy content would be demoted and quality content would rise? Let each piece of content compete on its own merits.

Could it be that the confidence interval for the Panda classifier isn’t high enough on any single document?

But if you sample enough pages from a site the Panda classifier reaches an acceptable confidence level, allowing Google to pass judgement on the entire site as a whole, but not on an individual URL basis.

panda document scores

If the Panda score is on a scale of 100 you might wind up with something like this. So while document three scores very well with a 94, the rest of the documents on the site drag the aggregate Panda score down.

This could explain why removing certain thin content pages might impact your Panda status, since the aggregate score might rise substantially.

It could also explain why many saw sites with a small content corpus escape the wrath of Panda, since the lack of a viable sample made the aggregate Panda score invalid.

Panda and Big Data

Part of this theory draws from Mike Cohen’s presentation at the Inside Search event. He stated that the way in which Google was improving the accuracy of voice search was through “massive amounts of data.”

And if you read In The Plex (which you should) you also come away with the feeling that it is sometimes less about tweaking the algorithm and more about feeding that algorithm more data. Machine learning requires big data.

Danny Sullivan reported from SMX Advanced that “the Panda filter isn’t running all the time. Right now, it’s too much computing power to be running this particular analysis of pages.”

Again, this seems to indicate that Panda isn’t part of the normal evaluation process and that it requires a substantial effort to recompute, even by Google’s standards.

Constraint or On Purpose?

Pee Wee Herman I Meant To Do That

Perhaps the iterative nature of the Panda updates will result in a more accurate document classifier that could be applied on the document level. Or maybe Google simply believes that a site’s overall content corpus should have an impact on all of the content on that site.

Maybe Google could apply Panda on the document level but instead believes site level application is more expedient.

There’s a small bit of logic there. If I buy products from a store and have them break again and again, I might not want to patronize that store even if a few of their other products were well-crafted and solid. So, the store (aka site) develops a reputation and the once bitten, twice shy adage kicks in. This dovetails nicely into the idea that your brand equity can have an impact on perceived relevance.

Of course, none of this could be even remotely true because I’m not a data scientist or statistician. I’m just a guy who reads a lot, experiments and enjoys uncovering patterns. And it doesn’t change the facts nor how to get out of Panda Jail.

What do you think? Is Panda’s site level application a product of constraint or done by design?

Google+ Review

July 07 2011 // Social Media + Technology // 19 Comments

(This post is an experiment of sorts since I’m publishing it before my usual hard core editing. I’ll be going back later to edit and reorganize so that it’s a bit less Jack Kerouac in style. I wanted to publish this version now so I could get some feedback and get back to my client work. You’ve been warned.)

I’ve been on Google+ for one week now and have collected some thoughts on the service. This won’t be a tips and tricks style post since I believe G+ (that’s the cool way to reference it now) will evolve quickly and what we’re currently seeing is a minimum viable product (MVP).

In fact, while I have enjoyed the responsiveness that the G+ team has shown, it echoes what I heard during Buzz. One of my complaints about Buzz was that they didn’t iterate fast enough. So G+, please go ahead and break things in the name of speed. Ignore the howling in the interim.

Circles

Circles is clearly the big selling point for G+. I was a big fan of the presentation Paul Adams put together last year that clearly serves as the foundation to Circles. The core concept was that the way you share offline should be mirrored online. My family and high school friends probably don’t want to be overwhelmed with all the SEO related content I share. And if you want to share a personal or intimate update, you might want to only share that with family or friends.

It made perfect sense … in theory.

I’m not sure Circles works in practice, or at least not the way many though they would. The flexibility of Circles could be its achilles heel. I have watched people create a massive ordered list of Circles for every discrete set of people. Conversely, I’ve seen others just lump everyone into a big Circle. Those in the latter seem unsettled, thinking that they’re doing something wrong by not creating more Circles.

Of course there is no right or wrong way to use Circles.

But I believe there are two forces at work here that influence the value of Circles. First is the idea of configuration. I don’t think many people want to invest time into building Circles. These Circles are essentially lists, which have been tried on both Facebook and Twitter. Yet, both of these two social giants have relegated lists in their user interface. Was this because people didn’t set them up? Or that once they set them up they didn’t use them?

I sense that Facebook and Twitter may have realized that the stated need for lists or Circles simply didn’t show up in real life usage. This is one of those problems with qualitative research. Sometimes people say one thing and do another.

As an aside, I think most people would say that more is better. That’s why lists sound so attractive. Suddenly you can really organize and you’ll have all these lists and you’ll feel … better. But there is compelling research that shows that more choice leads to less satisfaction. Barry Schwartz dubbed it The Paradox of Choice.

The Paradox of Choice has been demonstrated with jam, where sales were higher when consumers had three choices instead of thirty. It’s also been proven in looking at 401k participation, the more mutual fund choices available, the lower the participation in the 401k program.

Overwhelmed with options, we often simply opt-out of the decision and walk away. And even when we do decide, we are often less satisfied since we’re unsure we’ve made the right selection. Those who scramble to create a lot of lists could fall prey to the Paradox of Choice. That’s not the type of user experience you want.

The second thing at work here is the notion that people want to share online as they do offline. Is that a valid assumption? Clearly, if you’re into cycling (like I am) you probably only want to share your Tour de France thoughts with other cyclists. But the sharing dynamic may have changed. I wrote before that Google has a Heisenberg problem in relation to measuring the link graph. That by the act of measuring the link graph they have forever changed it.

I think we may have the same problem in relation to online sharing. By sharing online we’ve forever changed the way we share.

If I interpret what FriendFeed (which is the DNA for everything you’re seeing right now), and particularly Paul Buchheit envisioned, it was that people should share more openly. That by sharing more, you could shine light on the dark corners of life. People could stop feeling like they were strange, alone or embarrassed. Facebook too seems to have this same ethos, though perhaps for different reasons – or not. And I think many of us have adopted this new way of sharing. Whether it was done intentionally at first or not becomes moot.

So G+ is, in some ways, rooted in the past, of the way we used to share.

Even if you don’t believe that people are now more willing to share more broadly, I think there are a great many differences in how we share offline versus how we share online. First, the type and availability of content is far greater online. Tumblr quotes, LOLcats, photos and a host of other types of media are quickly disseminated. The Internet has seen an explosion of digital content that runs through a newly built social infrastructure. In the past, you might share some of the things you’d seen recently at a BBQ or the next time you saw your book group. Not anymore.

Also, the benchmark for sharing content online is far lower than it is offline. The ease with which you can share online means you share more. The share buttons are everywhere and social proof is a powerful mechanism.

You also can’t touch and feel any of this stuff. For instance, think about the traditional way you sell offline. The goal is to get the customer to hold the product, because that greatly increases the odds they’ll purchase. But that’s an impossibility online.

Finally, you probably share with more people. The social infrastructure built over the last five years has allowed us to reconnect with people from the past. We continue to share with weak ties. I’m concerned about this since I believe holding onto the past may prevent us from growing. I’m a firm believer in Dunbar’s number, so the extra people we choose to share with wind up being noise. Social entropy must be allowed to take place.

Now Circles might support that since you can drop people into a ‘people I don’t care about’ Circle that is never used. (I don’t have this Circle, I’m just saying you could!) But then you simply wind up with a couple of Circles that you use on a frequent basis. In addition, the asynchronous model encourages people to connect with more people which flies in the face of this hardwired number of social connections we can maintain.

Lists and circles also rarely work for digesting content. Circles is clearly a nice way to segment and share your content with the ‘right’ people. But I don’t think Circles are very good as a content viewing device.

You might make a Circle for your family. Makes perfect sense. And you might then share important and potentially sensitive information using this Circle. But when you look at the content feed from that Circle, what do you get? It would not just be sensitive family information.

If your brother is Robert Scoble you’d see a boat load of stuff there. That’s an extreme example, but lets bring it to the more mundane example of, say, someone who is a diehard sports fan. Maybe that family member would share only with his sports buddies, but a lot of folks are just going to broadcast publicly and so you get everything from that person.

To put it more bluntly, people are not one-dimensional.

I love bicycling. I also have a passion for search and SEO. I also enjoy books, UX, LOLcats and am a huge Kasabian fan. If you put me in an SEO Circle, there’s a good chance you’ll get LOLcats and Kasabian lyrics mixed in with my SEO stuff. In fact, most of my stuff is on Public, so you’ll get a fire hose of my material right now.

Circles is good for providing a more relevant sharing mechanism, but I think it’s a bit of a square peg in a round hole when it comes to digesting content. That’s further exacerbated by the fact that the filtering capabilities for content are essentially on and off (mute) right now.

Sure, you could segment your Circles ever more finely until you found the people who were just talking about the topic you were interested in, but that would be a small group probably and if you had more than just one interest (which is, well, pretty much everyone) then you’ll need lots of Circles. And with lots of Circles you run into the Paradox of Choice.

Conversation

I’ve never been a fan of using Twitter to hold conversations. The clipped and asynchronous style of banter just doesn’t do it for me. FriendFeed was (is?) the place where you could hold real debate and discussion. It provided long-form commenting ability.

G+ does a good job fostering conversation, but the content currently being shared and some of the feature limitations may be crushing long-form discussions and instead encouraging ‘reactions’.

I don’t want a stream of 6 word You Tube like comments. That doesn’t add value. I’m purposefully using this terminology because I think delivering value is important to Google. Comments should add value and there is a difference in comment quality. And yes, you can influence the quality of comments.

Because if the comments and discussion are engaging you will win my attention. And that is what I believe is most important in the social arms race we’re about to witness.

Attention

There is a war for your attention and Facebook has been winning. G+ must fracture that attention before Facebook really begins to leverage the Open Graph and provide search and discovery features. As it stands Facebook is a search engine. The News Feed is simply a passive search experience based on your social connections and preferences. Google’s talked a lot about being psychic and knowing what you want before you do. Facebook is well on their way there in some ways.

User Interface

If it’s one thing that Google got right it was the Red Number user interface. It is by far the most impressive part of the experience and feeds your G+ addiction and retains your attention.

The Red Number sits at the top of the page on G+, Google Reader, Google Search and various other Google products. It is nearly omnipresent in my own existence. (Thank goodness it’s not on Google Analytics or I really wouldn’t get any work done.) The red number indicator is both a notifier, navigation and engagement feature all-in-one. It is epic.

It is almost scary though, since you can’t help but want to check what’s going on when that number lights up and begins to increment. It’s Pavlovian in nature. It inspired me to put together a quick LOLcat mashup.

OMG WTF Red Number!

It draws you in (again and again) and keeps you engaged. It’s a very slick user interface and Google is smart to integrate this across as many properties as possible. This one user interface may be the way that G+ wins in the long-run since they’ll have time to work out the kinks while training us to respond to that red number. The only way it fails is if that red number never lights up.

I’ll give G+ credit for reducing a lot of the friction around posting and commenting. The interactions are intuitive but are hamstrung by Circles as well as the display and ordering of content.

Content

There is no easy way to add content to G+ right now. In my opinion, this is hugely important because content is the kindling to conversation. Good content begets good conversation. Sure we could all resort to creating content on G+ through posting directly, but that’s going to get old quickly. And Sparks as it now stands is not effective in the slightest. Sorry but this is one feature that seems half-done (and that’s being generous.) Right now the content through Sparks is akin to a very unfocused Google alert.

I may be in the minority in thinking that social interactions happen around content, topics and ideas far more often than they do around people. I might interact with people I’m close to on a more personal level, responding to check-ins and status updates but for the most part I believe it’s about the content we’re all seeing and sharing.

I really don’t care if you updated your profile photo. (Again, I should be able to not see these by default if I don’t want to.)

Good content will drive conversation and engagement. The easiest way to effect that is by aggregating the streams of content we already produce. This blog, my YouTube favorites, my Delicious bookmarks, my Google Reader favorites, my Last.fm favorites and on and on and on. Yes, this is exactly what FriendFeed did and it has, in many ways, failed. As much as I love the service, it never caught on with the mainstream.

I think some of this had to do with configuration. You had to configure the content streams and those content streams didn’t necessarily have to be yours. But we’ve moved on quite a bit since FriendFeed was introduced and Google is adhering to the Quora model, and requiring people to use their real names on their profiles.

Google is seeking to create a better form of identify, a unified form of identity it can then leverage for a type of PeopleRank signal that can inform trust and authority in search and elsewhere. But identity on the web is fairly transparent as we all have learned from Rapleaf and others who still map social profiles across the web. Google could quite easily find those outposts and prompt you to confirm and add them to your Google profile.

Again, we’ve all become far more public and even if email is not the primary key, the name and even username can be used with a fairly high degree of confidence. Long story short, Google can short-circuit the configuration problem around content feeds and greatly reduce the friction of contributing valuable content to G+.

By flowing content into G+, you would also increase the odds of that red number lighting up. So even if I haven’t visited G+ in a day (heck I can’t go an hour right now unless I’m sleeping) you might get drawn back in because someone gave your Last.fm favorite a +1. Suddenly you want to know who likes the same type of music you do and you’re hooked again.

Display

What we’re talking about here is aggregation, which has turned into a type of dirty word lately. And right now Google isn’t prepared for these types of content feeds. They haven’t fixed duplication detection so I see the same posts over and over again. And there are some other factors in play here that I think need to be fixed prior to bringing in more content.

People don’t quite understand Circles and seem compelled to share content with their own Circles. The +1 button should really do this, but then you might have to make the +1 button conditional based on your Circles (e.g. – I want to +1 this bicycling post to my TDF Circle.) That level of complexity isn’t going to work.

At a minimum they’ll need to collapse all of the shares into one ‘story’, with the dominant story being the one that you’ve interacted with or, barring prior interaction, the one that comes from someone in your Circle and if there are more than one from your Circle then the most recent or first from that group.

In addition, while the red number interface does deliver the active discussions to me, I think the order of content in the feed will need to change. Once I interact on an item it should be given more weight and float to the top more often, particularly if someone I have in my Circles is contributing to the discussion there.

Long-term it would also be nice to pin certain posts to the top of a feed if I’m interested in following the actual conversation as it unfolds.

The display of content needs to get better before G+ can confidently aggregate more content sources.

Privacy

One of the big issues, purportedly, is privacy. I frankly believe that the privacy issue is way overblown. (Throw your stones now.) As an old school direct marketer I know I can get a tremendous amount of information about a person, all from their offline transactions and interactions.

Even without that knowledge, it’s clear that people might talk about privacy but they don’t do much about it. If people truly valued privacy and thought Facebook was violating that privacy you’d see people shuttering their accounts. And not just the few Internati out there who do so to prove a point but everyday people. But that’s just not happening.

People say one thing, but do another. They say they value privacy but then they’ll give it away for a chance to win the new car sitting in the local mall.

Also, it’s very clear that people do have a filter for what they share on social networks. The incidents where this doesn’t happen make great headlines, but the behavioral survey work showing a hesitance to share certain topics on Facebook make it clear we’re not in full broadcast mode.

But for the moment lets say that privacy is one of the selling points of G+. The problem is that the asymmetric sharing model exposes a lot more than you might think. Early on, I quipped that the best use of G+ was to stalk Google employees. I think a few people took this the wrong way, and I understand that.

But my point was that it was very easy to find people on G+. In fact, it is amazingly simple to skim the social graph. In particular, by looking at who someone has in their Circles and who has that person in their Circles.

So, why wouldn’t I be interested in following folks at Google? In general, they’re a very intelligent, helpful and amiable bunch. My Google circle grew. It grew to 300 rather quickly by simply skimming the Circles for some prominent Googlers.

The next day or so I did this every once in a while. I didn’t really put that much effort into it. The interface for finding and adding people is quite good – very fluid. So, I got to about 700 in three or four days. And during that time the suggested users feature began to help out, providing me with a never ending string of Googlers for me to add.

But you know what else happened? It suggested people who were clearly Googlers but were not broadcasting that fact. How do I know that? Well, if 80% of your Circle are Googlers, and 80% of the people who have you in Circles are Googlers there’s a good change you’re a Googler. Being a bit OCD I didn’t automatically add these folks to my Google Circle but their social graph led me to others (bonus!) and if I could verify through other means – their posts or activity elsewhere on the Internet – then I’d add them.

How many people do I have in my Google circle today?

Google Employees G+ Circle

Now, perhaps people are okay with this. In fact, I’m okay with it. But if privacy is a G+ benefit, I don’t think it succeeds. Too many people will be upset by this level of transparency. Does the very private Google really want someone to be parsing the daily output of its employees? I’m harmless but others might be trolling for something more.

G+ creates this friction because of the asymmetric sharing model and the notion that you only have to share with the people in your circles. Circles ensures your content is compartmentalized and safe. But it exposes your social graph in a way that people might not expect or want.

Yes, I know there are ways to manage this exposure, but configuration of your privacy isn’t very effective. Haven’t we learned this yet?

Simplicity

Circles also has an issue with simplicity. Creating Circles is very straight forward but how content in those Circles is transmitted is a bit of a mystery to many. So much so that there are diagrams showing how and who will see your content based on the Circle permutations. While people might make diagrams just for the fun of it, I think these diagrams are an indication that the underlying information architecture might be too complex for mainstream users. Or maybe they won’t care. But if sharing with the ‘right’ people is the main selling point, this will muddy the waters.

At present there are a lot of early adopters on G+ and many are hell bent on kissing up to the Google team at every turn. Don’t get me wrong, I am rooting for G+. I like Google and the people that work there and I’ve never been a Facebook fan. But my marketing background kicks in hard. I know I’m not the target market. In fact, most of the people I know aren’t the target market. I wonder if G+ really understands this or not.

Because while my feed was filled with people laughing at Mark Zuckerberg and his ‘awesome’ announcement, I think they missed something, something very fundamental.

Yes, hangouts (video chat) with 10 people are interesting and sort of fun. But is that the primary use case for video chat? No, it’s not. This idea that 1 to 1 video chat is so dreadful and small-minded is simply misguided. Because what Facebook said was that they worked on making that video chat experience super easy to use. It’s not about the Internati using video chat, it’s about your grandparents using video chat.

Mark deftly avoided the G+ question but then, he couldn’t help himself. He brought up the background behind Groups. I’m paraphrasing here, but Zuckerberg essentially said that Groups flourished because everyone knew each other (that’s an eye poke at the asymmetric sharing model) and that ad hoc Groups were vitally important since people didn’t want to spend time configuring lists. Again, this is – in my opinion – a swipe at Circles. In many ways, Zuck is saying that lists fail and that content sharing permissions are done on an ad hoc basis.

Instead of asking people to configure Circles and manage and maintain them Facebook is making it easier to just assemble them on the fly through Groups. And the EdgeRank algorithm that separates your Top News from Recent News is their way of delivering the right content to you based on your preferences and interactions. I believe their goal is to automagically make the feed relevant to you instead of forcing the user to create that relevance.

Sure there’s a filter bubble argument to be made, but give Facebook credit for having the Recent News tab prominently displayed in the interface.

But G+ could do something like this. In fact, they’re better placed than Facebook to deliver a feed of relevant information based on the tie-ins to other products. Right now there is essentially no tie in at all, which is frustrating. A +1 on a website does not behave as a Like. It does not send that page or site to my Public G+ feed. Nor does Google seem to be using Google Reader or Gmail as ways to determine what might be more interesting to me and who really I’m interacting with.

G+

I’m addicted to G+ so they’re doing something right. But remember, I’m not the target market.

I see a lot of potential with G+ (and I desperately want it to succeed) but I worry that social might not be in their DNA, that they might be chasing a mirage that others have already dismissed and that they might be too analytical for their own good.

How To Implement Rel=Author

July 01 2011 // SEO // 322 Comments

Overshadowed by the Google+ launch was the implementation of the rel=author markup in search results. Once implemented, authors are given a very prominent treatment on search results.

It doesn’t reorder the results (yet) but it certainly highlights that result and likely drives a much higher click through rate. I was already interested in rel=author, but this was enough to get me off the proverbial couch and try it out myself.

Unfortunately the authorship directions provided by Google, while probably comprehensive, are confusing.

Thankfully, Louis Gray got me into Google+ and it was there that I put out the bat signal for a rel=author expert. Three Google employees quickly responded and set me straight on how exactly to implement rel=author.

A big thank you to Googlers Pierre Far, Daniel Dulitz and Jeremy Hylton for their assistance. Here’s what I learned from them.

Three Link Monte

The TL;DR version for implementing rel=”author” is that it requires three specific links.

A link from your blog post or article to your author page using rel=”author”
A link from your author page to your Google profile page using rel=”me”
A link from your Google profile page to your author page using rel=”me”

Read on for specific directions on how to get rel=”author” up and running on your own site or blog.

[Update 12/15/2011] While I still prefer the method described in this post, Google does allow you to verify authorship via an email address. Directions for this method can be found on the new Authorship home page.

Blog Post

The first link is from your blog post to an author page on the same domain. This is essentially a link that tells Google about the authorship of the posts on that domain. That’s why you use rel=”author” on this link. A blog with multiple authors will have multiple author pages, with the posts each author has written pointing to their own author page using rel=”author”.

But the author page does not have to be an actual author page. For a solo blogger, you can simply use your about page because that is about the author of the site.

Most templates will have the author of the post in the byline. This is where you want to place the rel=”author” link.

Here’s what I did. In WordPress I navigated to Appearance > Editor and then chose to edit my Single Post file. I then looked for the byline section and updated where it was going and added the rel=”author” attribute. (Use the quotes!)

rel=author code

Now every one of my posts will have a link in the byline from my name to my about page using the rel=”author” attribute.

[Update] You can actually use the root domain as your author page if it a) has a link, using rel=”me” to your Google profile and b) if it is not on a free host domain such as WordPress or Blogger. (Credit: @pedrodias)

Frankly, I think this makes it a bit more confusing but it is another option if you really don’t have a true author or about page.

Author Page

The second link is from the author page to your Google profile page. This link tells Google that the author of that domain is the same person as the one in the Google profile. You’re essentially claiming that Google profile as your own, which is why you use rel=”me” on this link.

The best practice is to link to the base URL of your Google profile.

https://plus.google.com/115106448444522478339

This might be a bit confusing because the base URL will default to a /posts suffix. It is further complicated by the implementation of Google+ which changes the subdomain from profiles to plus.

Don’t worry. If you’re not using Google+ yet use your current Google profile URL. Google will put in the proper 301 redirect from the old profile to the new once you’re using Google+.

[Update] Another way you can link to your Google profile is by creating a G+ button. Just make sure you select the ‘author’ option when generating the code and it will insert the rel=”me” attribute. (Credit: @pedrodias)

Google Profile

The third link is from your Google profile to your author page. A link to your domain is not going to cut it.

Linking to the actual author page makes sense in light of multi-author sites and blogs. You might not be the author of all the content on that domain, but you want to show that you’re the author of those few guest posts.

Edit Google Profiles for Rel=Author Markup

Go to your Google profile and select edit profile. Then click on the Links section and click Add custom link (it’s at the bottom). Then enter your label and author page URL and make sure you check the ‘This page is specifically about me.’ box. That will put a rel=”me” on this link.

Now you have a rel=”me” attribute pointing from your Google profile to your author page. Add a link for each author page. This would include your own blog but also the author pages for any sites or blogs to which you’ve contributed content.

My OCD kicks in here since every other link I have is a very clean link to my places on the web. In the future I’m hoping Google could create a separate link list for these author pages.

[Update] Ask and you shall receive! Google has created a separate set of links labeled ‘Contributor to’ for authorship purposes. It is now recommended to use this section to complete the authorship loop.

This new option is displayed when you edit links on your profile. The one major change here is that you’ll no longer see the ‘This page is specifically about me’ box.

Link to your author page if you’re using the three-link method, particularly for work on a multi-author site or blog. Link to the home page if you’re using the two-link method which I haven’t described here (yet). There is some indication that a home page link here might work for either method, but I’d err on the side of caution until that is confirmed.

How To Check Your Work

The last step is checking your work. To see if you’ve done everything correctly, run a sample blog post or article through the rich snippet testing tool. You can use my time saving Rich Snippets Testing Tool Bookmarklet for this task.

In my first attempt, I kept getting “verified = Author link is not verified.”

What did that mean?

Was Google just waiting to verify the links? No. It meant I’d screwed up the implementation. This is what you want to see instead.

Rich Snippet Test Tool Results for Rel=Author

If this isn’t what you’re seeing go back and check your work again. A missed quote or placing it outside of the link element could be the culprit. In addition, you will not see the author image in the tool. Nor will you see the image immediately in search results. That might be disconcerting but it’s expected and nothing to worry about.

[Update] There’s been some chatter about whether you need to submit your site via Authorship Request Form or Rich Snippets Interest Form to enable rel=”author”.

The answer is NO. Here’s what Google Webmaster Trends Analyst John Mueller confirmed.

We will pick up the authorship information automatically as we recrawl and reindex the pages involved, but this can take a bit of time until it’s visible. You do not need to submit either of these forms for authorship information. That said, the form linked from the help center article is useful to fill out, since it gives our team a contact person on your side should we notice something amiss with the markup on your side.

[Update] Google Product Manager Sagar Kamdar reports that there is a bug in the Rich Snippet Testing Tool for those trying to verify authorship mark-up, particularly for those using the alternate two-link method. This method can be used for blogs with one author and entails a rel=”author” link from the home page to the Google Profile with a rel=”me” link from the Google Profile back to the home page.

The bug means that you’ll get a negative response even when you’ve set it up properly. Google is working quickly to fix this bug, hoping to have it deployed by next week.

I’ll update this post once more when I know how long it takes between implementation and having the profile image displayed in search results.

[Update] On July 7th, about one week after I implemented rel=”author” on this blog, my smiling mug is being displayed on Google search results. (Thanks to the JoshMeister for the heads up.)

rel author search results example

The week between implementation and display is only for those URLs that have been recrawled recently. So this post, and my recent Google+ review both have rel=”author” display but an older post on SEO and UX does not. To be clear, it’s not about the age of the URL, it’s simply what Google is crawling again. I have older posts, such as my Facebook SEO post that do have the rel=”author” in place.

So your mileage may vary depending on how often your site and individual URLs are crawled.

[Update 12/15/2011] You can now also check Author stats in Google Webmaster Tools to see statistics for pages for which you are the verified author.

That’s how I implemented rel=author. Let me know if it works for you and if you’ve found other ways to implement it on WordPress or other platforms like Blogger.

How To Get Out of Panda Jail

June 29 2011 // SEO // 9 Comments

Did Google put you in Panda Jail?

Panda behind bars

Image credit: Alex Pilchin

Many of the hundreds of Panda blog posts contain theories and advice on how to get out of Panda Jail. I’m going to review some of the more popular recommendations to show you why they’re both completely right but absolutely pointless at the same time.

Noindex Low Quality Content

Remove the cancerous content and you’ll escape from Panda Jail, right? Heck, even Google suggests you noindex duplicate and thin content pages.

The process is pretty straight forward. Look for the offending content using certain metrics. Isolate and noindex the pages on your site which have a high bounce rate, a high exit rate (careful with that one) or haven’t received any search traffic over a long period of time.

This isn’t a bad thing to do, but it won’t get you out of Panda Jail.

Instead: Grade your content corpus like your high school English teacher would.

Lower Your Bounce Rate

One of the more popular theories is that it’s all about bounce rate or pogosticking. In essence, reducing your bounce rate is a signal of user satisfaction. Sounds good right? But it’s really easy to artificially reduce bounce rate through some clever user interaction design.

In addition, bounce rate is often not a signal of user satisfaction. The bounce rate for a Q&A site is going to be very different from an eCommerce site. Do you believe that Google measures all sites using the same benchmark? Not a chance.

The goal isn’t to lower your bounce rate but is to increase user satisfaction. I can easily see a situation where user satisfaction would go up, but so would bounce rate.

Instead: Make sure your pages match query intent with relevance and value.

Lower Your Ad to Text Ratio

I’m sure many of you have fired up the Google Browser Size Tool and applied it against your website. The idea is that Panda is tripped if there is more advertising then content on pages. Google seems to understand what is content and what is advertising or chrome (e.g. – navigation and masthead).

It should be a red flag if the ads on a page actually make it difficult to read the content. But focusing on the actual percentage and trying to figure out when you cross some magical algorithmic line will be a waste of time.

Instead: Make sure your site passes some basic usability and readability tests.

Fix Your Link Profile

Do you have a lot of links from low quality sites or sites in Panda Jail? Some believe that an over abundance of links from these sites could put you in Panda Jail by association. It’s more likely that those links simply got neutralized and don’t pass as much trust and authority as they used to.

Trying to shape your anchor text or removing yourself from bad neighborhoods won’t do you much good. Mind you, make sure you’re not in a ring of porn sites but overall this isn’t why you’re in Panda Jail. In fact, where you link out to is far more important.

Instead: Grow your links organically by building your reputation and expertise.

How To Get Out of Panda Jail

Too many people are putting the horse before the cart and focusing on the tree and not the forest.

Birch tree trucks in forest

Image credit: Tom Stanley Janca

I love numbers and metrics but getting out of Panda Jail is not about optimizing for each specific metric. The lock on Panda Jail isn’t picked though a combination of simple numbers.

Instead, it’s about changing those numbers by understanding query intent and matching it with relevance and value. It’s about evaluating your site for usability and readability. It’s about delivering quality content which should not be confused with keyword matched content or a lot of content.

Getting out of Panda Jail requires you to understand mental models, information architecture, user experience, interaction design and conversion rate optimization.

The numbers will change, but they’ll change for the right reasons. And that just might get you paroled from Panda Jail.

What is a C Block IP Address?

June 23 2011 // SEO // 34 Comments

What do Matt Cutts and Black Hat SEO have in common?

C Block IP Address for mattcutts.com

They share a C Block IP address.

OMG! Is this bad or dangerous? Here’s a fully non-technical explanation of C Block IP addresses and why you should or shouldn’t care about them.

What is an IP address?

Home Sweet Home Sign

An Internet Protocol (IP) address is essentially the computer address of a machine, in this case your site. It’s expressed as a series of four numbers (each between 0 and 255) separated by dots.

It might look something like 64.62.209.175

This is where you live. It’s home. Your little corner of the Internet.

What’s a C Block IP address?

You might get some conflicting results if you search for information on C Block IP because it often is confused with C Class IP addresses. For our purposes, the IP address is split into lettered sections (or blocks).

AAA.BBB.CCC.DDD

64.62.209.175 is in the same C Block as 64.62.209.10

The C Block is your neighborhood and those in the same C Block are your neighbors.

C Class IP addresses is a reference to the assignment of IP addresses. I’m not going to explain it further here because it just confuses the issue at hand. (If interested, here’s some basic information on IP Classes.)

C Block Links

The question that often comes up is whether links from the same C Block are some sort of red flag. For the most part, links from the same C Block aren’t a problem. It’s okay to get links from your neighbors.

But if you’re only getting links from your neighbors, things start to look a little fishy. Google might suspect that those neighbors aren’t entirely legitimate. A nefarious type might set up a slew of domains and have them all link to each other.

Links from 50 domains but just one C Block will look very strange. Links from 50 domains from 45 C Blocks will look just fine.

Bad C Blocks

Boarded Up Apartment Complex

Sometimes a bad element can move into your neighborhood. If you’re using a shared hosting provider you’re essentially living in an apartment building. Other people live at that address. If every one of them except for you is a notorious porn site, it might not look so great.

This could even happen on a larger level where an entire C Block has been used for some unsavory purpose. In this instance it’s not just one building that’s dilapidated, it’s the entire neighborhood.

Here’s what Matt Cutts had to say about it recently.

Check Your Neighborhood

Use the Majestic SEO Neighbourhood Checker if you’re moving into a new neighborhood or if you’re just interested in checking up on your neighbors. That’s how I stumbled upon this SEO C Block.

Sure enough mattcutts.com shares a C Block with seoblackhat.com, home of QuadsZilla. C Block neighbors also include Sphinn, Search Engine Land, Search Engine Journal, Wolf-Howl, Search Marketing Expo and Daggle.

None of these sites are getting dinged by association or even by the fact that they link to each other quite frequently.

TL;DR

An IP address is your home. The C Block is your neighborhood. C Block IP is really only important for SEO if you find out your IP is in a C Block slum.

The Future of Search and SEO

June 16 2011 // SEO // 5 Comments

What did the announcements made at Inside Search really mean for the future of search and SEO? More than meets the eye.

Mobile Search

The key to Google’s mobile announcements is that the human computer interface is being streamlined based on the platform and query intent.

Google developed shortcuts for specific types of mobile searches, essentially providing ways to make searches with certain types of intent super efficient. These searches are now performed through a combination of location and icon selection.

No words necessary.

Making it easier to search is important, but predicting query intent is even more powerful. If you’re not thinking about query intent, you’re not doing your job.

Voice Search

Voice search on desktop devices is another change in the human computer interface. Unlike mobile, this time it’s encouraging more words, not less.

The actual voice search capability is impressive but I’m more interested in how it might change keyword targeting and keyword research. User syntax is a crucial part of keyword research. Do people search on your topic using this word or that word? What words are they using to modify or reformulate their queries?

The answers to those questions are bound to change as voice search becomes a greater part of the equation. In short, you speak different then you write. This change in user syntax may not happen overnight, but I believe that as voice search becomes more prevalent we’ll be looking at different types of user syntax and, potentially, a shift in how long tail content is accessed.

Image Search

Images are becoming a connective tissue on the Internet. Whether it’s Flickr, mlkshk, Color, Instagram, Photobucket, iStockphoto, deviantART, We Heart It, Pinterest or any number of other sites and services, images are ubiquitous.

I’m an unabashed LOLcats fan and can envision a future where we communicate through a combination of memes. Think hieroglyphs for the Internet age. But I digress.

a picture is worth 1000 words

The trite phrase is that a picture is worth a 1000 words. Yet, I don’t think Google’s at the point where images alone are meaningful. The pixel matching algorithm may be quite good but those pictures still need words to describe them. File names, captions, alt text, anchor text, page content and other descriptors are necessary to provide context and results.

The drag and drop image search interface is a big change. But basic image optimization will remain important and likely increase in importance in the future.

Instant Pages

Speed. Google is consumed with reducing the time it takes between seeking and finding knowledge. Reducing the load time of a SERP click by pre-rending the page certainly supports this mission and improves user experience. Yet, I think there’s an ulterior motive.

Google wants to get better at predicting quality.

speed of google search

We know that Google measures and likely uses the pogostick rate as a benchmark for satisfaction, which plays an important part in determining search quality. The problem with the pogostick rate was the variable load speed of the target page.

Was the pogostick rate a measure of the quality of the content or on how quickly that content was delivered?

The first way Google looked to solve this was by convincing sites to get faster. So they talked about speed as a signal and built us all that pretty graph in Google Webmaster Tools with the crazy benchmark of 1.5 seconds. Most of us made headway on speed but not enough to satisfy Google.

Google figured out a way to do it themselves and meet their own 1.5 second benchmark. Now the pogostick rate for these pre-rendered clicks are pure. Google has removed the speed of delivery as a mitigating factor, giving them more confidence in the pogostick rate as a measure of content satisfaction.

Instant Pages should help Google improve overall search quality.

TL;DR

Google is changing the way we search though vastly different human computer interfaces. They’re looking to make it easier and faster to match query intent with value. If the way we search is changing, you better believe SEO is changing too. Monolithic, text only, cookie-cutter SEO devoid of intent analysis will become less and less effective.

SEO isn’t dead, it’s just getting more interesting.

Google Related Searches

June 13 2011 // SEO // 18 Comments

I love a good spreadsheet full of numbers and columns that I can filter and manipulate. But when I begin keyword research it’s not about numbers, it’s about user intent and query syntax. It’s about finding the right modifiers.

Google Related Searches

Google related searches is a powerful way to gather keyword intelligence. You’ll find a subset of related searches at the bottom of search results but you’ll want to use the advanced search option on the left hand menu to get full value from this feature.

You may need to click the More search tools link to reveal this option.

More search tools

One more housekeeping note. I’m going to split this post up into two sections, the first addressing tactical ways to use Google related searches and the second straying into a more theoretical examination that might appeal to algorithm geeks.

Find Root Modifiers

Related searches can help you quickly identify top root term modifiers.

google related searches for heart attack

For the root term ‘heart attack’ you get a nice collection of modifiers. Then enter a space after the term in the search bar to trip Google Instant.

google instant while in related searches

That’s right, you get double the suggestions when using Google Instant with related searches. Not only that, but these are different from the suggestions offered in the normal Google Instant interface.

Time to use your brain and collect the modifiers that make sense. I usually copy and paste the root modifier combinations to a text pad. Then drop the list into the Google Keyword Tool to find additional keyword opportunities and benchmark query volume.

Find Modifier Classes

Locating a few strong modifiers is nice but identifying a modifier class is even better. I define a modifier class as a set of structured terms. In this instance a picture really is worth a 1,000 words.

find modifier classes through related searches

Here you quickly confirm that song is a modifier class for Kasabian lyrics and probably is for any band+lyrics combination. (This is a shameless plug for the amazing Kasabian.)

Another example uncovers an easy brand modifier class.

google related searches identify modifier classes

Modifier classes are great ways to understand query patterns and, should you have the content to support it, expand your footprint.

Find Term Synonyms

Usually you’ll start the research process with a target keyword. However, this term may or may not be the way your customers are actually searching for your product or service.

google related searches for tv repair

In this instance ‘tv repair’ might be synonymous with ‘tv troubleshooting’ and ‘tv problems’. It’s up to you to figure out what the query intent is for each of these variations. Don’t just blindly gather up modifiers willy-nilly!

I recommend using Google Insights for Search to see the category profile for each of these terms. In this case, you’ll find that the local category is more prevalent for those searching for ‘tv repair’ than ‘tv troubleshooting’. I interpret this to mean the intent behind ‘tv repair’ may lean toward finding a local tv repairman while the intent behind ‘tv troubleshooting’ may lean toward finding a do-it-yourself solution.

As a safeguard you’ll want to perform a synonym query on many of these modifiers.

Find Competitors

Google related searches can also bring up potential competitors.

find competitors in google related searches

For the term ‘sweatshirt’, retailers like American Eagle, American Apparel, Old Navy and Hollister are all presented. If you’re selling sweatshirts you might add these sites to your list of competitors. What’s interesting is that none of these sites rank on the first page of organic results for this term.

A quick analysis leads me to believe these sites are optimizing for ‘hoodie’ and ‘sweater’ instead of ‘sweatshirt’. That might not be a bad idea based on comparative query volume.

Find Semantic Keywords

You’ll probably have noticed by now that sometimes Google related searches don’t include the target keyword at all. In these instances Google is returning a type of semantic keyword.

find semantic keywords using google related searches

Bananas foster is my favorite dessert (and I was lucky enough to have it at Brennan’s.) Here you can see that Google is returning other desserts related to bananas foster. To be fair this probably isn’t true semantics but simply a measure of closely related queries. Nevertheless it can be an interesting way to find potential keyword targets for SEO or PPC programs.

Speaking about PPC, what about pay per click?

google related searches for pay per click

Sure enough Google related searches does a nice job of bringing up related terms without the keyword term being present.

You Are A Tool

Sounds like a put down but it’s actually a compliment. Google related searches is a powerful keyword research tool, allowing you to explore and find modifiers, modifier classes, synonyms, competitors and semantic terms. Yet the most critical part is to apply your own analysis and to intelligently validate assumptions with other tool sets.

(This ends the tactical part of the post. You’re now entering the theoretical side that might only appeal to algorithm geeks. You’ve been warned.)

Bigram Breakage

What struck me as I examined related searches is how it reveals the application of bigram algorithms. Loosely interpreted, it’s a way to model where to split word pairs or break a query into component parts.

Admittedly, I’m a neophyte in this realm so I’ll let Steven Levy, author of In The Plex, explain it.

The key to understanding … was the black art of “bigram breakage”; that is, how should a search engine parse a series of words entered into a query field, making the kind of distinctions that a smart human being would make?”

Related searches puts bigram breakage front and center.

Example of Bigram Algorithm

The related searches for ‘social network sites’ returns modifiers on the entire term, but also for component terms: ‘sites’ and ‘social network’. This indicates that Google has learned where best to split this phrase. There aren’t any ‘network sites’ modifiers displayed.

Here’s another interesting example.

Google Wildcard Bigrams

Once again Google returns the whole term with modifiers. But this time it’s identified ‘social contact’ and ‘social ___ integration’ as the component terms. I, for one, am fascinated by how Google determines how to split word phrases.

Speed of (Machine) Learning

Not only can you get a sense for how Google is splitting word phrases but also how long it takes them to learn when to do so.

Angry Birds Related Searches

Related searches for angry birds only returns modifiers. It seems like Google has learned that it should not split these two words. Yet, it hasn’t yet learned to return related searches without that keyword (e.g. – Fruit Ninja).

Contrast this to another angry animal.

Angry Dogs Related Searches

The contrast is pretty stark. Here Google does split the phrase, producing modifiers for ‘dogs’ and ‘angry’.

Watch and Learn?

Understanding the science behind search can be interesting and, at times, useful. In this case, I also wonder if watching how the related searches change over time for certain valuable keywords might be instructive.

What happens when the bigram breakage for a term changes? How long does it take for Google to recognize when not to break a word pair? How long does it take before Google develops semantic terms for that word pair? How do these things impact normal day-to-day optimization efforts?

I don’t know the answers. Heck, I might not even be asking the right questions! But I believe observation can be a great teacher. So I’ll be keeping an eye on when and how certain related searches evolve over time.

Google Scribe SEO Hints

June 05 2011 // SEO + Technology // 6 Comments

Lost in the Google +1 button launch and Schema.org announcement was the release of a new version of Google Scribe. In typical Google fashion, this unassuming product may be more important than both Google +1 and Schema.org.

Google Scribe

What is Google Scribe?

Google Scribe is one of a number of Google Labs experiments.

Google Scribe helps you write better documents. Using information from what you have already typed in a document, Google Scribe’s text completion service provides related word or phrase completion suggestions. It also checks your documents for incorrect phrases, punctuations, and other errors like misspellings. In addition to saving keystrokes, Google Scribe’s suggestions indicate correct or popular phrases to use.

Think of it as an intelligent version of Google Docs.

Language Research Lab

But what is Google Scribe really about? Look no further than the engineer working on the project.

Google Scribe Engineer

That’s right, Google Scribe is about language models, something at the core of how Google interprets and evaluates web content.

Since Google Scribe’s first release on Google Labs last year, we have been poring over your feedback and busy adding the top features you asked for. Today, we’re excited to announce a new version of Google Scribe that brings more features to word processing.

Poring over your feedback might seem like they’re reading comments and suggestions submitted by users, but in actuality I’m guessing it’s the complex analysis of usage. Google Scribe is about language research. The kind of research helping Google refine algorithmic signals.

Every time you use Google Scribe you’re helping to refine the language model by choosing from one of many text completion suggestions. Google is getting smarter about language.

Semantic Proofreading

One of the new features seems to be a direct result of this analysis: semantic proofreading.

Semantic Proofreading Example

Normal spell check would not catch the words in this example because both words are correctly spelled. Yet, the language model has learned that the word awesome is rarely ever preceded by the word quiet.

That’s quite awesome.

Good Writing Matters

Unless you’ve been living under a rock you probably know that Google is using spelling and grammar as a way to determine content quality. Any analysis of Amit Singhal’s Panda questions would indicate that grammar and spelling are gaining in algorithmic importance.

I’d recently discussed Google’s potential use of spelling and grammar on reviews with Bill Slawski. I wasn’t convinced it was a good idea.

But then Barry Schwartz reported on a Google Webmaster Forum response by Google employee John Mueller regarding spelling in user generated content.

This was noteworthy enough to prompt an official Google tweet.

Google Good Spelling Tweet

Is that clear enough for you?

Anchor Text Suggestions

This new version of Google Scribe creates automatic anchor text for a URL. That in itself is pretty interesting, but Google Scribe also gives alternate anchor text suggestions and the ability for the user to create their own.

Here are two examples using my fellow Sphinn editors: Michael Gray and Jill Whalen.

Google Scribe Link Suggestions for Wolf Howl

Google Scribe Link Suggestions for High Rankings

Clearly Google Scribe is already seeing and using back link profiles. But Google will learn about the validity of the anchor text every time someone changes the anchor text from the automated, or primary suggestion, to one of the other suggestions or creates something entirely new.

What happens when Google Scribe determines that the primary suggestion for a URL is rarely used? The implication is that link suggestions could provide a feedback mechanism on overly optimized or ‘off-topic’ anchor text.

In other words, a paid link signal.

High Quality Documents

I’m convinced Google Scribe is helping to improve Google’s ability to interpret and analyze language. But there are indications that Google could be thinking even bigger.

Google Scribe Labs Description

Sure enough the description of Google Scribe starts with that succinct elevator pitch. “Write high-quality documents quickly.” The last word tells me it’s meant to support the new digital content work flow.

Scribe Bookmarklet and Extension

You can take Google Scribe on the go using the bookmarklet or Chome extension. I’m using the bookmarklet right now as I’m writing this post.

Google Scribe WordPress Integration

It’s a bit clunky from a UX perspective but I see a lot of potential. A more refined product might help sites ensure their users are producing well written user generated content.

Flipping The Funnel

Why limit yourself to the output of content when you can influence the input of content.

The explosion of digital content has been made possible, in large part, by blogging platforms. Yet, the quality of the content has been uneven, and that’s probably being generous. So why not attack the problem at the top of the funnel? Help people write better content.

I like the idea. In fact, I like it so much I’m exploring a side project that does the same thing in a different yet complementary way.

Google Scribe and SEO

Data and Spot Star Trek LOLcat

Like it or not, Google is using spelling and grammar to determine content quality. Google Scribe is one method being used by Google to better understand and evaluate language and anchor text. It’s not about the actual product (right now) but about the data (feedback) Google Scribe is producing.

Instead of obsessing about the specifics of the Panda update the SEO community can look to Google Scribe and take the hint. It’s not just what you say, it’s also how you say it.

So if you’re responsible for content, take a few more minutes and proofread your work. Google will.

Mechanical Turk Tips

June 03 2011 // SEO + Technology // 7 Comments

Amazon Mechanical Turk is a great way to do a wide variety of tasks, from content creation to image tagging to usability. Here are 15 tips to get the most out of Mechanical Turk.

Learn The Lingo

What’s a HIT? Mechanical Turk can be a bit confusing upon first glance. In particular, you’ll need to understand this one important acronym.

A HIT is a Human Intelligence Task and is the work you’re asking workers to perform. A HIT can refer to the specific task your asking them to perform but also doubles as the terminology of the actual job you post in the community.

Select A+ Workers

95 percent or more approval rate for HITs

The long and the short of it is that reputation matters and past performance is a good indicator of future performance. Limit your HITs to those with at least a 95% approval rate.

It may shrink your pool of workers and could increase the time to completion but you make up for it in QA savings.

Segment Your Workers

Match the right workers to the right task. In my experience, you get better results from US based workers when you’re doing anything that requires writing or transcription. Conversely, international workers often excel in tasks such as data validation and duplicate detection.

Give Workers More Time Than They Need

The time you give is the time workers have before the HIT disappears. Imagine starting a job and when you come back to turn in your work and collect payment the shop has closed and left town. This can really frustrate workers.

Mechanical Turk Reward Tip

I think Amazon creates this problem with the messaging around the hourly rate calculation. My advice, don’t get too hung up on the hourly rate and err on the side of providing more time for your HITs.

Provide Specific Directions

Remember that you are communicating work at a distance to an unknown person. There’s no back-and-forth dialog to clarify.

In addition, workers are looking to complete work quickly and to ensure they fulfill the HIT so their approval rate remains high. The latter, in particular, makes specificity very important.

Tell workers exactly what to do and what type of work output is expected.

Make It Look Easy

While the directions should be specific you don’t want a 500 word paragraph of text to scare folks off. Make sure your HIT looks easy from a visual perspective. This means it’s easily scanned and understood.

Take advantage of the HTML editor and build in a proper font hierarchy, appropriate input fields and use a pop of color when you really want to draw attention to something important.

Give Your HIT a Good Title

Make sure your HIT title is the appropriate length (not too short or long) and that it’s descriptive and appealing.

Mechanical Turk HIT Title examples

A good title is a mixture of SEO and marketing principles. It should be relevant and descriptive but also interesting and alluring.

Bundle The Work

If you can do it, bundle a bunch of small tasks into one HIT. For instance, have them tag 10 photos at a time.

This helps because you can set a higher price for your HIT. You’ll attract a larger pool of workers since many don’t seek out ‘penny’ HITs.

Mind Your Email

Workers will email you – frequently. Do not ignore them.

You are joining a community. Just take a peek at Turker Nation. As with any community, you get and build a reputation. Don’t make it a bad one. Respond to your email, even if the response isn’t what workers want to hear.

In addition, you learn how to tweak your HIT by listening to and interacting with the workers.

Pay Fast

A lot of the email you may receive is around a familiar refrain: “When will you pay.” This gets tedious so I generally recommend paying quickly, reducing the amount of unproductive email and giving you a good reputation within the community.

Pay Mechanical Turk HITs Fast

That means setting your automatic approval for something like 2 or 3 days.

Develop a QA System

To pay fast you need a good QA system. You can either do this yourself or, alternatively, put the work out as a separate HIT. That’s right, you can use Mechanical Turk to QA your Mechanical Turk work. Insert your Inception or Yo Dawg joke here.

Bonus Good Work

10 Dollar Bill

Give a bonus when you find workers who have done an excellent job on number of HITs. It doesn’t have to be a huge amount, but take the top performers and give them a bonus.

Not only is this the right thing to do, it’ll go a long way to establishing yourself in the community and developing a loyal pool of quality workers.

Build a Workforce

Once you find and bonus good workers, continue to give them HITs. You can do this by creating a list of and limiting HITs to just those workers.

If you do this you probably want to keep the ‘Required for preview’ box checked so workers not on that list aren’t frustrated by previewing a HIT they don’t have any chance of working on.

Download the worker history (under Manage > Workers) and use Excel to find high volume and high quality workers. Then create your list (under Manage > Qualification Types) so you can use it in your HIT.

Block Bad Apples

Just as you build a list of good workers, you also need to block a few of the bad ones. They might have dynamite approval ratings but for different types of tasks. Some people are good a some things and … not so good at others.

Coaching workers is time consuming and costly, so it’s probably better for you and the worker to simply part ways. You ensure the approval rate on your HITs remains high and the worker won’t put their approval rate in jeopardy.

Understand Assignments

Finally, understand and use assignments wisely. Each HIT can be assigned to a certain number of workers.

Warning on Assignments per HIT

So if you’re HIT is about getting feedback on your new homepage design, you might assign 500 workers to that HIT. That means you’ll give 500 reactions to your new homepage. It’s one general task that requires multiple responses.

But if you’re HIT is about validating phone numbers for 500 businesses, you will assign 1 worker to each HIT. That means you’ll get one validation per phone number. Do not assign 500 workers or you’ll get 500 validations per phone number. That’s wasteful and likely to irk those businesses too.

Mechanical Turk Tips

These tips are the product of experience (both mine and the talented Drew Ashlock), of trial and error, of stubbing toes during the process.

I hope this helps you avoid some of those pitfalls and allows you to get the most out of a truly innovative and valuable service.

Blind Five Year Old

The Pen Salesman

Panda and Big Data

Google+ Review

How To Implement Rel=Author

How To Get Out of Panda Jail

What is a C Block IP Address?

The Future of Search and SEO

Google Related Searches

Google Scribe SEO Hints

Mechanical Turk Tips

Subscribe

Recently Published

Browse by Category

Search The Site

Follow

Blog Roll