Thursday, June 04, 2009

Biographies of living persons: An ingenius compromise?

So what do we do about the people who don't want their biographies written up on Wikipedia? The dead trees standard (see last month's post) has some support but not consensus. Well here, for the first time in two years, is a really new solution:

Suppose we noindexed biographies of living persons, upon the subject's request.

First the basics, then the details. Suggestions welcomed.

Basics: A subject who doesn't want his or her biography to show up on Google could email OTRS with a request to have their biography noindexed. The biography itself would remain on Wikipedia, but the search engines wouldn't crawl it anymore.

Details: The ability to noindex in mainspace would be tightly controlled. Policy would restrict its use to BLP articles only, and the developers would toggle the ability only for select users. P possibly this ability would be restricted to functionaries. A log would be generated of all noindexed biographies, which would be reasonably accessible (viewable by all administrators)? The specifics on access would need to be nailed down, but something along the lines of that scope.

Generally speaking, most biography subjects' chief complaint has to do with the high ranking given to Wikipedia articles by search engines--combined with the difficulty in preventing malicious vandalism. Most of the objections to deleting biographies upon request have to do with the desire for encyclopedic completeness. So this looks like a viable compromise. At the very least, it's worth discussing.

Many thanks to the folks at WikiVoices for the brainstorm that led to this idea.


Sam said...

This is an interesting proposal. Noindex could potentially alleviate some of the problems that come with BLP coverage on Wikipedia.

Rather than having it simply at the subject's request, however, I might be inclined to see it as a third option in the otherwise-binary keep-delete debates at AfD. On BLPs of questionable notability, Noindex could be introduced as a compromise in the borderline cases to alleviate the concerns of subjects.

The main issue I see with this would be the behaviour of mirrors -- we can't ensure that mirrors follow our noindex directives. Often, mirrors don't rank highly in search engine rankings, but where Wikipedia is one of the few resources available, they often do.

Joshua said...

This is a really bad idea.

First, it rests on what is essentially an external concern (search engines). Second, in so far as there is that external concern it would damage search ranking of other pages. Third, if a page has serious problems then it shouldn't exist. If it doesn't have any serious problems we shouldn't have any reservations about it being easily searchable.

The vandalism problem has a solution: flagged revisions. The sooner flagged revisions gets approved the better.

main said...

Nice thought, but I think it should only be available for people not meeting the dead tree standard. Whatever Sarah Palin or Gordon Brown think of their Wikipedia biographies, they should be indexed by Google.

The technical infrastructure isn't too onerous, and would take a day or two at most.

jeem said...

I haven't done an exhaustive survey, but based on what I've seen on OTRS, most BLP subjects complain about either a) the vandalism itself, b) the fact that the article was created or edited without their explicit permission, c) the fact that our "editors" "allowed" such activity to take place, or d) some combination of the above. The few complaints I've seen about search results deal more with cache lag than anything else.

Joshua said...


Regarding mirrors, my understanding is that either a) mirror sites will be more or less directly slurping html in which case the noindex will get dragged along with it or b) will be more respectable and have reasonable control over content and thus would be likely willing to listen to requests to import our noindex tags.

This came up in the earlier context of when we added nofollow tags and where the devs more or less confirmed what I said above. I don't know enough technically to be sure that the situation would be identical but given my limited understanding I suspect it would be the same situation.

Someone could make a point of deliberately mirroring our noindexed pages (as deletionpedia mirrors our deleted pages).

Someone could make an argument that is a good idea even if some mirrors are going to be present. (I think this is wrong for among other reasons that it would potentially make users more reckless about material they think isn't going to be googlable that is but the argument can still be made).

greg park avenue said...

The keyword is compartmentalization. If someone already in Wikipedia doesn't want to be listed in Google search engine, fine with me, let him/her out, whatever his/her motives are. Some people like to play hide and seek, some people don't bother if they're found or not. Depends of some phobia, I guess, real or imagined. Claustrophobia would be my diagnosis. Won't work with spies, CIA assets and persons involved in FBI Eyewitness Protection Program.

John Broughton said...

I like the concept, though - as someone else has commented - it's problematical to do this for any bio, whenever it is requested. I'm sure that Bernie Madoff would be happier without his Wikipedia bio showing up high on search engines, as would anyone else so notorious. We shouldn't cede that much control.

On a sort of related note: perhaps, for BLP articles, there could be permanent semi-protection upon request? That would seem to have minimal downside, and help show that we're not totally indifferent to concerns of those for whom we have bios.

Joshua said...

John, permanent semi on request isn't inherently a bad idea. However, long-term semiprotection in many ways makes it harder, not easier to protect an article. When there is no semiprotection the IP addresses are nicely visible. This means that a) people looking at recent changes can easily see that an article was modified by a relatively new user and b) if there's a serious problem everyone can see what IP address was used. c) Semiprotection makes it more difficult for interested parties to remove simple vandalism from their own articles.

Semiprotection creates almost as many problems as it solves.

Lise Broer said...

It's important to consider the scenarios a solution would address. Semiprotection is a good tool against high volume casual vandalism, or against short spurts of impulsive interest.

What semiprotection doesn't address is the areas where we are already weakest: long term grudges and clever joe jobs.

Semiprotection upon request might be worth considering--there would likely be fierce debates about the balance between open editing and harm reduction.

Usually a good approach is to take a careful look at the shape of a problem, identify and prioritize the problem's elements, and then search for solutions that address the highest priorities and have the least downsides.

Joshua said...

Also another issue with semi on request is that pretty much no one would do this preemptively. They'd only request semi after there's been a serious problem. And once that happens there will be a lot of eyes on the article anyways.

Durova is correct that the most serious problems simply aren't helped by semiprotecting.

Lise Broer said...

That's one way to summarize it, although my opinion on semiprotection upon request isn't as firmly defined as Joshua's.

Suppose a more limited term of semiprotection were available upon request. People sometimes know in advance when their biographies are likely to get vandalized: a nasty romantic breakup, etc.

Perhaps two weeks' protection upon request, with one renewal available? Should be enough to ward off a furious ex-girlfriend.

Joshua said...

Adding a heads up notice on ANI or BLPN seems more effective. I added a a note to BLPN during Harold Koh's confirmation after a representative of his expressed concerns to me. And that seemed to work fine.