Thursday, May 29, 2008

Arbitration Committee performance review, Part 1

This is the first of a series of blog posts to discuss Wikipedia's Arbitration Committee. If you're not already familiar with what that is, it's sort of Wikipedia's supreme court. Wikipedians call it ArbCom. It's not a fun place--it's where you go to solve problems that haven't gotten fixed anywhere else.

A little background: in Wikipedia's earliest years Jimbo Wales was the only person who could ban anyone from the site. He was the final stop in dispute resolution. By 2003 that was getting too cumbersome for one person to do alone, so new solutions got brainstormed. ArbCom went into operation in April 2004. It's a panel of 15 people who are sort-of-elected by the community, but ultimately selected by Jimbo (the process is complex). Jimbo reserves the right to dissolve ArbCom, but it's largely a theoretical right and has never been implemented. Sometimes arbitrators resign before their term ends. So when a vacancy occurs Jimbo appoints a replacement. Usually he chooses from the pool of also-ran candidates from the last election, but not always. The most (in)famous instance of that was the short-lived tenure of Essjay.

A growing number of experienced editors within Wikipedia have expressed concerns about ArbCom lately. Complaints per se are nothing new--nearly everyone who gets sanctioned thinks their own case was handled badly--and because of this discussions about arbitration problems generally get bogged down in the intricacies of this or that particular case.

It is my opinion that the Arbitration Committee was never a very scalable concept, and that its responsibilities have expanded past the point of diminishing returns. In order to present this in an impartial manner I'll focus on trends, and present tangible numbers. Wikipedia's arbitration committee has never had a performance review before. Considering the important role they serve in what has become the world's seventh most popular website, it's time that they do.

First, a review of Wikipedia's growth:
Apr 2004: 250,000 articles
Mar 2005: 500,000 articles
Mar 2006: 1,000,000 articles
Sep 2007: 2,000,000 articles
May 2008: 2,391,955 articles (as of this writing)

Following is my research on the arbitration committee's caseload. The research method is simple: I counted the number of open cases on the last day of each month since ArbCom began. Each example is linked to the source where I gathered the data. Requested cases and recently closed cases are not included toward these figures. Cases in formal review do count. Please report any errors so that I can correct them promptly.

My finding is a significant and sustained dropoff in recent monthly cases, reaching its nadir during the three most recent months (March-May 2008) when the Arbitration Committee has heard the least number of cases ever, including its first three months of existence from 2004 when the Committee had no preexisting cases in its docket.

Apr: 4 cases
May: 6 cases
Jun: 6 cases
Jul: 9 cases
Aug: 12 cases
Sep: 12 cases
Oct: 14 cases
Nov: 13 cases
Dec: 13 cases

Jan: 6 cases
Feb: 7 cases
Mar: 12 cases
Apr: 4 cases
May: 12 cases
Jun: 12 cases
Jul: 15 cases
Aug: 13 cases
Sep: 15 cases
Oct: 18 cases
Nov: 16 cases
Dec: 20 cases

Jan: 26 cases
Feb: 14 cases
Mar: 12 cases
Apr: 12 cases
May: 11 cases
Jun: 14 cases
Jul: 19 cases
Aug: 18 cases
Sep: 12 cases
Oct: 11 cases
Nov: 11 cases
Dec: 13 cases

Jan: 11 cases
Feb: 12 cases
Mar: 7 cases
Apr: 12 cases
May: 12 cases
Jun: 13 cases
Jul: 12 cases
Aug: 12 cases
Sep: 12 cases
Oct: 7 cases
Nov: 7 cases
Dec: 4 cases

Jan: 7 cases
Feb: 6 cases
Mar: 5 cases
Apr: 4 cases
29 May: 4 cases

Several interpretations are possible based upon the data presented thus far. The introduction of community banning in mid-2005 and the 2007 expansion of community sanctions to include lesser remedies such as topic banning and revert parole is one factor worthy of consideration. In my opinion that does not fully explain the dynamic observed. Blog format is better suited to short presentations than long ones, and more relevant data will follow in future posts.

Image credits:


Matthew Jude Brown said...

Good to see the prospect of sensible discussion on these issues. One issue, as you point out, is simply that many of the easy cases now don't get as far as the arbcom; with solid precedent, the community and admins handle them.

Arbcom thus is handed fewer cases with obvious answers (in which we can look good), and more cases that aren't as amenable to quick or satisfactory resolution.

Many of the issues that now come before arbcom are ones where there is no community consensus as to the right thing to do. This firstly increases the odds of people being unsatisfied with our decision no matter what we do, and secondly increases the chance that the arbcom decision will be a typical product of a committee; timid and reflecting the points a plurality of the committee can agree upon, rather than decisive measures.

Simple user conduct cases are pretty cut and dried by now, if they even have to come to us, but I think the project as a whole hasn't reached good conclusions about several issues.

One of these is how vested contributors who add value to the site but behave problematically should be treated. Most every case like this has people arguing vociferously in both directions, and the odds of an arbcom decision that pleases everyone is small.

Another hard area is how issues fundamentally concerning content rather than behavior should be handled. The arbcom, historically, is not supposed to handle content disputes, just the behavior aspects. This has increasingly meant a tendency to try and win content disputes by provoking the other side into doing something they'll get restricted for, and has meant that cases that at root are a content dispute have been at times badly handled.

These are far from the only issues, of course - will be interested in seeing what you come up with.

David Gerard said...

Yeah. Essentially, as simpler types of cases get resolved - in 2004, it took an arbcom case to get a persistent troublesome IP blocked - the arbcom is left with the shittiest and most difficult cases to deal with. And remember that since 2004, has gone from #500 website in the world to #7. So just comparing numbers between 2004 and 2008 is comparing apples and the European Orange Mountain.

And it's a depressing job: wading through and cleaning out the sewers of stupidity, all the time. Hence arbitrator burnout.

John Broughton said...

Durova - what you SHOULD have been counting, at month-end, was cases OPENED during that month. Consider a situation where 4 cases per month are opened, and each takes exactly two months to resolve. Then at month-end, there will always be 8 open cases. Now consider if ArbComm becomes less efficient/effective, and takes exactly 3 months per case (50% more time) - but the caseload remains at 4 new cases per month. By your logic, since there would now be 12 cases open at each month-end, the committee's workload has increased by 50%. But it has not increased at all - there are still only 4 new cases per month.

Lise Broer said...

And you are welcome to follow up on this work, with the diffs I conveniently provided, and refine it.

No one has attempted this avenue of research before in a quantitative manner. It's conventional netiquette to offer at least perfunctory appreciation, and refrain from shouting in all caps.