Thursday, September 25, 2008

A work in progress

One of the goals of the media restoration project is to help coach more people who want to learn restorations. There are more valuable photographs that deserve restoration than I could ever do alone and it's just common sense to share skills and develop best practices. We've got other people who work with sounds, etchings, and video.

So today's post is an excerpt from a session with one of the editors who's getting started. He's restoring a much higher resolution version of the file at right: the Turkish surrender of Jerusalem to the British in 1917. Tonight we were troubleshooting. The most challenging issue to restoration on this photograph is addressing a band of discoloration that runs across the flag. Now this isn't the only issue that needs to be addressed but it's a critical one: after all, this is the white flag in a surrender photograph.

So we were both in Skype. He was using voice and I was replying in text. At one point I sent him a detail image to convey some ideas. It would have been nearly impossible or at least drastically less efficient to have conducted the conversation entirely on wiki. He's consented to posting my end of the conversation here. It's a typical example of media content collaboration.

[10:03:53 PM] Durova says: remind me the filename?
[10:04:10 PM] Durova says: nm
[10:04:11 PM] Durova says: got it
[10:04:20 PM] Durova says: library of congress
[10:04:28 PM] Durova says: Jerusalem surrender.tif
[10:05:07 PM] Durova says: do you mean the band that intersects with the flag?
[10:05:32 PM] Durova says: right
[10:05:49 PM] Durova says: but doing stuff that crosses color boundaries isn't easy.
[10:05:57 PM] Durova says: and it's two different tones.
[10:06:13 PM] Durova says: not likely
[10:06:21 PM] Durova says: I think it's a type of decomposition
[10:06:25 PM] Durova says: sec while I look in
[10:06:36 PM] Durova says: previewing at low resolution now
[10:06:43 PM] Durova says: my guess is that the bottom tone is correct
[10:06:46 PM] Durova says: let me look closer though
[10:07:12 PM] Durova says: right
[10:07:19 PM] Durova says: oh, definitely the bottom half is correct.
[10:07:22 PM] Durova says: I have a guess here.
[10:07:30 PM] Durova says: this was scanned from a paper print
[10:07:35 PM] Durova says: I'll bet that's a crease
[10:07:48 PM] Durova says: and the difference in tone was that this section didn't lie flat on the scanner bed
[10:08:13 PM] Durova says: so the light wasn't hitting it quite straight on
[10:08:22 PM] Durova says: and one side reflected more onto the machine
[10:08:37 PM] Durova says: Well
[10:08:46 PM] Durova says: other than the section right on the flag itself
[10:08:56 PM] Durova says: most of that is fairly compatible with the natural cloud patterns
[10:09:08 PM] Durova says: the big challenge is to get the flag straightened out
[10:09:09 PM] Durova says: chance
[10:09:30 PM] Durova says: now there are actually two sections you need to think about
[10:09:36 PM] Durova says: two creases on that flag
[10:09:43 PM] Durova says: one is up near the upper left
[10:09:46 PM] Durova says: see that smaller one?
[10:09:57 PM] Durova says: hm?
[10:10:02 PM] Durova says: but do you see what I'm talking about?
[10:10:22 PM] Durova says: ok
[10:10:44 PM] Durova says: If you don't see it, I could cut out a detail and circle it for you
[10:10:47 PM] Durova says: then send you the file
[10:10:53 PM] Durova says: ok you got it
[10:11:28 PM] Durova says: now here's the solution
[10:11:29 PM] Durova says: yes
[10:11:38 PM] Durova says: thye bottom half of that flag is correct
[10:11:45 PM] Durova says: it's the top half that has issues
[10:11:52 PM] Durova says: and it basically has issues in two sections
[10:12:00 PM] Durova says: a short band above the big crease
[10:12:08 PM] Durova says: and a smaller area near the small crease
[10:12:16 PM] Durova says: so let's start with the easy stuff
[10:12:22 PM] Durova says: try the healing brush on the small crease
[10:12:34 PM] Durova says: sampling from the large stable area at the bottom 20% of the flag.
[10:12:56 PM] Durova says: A basic approach that works with a lot of these things is to take on the small problems first
[10:13:07 PM] Durova says: then once the simple stuff is solved, the bigger problems are easier.
[10:13:29 PM] Durova says: Let me know when you've got that area fixed
[10:13:39 PM] Durova says: now the second step is the right border of that flag
[10:13:46 PM] Durova says: you'll want to go in at high resolution
[10:13:52 PM] Durova says: I'd use the clone stamp
[10:14:05 PM] Durova says: not at full hardness though
[10:14:07 PM] Durova says: about 70%
[10:14:26 PM] Durova says: and I'd sample that area just beneath the crease
[10:14:30 PM] Durova says: that's a healthy area
[10:14:37 PM] Durova says: and carefully stamp it
[10:14:51 PM] Durova says: want me to draw it up for you
[10:14:57 PM] Durova says: you see that sharpish white line?
[10:15:02 PM] Durova says: that's the crease itself
[10:15:11 PM] Durova says: okay, yes
[10:15:15 PM] Durova says: ah you're still on the first one
[10:15:21 PM] Durova says: right
[10:15:23 PM] Durova says: okay
[10:15:29 PM] Durova says: well you see how at the bottom of that band
[10:15:34 PM] Durova says: there's a sharp white line?
[10:15:40 PM] Durova says: that's the crease
[10:15:47 PM] Durova says: it isn't straight
[10:15:54 PM] Durova says: but that's the point where the light hits it differently
[10:16:07 PM] Durova says: so everything beneath that is basically healthy
[10:16:26 PM] Durova says: okay so I'd fix that border with either the clone stamp or a mask.
[10:16:33 PM] Durova says: Your choice. I'm partial to clone stamping.
[10:16:50 PM] Durova says: Healing brush doesn't work very well for the border though.
[10:17:02 PM] Durova says: yes it's a fuzzy border
[10:17:23 PM] Durova says: but you'd likely lose even more detail by attempting to healing brush there
[10:17:39 PM] Durova says: basically there are three sections that need either clone stamping or masks
[10:17:47 PM] Durova says: the easiest one is at far right
[10:17:55 PM] Durova says: yes
[10:18:01 PM] Durova says: I'll tell you the others once you get there.
[10:18:09 PM] Durova says: or I could draw this up and send you a detail image.
[10:18:15 PM] Durova says: know what?
[10:18:17 PM] Durova says: I'll do that
[10:18:18 PM] Durova says: ok
[10:18:34 PM] Durova says: a picture's worth a thousand words.
[10:18:40 PM] Durova says: I'll just be a moment.
[10:26:51 PM] Durova says: sorry, need to send you the other version
[10:27:13 PM] Durova says: there; much smaller

[10:30:11 PM] Durova says: So, you have that file open?
[10:30:20 PM] Durova says: yes
[10:30:24 PM] Durova says: see what I mean?
[10:30:39 PM] Durova says: once you get those three areas the rest is easy
[10:30:50 PM] Durova says: the rest is just healing brush work, really
[10:31:05 PM] Durova says: sure thing :)
[10:31:07 PM] Durova says: ack
[10:31:10 PM] Durova says: sure
[10:31:11 PM] Durova says: of course
[10:31:15 PM] Durova says: well, I'll be here.
[10:31:26 PM] Durova says: I knew this flag would be a problem
[10:31:33 PM] Durova says: not surprised you wanted advice
[10:31:39 PM] Durova says: actually, may I mention something?
[10:31:46 PM] Durova says: I'm proud you're taking this on yourself
[10:31:57 PM] Durova says: I was kinda thinking you'd shoot this problem back to me to fix
[10:32:06 PM] Durova says: this will teach you a lot :)
[10:32:17 PM] Durova says: aw shucks

Wednesday, September 24, 2008

Idol Curiosity

Breaking news: Clay Aiken is gay. As of this post 957 news sources are covering the story. It's good for the LGBT commuinity, I suppose, that another celebrity discusses his orientation in public. This sort of thing breaks down barriers and prejudices. So I'm balancing an urge to snark thank you, Clay, we never would have guessed with recollections of how Liberace's professional manager tried to conceal the obvious weight loss of late stage AIDS by calling it a watermelon diet. It wasn't so long ago that things were very different.

Still, 957 articles? Do you ever wonder whether really important news is getting shoved aside for celebrity stories?

Google News lists 19 stories right now for something else that's harder summarize in a punchy headline. It turns out that medical doctors may be getting misled into overprescribing expensive new medicines because drug companies have been selectively publishing the research data that makes their new products look best. It's one thing when it happens in company advertising (everyone expects that), but this has been happening in medical journals: pharmaceutical firms have been cherry picking the most favorable studies for publication in readily indexed journals and burying the tests that showed less effectiveness or major side effects. That's troubling because mainstream scientific literature is what good doctors are supposed to be reading to keep up with developments in their field.

So when you've got a problem and your doctor looks up the scientific research to decide on the right treatment, glowing results turn up for expensive new drugs. That means you pay more for medicine that may not be better. Even if your pharmacy fee is paid or fixed, that's still coming out of your premiums. In fact it's coming out of your premiums to pay for other people's overpriced drugs whether you're taking medication or not. It's coming out of your tax dollars too, to pay for all those people on Medicaid and Medicare (or for your national health service if you aren't in the United States). It's not the patients' fault or the doctors' fault--the drug companies have been filtering the information.

Does that make you angry? I sure don't like it.

Actually those older medications are sometimes the wiser choice. The medical profession knows what the long term health effects are for medicines that have been available a long time. So if two different drugs are equally effective and all the other factors are about the same, with an established medication you can find out more about interactions with other drugs or what health effects you might face 10 or 20 years from now. Even if the drug companies were publishing all the information they have, they wouldn't be able to tell you that much about new medicines.

This brings us to Wikinews and the power that puts into ordinary people's hands, because one of those 19 Google indexed health stories is mine. There's also good news in it: Congress and the President got wise to what was happening and passed new legislation that goes into effect soon. So the companies that get FDA drug approvals will have to start entering summaries of all the relevant research into a public database--not just the part of it that could boost their firm's stock value.

After years of digging beneath celebrity gossip to get to the meaty news, it's a pleasant change to have a place like Wikinews where the stories are chosen by volunteers who care instead of by firms that are in the business of selling eyeballs to advertisers. With all due respect to Mr. Aiken, this hits closer to home.

Music to my ears

Wikipedia's song articles need help. Twice this week I've seen seasoned site volunteers react in ways that look like they don't suppose the situation could be so bad as it is. So if you think Wikipedia has too many articles already, read on.

The gentleman whose portrait appears on today's blog is James Scott, one of the three top composers of classic ragtime. Out of the dozens of compositions he wrote, how many do you suppose have their own Wikipedia article?

None of them.

Fortunately that's about to change. I've been creating a set of Wikisource pages for his work. So far that's got complete musical scores for 22 of his compositions. It's a fortunate accident for wiki purposes that he basically stopped publishing in 1922--just within the window for United States public domain.

Another challenge is to locate recordings of his work. Searches of public domain archives have come up dry, but a couple of Wikipedians may help out. We'll probably (crossing my fingers) have MIDIs soon. Looking for pianists! Yes that means you, nimblefingers. This is a wiki. Want a featured sound credit? Here's your chance.

Now here's the part I'm really looking forward to. Normally by now I'd have started creating articles for these songs. I'm holding off a few days because Not the Wikipedia Weekly is going to be doing something new this Friday: real time editing. We're going to write an article for Wikipedia's main page at Did you know and I've proposed doing a James Scott song as our project. Newcomers are welcome.

Tuesday, September 16, 2008

The more things change, the more they stay insane

It may be apt that I'm Just Wild About Harry (originally a Broadway tune, later a Harry Truman campaign song) got promoted to Featured Sound overnight. Ever since lipstick on a pig became this week's political catchphrase I've been wondering whether someone would dig up an old Photoshop gag I created half a year ago. Hadn't been watching the article itself (which is quite the Wikipedian hotbed), but somebody linked me to an edit...

Sure enough, it's in the article now. And admin Jossi Fresco was the editor to place it. Now I try to keep mum about politics, or at least log out of my own party when I log into Wikipedia. On September 11, 2008 Wikipedia ran its first featured sound on the main page: George W. Bush's September 11, 2001 address ran in conjunction with a featured picture of the 9/11 rubble. So I have no opinion about whether that image belongs in the article. Someone on the talk page has already objected. I'm just sitting back, shaking my head, waiting to quietly enter the voting booth in November and do my thing. I've already decided who's getting my vote and that's nobody's business.

A few hours before that wiki-storm hit I started a featured picture nomination that's surprisingly timely: footage of the cleanup after the 1900 Galveston hurricane. Doesn't it just seem like nature is reminding us that there are certain places where humans aren't supposed to live?

Image credit:

Thursday, September 11, 2008

When good people disagree

A few words for people who wonder what makes me tick. This comes from an old quote in the Wikipedia Signpost:
With respect for the editors who’ve contributed these pages, it’s always been my belief that ethical decisions where good people disagree should be placed in the hands of the people who live with the consequences. No one could have more at stake in this request than these articles’ subjects.
That was my explanation for why I nominated Daniel Brandt's biography for deletion in June 2007, and it's a reasoning that drives a lot of my actions. I'm not big on paternalism.

Within the realm of reasonable choices, I look to the individuals who are most affected by the outcome. And as long as I saw a reasonable aspects to Mr. Brandt's position, I honored it. (Regular readers of this blog know why and when I changed my mind in his instance). I've also nominated several other living people's biographies for deletion on the same basis: can we make do without it, and does the subject want it gone?

That reasoning guides my actions in a lot of situations. And when editors come to me with a personal security concern I take that seriously. Most of all, when the concern is credible, I take that person's choices seriously. That individual knows their own life situation far better than I ever will and if there's a mistake to be made it ought to be their own as long as it doesn't hurt anyone but themselves. I won't cross the line of law or ethics in order to honor their wishes, but if the thing they choose to do falls within the realm of reason I won't presume to take the decision out of their hands. I may offer advice if they're willing to listen.

Friday, September 05, 2008

Palin by comparison

As of this writing, 48 editors have weighed in on the request for arbitration for the Sarah Palin biography wheel war. The underlying dispute is about whether the article should have been full protected or semiprotected. If you're not versed in the intricacies of Wikipedia debates, that means administrators reversed each others' actions over whether to leave the article open to editing by experienced users or freeze it entirely. There ought to be a better way to resolve that without involving so many people or taking up so much time.

One of the arbitrators, TheBainer, posted a request for statistical data on vandalism to the Sarah Palin article during that time. That struck me as a very interesting question, although I wasn't entirely comfortable with the idea of judging administrative actions according to statistical data that wasn't readily available to those administrators at the time when they acted. Anybody could read the article history, but that was a very active article in the first days after the announcement of Palin's selection as the vice presidential nominee. It just wasn't a practical idea to sort through that edit history manually with any sort of rigor while the vandalism problem continued to unfold. But this isn't the last time Wikipedia is ever likely to get a burst of attention due to breaking news, so it seems to me we could write a tool to parse that information in real time in a way that's useful to administrators.

I've gotten in touch with a coder who has some very smart ideas and what I'm looking for right now is someone with formal training in statistics. Basically the idea is this: create a tool that parses the recent history of actively edited articles and estimates what percentage of vandalism comes from autoconfirmed users. Automated analysis won't be perfect so it'd give reports based upon two searching techniques:

  • High figure: counts all reversions within a time frame.
  • Low figure: counts bot-reversions, rollbacks, and edit summary notations such as "rvv"

From there, the tool would determine which editors were being reverted and report on what percentage were autoconfirmed. So if 85% of the vandalism to an article is coming from non-autoconfirmed editors, then semiprotection is the obvious solution. The tool would only report on articles that have a certain baseline of recent activity, in order to screen out low traffic articles where the report would be statistically meaningless.

I'd love to bring in someone who has the skills in statistics to add rigor to the endeavor, so please get in touch if you have those skills or know someone who does.

A second idea (thanks to Xavexgoem) is for something we'd call a dramabot. Instead of crawling all recent changes for vandalism, dramabot would concentrate on articles that have gotten flurries of recent edits. Dramabot would scan those articles frequently and revert obvious vandalism until things calm down. If you're a coder who thinks dramabot would be a good idea, let's touch bases.