Friday, February 20, 2009

Failure to thumbnail

This is a featured picture on English Wikipedia. But you can't see it because the Wikimedia Foundation software has a problem. The problem is a longstanding one because it's been replaced at the article about sumo wrestling with a non-featured image. And that neglect does not demonstrate respect for the volunteer Victor Rocha who did the restoration work.

In order to see this featured picture, it's necessary to go to the hosting page and click on the link where the thumbnail ought to be. So I did a check on how many people are actually viewing this page. Turns out that for the month of January 2009, the hosting page at English Wikipedia received only 50 page views, and the corresponding hosting page at Wikimedia Commons received only 56 views. So at most--if every one of those views represents a a unique hit and every one of those visitors clicked through to the full version--106 people set eyes on one of the site's best images. And we can be pretty darn sure the actual number was lower.

It's time to prioritize the proper display of images. I'm not certain what bug caused the sumo failure, since it's a JPEG image and JPEGs normally display, but WMF software consistently fails to display PNG images larger than 12.5 MB. What you get instead is this at right, which ought to look like an eighteenth century engraving of American Revolutionary War sea captain John Paul Jones. He cuts a dashing figure topside on the deck of a warship in battle--or he would if you could see him. Since WMF software doesn't accept TIFF files, we're stuck with PNGs if we want to upload uncompressed formats. This is a serious detriment to utility and to collaborative editing, and editors who live in parts of the world that have slow Internet access are effectively shut off by this bug.

These problems inhibit the use and creation of featured content.

For shame.


Andrew said...

Have you considered reuploading the smaller file a little bit smaller and using that in the article? We don't have unlimited resources to thumbnail gargantuan images of 13 MB+. While we want to store the larger images for posterity, a few megapixels is enough for even the hugest screen.

Durova said...

Your suggestion--to compress or downscale the image--would actually defeat the main purpose of using the PNG format at WMF. Foundation software accepts files up to 100MB and they usually display properly in other formats.

In the realm of image restoration 12.5MB is actually rather small. 10MB is what I'd call the minimum for serious restoration work, and some of the files in my workpile right now are over half a gigabyte. These are the parameters for serious restoration work.

So what you're suggesting is that restorationists should quietly accept a software bug that hamstrings serious collaboration on historic images. That type of thinking is fundamentally at odds with wiki collaboration.

Anthony said...

Does it really take that much processing power for the Mediawiki software to shrink a file *once*?

Mike.lifeguard said...

No, that's not what Andrew suggested. Obviously.

As with video, you should upload a small version for display and keep the large version to work from. This is the best of both worlds: you can display the image at acceptable screen-resolution and also have a version which is acceptable for further restoration or derivatives. If there's a downside, I've missed it.

Durova said...

Suppose an equivalent bug affected text, Mike. The wiki software worked fine on stubs, but past that it doesn’t display properly. If you want to write a featured article you’ve got two options: either store it in a form that almost nobody sees because of the bug, or else store it in a format that degrades each time someone edits it. First the punctuation gets mangled. Then wikilinks break. Then whole words drop off the page. And diffs don’t work properly either so it isn’t feasible to undo the damage.

You’ve got a couple of workarounds, all of which are clunky. One is to upload to the version where other people can edit safely but almost nobody can access, and each time you do that you also upload a second copy to the degrading version that does display. Well, you try to work with this. Knowing it’s wrong. You contribute ten percent of Wikipedia’s featured articles and you try to bring other volunteers up to speed. But the bug is really hampering growth of the volunteer base. After a while you decide the situation is ridiculous. So you post to your blog and call it shameful.

But when you do, other Wikimedians remind you that most readers only browse a few paragraphs anyway. People don’t like to scroll through an entire article, they say. And they suggest you upload a stub as a ‘viewing copy’ that they think readers would enjoy more than your featured article.

Mike.lifeguard said...

No, that comparison is totally off the mark. A less hyperbolic comparison might be the following:

You can write normal text with no problem. But then you want to write a template that includes lots of {{#ifexist calls (like the archive template until recently), or a page with tons of transclusions (like WM:SPAM on a bad day or the steward elections until Pathoschild split things up, or Commons' DRs and archives which even use multilevel transclusion), but the system can't handle it.

Again, the analogy to "Here's a reader's digest version of my featured article" is flawed. Instead, it's more like "Sorry, that template won't work, you have to use this version with less functionality for technical reasons" or "No, that page won't render properly, split it up"

Well guess what? These limitations with text are real problems too.

Of course, it'd be nice to have it solved - any maybe someday it will. Until then, we have to use workarounds. So too with displaying very large images. I never said there wasn't a problem. What I said was there are acceptable workarounds for use until a proper solution can be found. For example making ifexist less expensive (an analogue with large images would be making the processing more efficient), or figuring out how not to multiply the transclusion size with each level. Until that's done, labelling limitations of the software as "shameful" is, as I say, hyperbolic (and ultimately unhelpful).

Durova said...

But the system is designed to handle files up to 100MB, Mike. Of course those files are going to be larger on average in an uncompressed format. And the uncompressed format is of premium value to serious editors.

You're comparing featured content contributors to spammers. That's hyperbolic.

Mike.lifeguard said...

If you're going to continue distorting my words beyond all reason, let me know and I'll stop wasting my time and yours.

What I compared featured content contributors to was actually:
1) talented template programmers who have run up against a brick wall with the abilities of the parser to work with the input they give it;
2) one of the most dedicated stewards we have who has done a lot of maintenance with the pages we're using for the steward elections due to the limitations in transclusion size; and
3) those of us who process Commons' deletion requests (and DRBot who handles archiving) who encounter the multilevel transclusion bug with astonishing regularity.

And of course the examples I gave are only a small portion of the limitations of the wikitext parser - others who are more familiar with it could provide more examples.

The point is that you are not experiencing the only limitation of the software. Furthermore, that limitation may not be the most important one (shock! horror! keeping the site up is a higher priority - notice today's downtime, for example). Furthermore, some problems are actually very difficult to solve (all the problems with both text and images mentioned fall in that category, AFIAK). Until the issues with image scaling are addressed, there are concrete & acceptable workarounds you can use, just as with other limitations in the software. While I don't ask that you stop raising consciousness or advocating for your favourite software limitations, I do ask that you do it in a reasonable manner.

Mike.lifeguard said...

It's also worth pointing out that while /uploading/ supports up to 100MB, /scaling/ doesn't necessarily. There are still specific limitations, which you've discovered.

So, you can now upload up to 100MB but only scale up to 12.whateverMB in certain cases. That's still better than previously.

Durova said...

Nothing I'm asserting is hyperbolic. That part about contributing ten percent of the site's featured content in a category: slight understatement, actually. I've contributed 10.1% of en:wiki's featured pictures. And that's not a good thing. Suppose I get hit by a bus?

Your suggested workarounds aren't scalable, Mike. I've seen how they create hassles that hamper volunteer base growth.

Suppose Wikipedia had a featured article that practically no one could access and read? People would be up in arms.

And a fellow who's running for steward wouldn't greet their complaints with disdain.

Durova said...

Regarding your second comment (we post conflicted) the bug with PNG display was apparent when the site's upload capacity was 20MB, and wasn't fixed then.

The new upload cap is welcome, but unrelated. And if I'd carried the article analogy farther I'd have included that really meaty articles still can't upload in any format.

There's an image I did just today that was originally a 192MB TIFF. Why would anyone need such resolution? It happens to be a rare early color photo of downtown Detroit, from 1942.

When people look at high resolution restorations of their hometowns they pick out individual buildings--landmarks they remember--including some that don't exist anymore. The fellow I did this one for is delighted. He's sending it to his father.

And even though the fellow I did this for is sitebanned, he gets the point.

llywrch said...

Durova, I'd like to point out one problem with large files, be they image, video, text, or whatnot: they are difficult to download -- especially if the viewer does not have a broadband connection. (Yes, there are still people in the Developed World who access the web with a 14.4K or 28.8K modem.) Forgetting their needs only deeps the digital divide.

Which is one reason that the images of various Bible manuscripts I uploaded are compressed -- yet still readable. I was trying to keep the people on the other side of the digital divide in mind -- the same folks for whom access to print copies of these manuscripts, even in facsimile, would be difficult.

As praiseworthy as your intent to provide detailed images is, it excludes an important segment of the Wikimedia audience -- who are the same people Wikipedia & its sister projects are crucial informational resources.


Durova said...

You're absolutely right, Geoff. That's one reason why it's especially problematic that we have a longstanding bug for display on the PNG format. WMF software doesn't accept TIFF files, so if we want to do serious work within the wiki environment we're stuck with PNGs. But since the thumbnailing fails at the format which--by its nature--tends to get populated with larger files that bug creates an extra hardship editors who have slow connections, because then an editor doesn't get to preview the files they're considering working on.

The other day I blogged about Wikipedia's first featured pictures from five years ago. Some of them were under 100K. A file that size wouldn't stand a chance at FPC today. Most of the pictures that get promoted now are 1MB or more.

So five years down the road--ten years down the road--what size files will we consider the norm? Editors who don't need the extra resolution don't have to make use of it; there are several simple workarounds for that.

But if we accept the deprioritization of image handling, several things happen:
1. Actual collaboration for serious restoration migrates out of the WMF environment.
2. Would-be restorers get put off.
3. Too few people end up with too large a share of the featured content credits.

For a year I've done outreach to bring more people into this. They've got plenty of built-in challenges learning how to restore material: woodcuts and colloidon glass negatives are profoundly different media. The extra hassle of working around software shortcomings and uncorrected bugs is inhibiting the growth of our volunteer pool.

It is not a good thing that I've contributed ten percent of en:wiki's featured pictures. At a site with over 9 million registered accounts, and a featured content program about to enter its fifth year, it's evidence of a problem.

pfctdayelise said...

Good on you Durova.

Chad H. said...


Once again, I am afraid that we have to disagree (although I never fault either of us for it :)

As Mike points out, there is a certain limit to what can physically be done. The upload limit is now 100MB, yes. The ability to render such an image (especially quickly!) is just a simple block that hasn't yet been overcome. It's not so much a matter of flipping a switch or adding a feature to do so. When you're attempting to thumbnail an image of that size, you're pushing against the limits of the computer itself--or at least the limits we set to them so they don't die trying. As mentioned, the Parser faces the same limits. At some point, you simply hit a wall on what is physically possible.

The only two solutions are A) throw more memory/processing/disk space at the problem, and hope it goes away temporarily, or B) improve the process to use less resources. The former is faster, the latter is more time consuming and consuming of developer resources. In the long run, the latter is preferred.

Ideally, I would love to see MediaWiki support 100+MB TIFF files and be able to display them inline. However, like Mike's text problem, eventually you hit physical limits (be it POST size, physical memory, disk space, or raw processing power). One day maybe we won't have to think of such realistic concerns :)


Durova said...

Update: turns out the first issue was an unannounced inability of the software to handle progressive JPEG compression. Now that we've found out what was causing that error the site can display that featured picture of a sumo wrestler.

The other issue turns out to be a DOS issue deliberately set by Tim Starling three and a half years ago. See this post. It so happens that the hardware has been upgraded within the last three and a half years, so there's an ingenius solution that might be possible here.

You might ask the developers to consider reprioritizing in light of the increased capacity and the needs of featured content contributors.

TJ said...

Hello Durova, I just wanted to thank you for posting the details behind the 'failure to thumbnail' issue - I was encountering the very same problem with a high resolution shot which wouldn't work until I saved it using standard, not progressive, jpg compression. Very helpful of you - cheers!

Lise Broer said...

What a pleasant surprise to receive that feedback months after writing the post. Very glad that helped you. Happy editing! :)