24 April 2002

Yat-Kha. And Reality Film

Posted at 12:00 AM | Permalink | Comments (1)

When these Tuvans play live in your neighbourhood, go.

I first saw them live performing an improvised soundtrack to the Russian silent classic Storm Over Asia, which I'm still wishing for them to release a recording of. But last night, performing their trademark rock/throatsinging mixture in an intimate venue, this extremely charismatic band rocked.

Side note: I was intrigued to discover that they're neighbours, in a manner of speaking, on uklinux :)

Reality Film, which organised Storm Over Asia, was also responsible for another unforgettable London concert this year (a good year so far), Acid Mothers Temple doing Legend Of The Overfiend.

19 April 2002

A small step for web typography

Posted at 12:21 AM | Permalink | Comments (0)

A Textpad macro that converts quotation marks and dashes with one keypress. Towards a foolproof route from Word to HTML.

By far the most common special characters, in my experience, that need to be converted for use on the Web are

  • apostrophes ’
  • “double quotation marks”
  • ‘single quotation marks (which includes the apostrophe)’
  • en-dashes –
  • em-dashes —

But almost no web sites bother. With just these 4 characters correctly displayed, typography on the web would already be vastly improved.

Stuck with MS Word

For all it’s faults, Word is still the only editor I know of that makes it easy for anyone to actually use these special characters correctly without having to memorise cryptic keyboard shortcuts. Apostrophes, single and double quotes, and en-dashes are automatically inserted during normal typing. The em-dash can be inserted with the shortcut key Ctrl+Alt+Num- (that’s the minus on the numeric keypad.) Word also happens to be the most popular (as in commonly-used) word processor. Nearly all 3rd-party content I receive for web publishing originate in Word. So it’s pretty much impossible to remove it from the equation.

Reaching for low-hanging fruit

At its simplest (forgetting for now HTML tags), my wish was to be able to Copy from Word and Paste into a plaintext editor (either an HTML editor or a web form) with the above 4 characters converted in such a way that all browsers display it correctly. Sounds simple, doesn’t it? Well, no plain text editor I know does this automatically. And Dreamweaver’s design view doesn’t do it right either. So a conversion step is necessary. This is anyway unavoidable if the text is destined for a web form, e.g. in a CMS.

So here’s what you do:

  1. Download this macro,
  2. install it in Textpad,
  3. assign it the keyboard shortcut (say) F10,
  4. Copy your text in Word,
  5. Paste it into Textpad,
  6. press F10,
  7. and then it’s ready to be copied and pasted into your HTML code.
  8. Repeat steps 4-7.

I know this will make a great difference to me. I don’t know whether others will agree. Please let me know what other conversions you’d like me to include in the next version of this macro. Likewise, if you find bugs, if you have a better method, or know why this is a waste of time, please let me know.

15 April 2002

Updates: Entities and bookmarklets

Posted at 12:25 AM | Permalink | Comments (0)

IE 5.0 PC and Opera 6 PC added to the entity support chart (thanks Nate and Branko), and lots of improvements to the bookmarklets page.

11 April 2002

Wanted: A foolproof route from word processor to HTML

Posted at 11:51 PM | Permalink | Comments (9)

As I admitted at the end of my entity support test article, knowing which special characters display correctly is a hollow victory.

Sure, now we can easily use the correct typographical marks when working in our favourite HTML editor, but only a small fraction of text on the web is generated like that. The great majority originates in word processors wielded by journalists or editors, and is transformed into HTML either by the word processor itself, or via a CMS, or by an HTML “programmer”.

Even if the writer was versed in the correct use of special characters, the deck is stacked against the typography surviving the transfer to HTML. The most popular word processor in the world generates HTML so bloated that it’s practically worthless as a route to web publishing, and doesn’t convert special characters to the correct numeric entities. Copying and pasting formatted text from Word into a WYSIWYG web editor like Dreamweaver doesn’t convert characters correctly either. And few HTML coders have the time or the knowledge to convert these characters by hand.

What we need

Now, I’m lazy, and I’m writing this in the hope that someone can proffer a (preferably automated) system that ensures the preservation of special characters from the word processing environment to HTML.

This could be

  • a means of converting Word or RTF documents to clean HTML with correct numeric entities;
  • software or a macro that automatically corrects Word’s HTML;
  • an alternative word processor to Word that offers a smoother route to HTML, and can realistically be provided to one’s journalists and clients, for example a simple RTF editor;
  • an editorial workflow that describes who takes responsibility, at what stages and by what means, for the preservation of typographical quality;
  • Word processor templates and/or editorial guidelines that enforce typographical standards and ease conversion to HTML.

For bonus points

  • correctly converting numbered or bulleted lists to HTML lists;
  • correctly converting Headings to HTML <h1><h6>, paragraphs to <p>; optionally removing empty paragraphs;
  • optionally converting Bold and Italic to <strong> and <em>, or stripping out character styles altogether;
  • correctly converting simple tables.

Note that I am not looking for a converter that attempts to mimic in HTML the look of the Word document. The HTML will be styled by the site’s style sheet. I’m just seeking to preserve special characters and hopefully also document structure. I’ll happily settle for plain text with just the special characters correctly converted.

What I know so far

As a modest recompense, I can offer what I’ve learned so far:

  • HTMLTidy can be customised to automatically convert lots of HTML in many, many ways. I admit I haven’t attempted it yet. Could this be trained to clean Word HTML down to the structural bones and convert special characters to the correct numeric entities? Can anyone share this knowledge?
  • Dreamweaver and Textism both ruthlessly clean up Word HTML, but neither converts special characters correctly, and fail on a great deal of Word formatting, for example lists.
  • “The Office HTML Filter is a tool you can use to remove Office-specific markup tags embedded in Office 2000 documents saved as HTML.” This adds the Export To: Compact HTML option to Word’s File menu.
  • wvWare are *nix libraries that allow access to Microsoft Word files. The wvHtml utility converts Word documents into HTML4.0.
  • NuxDocument, an extension of Eric Barroca's MSWordDocument, is a Zope product that represents generic documents by using plugins to convert native productivity suite formats to HTML. Plugins include MSOffice, OpenOffice.org, RTF and PDF.
  • David McRitchie, a microsoft MVP, has a very good MS-HTML resource, with the emphasis on converting Excel spreadsheets to HTML. He mentions on this page, incidentally, that he’s not impressed by the Office HTML Filter mentioned above. Other Microsoft MVPs or forums are good sources of information.
  • Michael Mell is writing RTF2HTML, an extendable Python tool for converting RTF to HTML. He is eager to hear suggestions for further improvements.
  • It is possible to build a good RTF editor into the browser, although this is probably not a solution for migrating content from word processor to HTML.
  • I’ve heard many people sing the praises of the freeware HTML editor HTML-Kit. It’s very customisable, has tons of plug-ins, and comes with lots of built in html-tidy functionality/presets. Maybe it does a better job of converting Word HTML?
  • A discussion on this issue, with many recommendations and links.

10 April 2002

Jared Spool in London

Posted at 09:02 AM | Permalink | Comments (0)

A very entertaining lecture (“Beyond Common Sense”) Monday night by New York’s foremost usability expert.

For me, the most important point he makes is that today’s design process does not have a feedback loop built in. We build, and then move on to the next job (“throw it over the fence”), never listening to the users or learning from the mistakes we made. Or for that matter, learning from what works.

This ties in with his vision for a design process that boasts the same degree of empiric knowledge that engineers draw on: Design patterns1. Documented designs that have been proved suitable for facilitating certain behaviours. In the Q&A Tim Ostler of Scient asked whether IP law wouldn’t get in the way, but Jared says clever design gets around that. (I hope that doesn’t mean such similar-but-different solutions like Adobe, Macromedia and Microsoft’s different takes on the “tear-off palette”, or a hundred different ways of saying “one-click”.) But Jared’s message is a valuable one in an era when “creativity” and “innovation” (often just self-expression by another name) have become such incontestible touchstones. “Until you’ve got a better answer, you copy”, I just happened to read today in Ogilvy on advertising (quite a Jakob himself).

Another very useful insight, regarding user testing, was “user-created tasks”. Find out by interview what are realistic on-line tasks for each respondant, and test those. Arbitrary, forced tasks will not reveal typical behaviour.

Despite the abundant (very funny) Jakob-bashing, it has to be said that Jared makes just as many controversial (or, frankly, dubious) blanket statements. Yes, he’ll quote the user-testing figures to prove his case, but often I’d draw different conclusions from them. For example Search, which he claims to never result in a successful transaction, since people only turn to them after failing to spot the appropriate ”trigger word”, usually a category. Firstly, this just stresses the necessity of keyword-mapping and misspelling-tolerant search engines. Furthermore, how does he think anyone uses Amazon or eBay if not by searching? And how can you compare that with shopping for clothes where there simply isn’t an exact term to describe a unique article, even if you wanted to shop for clothes that way? He did, effectively, retract his blanket ban on Search by acknowledging its use for “indexable” goods like books, but now I have to deal with people who just remembered “Search is bad.” In the same way, I’ve seen designers interpret his “Back button is the Button of Doom” mantra to mean that they should put Back buttons (by whatever name) in the middle of their pages.

I also found some of his analogies debatable. The “perfect sale situation” he sketches — where you have supply, demand and cash, which he likens to buying milk at 7-Eleven (and yet e-tailers still manage to lose the sale) — may be good for rolling your eyes at, but completely ignores the lifetime’s store of accumulated knowledge that allows people to shop at stores. Can anyone remember the first time, as a child, you were sent to buy the milk? You may well have failed in your mission (because of the scary shopkeeper, not finding it but being too embarrassed to ask, or whatever), and that’s notwithstanding the number of times you’ve seen already watched your mother shop. If you’ve never shopped on-line before, you are like a child.

By harping on faults I could find, I may have given the impression that the lecture wasn’t worthwhile. The only reason why I don’t state all the excellent points he made is that you can read them for yourself.

1. Further reading on the concept of Design Patterns: Christopher Alexander, Martijn van Welie’s Web Design Patterns, The Interaction Design Patterns Home Page, Hypermedia Design Pattern Repository, Common Ground.

04 April 2002

Browser support for extended characters

Posted at 11:42 PM | Permalink | Comments (0)

HTML forced many typographical limitations onto designers. Some, like exact font specification, are unavoidable. Others, like extended characters, are not, but are so inconvenient few people bother using them. As a result, many features of fine typography like curly quotation marks, dashes and ellipses are all but extinct on screen and many designers are no longer even aware of their existence, or resort to semantically meaningless GIFs for special characters. These things matter.

If clumsy editing tools weren’t enough of a problem, browser manufacturers compounded it through their incomplete support for character entity references. The former problem is modestly addressed by my Dreamweaver and Textpad modifications.

As for the browser support, aardvark created the invaluable Character Entity Chart (required reading before any of this will make any sense), and advises that “to be sure a particular browser supports the entities (both named and numeric), simply open your browser to this page and view the charts”. Unfortunately, without a reference you cannot be entirely sure whether a character is displaying correctly, and not all of us have the means or time to test them. For that purpose I’ve started the numeric entity browser support table [200kb]. Update: Anyone wishing to contribute reports use this Excel file: table.xls.

Here are some conclusions, observations and questions I’ve drawn from it so far:

  • Use numeric entities if unsure — I have not yet found any case where the character entity is supported, but not the numeric. I have found some instances of the opposite case.
  • At the very least, get into the habit of using curly single and double quotations, en- and em-dashes.
  • It’s a whole different kettle of fish for non-English writers, but I haven’t looked into this yet.
  • I put spaces either side of an em-dash — even though it is not correct, the dashes are so short in Opera that it would look ridiculous without spaces. It’s a small sacrifice.
  • I’m very disappointed with Opera 5, whose entity support is almost as bad as the v.4 browsers.
  • I’d like to use the multiplication sign, which occurs in my writing quite often — e.g. 800×600, which looks much better than 800x600, especially if using a serif font. But can I sacrifice NN4.7Mac, for users of which it’ll be meaningless?
  • Ditto for vulgar fractions ¼ ½ and ¾.
  • Ditto for ellipsis, for Opera’s sake this time…
  • Anybody got any recommendations for arrows? Can we rely on Wingdings or ZapfDingbats?
  • Mathematical characters are very badly supported. Looks like maths sites would have to stick to GIFs.
  • What’s the deal with the Diamond card?
  • I'm very interested in support for the soft hyphen. Please get in touch if you’ve tested it.
  • Using these entities is still an uphill task, due to editing tools, content provision and job descriptions. A subject for a future rant.

02 April 2002

I’ve updated the Textpad clip

Posted at 11:56 PM | Permalink | Comments (0)

I’ve updated the Textpad clip library to include many more useful numeric entities. More details here, or download it now.