Aftermarket Pipes

Friday, April 27, 2007

Your Memex is here. Are you using it?

Microsoft Research has put significant effort into implementing a near-literal version of Vannevar Bush's "memex" in its MyLifeBits project. I think we have the memex already: we just don't realize it.

In 1945, the Atlantic Journal published "As We May Think", in which Vannevar Bush speculated that in the future, a machine--the "memex," or "memory extender"--would assist researchers by storing, indexing, and retrieving every piece of information they could possibly need. A user could also add his own text, images, or recordings, and could record notes and comments on the content. And it all fit within a large desk.

This was strong stuff for the time: understand that the state of the art was the Harvard Mark I: a 50-foot-long, 10,000-pound, four-function calculator that could divide at the blinding speed of four operations per minute. To put it in perspective, Bush's prediction was made when my grandparents were not yet old enough to drive a car.

Since 2002, Microsoft Research has been working on implementing MyLifeBits, their version of the memex. And after five years of effort, they now have a one-user prototype to show for their efforts. So don't expect to be shelling out for the Microsoft Memex anytime soon.

But a few weeks ago, I had a realization. I went back to the original, 60-year-old article, and read over the description of the memex again:

A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility....

It consists of a desk, and while it can presumably be operated from a distance, it is primarily the piece of furniture at which he works....

In one end is the stored material. The matter of bulk is well taken care of by improved microfilm. ...

Most of the memex contents are purchased on microfilm ready for insertion. Books of all sorts, pictures, current periodicals, newspapers, are thus obtained and dropped into place....

All this is conventional, except for the projection forward of present-day mechanisms and gadgetry. It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another....It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. It is more than this, for any item can be joined into numerous trails.... And his trails do not fade.

There are more parallels, but that's a good start.

So on the one hand, we have a research project to create a literal implementation of the memex that might exist sometime in the future, or a distributed, chaotic, mashup of individual technologies that together get about 90% of the way there today.

Any bets on which will get there first?

More importantly, what are you waiting for that isn't there yet?

Monday, April 23, 2007

Slightly less perpetually behind

Adam Rifkin says, "Having a blog means feeling perpetually behind." Well, today I'm slightly less behind. The Pipes have been blocked for a few months (you might notice that this happens every time a project at work goes into the home stretch), but now that pressure is off, and I'll be able to (finally) polish and post some items that have been languishing in the Drafts folder for a while.

Friday, September 15, 2006

TurboGears decorator madness: linkify

One of the neat features in TurboGears is the @jsonify decorator. It uses RuleDispatch to define generic functions to convert data model objects into JSON notation for use in AJAXish applications. For example, TurboGears provides this default converter for the User identity class:

@jsonify.when('isinstance(obj, User)')
def jsonify_user(obj):
   result = jsonify_sqlobject( obj )
   del result['password']
   result["groups"] = [g.group_name for g in obj.groups]
   result["permissions"] = [p.permission_name for p in obj.permissions]
   return result

The first line lets the default JSONifier rules handle the object; after that, it removes the "password" field for security reasons, and then adds support for fields that the default rules can't handle (like joins). The @jsonify.when decorator handles mapping the default jsonify() function to the type-specific version, so when you want to return a User object converted to JSON, you just return "jsonify(myUser)" and you're done.

This approach can be used for other purposes. For example, in one project, I kept running across is the need to render references to objects as links to view that object. For example, say you have a app that renders the text

Last updated at 12:00 by Joe

with the template snippet:

<p>Last updated at ${thing.last_update_time} by ${thing.update_user.display_name}</p>

Easy and straightforward. But if you want to link "Joe" to Joe's user profile page, then every time you want to do this, you end up writing something like:

<p>Last updated at ${thing.last_update_time} by 
  <a href="${'/users/%d' % thing.update_user.id}" 
    title="User profile for ${thing.update_user.display_name}"
    ${thing.update_user.display_name}
  </a>
</p>

Then hours later you kick yourself because you find one place out of 20 where you made a typo in this monstrosity (see if you can find the one in the example above!).

So I "borrowed" jsonify's approach and created linkify.py for the project:

import dispatch
import model
import types

from elementtree import ElementTree

# Linkify generic methods... modeled after jsonify

@dispatch.generic()
def linkify(obj):
   raise NotImplementedError

@linkify.when('isinstance(obj, model.User)')
def linkify_user(user):
   link = ElementTree.Element('a',
                              href='/user/%d' % user.id,
                              title='User Profile for "%s"' % user.display_name)
   link.text = user.display_name
   return link

Then, in your controllers.py, you can make this available to templates:

# Add linkify to tg namespace
import linkify
def provide_linkify(vars):
   vars['linkify'] = linkify.linkify
turbogears.view.variable_providers.append(provide_linkify)

And now, in your template, you just write:

<p>Last updated at ${thing.last_update_time} by ${tg.linkify(thing.update_user)}</p>

Much, much nicer.

Wednesday, August 02, 2006

Yes, games ARE different.

In the late 1990s, the game industry was starting to tackle "best practices". Basic techniques common in the rest of the software world (and mostly taken for granted now) weren't getting much traction in the game business. I often heard fellow game developers complain that those practices wouldn't work for games, because the industry was "just different." A decade later, I think they were right--but not in the way they thought they were.

When will Duke Nukem Forever be released? When it's done. DNF is the most popular whipping boy for a game development schedule gone horribly wrong, but there have been plenty of similar examples (including the original Unreal). Game developers are notorious for scope creep and gold-plating, as well as massive mid-stream changes in design and architecture. Why is that?

I've become convinced it's not solely due to "unprofessionalism" or lack of effective project management. You see, if you're creating a business application, or a device driver, or control software, you can specify the requirements, from which you can create a design, from which you can create code, which you can trace back to requirements. It's not hard to write effective, measurable requirements for this type of software, which means it's not hard to justify or drop a given feature with objectivity.

But games have a damnable, hidden, unwritten Requirement Zero:

Thy Game Shall Be Fun.

Requirement Zero is a real killer. Because it's not objectively measurable, it's not easy to manage. You can pick on Daikatana because of its schedule slips, or because you don't like its designer, but the bottom line was it just wasn't a fun game, and for more reasons than the obvious "sidekicks get squished by doors" bugs.

I once had a conversation with Danielle Bunten Berry (of M.U.L.E. fame) and a colleague about a particular game. It was gorgeous, free of technical glitches, pushed the envelope on technology, and had a decent storyline. But my colleague summed it up this way: "There's a big hole where the fun should be".

You see, users don't expect their word processor, or their spreadsheet, or the software that dispenses their soda at the local mini-mart to be fun. It's sufficient for it to be "fun-neutral": so long as it isn't so badly designed, defective, or poorly-performing as to be "un-fun", it's acceptable.

You can create a game that completely satisfies the spec, is totally free of bugs, finishes on-time, stays within its budget, and has a flawless marketing program--and it can still be a miserable failure if it isn't fun.

Yes, games ARE different.

Thursday, June 15, 2006

this test app can break

Wow. I didn't expect a post about a little-used Windows API function to generate 30,000 page views. In any event, some folks still doubt the "IsTextUnicode()" explanation, so I'm putting up the test app that I used to validate my theory before I blogged it.

Just run the app, and enter a string into the edit control. As you type, the app repeatedly calls IsTextUnicode() and shows both the result (Unicode/not Unicode) and the flags that IsTextUnicode() returns to indicate which tests it used.

Updated:I had pasted in the relevant chunk of the app source code, but it appears this blog template chokes on 70-column preformatted text. If you really want it, drop me a line.

Wednesday, June 14, 2006

this api can break

Over at WinCustomize, someone thought they'd found an Easter Egg in the Windows Notepad application. If you:

Open Notepad
Type the text "this app can break" (without quotes)
Save the file
Re-open the file in Notepad

Notepad displays seemingly-random Chinese characters, or boxes if your default Notepad font doesn't support those characters.

It's not an Easter egg (even though it seems like a funny one), and as it turns out, Notepad writes the file correctly. It's only when Notepad reads the file back in that it seems to lose its mind.

But we can't even blame Notepad: it's a limitation of Windows itself, specifically the Windows function that Notepad uses to figure out if a text file is Unicode or not.

You see, text files containing Unicode (more correctly, UTF-16-encoded Unicode) are supposed to start with a "Byte-Order Mark" (BOM), which is a two-byte flag that tells a reader how the following UTF-16 data is encoded. Given that these two bytes are exceedingly unlikely to occur at the beginning of an ASCII text file, it's commonly used to tell whether a text file is encoded in UTF-16.

But plenty of applications don't bother writing this marker at the beginning of a UTF-16-encoded file. So what's an app like Notepad to do?

Windows helpfully provides a function called IsTextUnicode()--you pass it some data, and it tells you whether it's UTF-16-encoded or not.

Sorta.

It actually runs a couple of heuristics over the first 256 bytes of the data and provides its best guess. As it turns out, these tests aren't terribly reliable for very short ASCII strings that contain an even number of lower-case letters, like "this app can break", or more appropriately, "this api can break".

The documentation for IsTextUnicode says:

These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through. For example, if lpBuffer points to the ASCII string 0x41, 0x0A, 0x0D, 0x1D (A\n\r^Z), the string passes the IS_TEXT_UNICODE_STATISTICS test, though failure would be preferable.

Indeed.

As a wise man once said, "In the face of ambiguity, refuse the temptation to guess."

Competency and Layers

Larry Osterman has yet another great network programming post on his blog. To sum up, he declares his second rule of "making things go fast on the network":

You can't design your application protocol in a vacuum. You need to understand how the layers below your application work before you deploy it.

An excellent rule. Actually, I've often heard (and used) a more general form:

You can't be competent doing computer work at level N unless you have a good grasp of level N-1.

Programming is all about abstractions, and we as programmers are fond of thinking that our abstractions mean you "don't need to know" what's under the covers. But abstractions aren't perfect, and if you don't know what's under your current level of abstraction, then you're simply not competent.

For example, if you want to be a good MFC programmer, you need to have a decent grasp of Win32 API fundamentals. If you want to work in Python, you don't need to be a Python core hacker, but you'd better know enough about the implementation to know why, for example, repeated string concatenation is slow. And if you're using a object-relational mapper over top of a relational database, you still need to know your way around SQL.

I first heard this from Dr. Ralph Droms, one of my professors at Bucknell (who also invented DHCP). If I recall correctly, he was quoting one of the "elder statesmen" of computer science (Dijkstra, maybe?), but I can't recall just who it was.

Friday, May 26, 2006

Marketing Megaframeworks

A few months ago I started experimenting with TurboGears, an all-in-one web framework for Python. It's had some rough patches, but it's finally coming together for a 1.0-quality release. It's sufficiently featured and stable that I'm using it for an internal project at work (a nightly build server), and the developers I've shown it to at work have been pretty enthusiastic about learning it and getting involved.

In preparation for the 1.0 launch, Kevin Dangoor has put together some pretty cool swag on the TurboGears site:

The "Ultimate TurboGears" DVD with archives of all the previously-published screencasts as well as some new, exclusive 1.0 screencasts, as well as an offline copy of the whole TurboGears site
A steel toolbox with the TurboGears logo
A TurboGears-branded marble-racetrack desk toy
A "squishy" foam mini-toolbox

This is an interesting way to market (and fund!) a project like TurboGears. Surprisingly, there aren't any TurboGears shirts available. Maybe it's time for a TG companion for Choose Python...

Friday, March 03, 2006

Nowak vs. Wozniak: IT Journalism Continues its Slide

You know, sometimes I think that IT journalism can be its own worst enemy.

Last week, Peter Nowak published an interview with Apple co-founder Steve Wozniak, in which Woz appeared to make some pretty sweeping (if tactless) statements about Apple's recent strategy. Someone on an Apple-oriented mailing list noted the interview in his newspaper, and Woz, busy unpacking from his trip, sent off a quick email saying that a couple of the syndicated headlines were "way off base", and clarified the comments he made, implying that Nowak took some of his statements out of context.

Now Nowak has gone ballistic. In a response, he states, "Apple co-founder Steve Wozniak has posted a statement online," which he calls "a serious attack – not only on my credibility, but also on that of the press in general."

Lovely. First of all, it isn't an official "statement" that Woz "posted online". Woz responded to an email on a mailing list. The mailing list software archived the email, and it's now available on the list's web archive. Either Nowak doesn't know the difference, or he's being rather deceptive about the source of Woz's comment.

But at least Nowak was nice enough to provide a link to the archive message, a transcript of the original interview, and an MP3 recording of the same, "so that readers and listeners can decide for themselves whether Mr Wozniak was pushed or used."

Let's do just that.

On Intel

Nowak's Article:

The change in processor, for one, is something Wozniak never imagined.

"It's like consorting with the enemy. We've had this long history of saying the enemy is the big black-hatted guys, and they kind of represent evil. We are different and by being different we're better," he says.

"All of sudden we're the same in this hardware regard, so it's a little hard to swallow your words."

The Original Transcript:

Q: So there are two interesting things going on with Apple these days. First is the switch to Intel processors. What do you think about that?

A: Even from when it was first announced, I was kind of bored with it. The reasoning for it was correct. [...] No, Intel just did a very good logic design to not turn on more than needed at any time on the chip and it keeps the power lower, so we'll have higher-speed Macintoshs. And we switched before to a Power PC. Anyone who went through that transition of going from one processor to another with emulators to make the old stuff work, this one actually should be simpler and easier because we've developed for so long on Intel hardware anyway.

Q: Do you think on a philosophical level though there's a good many people out there who think, oh I can't believe Apple has switched to Intel, it's kind of like consorting with the enemy?
A: Absolutely. And you said it exactly right, it's like consorting with the enemy. We have had this long, long history of saying the enemy is the big black-hatted guys, they kind of represent evil, and we are different and by being different we're better. All of a sudden we're the same in this hardware regard, it's a little hard to swallow your own words from the past. And if it wasn't needed, I would say we shouldn't do it, and I have some questions as to how much it's needed. But I don't really have any fears or it's not going to bother me that some software isn't going to work for a while. I mean, anybody who jumps into it real early still has their old computer anyway.

First off, I don't know where "something Woz never imagined" comes from. It's unsupported by anything in the transcript. In fact, Woz talks about where PowerPC was going wrong, what he would rather have seen, why Intel was the right technical choice, and how it's not worse than the move to Power PC.

But read the original transcript again. Nowak asks, "What do you think about it?" Woz replies, "Not a big deal, it was the right thing to do technically, and after all, we've changed CPUs before." Nowak follows up with "Do you think some people will think this is terrible?", Woz agrees and explains why--in the context of what those "some people" will think.

Then in the article, Nowak inverts the order of the comments, leads with Woz's explanation of why some users will react badly, and passes it off as Woz's own opinion. When Woz notes that Apple can't play the "Apple good; Intel evil" card anymore--"it's a little hard to swallow your own words from the past"--Nowak conveniently elides "from the past". After all, Apple used to compare IBM to Big Brother from Orwell's 1984, but noticing that won't sell copy, I guess.

When someone asks you what you think, gets you to agree that some people will have a different opinion than you do, and then passes off that agreement as your own opinion, that's more than leading.

On iPods:

Nowak's Article:

As for iPods, Wozniak has mixed feelings. The success of the devices has been fantastic for Apple, in that they have diversified a company previously dependent on one product. But they are distracting Apple from its focus, and the company may be better served by spinning off the business.

"We're a computer company, and we really think computers," he says.

The iPods have their own operating systems, software and processor, so "there's a different group working on it anyway".

The Original Transcript:

Q: The other thing with Apple these days is what about iPods? Obviously they have a growing importance in the business, what do you think about the whole phenomenon?

A: That one totally surprises me. I'm just blown away by the number of stores I go into that never really carried big consumer electronics, music-type products for ages anyway, since Walkmans. And they just got these huge areas of you know, so many little carrying cases and headsets and this entire iPod auxiliary world. It just totally amazes me, and now it's getting to the point that everybody has an iPod and how do you sell them two, and once they have two, how do you sell them three?

Q: Is it a good move for the company to be putting more emphasis on that aspect of the business?

A: It's a good move in the sense that it's ... not diversion. What do you call it when you put your eggs in more than one basket?

Q: Diversity?

A: Diversity, right. So diversification that the company no longer resides on one product, its fortunes with up and down markets and up and down competition and security flaws and bad press, we aren't subject to one product driving the whole company's financial stake. So it's very, very good for Apple. Maybe it should be a separate division.

Q: You think so?

A: We're a computer company, we really think computers. Of course every product nowadays has a computer inside every technical product, so it's not too hard. I think spinning off a separate division for iPods makes an awful lot of sense.

Q: In what sense?

A: It doesn't have any Macintosh software in it really, it interfaces with the Macintosh's iTunes, and the PC iTunes, but really it's got its own processor, its own operating system, so there's a different group working on it anyway.
[...]
Also, as an example, Apple has long, long believed we should be a hardware and software company, and I've got to say there's a lot of people - myself, even Steve Jobs - have had doubts on occasion as to how we should run this, but by being a hardware and software company we have the integration - the hardware knows about the software, the software knows about the hardware, and they take advantage of each other. The funny thing is, we even did that back in the Apple II.

So here we go, we got the iPod and the iTunes - it's a satellite to your computer. Only by one company having their feet in both camps could the job have been done so well.
[...]
Q: So when you say divide it, are you suggesting perhaps a separate public company that deals with iPods in and of itself?

A: You know, I wouldn't go so far as to suggest how it's spun out, but one thing I believe... at Hewlett-Packard, we had divisions out in very many, different, nice-environment cities of the country - Colorado Springs, Santa Rosa, you know, we had some up in Portland. These divisions all kind of had their nice little living entity areas, and it makes the people work together more as a family, as a community. I believe in that, and Apple's [unknown word - world, perhaps] development is in one campus. This is the only time we've had two such huge products at once, so maybe one should be somewhere else. Even when we had Apple IIs and Macintoshs, the two groups weren't in the same building. The two groups didn't really interface.

So Nowak says, "what do you think of iPods?", Woz gushes over how amazing it is, how it's the right thing for Apple, and Nowak calls that "mixed feelings." Then, when Woz suggests that maybe Apple should move the iPod group to a different physical location (as they did when developing the Macintosh), Nowak spins it into "the company may be better served by spinning off the business"--even though Woz specifically said that's not what he meant.

A Serious Attack?

So what about Nowak's assertion that Woz's email accused Nowak of "pushing" him and having "an agenda", and consistutes an "attack on the press"?

In the email to which Nowak himself links, Woz makes precisely five remarks about the interview itself:

"a couple of headlines... were way off base". Different syndication channels put different headlines over the article. I don't know which ones Woz refers to, but it's hardly an attack on either Nowak or the press in general.
"I did NOT say that the iPod division should be spun off and I feel used in that regard." The first part is factually true, and the second part is a reasonable reaction.
"The reporter again pushed me to say I was negative [on the Intel transition]." Ok, that one's probably an error on Woz's part--I see only two questions about Intel in the transcript.
"That statement [that some Mac fans will be upset because of Apple's previous 'good vs. evil' message] must have been stretched into being one about my own thinking." Looks pretty accurate to me.
"The problem with thinking is that if you think out a 30 second explanation, it passes over the 5 second sound-byte crowd." This is especially apparent in Woz's iPod division comments. In the flow of conversation, he goes back and forth a bit, obviously (if you listen to the recording) thinking out loud. Nowak distills this into a few nice, tight, but misleading sound bytes. Maybe this is the "serious attack on the press" Nowak's howling about?

So, there you have it: Nowak publishes an an article that misrepresents the original interview, Woz clarifies it (from memory, not a transcript or recording) in an email message to a third party, and Nowak misrepresents both the medium and the content of Woz's clarification.

But what really mystifies me is why Nowak went to the trouble of posting a transcript and recording that proves that Woz was right.

Thursday, January 19, 2006

The State of IT Journalism?

A blurb on Slashdot sent me over to an article on a site called IT Observer, entitled "Linux Users May Be Violating Sarbanes-Oxley." If you're not familiar with it, Sarbanes-Oxley is a piece of US legislation, passed in the aftermath of the Enron scandal, which requires public companies to generate reams of documentation in order to prove that they're not the next big scandal waiting to happen.

Anyway, the article notes:

Companies using Linux for embedded applications may be unwittingly violating the Linux license and even breaking federal securities laws, according to a research published by Wasabi Systems.
...
According to the study, the problem lies with the requirements of the Sarbanes-Oxley Act that companies disclose ownership of intellectual property to their shareholders. The study indicates that dozens of companies are discovered each year to have violated the terms of GPL, and if they are public companies, they are violating Sarbanes-Oxley.

Ahem. So the IT Observer headline, while it's not technically a lie, is grossly misleading. It's about as accurate as saying "Mormons May Be Violating Sarbanes-Oxley", or "Democrats May Be Violating Sabanes-Oxley"--after all, some violators might be Mormons or Democrats.

You see, "Linux developers who work for public companies and who also include GPL'ed source code in distributed products without complying with the GPL" is, by any measure, a small subset of "Linux Users." And on the other hand, GPL violations aren't only a problem on Linux, as programmers can illegally use GPL'ed code to write closed software for BSD, Mac O/S, or Windows.

Trust me--I've worked with some of them.

So the headline is not only overly inclusive, it's also overly exclusive. It's a lie that serves only to snare you into reading the article, and it's irresponsible journalism.

Now, I'm far from your average Linux zealot. I make my living by writing proprietary code for custom hardware using a Microsoft OS. But this is rubbish--I'd expect it from Slashdot, but it's unacceptable for a source that claims its mission is "to deliver insightful cutting-edge news reports, in-depth and unbiased reviews and opinions, digital downloads and information relating to computing and technology."

But maybe I'm being too hard on the authors of the article, who are listed as "IT Observer Staff". Maybe it's not intentional deception. Maybe they're just parroting what they've been told. Maybe we should see where the study originates... on a whim, I Googled for "Wasabi Systems" and found them--first link on the page, as a matter of fact. It looks like they're not a market research group (as I'd incorrectly assumed). Wasabi Systems is an OS developer, whose flagship product is:

Wasabi Certified BSD, a certified, tested, and optimized version of the BSD operating system, offers the rich functionality of BSD Unix without Linux's troublesome GPL License.

Ahem. No mention of that in the article. Maybe the "IT Observer Staff" couldn't be bothered to make one Google query for their sources. Or maybe they've never heard of checking sources.

So, which is it... duplicity, apathy, or incompetence?

Wednesday, December 28, 2005

Coming Up for Air

They say that when a Microsoft blogger goes quiet, it means there's something big coming up. Well, that's not just for Microsoft. I'm back to blogging after two major, major releases at work: Vocollect Voice for Handhelds and the Vocollect Talkman T5 wearable computer.

The T5 was a pretty exciting project--a high-performance, extremely ruggedized, Bluetooth- and 802.11b-enabled voice-controlled wearable about the size of a large mouse. Unfortunately, I can't talk about the internals of which I'm most proud, but ite's a damned slick piece of hardware, and probably the most fun product of my career.

So now that I have a little breathing room, I'm going to be able to think about non-work-related geekery a little more. I've become fairly smitten with TurboGears over the last few months, and it's been frustrating not to have enough time to properly follow what's developing there. With any luck, I'll be blogging more about that in the near, near future.

Edit 17 Jan 2006: finally got "real" links for VVH and T5...

Thursday, September 01, 2005

Bad Marketing Theatre presents...

Amongst my pet peeves are such diverse elements as bad science, acceptance of innumeracy, and marketing-without-thinking. Marketing-without-thinking means going through all the motions of what marketing textbooks or white papers say you should do, without really knowing why you're doing them. "Checklist-based" software design and meaningless "Feature/Benefit" tables are among the most common examples.

Recently, I found this chart on the Alienware web site. What's wrong with this picture?

Helpfully, the chart explains that lower numbers are better. Unhelpfully, it doesn't provide numbers. (Not to mention that percentage-based comparisons of temperature aren't terribly meaningful to start with.)

Here's another, from Logitech's web site. I was looking around at their "digital pen" technology, and noticed that they had two different versions, which differ in price by $100. Fortunately, there's a "Compare Products" link. Unfortunately, this is the chart it gives you.

Six products are listed: three cradles (all with identical "features"), two pens (each with identical "features"), and ink. The two pens each include "PC", "USB", "Optical Sensor", and "Ballpoint Pen". The ink refills, however, don't include "PC", "USB", or "Optical Sensor", and the cradles don't have "Optical Sensors" or "Ballpoint Pens". That's a bit of a relief--I would be a little concerned if I had to attach my ink to the PC in some way.

A little intelligence goes a long way...

Friday, July 15, 2005

A first taste of Spyce

Like many programmers, I have a tendency to accumulate unfinished side projects. One of these is a web application. Over the years it's gone from WebWare, to Woven, to CherryPy 1, to Quixote + Cheetah, to Nevow, and finally landed on CherryPy 2 + HTMLTemplate + SQLObject + SQLite.

Yes, continually rewriting the infrastructure code is probably the biggest reason it's a continually unfinished project, but on the other hand, it's become a fun sandbox for trying out different app servers, persistence layers, and template engines.

I've written a pretty large chunk of CherryPy-based plumbing, and I've liked it pretty well, but lately the HTMLTemplate-based code started getting a little too hairy. CherryTemplate didn't look particularly better, and I didn't like my previous experience with Cheetah, so I started looking around again. A few conversations with Jonathan Ellis convinced me to give Spyce a try.

Now, Spyce is more than a templating language--it implements both the app server and templating bits of the stack, and it doesn't look like CherryPy+Spyce would be a clean or efficient separation. Fortunately both frameworks make it easy to isolate non-presentation logic into pure, non-framework modules (which I have done), so much of it is probably salvageable.

So far, I like what I see. I didn't really grok tag libraries the first time I looked at it, but after getting my head around them by reading the source, they're pretty elegant. The templating language is nicely structured and expressive (which was the whole point of the exercise) and it fits my brain better than Cheetah's.

The one gripe I have with Spyce is its lack of URL abstraction. Given the multiple framework changes, I've become a total convert to the RESTful/"Cool URI" paradigm, which avoids exposing the site implementation structure in your URLs. CherryPy and Quixote do this very well, but Spyce is centered around the "one template file=one URL" paradigm.

If and when I move beyond Spyce's internal server and onto Apache, mod_rewrite will provide an inelegant but useable way to do it, but I'd rather my framework didn't force me to hack around it at that level.

Wednesday, May 25, 2005

Unsung Heroes of Python: asynchat/asyncore

Python is known for its "batteries-included" nature. One of the batteries that gets too little attention, I think, the asynchat module.

One of my pastimes is playing a certain web game, which features live chat. In a previous incarnation of the game, one of the players wrote a very useful "bot"--an automated pseudo-player that sat around in chat, and provided useful information when queried. She quit before the newest revision of the game was released, though, and some of the players were missing the bot.

So, I pulled out the asynchat module, launched Ethereal, and started reverse-engineering the chat protocol (source and some documentation are available, but the version I was talking to seems to be somewhat customized).

45 minutes later, I had a fully-functional bot. Another hour, and I had a nicely-factored module from which you could build a whole new ultra-wizzy chat client. (Teaching the bot all about the game took another eight hours, of course, but I don't think any libraries would help with that...)

Now the cooler-than-thou Pythonistas out there are probably saying, "Bah! Twisted Rules!". That's nice. Twisted may be sexy, but asynchat/asyncore has some advantages:

It's simple. Two modules, under 1k lines of code (as opposed to a raft of modules and 80k lines of code). No surprises.
It's documented, so I don't have to hang out in an IRC room or grovel through thousands of lines of code to figure out what's wrong.
It's included with Python, so I know it's tested--no surprises when I try it on a new machine.
It's written in everyday bog-standard Python, not its own framework on top of Python, so there's no prerequisite learning to do.
I'm reasonably sure that it's not going to be drastically changed.

So, for grinding out TCP/IP-based tools quickly, nothing beats asynchat. It probably would have taken me three times as long in C or C++, and if I'd started with Twisted, I'd still be reading Twisted source code...

Thursday, May 12, 2005

Businesses can take hiring tips from Open Source

Dave Friedman and Doc Searls had an interesting discussion on the difference between "hiring" in open source and business. Doc railed against hiring based on flawed measures like academic degrees and IQ tests; Dave concluded that businesses and open source projects are so different that the same practices can't work in both. I think there's a better way.

Dave admits that academic degrees and IQ tests are imperfect, but businesses have to use them to evaluate potential employees because there's no other way to do it. The "anarchic world of open-source coding" doesn't use them, because there's no evaluation to be done: "any person can contribute to the code, at any time, regardless of qualification".

On the contrary, no serious Open Source project would ever think of letting Joe Random contribute changes at will. Sure, anyone can download the source and make their own changes, but "commit privileges"--the ability to make those changes to the official codebase--are tightly controlled. That's a big distinction.

Open Source contributors start by having to submit every change as a "patch" to the existing code, rather than changing the code directly. A current developer examines the patch, then either rejects it or commits it on the contributor's behalf. After slowly establishing a track record of both good patches and the ability to work with the rest of the team, the contributor may receive the ability to directly commit his own changes.

On most projects, even programmers with years of experience and spotless reputations have to go through this process. Some projects are so conservative with commit privileges that even valued, long-time contributors still have to submit patches.

So granting commit privileges to a contributor is the Open Source equivalent of hiring an employee. Both represent serious commitments--an incompetent contributor with commit privileges is as dangerous to the project as an incompetent employee is to a business. And revoking commit privileges carries the same political and psychological baggage as firing an employee.

Businesses try to predict whether a candidate will be a good employee, while Open Source projects say, "show us you're good by doing work at no risk to us, and then maybe we'll offer you a position." It's unlikely that the software industry can get away with this--the media and medical industries do, but only for entry-level positions. So what can we do?

I think the solution is to increase our use of true "contract-to-hire" positions. Contract-to-hire gives the company the ability to bring a candidate on at low risk, then hire the candidate or decline with no repercussions. It's also far better handling the unfortunate case of a competent employee that simply isn't a good fit for the company, because it limits the company's liability (both legal and emotional) while letting the employee avoid a resume-busting dismissal.

Yes, some companies abuse contract-to-hire. I know one programmer who was assured he would be "converted" in six months, only to spend two years in "headcount limbo" before being released with no warning. To be fair to both parties, the contract has to specify both the duration of the contract and a deadline for exercising or declining the option to hire.

The lack of benefits like health insurance for contractors is an issue, too, but it's hardly insurmountable. Contractors already command higher rates than they would get as full-time salary in order to pay for the missing benefits. When negotiating the contract terms, negotiate the proposed full-time salary (and thus the contractor's "benefit allowance") up front.

Fair and honest contract-to-hire is a win for both employers and individuals, and it's the only way I can see to achieve the hiring benefits Open Source projects enjoy. So what am I missing?

Wednesday, April 27, 2005

The Googlemonster strikes again...

...and now Alex Martelli has been consumed.

The recent influx of "name" talent at Google reminds me of the projections we used to make in the mid-nineties of how many years, at current growth rates, it would take until Microsoft employed all of Washington State (where I was living at the time).

Friday, April 15, 2005

"Inexcusable" failures of technology

David Berlind at Znet has listed his opinion of "Technology's 10 most inexcusable failures". His insights are valid, but I think he's missing the bigger issue.

He cites problems like "why doesn't my phone automatically remember a number I get via 411?" and "why can't my email system automatically read contact information from emails without an a priori, standardized format like vCard?" The complaints have a common theme: the user thinks, "why doesn't it do this very logical thing?", and the technology provider looks at the technology and thinks "given what I already know how to do, this is what I can do".

When the user says, "What I really want is to have my email automatically pull website addresses out of mail messages," the programmer's initial reaction is to write some code that pulls "http://" out of mail messages, and ship it as a feature.

Great--it did just what the user asked for. But what the user really wanted was "how can I make my email program and my browser share a brain?" The developer's logical but incorrect response leads to surgical, limited, and frustrating "fixes" for life's problems.

An example: Microsoft introduced the Start menu in Windows 95. They found that a large segment of users didn't know where to begin without Windows 3.1's visible Program Manager, so they added an animation that slid across the taskbar and pointed to the Start button. They correctly reasoned that the animation would get annoying after seeing it a few dozen times, so after the user uses the Start menu a few times, the animation no longer occurs.

The close-up problem is "how do we tell new users to use the Start menu, and then stop telling them when they know how?" The surgical solution is a bit of text, a chunk of code, and a Registry setting.

Later, they discovered that people who wanted to get rid of the Office Assistant altogether typically just hid it every time it came up. Another close-up problem, with another close-up solution: if you hide the assistant right away several times in a row, it asks if you really want to get rid of it permanently. Another small chunk of code and a Registry setting.

These are really two examples of the same problem: help the users until they don't need it anymore, then go away. But we keep solving the specific problem again and again, each time we see it come up, with specific one-off fixes.

On the other hand, after seeing this one too many times, the programmers bring out the big guns: they design a large, rigid, all-singing, all-dancing specification that attempts to predict and address all possible isomorphs of the problem.

Dwight Eisenhower is credited with saying, "If a problem cannot be solved, enlarge it." I'd add, "Enlargement is a strategy—not a goal!"

This is COM. This is OLE. This is MFC's document architecture, the vCard spec, J2EE and every other "boil the ocean" design that aims to enumerate all possible needs and create a design that addresses everything in one perfect system. It solves the original problem—sort of—and then acquires a life of its own. Now, when a different problem comes up, the programmer tries to force the solution into the amazing problem-solving framework. This works great if the new problem is truly identical to the old one. That's usually not the case.

So, instead of solving real, root problems in a sustainable way, we continue to choose between doing as little as we can to solve the facet we're staring at, or by designing a massive architecture that's too fragile to adapt to the next challenge that arrives.

The first person who understands this, addresses it in a way that adapts to new challenges instead of trying to predict them, and then packages it in a useable and attractive form, will be a true pioneer.

Thursday, April 07, 2005

Phoneblogging with RMS

Two firsts today: a first attempt at blogging from my new phone, and the first time hearing Richard Stallman speak.

Phoneblogging was easier than I had feared, although my phone insists I am clogging.

RMS spoke at Pitt. On the one hand I was a little disappointed in the content of the talk: I was expecting more "current affairs" and state-of-the-FSF, but the bulk of the talk was right out of Free as in Freedom. My disappointment was tempered about half way through the talk, when I realized that two thirds of the audience weren't alive when the events he was explaining occurred. The reactions from the crowd implied that for many of them, this was the first time they were hearing about the beginnings of the Free Software movement. The audience were roughly an even mix of free software cynics ("how can I ever earn a living if I give my work away for free?!") and free software advocates ("how will issue X affect free software?"), and a blessedly-small number of red-faced "oh-my-God-it's-really-him" fanboys.

I recorded the talk on an ancient minicassette recorder. It's not good enough to Oggify and put up, but I know some other people were recording; hopefully one of them will podcast it. He said a few interesting and new things--most interesting, he rattled off a list of a half-dozen or so changes that are being discussed for GPL 3, some of which I haven't heard before.

The question-and-answer period was fun. When I introduced myself as "one of the bad guys", who had the temerity to use Emacs to develop non-Free software, the room groaned, but Richard was gracious ("Using Free software to develop non-Free software doesn't make it any worse.").

What struck me most was his honesty and surprisingly, his humility. That's a word I haven't previously associated with him. He acknowledged when people pointed out "grey areas" in the philosophy (for example, the dichotomy between free software and non-free content). Before the talk began, I wanted to snap a picture of him, and asked his permission. He looked at me, paused, and said, "You know, your freedom is more important than me. Go ahead, but I'm not the important one here."

Unfortunately, that picture is the only one my phone has munged so far. Fitting, given his statement.

Fun with Google Maps

Ok, Google Maps, you know where here is... and you know where there is (although someone needs to tell you the Pitt university website is not at http://utexas.edu).

So why can't you get there from here?

Wednesday, April 06, 2005

The Starbucks Delocator

The Starbucks Delocator is an interesting concept: enter your zip code, and get back a non-chain coffeeshop near your place.

On the other hand, I'm unlikely to drive 26 miles for a cup of coffee.