There’s a lot of work being done on so-called 3D desktops. I think it’s worth getting some finer-grained terminology. There’s 1-, 2-, and 3-D, and the fractional 2.25-D and 2.5-D. And there’s the non-spatial dimensions T-D, G-D, A-D, C-D and O-D.
The traditional 2D/3D distinction is whether the model being viewed is defined in just two spatial dimensions or three. Of course, either way, it is typically projected onto 2-D for display on a flat screen. This projection might involve rotation or scaling, so it’s worthwhile to remember that there’s a projection involved, even when the model is 2-dimensional.
Most computer applications have a 1-Dimensional model: text. But here, too, it is typically projected onto a 2-D display.
While 3D geometry models had been used early on in Computer Aided Design (CAD), the first CAD on weak PC systems were called 2.5-D. In 2-D, each point of interest can be specified in X and Y coordinates, which can be specified independently of each other. True 3-D systems have a third orthogonal coordinate. But 2.5-D systems allow 2D geometry to be extruded through a third dimension. The third dimension cannot be specified independently for each point. Instead, a whole group or layer of points share a depth. This is fine for architectural work, where people normally think of spaces as being vertical extrusions of 2D plans.
Recently, we’ve seen some user interfaces that introduce a similar but even weaker variation through a limited third dimension. Microsoft’s new Vista operating system has so-called 3-D windows in which the third coordinate cannot be independently specified. Sun’s project Looking Glass animates through a similarly non-independent third dimension. (Be sure to wait or fast forward to the demo in this link.) And similarly for Beryl on Linux or the leafing technique made famous by the iPhone demo.
I view all of these techniques – in which multiple views of a 2D model are displayed, with layering or animation of transition – as fractional dimensions. It is 2.25-D if it’s mere animation between fixed values, or 2.5-D if the third dimension is continuously (but not independently) variable. The value of these fractional techniques (in addition to just looking cool), is that they reinforce the illusion that you’re dealing with a manipulatable object.
Animation brings us to another kind of dimension – time. A movie is fractional time: the 2D display varies over time, but the pixels don’t vary indepently of time. However, a 2-D Web site is richer still if it is implemented in Flash and has different parts that respond to our actions in different and continuously variable ways. Let’s call these 2T-D. 3-D Machinima is 3T-D.
The newest thing in user interfaces is to explicitly represent the people explicitly as Avatars. Even if single-user, the A-D allows us to distinguish between ourselves and the applications. 2A-D slide shows with talking heads are simply more engaging. In a collaborative environment, an avatar allows us to easily see who is doing what. Consider, for example, a conferencing application that doesn’t represent the users at all. It’s difficult or impossible to know who is in the conference, who is speaking, or who is manipulating the central presentation. Some 2-D conference applications do attempt to represent users with 1-D text – typically in a listing in the upper right. But these are not associated with actions, so it is hard for me to accept these as true avatars. I can decode the text and see who has entered the room and possibly remember who has left, and maybe the name of the participant who has control of the main display is in color, but that’s a lot of cognitive overhead to process. In Croquet, it is not just the fact that the avatars are 3-D that matters. Croquet is truly in the A-D because we can see (through their movement and their laser pointers) what they are doing without thinking about it. Would you rather look at a Web-site access log to see who is doing what, or would you rather have a seat at the mall and get an instant assessment of who is going in and out of each store?
Another non-spatial dimension is the degree to which an environment is generative. At one end of the G-D, we have bit-map sounds, images, and movies. The model is a sea of bits which must all be transmitted to the application. It can be quite difficult to extract a more meaningful model form these bits. (Where are the edges of the teapot? What is it’s volume or orienation?). The move to vector graphics has been driven primarily by bandwdith considerations: MIDI can be much smaller than an MP3, SVG much smaller than JPEG, Flash much smaller than an MPEG. By increasing how much a model is parameterized over a pre-defined class, we achieve huge compression. But I feel the G-D’s real power for user interface comes when combined with high values in the other dimensions. Compared with a talking-head video (2A-D), generative 2AG-D or 3AG-D avatars are more flexible.
Croquet, of course, is all about C-D: Coordination (or collaboration, or cooperation). Regardless of whether the model is 2D or 3D, what I do to the model ought to be immediately visible to anyone else using the same model. There are several other systems, of course, that do this, too. Some of these are even 3A-D.
Well, that gives us the C, G, A, and T bases that normally form DNA. A fifth base, U, is sometimes found, but I can’t quite make that work. My fifth base for user interfaces is O: The Object-Dimension.
Despite the fact that all of this is, in my opinion, done for the purpose of making things easier to use by virtue of the things seeming to be real objects, it is certainly possible to be carried away implementing these techniques without being consistent about presenting specific and predictable objects to the user. BumpTop is a 2.25-D desktop that does begin to treat the user interface elements as objects. But we can go much further. For example, consider the “undo” operation we often see in applications. The undo button or hot-key allows us to revert the global-state of the application one step at a time, or re-do it going forward. But what if I want to select three problematic paragraphs and a sentence out of a document I’ve been working on, and only undo those? Surely, each real object has it’s own history that can be manipulated independently of the history of other objects, or which can be merged with the history of other specifically selected objects to form a group history. Now that’s object-oriented programming and an o-o user-interface! And should not every single object be directly manipulatable, including the user interface itself? Indeed, I would like to select a group of objects to get a direct-manipulation time-line slider as an object. When I go back in time, and do something, I want to see both possible futures from that point displayed as a railroad-track fork, and to be able to easily pass my slider over each path. This is thinking in the O-D!
O-D is kind of the glue that brings the others to life. For example, while Flash is 2T-D, Laszlo is 2TO-D.
So, Croquet is about a lot more than being 3-D or A-D or even C-D. It is, perhaps, unique in the extent to which it is also highly T-D, G-D, and O-D.