One of the general internal themes of Croquet is that everything ought to just work, and work well. Most practicing software developers aren’t fortunate enough to be able to create artifacts like this because the software is aimed at addressing a very specific problem. That tends to lead to tools of limited scope and interaction.
Consider sound. If you only want to make voice chat work, you can use a low fidelity encoding on a lossy transport. It will do what it does well, but only that. Now suppose you and someone else are watching a movie and discussing it, using separate programs for the movie and VoIP. Either program might work well, but use them together and everything is likely to go to hell.
To support Croquet’s general distributed collaboration model for everything, we use a delivery mechanism (TCP/IP) in which no data is lost. In order to make things practical in secure settings, we do all transport through a single encrypted port. In order to contribute to the immersive 3D effect, all sound is spatialized to the speaker’s location in the virtual world. That all makes it tough to get good sound.
Well, we’ve been working on that. Our last release of the Qwaq Forums implementation has terrific headset sound over reasonably good Internet connections. In meetings of more than 20 people, with several 2D legacy applications running in-world, the sound is much better than either dedicated line or Internet-based telephone or speakerphone.
A lot of this was good old-fashioned hard engineering by Josh and Andreas and Brad. We use a well-designed high quality encoding. (E.g., there’s a lot of fidelity recorded and we compress the sound well. When there’s no sound, we don’t waste bandwidth.) We integrate a dynamically-sized jitter-buffer with the spatialized sound libraries. (E.g., suppose your network slows down. Instead of having gaps in the voice, we buffer everything. If the network gaps exceed the amount of buffering, we synthesize some sound to stretch the voice and increase the buffer size. As the network improves, we shrink the buffer size so that you don’t have delays like you’re talking to an astronaut on the moon.) We prioritize traffic through the Croquet router. (E.g., suppose no one is talking but there’s a megabyte of other activity to send. If in the middle of that someone starts talking, the other stuff is interrupted in an appropriate way to keep the voice flowing.) We’ve adjusted thread priorities and adjusted the operating system interface so that we can control the speaker and mic devices directly. There’s been a hell of a lot of testing and measuring and profiling. (Some of it done smart and efficiently. Some of it incredibly boring and painful and wasteful. But overall quite necessary.)
There are limitations. We don’t do echo cancellation, so you are better off using a headset (or at least the built-in push-to-talk) rather than an open-mic. The system will tolerate a lot of problems with the Internet connection, including delivery variations, bandwidth throttling by your ISP, and even disconnections – but there are limits and nothing good will happen on a completely shitty connection.
It is amazing to me that we were able to make it work so well. I’m not terribly happy with Vonage and I’m sure that I can tell when someone is using it. But the again, cell phones and land-lines suck too, and maybe Vonage and Skype are good enough. If stand-alone consumer-grade VoIP works well enough on worse connections and hardware, I would say that given a setup good enough for virtual worlds, our sound is much better than dedicated VoIP. (Josh, Andreas and Brad are just flat out amazing.) It is starting to push into the territory of megabuck virtual-presence executive-conferencing systems. And given that it does work, it isn’t at all surprising that it works well in conjunction with other stuff going on. That’s the way it was designed.