Previous: Weapons of Math Destruction
Imagine we are at the Nasa Operations center, and it is filled with people attending to different aspects of a space launch. The operations director checks in with the different domain specialists: “Communications?” “Go.” “Environment?” “Go.” “Transport?” “Negative! We have a problem.” Within that room, the specialists are focused on the complex applications and information in front of them. They all need to hear and speak to each other, and to see some common data on the big board at the front of the room. As the situation changes, some of the problematic applications are examined by related specialists who were not looking at it previously.
Now imagine a distributed virtual operations center.
The same data needs of scores of applications could easily overload any one computer or network. It would be ridiculous to ask each user to turn on and off the various applications as some tense and costly situation evolves. Here’s our first trick. Teleplace does not ask each user to watch the same huge screen showing all the possible applications. Just like the physical operations center, the virtual space has clusters of specialty displays in various locations, and each user looks toward one of these, or stands back to see a group of two or more. Some users look only at the big board. For each user, Teleplace displays only the part of the scene that the user is looking at. When the user moves to some other position with a different point of view, Teleplace renders that new viewpoint. In our implementation, each user automatically subscribes to the data feed for each application in the space, but it automatically and instantly pauses and resumes their own feed as the app falls outside or inside of the user’s viewing frustrum. We run the application (e.g., on a server) as long as anyone is subscribed: in application time, it runs constantly. But the subscribers start and stop time for their view of the data: we pump data to a subscriber only when that subscriber is not paused. The user walks around – either by moving his avatar like a game or by clicking on predefined locations – and the system delivers only the traffic that is actually needed for that location.
There’s something very subtle here: the applications were not written to be in a three dimensional operations center. They were not meant to be shared among many users, or to be started and stopped by an ever-changing subset of those users. To do so would have made writing them hopelessly complex, and combining them into one working system would have been a nightmare if different programmers had taken different approaches. There are network systems to share such apps, such as VNC and RDP, and Teleplace makes use of them, but these do not have provisions for starting and stopping based on what is being rendered within an individual user’s 3D viewing frustrum. The network protocols do not have the full context to do it all. The individual applications do not have the full context do it all. Only the full application environment – the virtual operations center in Teleplace – has enough information to seamlessly pause and resume everything by something as simple as which way a given user is facing.
To more fully exploit this kind of magic, we try to break things into smaller pieces and let the overall Teleplace environment recombine them appropriately. Consider a movie playing on one of these display panels. A user may want to look away at something else and so no longer needs to get the video feed, but they still want the audio to continue. However, if the user walks over to a conference area, they no longer want the sound from the outside movie. Simply embedding a conventional movie player into the scene would not achieve that. Instead, we have to break the video and audio parts into separate feeds. Like the app and display protocols above, a simple video player just doesn’t have enough context to know when to pause or resume the audio separately from the video. The full Teleplace environment does.
Pausing and resuming data streams saves a lot of network and computer power, but even if it reduces such loads to a tenth, a naive implementation is still not going to be able to handle everything in our operations center or even simpler conference room scenarios. We know, because we tried. Our solution is based on starting and stopping the flow of data at a finer scale. One part of this is to stop and start dynamically based on the overall latency of the system. We have been doing this for Webcam framerates and the resolution of mouse movement updates since the first versions of the system. Key to this is that we are looking at overall latency from all traffic, and not just Webcam or mouse movement traffic.
We also interleave data in a special way. For example, audio must be sent in short bursts of data, because if we buffer too much, there will be delays in the conversation. But with very little buffering, we have no slack for transient network delays. It drives people absolutely nuts if they are listening to someone talk and then there’s a little gap or stutter in their voice while the network does a hiccup. There are techniques for alleviating this, such as dynamic jitter buffers that build up a longer time delay when the network is bad, yet deliver sound more immediately when we can get away with it during those times that the network speed doesn’t vary much. Teleplaces does that. But no matter how much you buffer, you can always get caught with no audio data to play. You can extrapolate the last sound a bit to stretch out the utterance, and if this is short enough and infrequent enough, it works pretty well. We do that, too.If audio is all you are doing, you can use protocols like UDP that may fill your pipe better. The problem, though, comes when we are also doing other things. Video, for example, is encoded in big image frames called keyframes, with smaller difference frames in between. Transmitting a keyframe is much like transmitting a JPEG image. We do not want people to have to wait for the next tiny bit of audio while waiting for a large keyframe image to download. Some streaming video implementations shift to a different image quality to meet changing network conditions, but that is not acceptable for people doing high-resolution work, and it doesn’t solve the problem of avoiding audio glitches. It actually makes the glitches worse as the system switches resolution. Real office networks have terrible variations. Wifi is worse. Mobile networks are still worse. How can we possibly deal with that?
Our answer is to cut the individualized slicing of time even finer. Instead of sending the large keyframe to the operating system socket for transmission, we break up the keyframe into smaller data packets and keep them in a list. We only give the head of that list to the operating system socket when the socket is ready for more data. When audio data is ready to be transmitted, we put that before the video data. Meanwhile, on the other end, each participant plays audio as soon as he gets it, even if we don’t yet have the corresponding video frame. People perceive discontinuities in time differently for sound and sight. We tolerate a tiny freeze or out-of-sync behavior in video, but not an audio glitch.
Teleplace manipulates time not just for the sight and sound of video, but for all data within our system. Audio, video imagery, application updates, inputs to applications, and general coordination activity all go into the same socket packet list, but all have different priorities. High priority data always goes through the pipe ahead of lower priority data. Note that this could not be done by someone using one audio program that transmits sound as fast as possible, and a separate program for screen sharing. There needs to be one application-level packet list for the prioritization to be meaningful, or else large data from one program can randomly squeeze out high-priority data from another.
This exploitation of application context for better network data flow is called the “End to End” principle. Before starting the work that became Croquet and Teleplace, David Reed had described the circumstances in which one can save overall time by moving protocols into the application end-points rather than relying on “optimizations” in individual lower level infrastructure components. Josh Gargus developed these ideas over several years, with heavy midwifing by Andreas Raab. Instead of having each user make a quadratic number of connections to each other user, each computer makes just one persistent connection per session to a distribution node at the server, acting and scales like a linear broadcast network. Each different piece of an application or media resource becomes a logical data stream carried over this common “dial-tone” of an overlay network, with all data streams feeding just one socket packet list. This gives us several other benefits:
- In addition to the coarse time magic (on and off) and the fine-grained time magic (sliced and prioritized), we also make best use of the network socket by combining small frames into larger ones when data is available, even if these are from different “applications” within the whole scene. Thus when possible each network packet is neither so large that it gets fragmented nor so small that it wastes frames. In yet another application of End to End, separate programs going to separate addresses could not do this, but we can.
- This combined traffic is encrypted at the connection, so we have a simple and verifiable security model. It works with or without a Virtual Private Network (VPN). The unified connection also facilitates proxy and firewall management compared with, e.g., asking a user’s IT department to open up different ports and protocols for each application to make its own connections.
- Within Teleplace, the queuing of app-tagged data in both directions allows us to set up separate logical endpoints for each different application data stream, and these can all operate without blocking on each other, giving the overall application much more efficiency.