Preprints & Reprints
Back to Preprints & Reprints > Publications & Opinion > Homepage


Video Conferencing, Telepresence & Reality
Peter Cochrane, Mike Mathews, Keith Cameron, David McCartney BT Research Laboratories, Martlesham Heath, Ipswich, England

Abstract
The present development of videoconferencing and video telephony is conditioned by the technologies of the past where bandwidth and distance are expensive and television forms the basis of the display techniques. We challenge this view on the basis that the planet is now dominated by optical fibre transmission, which is very low cost and is rapidly getting cheaper. We further propose that bandwidth should be used (wasted ?) to simplify terminal equipments and provide new display and interactive environments. In our view we should now turn our attention to humanising the interfaces and add more realism by using more bandwidth rather than continuing on the accepted trajectory of higher levels of compression and coding. We present a view of the facilities that could result from the development of current and emerging technologies .

Introduction
Throughout the history of the video telephone, videoconferencing and telepresence systems development, a single determination has prevailed. This can generally be classified as the "copper mindset" in which bandwidth and distance are assumed to be expensive commodities today and in the future. The majority of effort has therefore been expended in the areas of signal compression and coding for networks that are presumed to be of poor performance, restricted bandwidth, low utility and high price. In reality, the advances made in optical fibre transmission and wide band switching over the same period are largely negating these constraints and we can look forward to switched, wide band services at a low price before the end of the century. Extreme coding schemes that reduce full PAL TV 625-line systems (or similar) down to 2 Mbit/s or 384 kbit/s (or less) will then become unnecessary.

The focus of development effort on coding and signal compression has led to an almost total neglect of the user requirements and interface. Whilst the developers see the use of standard TV cameras and screens with straightforward acoustic coupling between locations as fit for purpose, the reality is, from a user point of view, they can generally be found very lacking and inadequate. From an engineering viewpoint, it is easy to become very enthusiastic about the advances made in code compression and about the detailed phenomena relating to screen displays. However, to the user, these issues can appear distracting and limiting in terms of effective communication. In this paper we approach the topic of videoconferencing and telepresence from the stance of the user and what is required, and then go on to elucidate what is actually possible with modern terminal technology and telecommunications based on optical fibre in contrast to the limitations of the rapidly-disappearing copper and radio networks.

What Is Wrong ?
Existing videoconferencing and video telephone technology presents us with images of another human of the wrong size, the wrong colour and that generally become blurred or jerky and distorted with movement. The images lack good synchronisation between speech and lip movement, have a voice that does not emanate from the lips but from some loudspeaker to the side, do not permit eye contact or body language and do not approach the illusion of "being there" from either the speaker or recipient's point of view. Moreover, in videoconferencing, these limitations are compounded by the need for more than one screen and the lack of any shared workspace. Also, everyone appears to look over each other's heads. All of this adds up to an unnatural and somewhat sterile workplace which is difficult to become acclimatised to if all of the people in a communications session have not been previously acquainted in real life.

When all of the above is compounded by relatively expensive equipment compared to that readily available on the domestic market for entertainment and the generally unavailable facility of a dial-up circuit under customer control with the consequent need to request service from a telco on each occasion, videoconferencing may be seen to fall somewhat short of users' expectations. Another disconcerting feature is the "self view" generally presented on an auxiliary screen to make sure that you sit square on, and in frame, with the static terminal camera.

The video telephone has even more problems than videoconferencing, with drastic signal compression, very small picture size and gross distortion in all respects. What is not clear is what the users will put up with. But it is perhaps not unreasonable to suppose that they may well expect to see at least a PAL TV 'living room standard' presentation !

The "Virtual" Conference Room
Studying human activity in a real (single location) conference facility reveals a number of important and imperative requirements to maximise the usefulness and chances of success of any meeting. In order to maximise the effectiveness of videoconferencing services, it would therefore appear sensible to mimic this environment as closely as possible to provide a facsimile representation - a "virtual" conference room, and, as far as possible, humanise the interfaces and workspace. Specifically the following developments seem both logical and necessary:-

Video Window
Video window experiments and developments have been conducted over the last 15 years and have now arrived at a point where all the components are available as commercial products. The principle of operation is to present human beings in real size representation on a high definition projection TV screen, as shown in Fig1. By suitably arranging furniture and decor, the illusion of a continuous room or meeting-place can be created. Moreover, using electronic processing and steerable microphones, it is possible to focus the acoustics on any one speaker and arrange for his voice to emanate from the appropriate part of the image. With the camera mounted in front of the screen, it is difficult to achieve the illusion of eye contact and ideally a behind-the-screen camera is required. One method of achieving this is shown in Fig 2 where a liquid crystal shutter of wall size is used to present a full size room image whilst at the same time allowing behind-the-screen cameras to operate unseen.

Electronic White Board
The white board or flip-chart is now an integral part of any actively-used conference room and a facility that is usually lacking, or poorly realised, in the teleconference environment. In an ideal system it would be a natural expectation for any of the members of the meeting, in any of the dispersed locations, to be able to walk over to a white board and start writing or drawing whilst the rest observe. This poses some significant difficulties in the videoconferencing environment. The answer to this problem is the electronic white board which allows people located at remote sites to interact as if they were sharing the same board (Fig 3). Writing and drawing on one electronic white board appear on all other boards to which it is linked. Erasures by any of the users of linked electronic white boards is also possible. In one implementation [1], an image of the human being at the distant end is superimposed on the screen to increase the sense of personal interaction.

Instant Fax
The ability to pass documents across the table to each member in a meeting is not only an established practice but an essential one for effective communication. In the tele-environment this may be realised in the manner shown in Fig 4 with a wide band fax able to relay details effectively across the table instantaneously. The bandwidth requirement for this facility is only 240kbit/s. A similar facility for business cards might also be anticipated and could similarly be realised in an instantaneous fax form requiring only 2.4kbit/s.

Computer Workspace
Whilst multimedia developments seek to place a small picture of the human being in a corner of the screen and have the PC screen dominated by computer data, a more realistic approach might be to reverse the process. What is actually required is real-size human beings and small size (actual computer screen size?) computer generated data. Obviously, switching between people and data is one solution but also the subliminal overlay of one above the other with an increasing density of representation for any person speaking would also be possible and would probably enhance the process. Alternatively, a third large screen (video window plus active white board) solely for the representation of computer data and diagrams could be realised as shown in Fig 5.

A further alternative for computer data display and manipulation involves the use of a large screen with a direct hand interface. Instead of a mouse or keyboard, the hand (or a wand) allows the user to activate and manipulate displayed data. The principal of operation is shown in Fig 6 in a rudimentary realisation. Further levels of sophistication include a direct written input to text using a stylus, with the prospect of direct voice activation within a very few years. Again, all parties in the diverse locations can share the workspace and participate.

Three Dimensional Presentation
3D imaging systems offer additional impact, extra information and an improvement in realism compared with conventional 2D systems. We are already seeing real time, 3D imaging systems finding uses in high value applications where it is essential to obtain depth information; particularly for defence applications, in subsea systems and in the remote handling and hazardous waste industries. As we move into the 21st Century we can expect these high value 3D services to permeate through into the CAD and CAM arena. Architects, car constructors and aeronautical engineers have all indicated a need to 'walkthrough' new designs of buildings, cars and aeroplanes to experience the novelty and to examine the design merits. 3D imaging systems offer an attractive solution. Further in the future we might anticipate the acceptance of 3D for applications in video-telephone and videoconferencing systems.

Previous attempts to produce commercial, cinema based, 3D Imaging systems in the l950s and 60s failed for a variety of reasons, one of which was the requirement to use spectacles. In recent years new 3D display techniques have evolved, including spectacles free (autostereoscopic) systems which are better suited to video-telephone and video-conferencing applications. One such technique is based on the use of a lenticular projection screen consisting of a linear array of cylindrical lenses which separate out the left- and right-eye views of the image. A large (2.5m wide) screen video system has been demonstrated using this system [2] which indicates its potential for application in videoconferencing.

Optical Illusions
An alternative approach to enhancing the realism of transmitted images is the use of optical illusion to cheat the eye. One such illusion is based on the simple optical systems shown in Fig 7. The image on a TV monitor is focussed by a large concave reflector to form an image in space which has a three dimensional quality. This quasi-3D technique has been commercially exploited by one manufacturer of arcade computer games but it may also find niche applications in visual telecommunications.

Holographic Projection
To date the most realistic 3D images have been of still scenes using either lenticular based or holographically based systems. Until recently the prospect of creating a moving holographic display has proved elusive. However, work at the Media Labs at the Massachusetts Institute of Technology (MIT) [3] has shown the feasibility of such an approach. The MIT work, still in the early stages of development, requires an acousto-optic crystal in which to create the hologram and a large computer and custom made hardware to produce the effect. Initial images were monochromatic and small ( 30 x30 x 50 mm) and were of wireframe models. Recent developments have been to extend to shaded images, in colour and of a larger size. The field of view remains a little limited, but this is an impressive piece of work and has demonstrated for the first time, that holographic video is possible. There is still a long way to go (possibly 20 years or more) before the technology becomes a realistic option for use in applications such as video-conferencing.

Telepresence
The desire of people "to be there" reaches its ultimate expression in the science fiction concept of teleportation in which people are moved between remote locations by the disassembly and reassembly of their bodies at the molecular level. Although true teleportation will remain the domain of science fiction for the foreseeable future, technology is already available to create a sense of "being there" - so called telepresence.

A sector of telepresence that is fundamentally simple but highly effective is based on mounting a miniature TV camera (or cameras for stereoscopic vision) on a conventional audio headset of the form shown in Fig 8.

A remote operator experiences the illusion of "being there" by effectively "sitting on the shoulder" of the human platform. This technology can be used to address the often experienced problem which is summed up by the comment "if only I could see what you can see - then I might be able to help you". In all manner of service, maintenance, engineering and other human activities, the need to "be there" is prevalent. The availability of micro-miniature TV cameras coupled with modern day telecommunication facilities now make this a feasible proposition , even for the most mundane of experiences and remoteness of location.

The addition of a head-up display and pointing system also allows the wearer and/or remote user to target particular objects with accuracy, while at the same time the remote viewer can send graphical indications and data in support of their joint vision and interaction. Voice communication is facilitated by the same headset to combine vision and dialogue. Potential applications of this technology include; remote maintenance, equipment installation, telemedicine, surveillance and news gathering.

Virtual Reality
Virtual Reality (VR) is a rapidly evolving technology in which the user, as in telepresence, experiences a sense "of being there", but this time in a computer generated virtual (synthetic) world rather than the real world.

The user is able to move within the virtual world and interact with it. The technology is expected to have applications in a wide range of activities including education and training , computer aided design and manufacturing (CAD-CAM), surgery and, most immediately, entertainment through the electronics games industry.

VR systems can be either desktop or immersive types. Desktop VR uses a conventional display monitor and the operator interacts with the computer generated images using a mouse or more sophisticated interface devices. Immersive VR requires the user to wear a headset in which miniature displays project slightly different views of the virtual world onto each eye, thereby creating a 3D image. Physical interaction with the computer generated images can be achieved using gloves or shadow systems that detect the movement of hands and fingers. More sophisticated means of physical interaction can be expected in the future, including full body sensing.

The use of an extended version of immersive VR, based on combining real images of people with computer generated images of a "virtual" conference room, may at at first sight appear to offer the ultimate form of videoconferencing. However, the requirement to use headsets will create an unnaturalness in human interactions, particularly because they inhibit eye contact, which may offset the advantages gained from the use of VR. More thought needs to be given to such issues, and to how VR technology is likely to evolve in the future, before the role of VR in videoconferencing can be fully assessed. However, suitably lightweight and miniature elements such as active spectacles and contact lenses are already being investigated and may be within five years of practical realisation.

Conclusion
As we move into the 21st Century todays videoconferencing and video telephones will look very quaint and functionally inadequate. New display and projection systems are expected to enable us to create a new sense of realism. Attaining the objective of feeling that "you are there" - teleported to a new location with humanistic interfaces and facilities should be our ultimate objective. The key benefits of these new technologies will include:

  • A global reduction in the need for physical transport and travel.
  • More efficient and reduced use of raw materials and energy.
  • More efficient operations for companies and organisations.
  • More efficient and effective work practices.
  • More productive and less stressful lifestyles.

    All of this is now possible by the removal of the capacity bottleneck inflicted by copper and radio infrastructure of transmission systems, which have already been overtaken and outmoded by optical fibre. For the most part all of the other technologies required are available or coming to fruition. Probably the most demanding requirement is the change of mind set required in the associated industries, regulators and governments.

    References
    [1] H Ishii and M Kobayashi, "ClearBoard: A seamless medium for shared drawing and conversation with eye contact," Proceedings of ACM SIGFCHI Conference on Human Factors in Computing Systems (CHI'92), May 1992.
    [2] R B?rner, "3D TV projection," Electronics and Power, June 1987.
    [3] The MIT Report Dec/Jan 1990/91