Preprints & Reprints
Back to Preprints & Reprints > Publications & Opinion > Homepage
TELEPRESENCE - VISUAL TELECOMMUNICATIONS INTO THE NEXT CENTURY
Peter Cochrane, David J T Heatley, Keith H Cameron
1. Introduction
Whilst there have been dramatic changes in telecommunications transmission and switching technology, the telephone service has seen little change since it was first introduced almost 150 years ago. However, there is a latent human desire to enhance this basic service with vision and other senses, as is evidenced by the growth in teleconferencing, the introduction of video telephones, and multi-media. Such developments are becoming increasingly focused by environmental and economic pressures as the base technologies are now becoming available to meet the challenge of "being there" via telepresence. In concert with this is the migration towards all-optical transmission and networking which will provide the level of bandwidth transparency and routing flexibility necessary to facilitate these services.
In this paper we describe a selection of telepresence systems and services that are currently being investigated and developed. Each can be fully realised with present day technology, and most can operate, albeit at a reduced service level, over the existing network. It is not within the scope of this paper to present detailed descriptions of the practicalities of these services, nor the demands they will place on networks, but to draw attention to their potential and likely importance in future telecommunications.
2. CamNet - Mobile Telepresence
A form of telepresence that is fundamentally simple to realise utilises a miniature video camera mounted on a conventional audio headset of the form shown in Fig 1. The remote operator experiences the illusion of "being there" by effectively "sitting on the shoulder" of the human platform. This is the concept behind CamNet, the realisation of telepresence through the merging of Camera and Network technologies [1]. The addition of a laser pointer or a cursor linked to a head-up viewfinder/display enables the wearer and/or remote user to point to particular objects with accuracy, whilst graphical illustrations and data can be displayed in support of their joint interaction. By mounting twin cameras on the headset and providing the remote user with a form of "virtual reality" headset incorporating two video displays mounted close to the eyes, stereoscopic telepresence is also possible [2,3]. In all manner of service, maintenance, engineering and other human activities, the need to "be there" is prevalent. The availability of miniature video cameras coupled with modern day telecommunication facilities now make this a feasible proposition, even for the most mundane of experiences and remoteness of location.
2.1 Some Applications
Equipment Maintenance/Installation: During the installation and maintenance of modern equipment, a high level of communication between the remote site and supplier is generally required, together with frequent visits by experts to view the situation first hand. Much of this can be avoided by CamNet by allowing the supplier/designer to "be" at the site at all times without leaving the factory, office or laboratory (Fig 2). In addition, this presents numerous opportunities for cost savings in for example on-site training, and does away with the need for extensive technical documentation to be stored at remote sites, while at the same time affording instant updates of all information in real time. Most importantly, CamNet provides the field personnel, not just with a remotely accessible data base, but perhaps more importantly with the assistance and confidence derived from interacting directly with a more knowledgeable person.
Mobile News-gathering: A mobile TV news team typically has three crew members - the camera-man, the sound-man, and the interviewer. CamNet can reduce the crew size to one (Fig 3) and thereby significantly reduce operational costs whilst increasing overall mobility.
Telemedicine: Throughout the Western world the cost of health care is increasing rapidly, while elsewhere the ratio of patients to medical practitioners is becoming alarmingly high. CamNet could be used to transport the "presence" of the MD, surgeon and consultant to rural surgeries, hospital operating theatres or the scenes of accidents to assist paramedics. This would facilitate the more effective use of an increasingly limited resource - the medical specialist. A logical extension would be to support the joint viewing of, for example, X-rays, CAT scans and views from endoscopes [4] (Fig 4).
Remote Surveillance: Security forces throughout the world are increasingly resorting to video recording as a means of crime certification and prevention. Telepresence via CamNet is one of the logical developments in this trend, with each security officer linked visually and orally at all times to HQ. This is merely an extension of the mobile communicators currently in use, but with the added advantage of the telepresence equipment being active without the knowledge of anyone other than the officer and HQ. The gathering of evidence and summoning-up of assistance could therefore become both automatic and covert.
2.2 Some Practical Considerations
Lightweight: It is essential that any head-mounted equipment (i.e., camera(s), display(s), laser(s)) fit securely on the wearer's head, while being comfortable and of minimal weight. For example, a camera with automatic focus and aperture would be the preferred option but would intrinsically be bulkier than one with manually adjustable or even fixed settings. However trials have shown that the latter option can produce acceptable images under most operational conditions.
Head Wobble: Whilst our own perception when viewing an object is that of stability, the reality is that our body and head do move, and this becomes visible (and possibly disturbing) to the remote operator when viewing certain images. This effect can be minimised with the electronic frame-by-frame processing techniques used in modern camcorders.
Illumination: Certain environments require the wearer to move from high to low lighting situations, and vice versa. Cameras with automatic aperture adjustment can cope with reasonable changes in lighting levels, but further development may be needed to meet the extremes encountered in practice and to further optimise the performance. The addition of a simple video light to the headset could be used to locally illuminate objects in very low light level conditions.
Pointing accuracy: Using the video display on the headset to give the wearer a camera-eye view would ensure that the remote operator's field of view is optimum at all times.
Tether-free operation: In many applications the presence of an umbilical link from the headset to floor/ground mounted transmission equipment, while affording a secure connection, is a considerable hindrance. Radio and microwave afford the desired mobility, but are unlikely to support the bandwidth required by CamNet without resorting to data compression, which in turn adds undesirably to the bulk of the equipment. Optical wireless is emerging as an attractive option [5], affording secure broadband mobility at a low cost, however, its propagation susceptibilities may limit its use to indoor environments such as open plan offices, warehouses, hospital wards, and such like.
Linking to the network: In the near future the ISDN network will be used to convey the CamNet service. However, in order to achieve a good quality moving image over this network, suitable video codecs will be required, such as those utilised in the BT videophone (type VC7000). Such equipment uses an ISDN-2 link (2 x 64 kbit/s) which can be configured to accept data transfer and/or high definition still frame capture. For some applications, particularly in remote locations, connection to the ISDN network may be difficult, in which case radio links or portable satellite communication equipment could provide access by an alternative route.
3. Telepresence in the Office
For economic, environmental and operational reasons there is mounting pressure on businesses to more fully exploit modern telecommunications. For instance, it is estimated that the cost of business travel from the UK alone annually exceeds ?1 billion [6]. It may be possible to reduce some of this travel by making more frequent and widespread use of teleconferencing. However, studies have shown that the effectiveness of such interactions can be severely compromised by the unnaturalness of today's teleconferencing environment. For example: the images of the remote participants are not life size and can be of a poor visual quality; the voices of the remote participants do not emanate from their image; eye contact and body language are absent because of the placement and field of view of the cameras and displays. This will significantly reduce the effectiveness of video conferencing as studies have shown that, depending upon the information content of the conversation, up to 60% of the interactions are non-verbal. To remedy this, teleconferencing technology is required which can closely mimic the real conference room, as well as humanise the interfaces and workspace through the use of telepresence. The following example developments seem both logical and necessary in this context.
3.1 Video Wall
Two remote offices can be "merged" in a life-like manner by arranging for one wall in each office to be a high definition display, giving a life-size "window" into the other office (Fig 5). By suitably arranging furniture and decor, the merge can be rendered apparently seamless thereby creating the illusion of a continuous room. To add to the illusion, electronic processing could focus the acoustics on any one speaker and arrange for his voice to emanate from the appropriate part of the display. Eye contact would be facilitated by placing the camera(s) behind the screen in each office, the screen itself being a liquid crystal shutter which rapidly alternates between its display mode, when it presents a room-size image, and its transparent mode during which the behind-the-screen camera can see into the room.
3.2 Electronic White Board
It is natural during a meeting to be able to walk over to a white board and start writing or drawing while the others present observe. In the context of teleconference an electronic white board (Fig 6) can be used to allow people in each of the remote offices to interact as if they were sharing the same physical space. Writing and drawings produced on one electronic white board instantly appear on all the other boards to which it is linked. Similarly, erasures made on one board are replicated on the others. The white board itself could be a dedicated installation, as is the case in normal meeting rooms. In another implementation it could occupy a region of the video wall, thereby allowing the images of the people in the remote offices to be superimposed to aid interaction. The presence and placement of the white board within the video wall would be under the full control of the local participants.
3.3 Instant fax
The ability to pass documents across the table to each member in a meeting is an established practice and an essential part of effective communication. In teleconferencing this may be realised with a high speed fax machine and/or document scanners linked to paper-sized desk mounted displays (Fig 7), each able to relay details across the "boundary" of the table instantaneously. This facility could also be used to perform the time-honoured ritual of exchanging business cards. The transmission capacity required by such equipment might typically be only 240kbit/s. Indeed, a reduced but still highly acceptable rendition of this service could be achieved over today's ISDN-2 links.
4. Telepresence at the Desk
Technology has made the modern desk a battle zone of computer hardware, cabling, telephones, and the like, none of which work easily with one another because they have their own proprietary interfaces. The wiring alone makes reconfiguration laborious, whilst the integration of diverse software and hardware is becoming increasingly more difficult. One solution to this is an active desk incorporating, for example: ergonomically built-in multi-function displays and input/output devices; new forms of interfacing such as "hands in the screen" and "eye plus voice tracking"; full linkage to other desks in the same office or in remote offices via an all-optical backplane.
Optical wireless lends itself to un-tethering items on the desk such as the computer keyboard and mouse, and the telephone. Furthermore, inductive loops below the surface of the desk could charge anything placed on its surface. For example, a lap-top computer or active organiser placed on the desk could be trickle charged at the same time as communicating with the desk.
4.1 Teleconferencing
Teleconferencing is one of the uses that the integral display on the desk (Fig 8) could be put to, producing a life-size head-and-shoulders image of the remote user. The large size of the screen ensures that peripheral vision is substantially filled, thus creating the illusion of "being there", but now on a one-on-one basis, in contrast to the group-on-group nature of the video wall described earlier. As in the latter, eye contact is again facilitated by arranging a camera to look through the screen from behind, achieved either via back projection (Fig 9) or by operating the LCD screen as a shutter.
The high definition nature of the display allows it to be multi-tasked during teleconferencing as the main viewer and a computer monitor. By using an infra red light pen, the screen can also function as an electronic white board, allowing multiple participants to interact in the same media-space in real time. People sitting at desks thousands of miles apart can thus come together to realise a real time working environment that closely mimics reality.
4.2 Hands In The Screen
An overhead camera scanning the desk's surface to produce a positional image of the user's hand, or miniature inductive sensors on the finger tips (as part of a 3-dimensional positioning system) can be used to realise an viable "hands-in-the-screen" interface (Fig 10). Hand motions can then used to manipulate objects on the screen and control the operations of the computer, thereby removing the need in this application of conventional keyboard or mouse control. Such a facility lends itself to, for example, the modelling and manipulation of data and virtual objects. These can be placed in the virtual medium viewed through the screen and directed by a combination of voice and hands. Once again the large screen size creates the illusion of personally being in the virtual medium together with the objects being manipulated.
To further enhance the lifelike and intuitive nature of this interface, a 3-dimensional rendition can be introduced to add depth of vision, reality, and personality to the environment. Objects can be humanised to react emotionally and give heuristic guidance during interactive sessions with movement, stance, colour and/or audio to convey reactions. For example, icons try to avoid your hand if the action on them being attempted is questionable, or they become defensive if you are about to initiate a damaging action.
5. Virtual Reality
Virtual Reality (VR) is a rapidly evolving technology in which the user experiences the "being there? essence of telepresence, but this time in a computer generated virtual world rather than the real world [7]. The user is able to move within the virtual world and interact with it. The technology is expected to have applications in a wide range of activities including education and training, computer aided design and manufacturing (CAD-CAM), surgery, and most immediately entertainment through the games industry. VR systems can be either desktop or immersive. Desktop VR uses a conventional display monitor and the operator interacts with the computer generated images using a mouse or more sophisticated interface devices such as motion/position sensing gloves. Immersive VR requires the user to wear a headset in which miniature displays project slightly different views of the virtual world onto each eye, thereby creating a 3D image with little if any unfilled peripheral vision. Physical interaction with the computer generated images can be achieved using sensor gloves or shadow systems that use video cameras to "see" the movement of hands and fingers. More sophisticated means of physical interaction can be expected in the future, including full body sensing.
The use of an extended version of immersive VR, based on combining real images of people with computer generated images of a ?virtual? conference room, may at first sight appear to offer the ultimate form of teleconferencing. However, the requirement to use headsets will create an un-naturalness in human interactions, particularly because they inhibit eye contact, which in turn may offset the advantages gained from the use of VR. Nevertheless, suitably lightweight and miniature elements such as active spectacles are already well developed and other technologies such as active contact lenses are being investigated.
6. Three Dimensional Presentation
3D imaging systems may be used to improve upon the "being there" illusion that conventional 2D systems are capable of producing. We are already seeing real time 3D imaging systems finding uses in high value applications where it is essential to obtain information on depth or distance, particularly for defence applications, undersea systems and in the remote handling and hazardous waste industries. As we move into the 21st century we can expect high value 3D services to permeate through into the CAD and CAM arena. Architects, car manufacturers and aeronautical engineers, to cite but a few, have all indicated a need to "walk through" their new designs, to experience their novelty first hand and to examine the design merits up close. 3D imaging systems offer an attractive means to achieving this. We might also anticipate the acceptance of 3D imaging for video-telephone and teleconferencing systems.
6.1 3D Past and Present
Previous attempts to produce commercial, cinema based, 3D imaging systems in the 1950's and 60's failed for a variety of reasons, one of which was the requirement to use special spectacles. In recent years new 3D display techniques have evolved [8], including spectacle-free (auto-stereoscopic) systems which are better suited to video-telephone and teleconferencing applications. One such technique is based on the use of a lenticular projection screen, consisting of a linear array of cylindrical lenses which separate out the left and right-eye views of the image. A large (2.5m wide) video screen has been demonstrated using this system, which gives a clear sign of its potential for teleconferencing applications.
6.2 Optical Illusions
An alternative approach to enhancing the realism of transmitted images is the use of optical illusions to fool the eye. One such illusion is achieved with a TV monitor focused by a large concave reflector to form an image in space which has a three dimensional quality. This quasi-3D technique has been commercially exploited in arcade games but it may also find niche applications in information technology.
6.3 Holographic Projection
To date the most realistic 3D images have been of still scenes using either lenticular based or holographically based display systems. Until recently the prospect of creating a moving holographic display had proved elusive. However, work at the MIT Media Labs has now demonstrated the feasibility of such an approach [9]. They use an acousto-optic crystal to create the hologram, and a large computer linked to custom made hardware to produce the effect. Initial images were monochromatic and small ( around 30 x 30 x 50 mm) and merely showed wireframe models. Recent developments have extended this to shaded images, in colour and in a larger size. The field of view remains limited, but this is nevertheless an impressive piece of work and has demonstrated for the first time that holographic video is possible. However, there is still a long way to go (possibly 20 years or more) before the technology becomes a realistic every-day option for use in applications such as teleconferencing.
6.4 3D Fax
The creation of 3D objects at a distance is also now feasible through the use of laser based machining and/or molecular deposition. In the first technique an object is scanned in 3D by a laser beam and the reflected beam encoded into space coordinates. After transmission over a telecommunication link, the decoded coordinates are used to guide a laser cutter. Raw material is thus shaped at a distance to produce a complete facsimile in 3D. In the second case the artefact is reproduced in a chemical bath where coincident UV light sources cure the fluid, layer by molecular layer. At present both of these techniques are at an R&D stage and suffer long time delays in completing relatively simple shapes - say 3 hours for a cup and 6 hours for a jet engine turbine blade. However, given time for refinement, such techniques could have important implications for "just-in-time manufacturing" and "customised retail".
In order for this type of service to be viable it is necessary that the artefact replicated at the remote end have as many of the physical attributes as the original as possible. For instance, the ability to be able to pick it up and handle it is clearly essential. In certain situations it may also be necessary to replicate the precise dimensions, stress and thermal properties of the original.
7. Conclusions
Most of the enabling technology required in the user equipments discussed in this paper is available today. This, coupled with the bandwidth transparency offered by optical networking, will bring about the realisation of radically new telecommunication services in the genre' of telepresence. However, to ensure success and maximum utility, these services must realise humanised rather than artificial interfaces. This paper has drawn attention to some novel realisations of these services that meet this objective, together with operational limitations and difficulties yet to be overcome.
8. Acknowledgements.
The authors wish to thank Kim Fisher and Rob Taylor-Hendry for producing the diagrams used in this paper.
9. References
1) Cochrane, P., et al., "CamNet - The first Telepresence system", INTERLINK-2000 Journal, Vol. 1, N? 4, August 1992, pp. 38-41.
2) Hirose, M., "RWC related technologies (1), Virtual Reality", Japan Computer Quarterly, N? 89, 1992, pp. 38-44.
3) Miller, S.K., "Video goggles give second sight to ageing eyes", New Scientist, 23rd May 1992, p. 17.
4) Williams, C.B., Baillie, J., Gillies, D.F., Borislow, D. and Cotton, P.B., "Teaching gastrointestinal endoscopy by computer simulation: a prototype for colonoscopy and ERCP", Gastrointestinal Endoscopy, Vol. 36, 1990, pp. 49-54.
5) McCullagh, M.J., et al., "Optical Wireless LAN's: Applications and systems", IEE Colloquium on "Cordless Computing - Systems and User Experience, Digest N? 1993/003, London, 12th January 1993, Paper 8.
6) Annual Abstract of Statistics 1991, Central Statistic Office, HMSO.
7) Kruger, M.W., Artificial Reality II, Addison Wesley, 1991.
8) Borner, R., "3D TV projection", IEE Electronics and Power, June 1987.
9) The MIT Report, Dec/Jan 1990/91.