Biz & IT —

Hands on: building an HTML5 photo booth with Chrome’s new webcam API

Support for WebRTC, a new Web standard for audio and video conferencing, has …

Experimental support for WebRTC has landed in the Chrome developer channel. The feature is available for testing when users launch the browser with the --enable-media-stream flag. We did some hands-on testing and used some of the new JavaScript APIs to make an HTML5 photo booth.

WebRTC is a proposed set of Web standards for real-time communication. It is intended to eventually enable native standards-based audio and video conferencing in Web applications. It is based on technology that Google obtained in its 2010 acquisition of Global IP Solutions and subsequently released under a permissive open source software license.

Implementations of WebRTC will consist of two parts: a wire protocol for network communication and an assortment of JavaScript APIs that will allow the WebRTC functionality to be used in Web applications. The WebRTC JavaScript API proposal is being drafted by the W3C Web Real-Time Communications Working Group, building on a specification was originally written by Google's Ian Hickson. The underlying network protocol standard is being drafted separately through IETF.

One of the key features defined in the WebRTC specification is the MediaStream object, a generic JavaScript interface for interacting with live audio and video streams. This functionality can be used for a broad number of potential applications beyond audio and video conferencing.

As we reported on Thursday, Mozilla is drafting a specification called MediaStream Processing that defines JavaScript APIs for real-time programmatic manipulation of MediaStream instances. Mozilla's proposed standard would make it possible for Web developers to use MediaStream in the browser for tasks like audio mixing and motion detection on live video. It's important to note that MediaStream Processing is a separate standard from WebRTC, though it relies on the JavaScript APIs that are defined in the WebRTC specification.

In order to facilitate audio and video conferencing, the WebRT JavaScript APIs have to provide a mechanism through which Web applications can access the end user's webcam and microphone. The specification defines a function called getUserMedia, which does precisely that. If the relevant hardware is present and available for use, getUserMedia will trigger a callback function and pass along a MediaStream instance that mediates live access to a stream from a webcam or microphone.

The getUserMedia feature is especially significant, partly because such functionality was previously only available through proprietary browser plug-ins. Used in conjunction with MediaStream Processing, the ability to take a live MediaStream from a webcam offers some compelling opportunities. As we wrote in our coverage of MediaStream Processing on Thursday, one example is that it will allow Web developers to build standards-based augmented reality experiences that run entirely within the browser.

The getUserMedia function is among the WebRTC features that are now available in the Chrome developer channel when the browser is launched with the --enable-media-stream flag. We started by throwing together a really simple demo so that we could see how it works in action:

<html>
  <head>
    <title>HTML5 Webcam Test</title>
  </head>
  <body>

    <h2>The Thing cannot be described&mdash;there is no
    language for such abysms of shrieking and immemorial
    lunacy, such eldritch contradictions of all matter,
    force, and cosmic order</h2>

    <video id="live" autoplay></video>
    <script type="text/javascript">
      video = document.getElementById("live")

      navigator.webkitGetUserMedia("video",
          function(stream) {
            video.src = window.webkitURL.createObjectURL(stream)
          },
          function(err) {
            console.log("Unable to get video stream!")
          }
      )
    </script>
  </body>
</html>

The getUserMedia function takes three parameters. The first parameter is a string that is used to indicate whether audio or video is desired. In this case, we specify "video" so that we can access the user's webcam. The second parameter is a callback function that is invoked when the function successfully obtains the webcam stream. The third parameter is a callback function that is invoked upon failure.

The success callback is passed one parameter, a MediaStream instance that provides access to a live video stream from the user's webcam. In the callback function, we call createObjectURL to create a Blob URL for the stream. When we set the blob URL as the video element's source, it will display the contents of the webcam MediaStream in real time.

The getUserMedia function is intended to have a security prompt that asks users for permission before making the webcam accessible to a Web application. This prompt will likely be similar to the one that the browser already uses when a Web application calls upon the standard geolocation APIs to request the user's position.

The getUserMedia security prompt has not been implemented yet in Chrome, so the browser provides immediate webcam access without user intervention. This security weakness will almost certainly be remedied before Google makes the feature available without a launch flag. For now, remember to use caution when browsing with the flag enabled. (For an overview of WebRTC security considerations, you can refer to this IETF slide deck)

I tested the example above in Chrome Canary on a 2011 MacBook Air connected to a Thunderbolt display. The browser was able to pick up a live video stream from the webcam that is built into the monitor. It works exactly as expected, though it was a bit CPU-intensive. As you can see in the screenshot, I enlisted Cthulhu's help to test the demo.

Displaying live video from a webcam is a good starting point, but it's hardly enough for a good demo. I decided to go a step further and expand it into a simple photo booth that can capture snapshots when the user clicks a link. It accomplishes this by painting a single frame of the video to a hidden canvas element and then extracting the image data, which is then plopped into a new image element that is appended to the film roll at the bottom of the page.

<html>
  <head>
    <title>HTML5 Photo Booth</title>
  </head>
  <body>
    <h2>HTML5 Photo Booth</h2>

    <video id="live" autoplay></video>
    <canvas id="snapshot" style="display:none"></canvas>

    <p><a href="#" onclick="snap()">Take a picture!</a></p>
    <div id="filmroll"></div>

    <script type="text/javascript">
      video = document.getElementById("live")

      navigator.webkitGetUserMedia("video",
          function(stream) {
            video.src = window.webkitURL.createObjectURL(stream)
          },
          function(err) {
            console.log("Unable to get video stream!")
          }
      )

      function snap() {
        live = document.getElementById("live")
        snapshot = document.getElementById("snapshot")
        filmroll = document.getElementById("filmroll")

        // Make the canvas the same size as the live video
        snapshot.width = live.clientWidth
        snapshot.height = live.clientHeight

        // Draw a frame of the live video onto the canvas
        c = snapshot.getContext("2d")
        c.drawImage(live, 0, 0, snapshot.width, snapshot.height)

        // Create an image element with the canvas image data
        img = document.createElement("img")
        img.src = snapshot.toDataURL("image/png")
        img.style.padding = 5
        img.width = snapshot.width / 2
        img.height = snapshot.height / 2

        // Add the new image to the film roll
        filmroll.appendChild(img)
      }
    </script>
  </body>
</html>

The WebRTC standard is still evolving, so the API will likely undergo changes before it is finalized. The Chrome developer channel offers a great test environment for Web developers who want to start experimenting with MediaStream functionality. Opera also has a custom test build available with getUserMedia enabled.

Mozilla is working to add support for WebRTC to Firefox. As we demonstrated yesterday, they have basic MediaStream support implemented. They do not, however, have support for getUserMedia yet. It's worth noting that Mozilla is also developing an independent Camera API standard specifically for capturing from webcams and built-in cameras on mobile devices.

Ericsson Labs has also been doing a lot of work with WebRTC. They have a fairly sophisticated implementation that is built on top of WebKitGtk+, the WebKit port that is used by the GNOME desktop environment and many popular Gtk+ applications on Linux. Ericsson's WebRTC-enabled version of WebKitGtk+ can be used with GNOME's Epiphany Web browser to test WebRTC capabilities on Linux. You can see it running a full-blown, browser-based video conferencing demo on Ubuntu in this video.

WebRTC is clearly on track to deliver interactive browser-based audio and video conferencing with Web standards. Popular tools like WebEx, Google+ Hangouts, and Facebook video chat could all eventually be rebuilt to run natively in the browser without requiring plug-ins. Even more compelling is the prospect of having WebRTC and MediaStream processing available in mobile Web browsers. Imagine being able to have the kind of functionality that you get in Instagram, Layar, and Facetime available in mobile Web application.

Channel Ars Technica