Recent advances in communication and networking, web services, semiconductor devices as well as signal processing have led to a proliferation of outlets for delivering rich multimedia content, including the Internet, mobile broadband networks, and Cable/Satellite/IPTV. This has encouraged the creation and sharing of a growing amount of user-generated media content through the web (e.g., YouTube) that provides an enriched, participatory multimedia experience in keeping with current trends like social networking. At the same time, there are also efforts to leverage the new delivery systems like web-video for professionally produced content (e.g., Hulu, or TV channels' websites), which have elicited a huge response from the users due to their convenience. We observe a trend of growing demand for more user control, accompanied by a continuing appetite for professional quality content.
Compared to earlier video broadcasting systems, there have already been an increase in user control in the multimedia presentation systems of today; common features like pausing, forward/backward scrolling, etc are available not only for videos on the web but also in Cable and Satellite media through Digital Video Recorder functionality. To look at the possibilities of going beyond such rudimentary levels of interactivity, we need to reconsider the video production and presentation system in use today that produces videos containing a single linear sequence of scenes or segments. In this method of presentation, multiple threads of events that may be happening in parallel in real life (e.g. in live shows, sports, reality television) have to be shown in a single sequence to the viewer, requiring the director to often make compromises in quality or the amount of content. There have been a few extensions made to media presentation and delivery systems to accommodate parallel events, e.g., PiP (picture-in-picture) or multiple camera angles. However, these have only very specific applications and they offer too little granularity of control.
In our research, we propose to develop new techniques for capturing, delivering and presenting concurrent content that will allow the users to experience multimedia content in a much more realistic and participatory manner. We will also develop techniques to automatically identify concurrency in already published content. We believe the media viewing experience can be significantly enriched by allowing the user to control, for example, which subset among five parallel events to focus on, how views of parallel events can be interleaved, or choose which parts of the original footage to watch based on prior knowledge of their content/emphasis/events. We will look at the various algorithmic and systems level challenges that need to be resolved at content production, delivery and presentation stages in order to enable these capabilities. Some of these challenges include:
- Content annotation and semantic analysis for new, concurrently published content as well as for legacy content, to preserve and expose the multi-threaded character of the original content and inter-segment dependencies.
- Dynamic composition and presentation of personalized content based on real-time user inputs and preset preferences.
- Delivering the content over heterogeneous access networks with varying degrees of multicast and broadcasting support.
We envision the results of our work to be applicable to a wide variety of use cases. A few examples are:
- Personalized views of multi-threaded content and events, with fine-grained user control that satisfy semantic constraints specified by the producer.
- Personalized views of produced TV programs, including relevant content from archives.
- Multiple personalized versions for the same viewer, customized for specific viewing contexts (home, mobile, etc) and programs.
- User-controlled personalized live coverage.