Chetan Verma


Department of Electrical and Computer Engineering
University of California, San Diego
Note: My website has moved. Please visit the updated page: Here



About Me Projects Publications Resume Links

Key research projects:

1. Modeling file access patterns

The data which knowledge workers need to conduct their work is being stored across an increasing number of repositories and is growing significantly in size. It is therefore unreasonable to expect that knowledge workers can efficiently search and identify what they need across a myriad of locations where upwards of hundreds of thousands of items can be created daily. This work describes a system which can observe user activity and train models to predict which items a user will access, in order to help knowledge workers discover content. We specifically investigate network file systems and determine how well we can predict future access to newly created or modified content. Utilizing file metadata to construct access prediction models, we show how the performance of these models can be improved for shares demonstrating high collaboration among its users. Experiments on ten enterprise shares reveal that models based on file metadata can achieve F scores upwards of 99%. Furthermore, on an average, collaboration aware models can correctly predict nearly half of new file accesses by users while ensuring a precision of 75%, thus validating that the proposed system can be utilized to help knowledge workers discover new or modified content
The work is done at and in collaboration with Symantec Research Labs, Mountain View.

Collaborators:
Sandeep Bhatkar, Michael Hart, Aleatha Parker-Wood, Sujit Dey

The work is accepted as a full paper in ICEIS 2015 conference (acceptance rate ~ 15%). An extended version is under preparation. Please see Publications section for details.


2. Learning structures between tag annotations

As humans, we understand the relations between Soccer and Ronaldo, between Germany and Berlin. How to make machines understand and use such relations? Taxonomies and Ontologies have been studied to represent relations between different entities. WordNet is a popular lexical ontology, constructed manually by experts. While such a graph captures semantic relations between tags, it fails to encode the information present in a given corpus of images pertaining to interaction between the tags. In this project, we study a data-driven approach for the construction of an ontological graph for a set of image tags obtained from a large corpus of images, where each image in the corpus is annotated with zero or more tags. With certain simplifying assumptions to help in the construction, we formulate the graph construction as an optimization problem and provide an approximate solution.

Evaluation of Ontologies or Taxonomies is often a difficult task. While most research focusses on manual evaluation or comparison with a manually built gold standard ontology, we propose evaluation of the ontological graphs based on novel data driven tasks that asses how well the tree structures capture tag statistics in images.

This work was done in collaboration with Yahoo Labs, Bangalore.

Collaborators:
Vijay Mahadevan, Nikhil Rasiwasia, Gaurav Aggarwal, Ravi Kant, Alejandro Jaimes, Sujit Dey

The work is presented as a poster paper in WWW 2014 conference (acceptance rate = 29.4%). An extended version is under review. Please see Publications section for details.


3. Obtaining training videos for arbitrary categories

Personalization applications such as content recommendations, product recommendations and advertisements, and social network related recommendations, can be quite beneficial for both, service providers and users. Such applications need to understand user preferences in order to provide customized services. As user engagement with web videos has grown significantly, understanding user preferences based on watched videos looks promising. However, the above requires being able to classify web videos into a set of categories appropriate for the personalization application. Such categories may be substantially different from the common categories (such as Comedy, Entertainment, Pets etc.) that are used by video sharing websites. Hence, training videos for classifying web videos into required set of categories, that are appropriate to the personalization application, might be unavailable.

In this project, we study the feasibility and effectiveness of a fully automated framework to obtain training videos to enable classification of web videos to any arbitrary set of categories, as desired by the personalization application. We investigate the desired properties in training data that can lead to high performance of the trained classification models. We then develop an approach to identify and score keywords based on their suitability to retrieve training videos with the desired properties, for the specified set of categories. Experimental results using YouTube videos indicate feasibility of the proposed approach to obtain high classification performance. Comparisons with retrieving training videos using category names reveal that our approach performs significantly better.

More information on this work is available at the Project Webpage.
The work is presented in WI 2013 conference (acceptance rate = 25.3%). An extended version is under preparation. Please see Publications section for details.



Previous projects:

1. Tweet extraction and processing for recommendation system
In order to supplement movie/TV series recommendation systems, Twitter was explored as a source to learn public opinion. This work was done during internship with Samsung Labs.

2. Wireless mesh networks
In this work, we explored how addition of few short-cut links in a wireless mesh network make it behave as a small world network, through reduction in average path length. This work was done in initial quarters at UCSD. See older publications for related papers.


More details on the projects are available in my resume.