Tamara L. Berg
Currently At Yahoo! Research
Berkeley, CA

Starting as an Assistant Professor at
SUNY Stony Brook University - Fall 2008


Photos

Wedding Photos

  Research

My main research area is Digital Media, specifically focused on organizing large collections of images with associated text through the use of techniques from Natural Language Processing and Computer Vision. Today billions of images with associated text are available in web pages, captioned photographs from news sources, video with speech or closed captioning, and others. In order to organize, search and exploit these enormous collections we have developed methods that combine information from both the visual and textual sources effectively. Past projects include: automatically identifying people in news photographs, classifying images from the web, and finding iconic images in consumer photo collections. I am generally interested in bringing together people and expertise from various areas of Digital Media including digital art, music, and cultural geography.

I am currently working as a research scientist at Yahoo! Research where I am developing various digitial media related projects including the automatic annotation of consumer photographs. In Fall of 2008 I will start as an assistant professor at Stony Brook University and will be looking for excited, motivated graduate students.

-------------------------------------------------------------------------------------------------------------------------------------------------


  Projects
 

Faces In the Wild

We show that a large and realistic face dataset can be built from news photographs and their associated captions. This dataset is more realistic than usual face recognition datasets, because it contains faces captured ``in the wild'' in a variety of configurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. We obtain 44,773 faces from approximately half a million captioned news images. We then automatically link names, obtained using a named entity recognizer on the captions, with faces, obtained using a face detector on the images. Initially we use a simple clustering method and produce fair results. However, the context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We therefore improve our results significantly by linking the clustering process with a language model which learns the probability that an individual is depicted given its context within the caption. Once the training procedure is over, we have a large, accurately labeled set of 30,281 faces, an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation. We also produce a face dictionary of news photographs organized according to the people present and which can be searched by individual.
Demo: Face Dictionary
Dataset: Faces In the Wild
Dataset: Labeled Faces In the Wild
 

Animals On the Web

We have built a set of classifiers to recognize several animal categories: Alligator, Ant, Bear, Beaver, Dolphin, Frog, Giraffe, Leopard, Monkey and Penguin. Using, Google Web Search, we identify a pool of candidate images for a given query. These images are then re-ranked by our system using information extracted from both the surrounding text and the images themselves. This give us quite a good pool of images for each class. We also demonstrate that we can extend this pool of images quite easily using a set of related queries for the monkey class. We produce a startingly good set of results for complex web data.
Demo: Animals on the Web
Dataset: Animals on the Web Dataset

 

Ranking Iconic Images

We define an iconic image for an object category (e.g. eiffel tower) as an image with a large clearly delineated instance of the object in a characteristic aspect. We show that for a variety of objects such iconic images exist and argue that these are the images most relevant to that category. Given a large set of images noisily labeled with a common theme, say a Flickr tag, we show how to rank these images according to how well they represent a visual category. We also generate a binary segmentation for each image indicating roughly where the subject is located. The segmentation procedure is learned from data on a small set of iconic images from a few training categories and then applied to several other test categories. We rank the segmented test images according to shape and appearance similarity against a set of 5 hand-labeled images per category. We compute three rankings of the data: a random ranking of the images within the category, a ranking using similarity over the whole image, and a ranking using similarity applied only within the subject of the photograph. We then evaluate the rankings qualitatively and with a user study.
Demo: Ranked Iconic Images


 

-------------------------------------------------------------------------------------------------------------------------------------------------

Refereed Publications

-------------------------------------------------------------------------------------------------------------------------------------------------

  Technical Reports and Theses

-------------------------------------------------------------------------------------------------------------------------------------------------