Comparative Exploration of Document Collections:
a Visual Analytics Approach

NEWS: HONORABLE MENTION @ EuroVis 2014

NEWS: SOURCE CODE available

D. Oelke, H. Strobelt, C. Rohrdantz, I. Gurevych, and O. Deussen

Abstract

We present an analysis and visualization method for computing what distinguishes a given document collection from others. We determine topics that discriminate a subset of collections from the remaining ones by applying probabilistic topic modeling and subsequently approximating the two relevant criteria distinctiveness and characteristicness algorithmically through a set of heuristics. Furthermore, we suggest a novel visualization method called DiTop-View, in which topics are represented by glyphs (topic coins) that are arranged on a 2D plane. Topic coins are designed to encode all information necessary for performing comparative analyses such as the class membership of a topic, its most probable terms and the discriminative relations. We evaluate our topic analysis using statistical measures and a small user experiment and present an expert case study with researchers from political sciences analyzing two real-world datasets.

 

BibTEX

    @article {Oelke2014,
    author = {Oelke, Daniela and Strobelt, Hendrik and  Rohrdantz, Christian and Gurevych, Iryna and Deussen, Oliver},
    title = {Comparative Exploration of Document Collections: a Visual Analytics Approach},
    journal = {Computer Graphics Forum},
    volume = {33},
    number = {3},
    publisher = {Blackwell Publishing Ltd},
    issn = {1467-8659},
    url = {http://dx.doi.org/10.1111/cgf.12376},
    doi = {10.1111/cgf.12376},
    pages = {201--210},
    year = {2014},
    }