Individuals and organizations must cope with massive amounts of unstructured text information: individuals sifting through a lifetime of e-mail and documents, journalists understanding the activities of government organizations, companies reacting to what people say about them online, or scholars making sense of digitized documents from the ancient world.
Our project’s research goal is to bring together two previously disconnected components of how users understand this deluge of data: algorithms to sift through the data and interfaces to communicate the results of the algorithms. Our project allows users to provide feedback to algorithms that were typically employed on a “take it or leave it” basis: if the algorithm makes a mistake or misunderstands the data, users can correct the problem using an intuitive user interface and improve the underlying analysis. We aim to jointly improve both the algorithms and the interfaces, leading to deeper understanding of, faster introduction to, and greater trust in the algorithms we rely on to understand massive textual datasets. Work done in collaboration with the University of Maryland.
Understanding the complex and increasingly data-intensive world around us relies on the construction of robust empirical models, i.e., representations of real, complex systems that enable decision makers to predict behaviors and answer “what-if” questions. Today, construction of complex empirical models is largely a manual process requiring a team of subject matter experts and data scientists.
The Data-Driven Discovery of Models (D3M) program aims to develop automated model discovery systems that enable users with subject matter expertise but no data science background to create empirical models of real, complex processes. This capability will enable subject matter experts to create empirical models without the need for data scientists, and will increase the productivity of expert data scientists via automation.