Poonam Kumari personal website

Poonam Kumari

PhD Candidate     Resume     Linkedin     GitHub

Online Data Interaction Lab
University at Buffalo

Contact:
12 Flickinger ct, Apt F, Amherst, NY, 14228 
poonamku at buffalo dot edu

Research

My research focuses on visualizing uncertainity in incomplete databases to help users make an informed decision. This page describes my current research directions.

I am fortunate to work with exceptional collaborators Oliver Kennedy, William Spoth, Gourab Mitra, and Lisa Lu.

Current Research

Visualizing uncertainity in incomplete databases

The process of going from a raw dataset to an analytical answer involves an extensive data preparation process where analysts identify problems or unexpected structural features of the data. Some 'optimistic' data analytics tools (e.g., Pandas) automate aspects of this process through simple heuristics optimized for the common case (e.g., ignore malformed source data). Ironically, this automation often requires more from the user, as they must now manually inspect the source data for potential errors that might be obscured by the system's heuristics. In this paper, we explore the design of an interface that carries the benefits of both optimistic systems (i.e., easy access to answers assuming common-case heuristics are valid) and pessimistic systems (i.e., greater trust knowing that values shown to the user are error-free). We fit the resulting interface into a recently proposed data preparation and exploration system called Vizier, which links documentation (e.g., of potential errors) to fragments of affected data. This additional information reduces analyst's workload by focusing attention on assumptions relevant to analysis, but risks biasing any decision the user makes.

We explore how different visualization techniques affect perceived data quality, accuracy and decision confidence through IRB approved user studies and interviews with users/analysts.

Loki: Streamlining Integration and Enrichment

Data scientists frequently transform data from one form to another while cleaning, integrating, and enriching datasets. Writing such transformations, or "mapping functions" is time-consuming and often involves significant code re-use. Unfortunately, when every dataset is slightly different from the last, finding the right mapping functions to re-use can be equally difficult. In this paper, we propose "Link Once and Keep It" (Loki), a system which consists of a repository of datasets and mapping functions and relates new datasets to datasets it already knows about, helping a data scientist to quickly locate and re-use mapping functions she developed for other datasets in the past. Loki represents a first step towards building and re-using repositories of domain-specific data integration pipelines.