Visualizing uncertainity in incomplete databases
The process of going from a raw dataset to an analytical answer involves an extensive data preparation process where analysts identify problems or unexpected structural features of the data. Some 'optimistic' data analytics tools (e.g., Pandas) automate aspects of this process through simple heuristics optimized for the common case (e.g., ignore malformed source data). Ironically, this automation often requires more from the user, as they must now manually inspect the source data for potential errors that might be obscured by the system's heuristics. In this paper, we explore the design of an interface that carries the benefits of both optimistic systems (i.e., easy access to answers assuming common-case heuristics are valid) and pessimistic systems (i.e., greater trust knowing that values shown to the user are error-free). We fit the resulting interface into a recently proposed data preparation and exploration system called Vizier, which links documentation (e.g., of potential errors) to fragments of affected data. This additional information reduces analyst's workload by focusing attention on assumptions relevant to analysis, but risks biasing any decision the user makes.
We explore how different visualization techniques affect perceived data quality, accuracy and decision confidence through IRB approved user studies and interviews with users/analysts.
- Poonam Kumari. Make Informed Decisions:Understanding Query Results from Incomplete Databases. Proceedings of the VLDB 2019 PhD Workshop.
- Poonam Kumari and Oliver Kennedy. The Good and Bad Data. Proceedings of the VLDB Endowment 2017.
- Poonam Kumari, Said Achmiz and Oliver Kennedy>. Communicating Data Quality in On-Demand Curation. Proceedings of the 11th VLDB Workshop on Quality in Databases 2016, VLDB 2016.