Tool for nonstatisticians automatically generates models that glean insights from complex datasets

ScienceDaily | 1/15/2019 | Staff
amyc9948 (Posted by) Level 3
Democratizing data science is the notion that anyone, with little to no expertise, can do data science if provided ample data and user-friendly analytics tools. Supporting that idea, the new tool ingests datasets and generates sophisticated statistical models typically used by experts to analyze, interpret, and predict underlying patterns in data.

The tool currently lives on Jupyter Notebook, an open-source web framework that allows users to run programs interactively in their browsers. Users need only write a few lines of code to uncover insights into, for instance, financial trends, air travel, voting patterns, the spread of disease, and other trends.

Paper - Week - ACM - SIGPLAN - Symposium

In a paper presented at this week's ACM SIGPLAN Symposium on Principles of Programming Languages, the researchers show their tool can accurately extract patterns and make predictions from real-world datasets, and even outperform manually constructed models in certain data-analytics tasks.

"The high-level goal is making data science accessible to people who are not experts in statistics," says first author Feras Saad '15, MEng '16, a PhD student in the Department of Electrical Engineering and Computer Science (EECS). "People have a lot of datasets that are sitting around, and our goal is to build systems that let people automatically get models they can use to ask questions about that data."

Tool - Bottleneck - Data - Science - Field

Ultimately, the tool addresses a bottleneck in the data science field, says co-author Vikash Mansinghka '05, MEng '09, PhD '09, a researcher in the Department of Brain and Cognitive Sciences (BCS) who runs the Probabilistic Computing Project. "There is a widely recognized shortage of people who understand how to model data well," he says. "This is a problem in governments, the nonprofit sector, and places where people can't afford data scientists."

The paper's other co-authors are Marco Cusumano-Towner, an EECS PhD student; Ulrich Schaechtle, a BCS postdoc with the Probabilistic Computing Project; and Martin Rinard, an...
(Excerpt) Read more at: ScienceDaily
Wake Up To Breaking News!
Sign In or Register to comment.

Welcome to Long Room!

Where The World Finds Its News!