Analytic Sandboxes

Analytic Sandboxes offer the biggest gains available from In-Database Processing. So what is an Analytic Sandbox, anyway?

Simply put, the analytic sandbox is physical space carved out of the production enterprise data warehouse (EDW) reserved for business analysts to build, test and run ad-hoc analytics. Within the sandbox analysts are given developer access to create/drop tables & views. A semantic layer or view of production data might be pre-built within the sandbox, but typically business analysts build their own depending on the particular ad-hoc analytics they’re focused on. Initial gains from analytic sandboxes are speed. Business analysts no longer depend on massive data dumps to workstations and then building analytical data sets to run ad-hoc analytics. Literally, hours of effort drop to seconds. Aside from the in-database speed gained in building analytical datasets, there is also speed gained from in-database mining. The ability to join and aggregate data or to explore data relationships, interactions, distributions, outliers, etc. are much faster within a data warehouse than on a workstation. After all, data warehouses are designed for this type of work.

Another sandbox gain is the ability to audit data and analytic processes. First by migrating user spreadmarts or desktop data marts into the production sandbox. Business users notoriously accumulate rogue desktop data marts or spreadmarts. These datasets are collected from outside sources and stored in Excel, Access, SAS datasets or even flat files. They’re inefficient, loaded with data errors, produce analytics that can’t be audited and generate confusing reports . Migrating rogue data marts away from desktops and into the production sandbox doesn’t initially remove the problem; however, migrating data marts to the sandbox gives IT visibility to rogue datasets. Effort can then be placed into weighing the business value of these data marts and determining if and how to integrate them into the production EDW. A second auditing capability is with the actual analytic processes. Have you ever wondered how a report was generated but the explanation just made your eyes glaze over? As analytics migrate from the workstation to the sandbox, business rules and processes also migrate. Analytic rules are encoded into semantic views, SQL, SAS pass-through code or x-ref tables. Over time complete analytic processes can be observed and audited for accuracy and consistency by independent groups in IT or business — just as semantic layers for BI reporting & dashboards are audited.

Analytic sandboxes are not for all users and not for all business analytics. Traditional BI reports, dashboards, and most ad-hoc analyses belong to the production EDW reporting environment built for casual users of day-to-day reports. Similarly, few business users, and not all power users, would have access to the sandbox — at most 1% or 2% of all business analysts. The power user who also has analytic sandbox access has skills not normally identified with a typical business or IT analyst, but a mixture of both. He/she knows how the business operates and understands corporate strategic direction. He/she understands the data, data relationships and EDW table structures. He/she can follow an entity relationship diagram and is comfortable navigating through normalized structures. He/she is adept in SQL, in complex joins and aggregations and in create/drop table procedures. This analyst is accomplished in an analytic scripting language (e.g., BASE/SAS) and in a statistical package (e.g., SAS/STAT). Further knowledge of macros, pass-through facilities, stored procedures and UDFs help. Finally, they posses data management etiquette— these analyst can mine, manage and explore data within the sandbox without dimming the lights in the data center or impacting response times for the other 98%. Data Artist, rather than Data Scientist, might best describe this person, but this analyst also has solid mathematics, econometrics or statistics academic credentials.

There are many reasons for and even against building an analytic sandbox inside a production EDW. On the one hand, a poorly managed sandbox without sound workload management will culminate into a DBA’s nightmare and can lead to dramatic performance degradation for other EDW users. On the other hand a well managed sandbox preserves the single-view of data, minimizes rogue data marts and speeds up data mining and analytics. New analytics can be developed, tested and converted into production applications quickly, giving a wider user group access to newer analytics quickly. The process of analytics, from data mining to building analytical datasets to publishing final reports can be audited by both IT and business, giving everyone greater confidence in the analytics. Analytic Sandboxes become especially more valuable with big data: enterprise data volumes explode, data structures become more complex and data velocities increase.

Analytic Sandboxes are not for all organizations, but more and more, both large and small are adopting the sandbox solution to speed up and streamline their analytic processes.