A Small Business Guide to Self-Service Data Preparation

By: on July 25, 2016

When people talk about what makes self-service business intelligence (BI) tools different from their traditional counterparts, the focus is usually on interactive data visualizations for business users. However, data visualization is only part of the story.

The real technical challenge to enabling “DIY” analytics for business users resides in data integration and preparation, not visualization.

Small businesses frequently deal with multiple data sources that aren’t very well integrated. For instance, data about customers in a CRM system may be walled off from data about customers’ purchases in an accounting system, or separate locations may use separate databases.

In order to make informed decisions based on analysis of the business’s operations, the business’s data sources must be integrated. Otherwise, businesses will only be able to spot trends in a single data source (e.g., revenue dropped in Q3), as opposed to spotting trends across multiple data sources (revenue dropped in Q3 right around the time that branch stores one and two began having issues with late shipments from local distributors).

There are a number of options for self-service data integration that enable business users to mix and match data sources on the fly, instead of relying on the IT department to bridge data silos. Many are also useful for smaller organizations that are short on IT resources.

We’ll discuss these options and the many benefits businesses can achieve with them by answering the following questions:

(Click on a link below to jump to that section.)

How Does Self-Service Data Preparation Differ From Traditional Data Preparation?
What Are the Benefits of Self-Service Data Preparation for Small Businesses?
What Are the Most Useful Features of Data Preparation Tools for Small Businesses?
What Are the Different Types of Self-Service Vendors on the Market?

How Does Self-Service Data Preparation Differ From Traditional Data Preparation?

Traditionally, preparing data for analytics involved a time-consuming and complex process known as extract, transform and load (ETL), which required specific tools and specialized skills.

In the ETL process, the IT team extracts data from one or more sources, typically in a scheduled batch job, transforms it into a normalized format, and loads it into a centralized data warehouse. The warehouse stores the business’s historical data in ways that make it easy to generate standardized reports and dashboards that can be used throughout the organization.

Traditional, IT-governed ETL and Reporting


Traditional, IT-governed ETL and Reporting
 
Data warehouses remain a viable strategy for supporting analytics. Perhaps their greatest benefit resides in the fact that the data models that govern how they’re loaded into a data warehouse are centrally controlled by IT.

For instance, once the business settles on the attributes that meaningfully define a high-value customer for its purposes (i.e. at least one purchase within the last six months, purchase over X dollars etc.), IT uses these attributes to normalize data and to set up standardized reports that track the business’s key performance indicators (KPIs).

However, there are also three major issues with ETL and data warehousing:

  1. Data warehouses are inflexible. Once the data model has been defined and the data has been loaded into the warehouse, the paths for analyzing the data get frozen into place, limiting the number of potential insights that can be derived from it.
  2.  

  3. Only IT can prepare data. ETL tools aren’t designed for business users, even business analysts. Moreover, since one of the points of the data warehouse is to normalize the organization’s data in standardized formats, it makes organizational sense to task a single unit with preparing it.
  4.  

  5. Data warehouses emphasize reporting over ad hoc exploration. Data warehouses are architected to support scheduled reports or real-time dashboards created by the IT team. The rigidly structured ways in which they store data lend themselves to reports and dashboards that track pre-defined KPIs, but aren’t as well suited to exploratory analysis. For instance, the particular data attributes in which a business analyst is interested in order to answer a one-time inquiry from the CEO may not have been considered when the data was transformed into a normalized format and loaded into the warehouse.

Moreover, smaller organizations may simply lack the IT and budgetary resources to architect and manage data warehouses altogether.

This situation left small businesses in an analytics “stone age” before the emergence of self-service options for data preparation, since wringing insights from an Excel sheet is about as much fun as cutting animal hides without a knife.

And, without insights about customers’ purchasing habits, supply chain interruptions, new market opportunities, out-of-control expenditures and other issues that affect mission-critical operations, how will small businesses see the path ahead to growth?

As we’ve explained in a previous report, analytics is the key to achieving these insights. But, the first step to moving to data-driven decision making is ensuring that your data is prepared for analysis in the first place.

Self-service data preparation tools allow business users to blend data from various sources themselves, sometimes just by a few clicks or drag-and-drop gestures in the user interface.

Drag-and-drop data manipulation in Alteryx
 
According to Gartner, a research and advisory firm (the full report is available to Gartner clients), these tools enable “users to more easily mash up data from different data sources, load this combined data set into memory and provide an intuitive drilling experience through interactive visualization.”

This type of approach is well suited to both inexperienced business users from large organizations and small businesses that haven’t been able to attract top analytics talent.

Exploratory Analysis With Self-Service Data Preparation


Exploratory Analysis With Self-Service Data Preparation

Because the data set is accessed in memory (i.e., in RAM) instead of being read from disk-based storage, applications can keep up with the speed of the thought processes of their human users. The actions of the user (e.g., clicking on a segment of a pie chart to drill down) change how the data is processed in real-time, going beyond visualization and into exploration.

What Are the Benefits of Self-Service Data Preparation for Small Businesses?

The benefits of these tools are numerous and significant:

Benefits of self-service data preparation

Faster time to insight

 
A business user may have to wait weeks or even months for an IT team to integrate data sources, whereas self-service data preparation tools allow analytics to become a dialogue with data by allowing users to integrate data sources on the fly.

Eliminating the need for upfront data modeling

 
Moreover, with self-service tools, data models no longer limit analytical possibilities, which vastly expands the range of questions analysts can ask of data.

As Gartner explains:

“The traditional semantic layer-based architecture that dominated the market until recently required a formal data model and dimensional hierarchies. This architecture works well in a reporting use case, where the information is already well understood and modeled. Alternatively, the data discovery architecture is able to facilitate exploration of data sets (hence, the name) without requiring upfront modeling as a prerequisite.”

For small businesses without IT personnel and analysts trained in the complex discipline of data modeling, self-service tools can be a godsend.

Even though the costs for a single user license can be significant, you’ll quickly see the return on your investment if you compare the license price (which hovers at the low end around $1,000 per user and according to Gartner averages around $1,700) to the annual salary that a database administrator commands.

Drag-and-drop UI for business users

 
Self-service data preparation tools make data integration as easy as dragging or clicking on an icon in a user interface. They thus work well alongside or as components of visual analytics tools that are also designed for users without technical literacy in languages such as SQL.

Again, this is a major part of the appeal of data visualization tools both for large organizations aiming to extend analytics to more users and to small businesses without highly trained employees.
 


Expanding the range of data sources analysts can access and blend

 
By simplifying the work of integration, self-service data preparation tools make it possible for business analysts to work with new types of data sources such as massive data sets stored in Hadoop (a distributed architecture for storing and processing big data). Analysts can easily blend these sources with more familiar data sources such as customer or sales records to arrive at new insights.

What Are the Most Useful Features of Data Preparation Tools for Small Businesses?

As you might expect, data preparation solutions offer a broad range of capabilities. We’ve listed some of the most important in the following table:

Data mashups/blending Perform ad-hoc analyses blending together multiple types of data sources, or normalize multiple data sources into common formats for integrated analysis.
Interactive UI Combine data sources through an intuitive, visually oriented user interface.
Data discovery According to Gartner: Allow users to understand and organize data sources via “functionalities such as searching, sampling, profiling, cataloging/inventorying data assets and tagging/annotating data for future exploration, discovering/suggesting sensitive attributes, identifying commonly used attributes (such as geo-data, product ID)” etc.
Data modeling Eliminate the need for upfront modeling. Their UI and data discovery features can assist with the task of developing logical data models.

 

Modeling inventory data in Looker
 

What Are the Different Types of Self-Service Vendors on the Market?

Data preparation tools can exist as stand-alone solutions or as integrated components of broader analytics platforms.

As Gartner explains in the Market Guide for Self-Service Data Preparation for Analytics, “the integrated data preparation vendor offerings focus on enriching the data preparation capabilities as part of the end-to-end offering. Alternatively, the stand-alone vendor offerings focus on enabling tighter integration with the downstream processes, such as API access and support for multiple BI tools.”

This means that stand-alone products are more suited to businesses that have already heavily invested in analytics tools and that have in-house IT resources.

Stand-alone products are also more suited to businesses that need a broad range of capabilities to solve for niche data preparation issues.

Data preparation modules within integrated BI suites, on the other hand, are better options for businesses without existing platforms or the technical expertise for DIY application integrations via APIs.

Most smaller organizations should consider end-to-end solutions, as managing and integrating an arsenal of analytics tools in addition to a data preparation tool will likely overburden their technical resources.

The following table lists vendors offering integrated data preparation tools in end-to-end platforms.

Integrated Data Preparation in End-to-End BI Platforms

View profile
View profile
View profile
View profile
View profile

If you’re looking for stand-alone solutions, consider vendors such as Informatica, Paxata and Teradata.

You can also take a look at our comprehensive listing of end-to-end solutions for data preparation and visualization. We have user reviews, pricing details and product descriptions for more than 100 solutions.

Finally, if you’re still feeling lost, you can call one of our software advisors for a free consultation at (855) 998-8505. They can offer custom recommendations based on your budget, business size, deployment strategy (on-premise, cloud-based, virtualization etc.), industry vertical and functionality requirements.

You may also like:

What Is Data Blending, and Which Tools Make It Easier?

Traditional vs. Modern BI Buyer Trends Report

Pivot From Spreadsheet Anarchy to Analytics Governance

Compare end-to-end and standalone solutions for data preparation