Many online “guides” to data visualization tools just lump together solutions for business analysts, web developers, software engineers, graphic designers etc. into huge lists.
That’s why we’ve created a guide to open source data visualization solutions specifically intended for data analytics, rather than graphic design or web design. We’ll also look at major vendors such as Tableau that offer free versions of their proprietary solutions.
We’ll see that open source solutions tend to be geared toward data science, whereas proprietary solutions are more suited to the needs of business analysts.
We’ll cover the following topics and solutions:
(Click on a link below to jump to that section.)
How Visual Analytics Go Beyond Mere Data Visualization
Tableau Public (Free)
ParaView (Open source/Free)
Gephi (Open source/Free)
Weave (Open source/Free)
Conclusions and next steps
How Visual Analytics Go Beyond Mere Data Visualization
In the business intelligence (BI) market, open source is often a highly complex laboratory environment for Fortune 500 companies. These companies have vast budgets to burn on projects that are situated at the bleeding edge of big data analytics.
Thus, you need to understand that in the BI market, “open source” is not a synonym for “free,” “easy” or “DIY.”
This is important to note, because analytics platforms are evolving in response to the need for greater ease of use. Vendors such as Tableau and Qlik have cornered the BI market by making analytics accessible to end users without much technical knowledge, as opposed to traditional BI solutions that could only be used by the IT department.
Ajay Chandramouly, Global Director of Analyst Relations at Tableau, explains that “To do BI in the past, you had to know how to script in SQL, or you had to rely on someone in IT who knew SQL to run those queries for you. Then you had to wait for the answer, and by the time you got it you had a new round of questions.”
Tableau’s graphical user interface (GUI) eliminates this work for the user, because it “translates drag and drop gestures into SQL queries on the back end.”
Chandramouly explains that this means that Tableau’s capabilities go beyond data visualization into the territory of “visual analytics, which doesn’t just involve churning out compelling charts and graphs in the back end, but also interacting with and querying the data using a visual interface in the front end.”
Visual analytics is where many open source data visualization products fall short. Preparing data for analysis and querying data are highly complex processes, especially if you’re connecting to complex data sources, such as a relational database used by your organization.
If a data visualization tool focuses on the visualization step of analysis and not the data preparation or processing steps, it probably won’t address your needs for a comprehensive analytics solution.
You can use a dedicated platform for analytics alongside a tool for creating charts and graphs. However, as Chandramouly explains, this takes you away from the dialogue with your data that solutions such as Tableau enable. This is because visual analytics enable users to “spend their time in the flow of asking and answering questions of their data,” rather than “spending time tweaking graphs and formatting.”
To make your search easier, we’re going to focus here on open source products and free offerings from vendors that do support advanced data preparation and processing. We’ll start with a look at Tableau Public, a free vendor offering, which is more suited to business usage than open source products are.
Tableau is one of the biggest names in data analytics and visualization. Its offerings have helped define the market for self-service BI, i.e., BI that doesn’t require heavy assistance from an IT department.
Tableau is an enterprise solution used by Fortune 500 companies—a single license for the professional edition is $1,999, which is quite costly for small businesses. Fortunately, Tableau also offers a free version of its desktop client, known as Tableau Public.
Tableau Public offers many of the same powerful visualization capabilities as the paid versions of Tableau. Users can analyze data from sources such as Excel sheets through geographical visualizations, Gantt charts, treemaps and other templates.
While Tableau Public offers robust data preparation and visualization features, you’ll need the Professional and/or Server edition in order to:
- Share interactive visualizations across your organization.
- Save files to your computer (Tableau Public only allows you to save files to your Tableau Public account, which is limited to 10GB of storage.).
- Connect to advanced data sources (Hadoop, Oracle databases, Microsoft SQL Server and SQL Server Analysis Services etc.).
With Tableau Public, you can only connect to flat files such as Excel or .CSV spreadsheets. There are web data connectors for connecting to databases that publish in web formats such as HTML, but most businesses will need the Professional or Server editions to connect to their databases.
Tableau Public is, however, a great way to get started with visual analytics, especially if you’re focusing on flat files such as Excel workbooks. There are extensive training resources available for free on the Tableau Public site.
The biggest strength of Tableau Public resides in its user interface.
As Chandramouly explains, Tableau’s interface is the product of years of research led by the visualization guru Jock Mackinlay: “Mackinlay’s team led the effort in directly implementing visualization best practices into the product. His team figured out the best ways that the human mind can process and represent data visually.
“For example, it’s better to show contrasts in a blue-orange gradient than a red-green gradient, since 10 percent of the male population is colorblind and can see blue-orange contrasts more easily than red-green contrasts.”
Let’s turn now to some open source alternatives for visual analytics. There aren’t that many options out there, since many open source projects focus either on visualization tools for web developers or software developers, or on advanced analytics tools for statistical modeling etc. in data science.
ParaView is one of the most robust open source platforms for visual analytics. This is not surprising, given that the Los Alamos National Laboratory had a hand in its development.
ParaView is a powerful tool for seasoned data scientists. It supports distributed file storage and parallel processing for analytics on huge datasets of the petabyte scale. Moreover, the solution is platform-agnostic, so it can run on systems ranging from individual workstations to server clusters and supercomputers.
The visualization capabilities of ParaView are powerful enough for the solution to have found applications in data-intensive natural sciences that rely heavily on statistical modeling, such as fluid dynamics and astrophysics.
ParaView excels at creating animations as well as interactive visualizations from vast data sets. Additionally, users can extend the solution’s functionality through Python scripting.
While it does offer a rich GUI, the interface will be confusing to business analysts without quantitative backgrounds and experience with solutions such as MATLAB.
Gephi excels at visualization and exploration of complex networks. The platform is used for analysis of both biological and social networks. It also handles link analysis, which is a more general form of network analysis that clarifies relations between object classes by treating them as nodes within a network.
The business value of Gephi resides in its advanced features for creating and interacting with network visualizations, particularly social networks. Given that some of the most successful data science initiatives at companies such as LinkedIn have focused on network analysis, data scientists should definitely give Gephi a spin.
One thing to note about Gephi is that it’s used to further explore existing graphs in standard graph file formats. While users can import spreadsheets, Gephi lacks extensive data integration and preparation capabilities. It’s definitely a niche solution rather than an all-purpose visualization platform.
Weave is an open source data visualization platform developed by the Institute for Visualization and Perception Research at the University of Massachusetts Lowell. Weave differs from ParaView and Gephi in that it intends to be an all-purpose visualization platform, spanning both business and scientific use cases.
Weave is cloud-based, meaning that users access the user interface within their browsers. Weave offers a number of visualization templates, ranging from standard bar charts and pie charts to thermometers and maps. It is particularly useful for geographic visualizations.
Like Gephi, Weave is somewhat limited in the data sources that it can be connected to. The most widely used formats it supports are .CSV and Excel files, but there are a few other options.
Conclusions and Next Steps
We’ve seen that open source visual analytics are a viable alternative for advanced data science applications.
You should consider open source if:
✓ You already have solutions for data preparation and processing in place.
✓ You employ data scientists comfortable with statistical methodologies for advanced analytics.
✓ Your IT personnel and/or data scientists are comfortable with extending and customizing the capabilities platform.
You should consider a free or paid version of a proprietary solution if:
✓ You need the visualization platform to assist with data preparation.
✓ You’re connecting to a variety of data sources, particularly relational and/or non-relational databases.
✓ Your end users will primarily be business analysts.
We’ve covered Tableau Public in this guide, but there are other proprietary offerings you may want to consider. We offer the following resources in this space: