The rise of citizen data science II – Embracing the role within organizations


In my previous blog, I explained that the data scientist scarcity, the advent of data science and machine learning platforms (DSML) and the lack of business translators are making the role of citizen data science very important. To embrace this role, organizations have to define the expertise, skills, tasks and leadership decisions needed for a successful implementation of the role of citizen data scientists (CDSs).

1. What expertise is required?
2. What involvement from data and analytical leadership is required to reap the benefits from CDSs?

Kartik Patel (Gartner) pointed out that CDSs create models for predictive and perspective analytics, but they are not trained in computer science or data analysis. Their main job function is therefore outside of the analytics or statistics scope. But with the help of advanced data technologies and tools, CDSs analyze the provided data and make data-driven business decisions.

CDS capabilities are a combination of basic business analyst and data scientist skills, so that they understand statistics, but not as advanced as a data scientist. For example, they are capable of:

• mining data for information to present findings;
• doing exploratory analysis and data visualization;
• programming knowledge, quantitative and data interpretation skills.

Besides the technical abilities described above, problem-solving skills are key for CDSs. Understanding the business problem and specifying the model to solve the problem is crucial before using an augmented analytics platform* to implement the model. In other words, knowing how to tackle problems and eliminate them, enable CDSs to support the organization in justifying business values of data.

*Augmented analytics is the use of enabling technologies such as machine learning and AI to assist with data preparation, insight generation and insight explanation to augment how people explore and analyze data in analytics and BI platforms (From Gartner Glossary).

How data and analytical leaders reap the benefits

CDSs are able to efficiently boost organizations in the artificial intelligence (AI) and machine learning (ML) domains, but their success is highly dependent on leadership encouragement. Organizations that promote the role as an admissible approach to close the data science gap enable themselves to create better analytics products. According to Gartner data and analytics (D&L) leaders should take the four following actions to make citizen data scientists successful within the organization.

1.   Build a CDS compatible ecosystem

Professionals do not become citizen data scientists just by having the right knowledge and skills, neither when they are working in an isolated environment. They have to be able to access ETL (Extract, Transform, Load) and process data for conducting a robust and advanced analysis. A holistic ecosystem (includes people, data, process and tools) must be leveraged to empower citizen data scientists to add value to organizations.

In other words, a comprehensive ecosystem should provide complementary roles (people component), like business translators, developers, data engineers, and machine learning architects, supporting citizen data science and getting the right insights out of data. Through constructive cooperation between complementary roles and citizen data science, the skills gap within the organization will be closed.

Moreover, to take into account the process in the ecosystem and to build strong data governance, leadership must have basic data literacy in place to first understand the data context and needs of the organization before setting data policies. Data literacy is a prerequisite to govern data and paves the way for the citizen data science role. Having efficient processes is a must to see a high-spirited community of various complementary roles. The well-defined processes lead to smoothing the flow of analytics content creation and customer demand fulfillment.

2.   Embedding augmented analytics with adding capabilities

The regular increase in the capability of analytical tools should be taken into account, instead of adding sudden huge capabilities in tools. It means that D&A leaders should make sure tools do not get outdated. Extending the abilities of tools used by citizen data science gradually as they become available prevents them from being overwhelmed by huge changes in tool capabilities. Such an impactful switch could create inefficiencies.

To meet this requirement, the extension capabilities should bridge the gap between skills of citizen data science and skills of other complementary roles. For example, the tools help CDSs to boost their abilities in, for example, data storytelling, feature engineering, direct querying using natural language queries, operationalization of analytics models, and so on.

In addition, it is suggested that augmented analytics workflow should be added to the current toolkits of CDSs. As mentioned in the previous blog, data science and machine learning platforms (DSMLP) are tools used to automate augmented analytics. DSMLP can facilitate augmented analytics workflow including augmented data preparation, augmented data discovery, and augmented data science, as shown in the visual below.

In other words, the synergy between CDSs and augmented analytics tools makes them powerful, so that a combined effect is greater than the sum of their separate effects. This means that augmented analytics tools in themselves do not reinforce better, problem-solving analytics products, it is with the capabilities of CDSs that organizations become able to leverage these tools successfully.

3.   Involve CDS in the business

Empowering CDSs is tied to allowing participation in projects, so they create value for the organization. D&S leaders should take into account four issues when they are extending business projects for CDSs:

► Prioritization: prioritize projects that are related to existing business processes or products and address the well-known and broad opportunities.
► Communication: the developed models and the analytical results should be shared.
► Utilization: Utilize created insights and avoid the creation of shelf work.
► Announcement: Announce and introduce transformational projects into the mix, working in close collaboration with expert data scientists.

4.   Collaboration between citizen data scientists and expert data scientists

Citizen data scientists never are substitutes for expert data scientists. Data scientists automate and leverage CDSs for simpler tasks and model development processes. Data scientists themselves focus on complex tasks and validating models developed by CDSs before being put to production.

Therefore, D&S leaders should build and facilitate efficient and suitable collaboration channels between experts and citizen data scientists and enable all roles to be involved across the analytical process with a collaborative approach.

To summarize, in order to embrace CDSs in an organization, besides having CDSs with sufficient capabilities, D&S leaders should provide a holistic ecosystem and infrastructure to help this role to be accepted by the organization and to create value for the business through this role.