Author: Antoine Grignard
Many experts repeatedly mention that “data is the new oil.” The idea behind this catchphrase is that data can be a major asset for companies. The equation is simple: it must result in profit growth, either by boosting revenues or decreasing costs. Like oil, raw data produces little value on its own and requires a defined value chain. This process consists primarily of data processing, storage, governance, and use.
This article examines a crucial step in value creation that starts with the question: "how can I boost the value of my data by adding external sources?" Data Enrichment is the answer.
But what is data enrichment?
Understanding clients with socio-demographic data, mitigating real estate risk with geospatial data and segmenting potential leads with revenue estimates are all examples that rely on data enrichment. In other words, it is the process of integrating new information into an existing database.
The first step towards data enrichment, is defining the business goals: what do you want to achieve with your data (e.g., more personalised customer service, more efficient sales process, …). Indeed, the benefits of data enrichment are many and varied, it is therefore, essential to define clear objectives.
The next step is to identify what is required (e.g., cleaning the data, adding datapoints, …). Making an impact on the business by improving data quality is the essence of a data enrichment strategy. Improving data quality consists of cleaning existing elements or adding new information. This augmented dataset generates added value for the company.
The third step is to define the information required to meet the business objectives as many external sources can complement and enrich your primary data. In an ideal world, the information available matches your requests perfectly, but it is rarely the case. For this reason, evaluating data sources allows you to select the most appropriate data provider.
The following elements form the basis for a consistent evaluation:
The price of the data determines direct acquisition costs. Evaluating the different pricing models is essential: a price per qualification or a bucket fee can result in significant differences depending on the volume of information needed.
It can be tricky to find a perfect match with the information defined. In most cases, proxies are used to get close to the desired criteria. The accuracy can vary from one source/proxy to the other. For instance, a proxy for company maturity can be the start-up year and the number of employees. Different alternatives should be investigated.
Evaluating the scalability of the source is also a key element, certainly when large volumes are involved. The update frequencies and/or response time can impact scalability.
The integration method (API, CSV, SQL query…) directly impacts data processing. An automated process limits the manual inputs, saving time and increasing efficiency.
The legal and licensing aspects also need to be considered.
Evaluating potential security threats helps prevent corrupting or disrupting your organisation’s system. For example, collecting data by using a pre-built script found on a forum could hide malicious content.