Author: Antoine Grignard
Many experts repeatedly mention that “data is the new oil.” The idea behind this catchphrase is that data can be a major asset for companies. The equation is simple: it must result in profit growth, either by boosting revenues or decreasing costs. Like oil, raw data produces little value on its own and requires a defined value chain. This process consists primarily of data processing, storage, governance, and use.
This article examines a crucial step in value creation that starts with the question: "how can I boost the value of my data by adding external sources?" Data Enrichment is the answer.
But what is data enrichment?
Understanding clients with socio-demographic data, mitigating real estate risk with geospatial data and segmenting potential leads with revenue estimates are all examples that rely on data enrichment. In other words, it is the process of integrating new information into an existing database.
The first step towards data enrichment, is defining the business goals: what do you want to achieve with your data (e.g., more personalised customer service, more efficient sales process, …). Indeed, the benefits of data enrichment are many and varied, it is therefore, essential to define clear objectives.
The next step is to identify what is required (e.g., cleaning the data, adding datapoints, …). Making an impact on the business by improving data quality is the essence of a data enrichment strategy. Improving data quality consists of cleaning existing elements or adding new information. This augmented dataset generates added value for the company.
The third step is to define the information required to meet the business objectives as many external sources can complement and enrich your primary data. In an ideal world, the information available matches your requests perfectly, but it is rarely the case. For this reason, evaluating data sources allows you to select the most appropriate data provider.
The following elements form the basis for a consistent evaluation:
The price of the data determines direct acquisition costs. Evaluating the different pricing models is essential: a price per qualification or a bucket fee can result in significant differences depending on the volume of information needed.
It can be tricky to find a perfect match with the information defined. In most cases, proxies are used to get close to the desired criteria. The accuracy can vary from one source/proxy to the other. For instance, a proxy for company maturity can be the start-up year and the number of employees. Different alternatives should be investigated.
Evaluating the scalability of the source is also a key element, certainly when large volumes are involved. The update frequencies and/or response time can impact scalability.
The integration method (API, CSV, SQL query…) directly impacts data processing. An automated process limits the manual inputs, saving time and increasing efficiency.
The legal and licensing aspects also need to be considered.
Evaluating potential security threats helps prevent corrupting or disrupting your organisation’s system. For example, collecting data by using a pre-built script found on a forum could hide malicious content.
Analysing the similarities between data sources allows to limit the overlap and optimize the costs and data processing.
Finally, proper evaluation and correction of parameters are necessary throughout the project.
At BrightWolves, we accompany our clients in their data enrichment path in various sectors, and for a broad range of use cases.
For instance, we collaborated with a company active in the food and beverage industry. The sales representatives complained that the quality of the target lists harmed prospecting. Closed, misnamed or poor-quality restaurants made the process long and inefficient. Considering these shortages, we defined data quality improvement and qualitative criteria collection as the key drivers for limiting visit time and targeting higher quality leads. We selected ratings and reviews as the qualitative criteria and opening status as a cleaning parameter to achieve these objectives. We reviewed several data sources such as Facebook Pages, Yelp and Google Places. The latter provides information about restaurants with good coverage at a reasonable cost. In addition, its API connector facilitates data integration automation and ensures scalability for larger volumes. After enriching the dataset, the project cleaned up 10% of the prospect list and qualified 85% of the restaurants based on ratings and reviews, optimising the salesperson's visit time and increasing the strike rate.
Another example is a project we conducted at an international private equity fund. The company's objective was to map its target market to find the best investment opportunities. Market mapping was achieved by selecting the key criteria for segmenting and prioritizing the candidates. We faced two main challenges during this case: (i) data acquisition was expensive and (ii) different overlaps between data sources. The evaluation of data sources required access to the content, which was not freely available. For this reason, we conducted a first pilot phase in a specific and small area. This sample allows us to compare sources at a limited cost. During this project, we combined 5+ data sources and enriched the dataset with additional web scraping. The output was a clean list of a few thousand high-quality investment opportunities.
Are you currently considering a data enrichment opportunity? Would you like to better understand what is feasible for your company and how BrightWolves can help you in this process?