Making the move from paper to digital documents (2/2)

Author: Simon Knudde


In the previous article, we discussed how search engines changed our way of consuming information. We then dived into the topic of corporate documentation digitalisation, more specifically Intelligent Character Recognition (ICR), enabling the migration from a paper-based way of working to a more digital way of working. In this second article, we will complete the journey by diving into search engine implementation.

ICR is only the first step of your digitalisation: Your documents are now in a database, but you still need to search through it yourself. Thankfully, other open-source tools exist for you to implement the remainder of what you need, namely a search engine – such as Elasticsearch – which is built around the Lucene project, an open-source technology empowering most of the search engines you’ll find across the web (but no, not google). This search engine is designed to search by keywords with a lot of possible filter customisations: single word, multiple words, typo tolerance (“hllo” will return documents with “hello”), and many more filtering features.


Why is this important to mention? Because this shows that custom search engine implementation can solve your needs and allow you to perform better than Google’s own algorithm. Yes, you read that right! When you use Google, you typically need a relevant answer, but just one is enough. Whereas your needs can vary with your internal documentation: do I need just one document or all documents fitting a certain criterion? How many filters do I want to search on? All these questions will determine how you configure your internal search engine. As you can see, it likely won’t be just a one text field search bar, but instead, will contain several topic-specific fields and filters. You can decide to use them – or not – depending on your needs of the moment. These examples illustrate that while you can implement a search engine from opensource technology to manage your internal documentation, it requires careful consideration on the use cases to make sure you get the most out of it.


Let’s make it specific. Imagine that new regulations impact your business, to be compliant you will need to update contracts and internal processes. A wide range of documents will have to be modified and you will have to send notifications to the affected clients. How do you make sure you don’t miss out on any of them? Going back to the Google example, Google will give you millions of results, starting with the most relevant. But where does relevancy stop? How can you be sure you got all the affected documents but without risking modifying related – yet not affected – documents? This is where the careful design of your search engine will make the difference. By using appropriate tags and filters you will be able to retrieve only the relevant documents.


The best advice is to audit your internal needs carefully and define your data strategy before diving into the implementation of a document management system – be it an open-source system as we just described – or a more complete set of software developed by a provider. Commercial tools will typically encompass what we discussed with additional (industry-specific) features such as contract generation, topic identification, date identification for notification generation etc.


In the hope these articles were insightful, I wish you a fulfilling digitalisation journey and a lot of value creation. Feel free to contact me if you want to have a chat about the possibilities for your organisation.


If you want to explore more possibilities of Intelligent Automation, then have a look look at our video series!