Smart Archive Generation Using Computer Vision, NLP and Big Data

Title

Smart Archive Generation Using Computer Vision, NLP and Big Data

Subject

Decision making
Big data
Digital storage
Long short-term memory
Image enhancement
Data Analytics
Optical data processing
Knowledge management
Computer vision
Optical character recognition

Description

Gaining insights from the dense network of interrelated documents involved in E&P projects requires experience, knowledge, and awareness about the existence of the required data. This framework aims to facilitate the decision-making process while consuming shorter time periods and lower costs, without sacrificing the accuracy of the data and decreasing the probability of human errors. The high complexity of E&P Projects results in a dense network of interrelated documents which are produced to cover the various aspects and details of the project. Gaining insights from old data requires experience, knowledge, and awareness about the existence of the required data. Accordingly, the knowledge accumulated over the time from various projects can be considered a key asset, since it can be leveraged to perform more informed decisions. This paper presents a framework that aim at capturing organizational knowledge locked in paper-based datasets and store it in a structured digital format that facilitates its retrieval and enables analyses which help uncover valuable insights. This research aims to generate valuable data from existing archives while causing minimal disturbance to existing business processes and workflows. The framework performs four main functions: image processing, text recognition, Data Analytics and Data storage. Initially the text recognition module
which is performs Image Processing to enhance the quality of the scanned files, and optical character recognition using LSTM which extracts the text contained in images. The Data Analytics Module, then cleanses and mines the extracted text using Big Data Analytics tools. Text Matching and searching is performed on the Spark Dataframe using regular expressions to identify different attributes and their different types. Finally, the data is stored in a SQL Database. In order to measure the workflow's accuracy a manual baseline was generated for a sample project. The accuracy is measured using field-level verification, since it was found to be the most fit-for-purpose, as it allows to measure the accuracy of the workflow on the level of each field. Copyright 2021, Society of Petroleum Engineers

Creator

Marzouk, Mohamed Mahdy
ElZahed, Mahmoud Mohamed

Publisher

2021 Abu Dhabi International Petroleum Exhibition and Conference, ADIP 2021, November 15, 2021 - November 18, 2021

Date

2021

Type

conferencePaper

Identifier

10.2118/207365-MS

Citation

Marzouk, Mohamed Mahdy and ElZahed, Mahmoud Mohamed, “Smart Archive Generation Using Computer Vision, NLP and Big Data,” Lamar University Midstream Center Research, accessed May 4, 2024, https://lumc.omeka.net/items/show/29404.

Output Formats