D-Tale for Fast and Easy Exploratory Data Analysis of Well Log Data

Author:Murphy  |  View: 28349  |  Time: 2025-03-23 19:50:31
Image by Photo Mix from Pixabay

Exploratory Data Analysis (EDA) can be a time consuming but crucial part of the data science and machine learning workflow. It is through this process that we become familiar with our datasets, understand their contents, have an overview of the statistics of the data and much more. It is at this stage that we spend the majority of our time on many projects. In some cases, this can be up to 90% of the available project time.

When carrying out EDA in Python, we often rely on libraries like pandas and matplotlib to explore our data. Oftentimes, this can lead to writing significant pieces of code to get a plot to display how we want it to. For example, creating well log plots with matplotlib takes time to work out and display the data the correct way.

There are several libraries out there for Python which can speed up the EDA phase of a project significantly. One of these is D-Tale.

D-Tale is a powerful exploratory data analysis Python library that makes it easy for you to interactively view, analyse, and edit data contained within pandas dataframes. If you want to explore the functionality of D-Tale without downloading it, check out this live example.

To get started with D-Tale on your own system, you install it using your terminal and running the following command.

pip install dtale

If you are using Anaconda, you will need to use the following command instead:

conda install dtale

Importing Libraries and Loading Data

The first step with any python project is to import the libraries we will be working with. In this case, all we need is pandas to load in our data from a CSV file and D-Tale to carry out the analysis.

import pandas as pd
import dtale

Once the libraries have been imported, we can import our data. For this example, we will be using well log measurements that have been acquired from numerous oil and gas wells off the Norwegian coast. The dataset we are using for this tutorial is a subset of a training dataset used as part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). It is released under a NOLD 2.0 licence from the Norwegian Government, details of which can be found here: Norwegian Licence for Open Government Data (NLOD) 2.0. The full dataset can be accessed here.

To read the data, we call upon the following:

df = pd.read_csv('Data/Xeek_Well_15-9-15.csv')

Loading Data into D-Tale

Once the data has been loaded, we can start exploring the dataset. Normally when working with data we would be using several pandas functions. However, all we have to do with D-Tale, is call upon the following:

dtale.show(df)

Once we call upon dtale.show(df) we will be presented with an interactive data table similar to the one below. This provides a much better and more friendly experience than basic pandas functions we are used to.

Interactive dataframe created by D-Tale of well log measurements. Image by the author.

In the top left of the display, we can see two numbers. The bottom one (17717) represents the number of rows present in the dataframe and the right one (12) represents the number of columns.

D-Tale row and column count info. Image by the author.

If we click on the triangle that points to the right, we will see the following menu, which is packed with features to make data analysis easier.

Menu options contained within D-Tale. Image by the author.

As there are so many options available within D-Tale, we are only going to focus on a few of them.

Data Summaries with D-Tale

When working with pandas, we often call on the .describe() method to get some summary statistics of our dataset. With D-Tale, we can also do this, but at the same time, we will get significantly more information.

From the menu shown above, all we need to select is Describe. This will open up a new page with a summary.

In the example below, we can see that when the WELL column is selected, we get information about the content of that column. As this column is of data type string, we have information about the characters, the length of the strings, and even details about the unique values contained within that column.

This is especially handy if you are dealing with a CSV file that contains information from multiple wells, and you need to know which wells they are.

Well column summary created using D-Tale. Image by the author.

If we go to the LITH column, we will see the same kind of summary, but we will now see all of the lithologies that are present within that column.

Again, this is handy when exploring well log data, as we are often interested in certain lithologies for our petrophysical and well log analysis.

Lithology column summary created using D-Tale. Image by the author.

We can go one step further in exploring the LITH data by clicking on the Value Counts button. This will present us with a nice bar graph illustrating the occurrence of each lithology within the dataset.

Bar charts within D-Tale showing the occurrence of different lithologies within a well. Image by the author.

If we do the same with numerical data, we get a few extra options for visualising our dataset.

For example, if we look at the DTC (acoustic compressional slowness) column in our dataset, we get the key statistics about the data, including the mean, percentiles, standard deviation etc.

We also get a nice boxplot showing the distribution of our data.

At the bottom of the summary, we have information about the unique values within that column and the option to view any outlier values.

Data summary of a numeric curve created with D-Tale. Image by the author.

If we click on the Histogram button, we can view the distribution of values within the DTC column on a combined histogram and KDE plot. This is a very common data visualisation and allows us to view the distribution of our data. In the example below, we have a bimodal distribution with a peak of around 85 us/ft and a second of around 157 us/ft.

On this page, we also have the option to change the number of bins for our histogram. This is great if we want to increase the level of detail we want to see or reduce it.

Interactive histogram within the Describe module of D-Tale. Image by the author.

One great feature within the Describe section of D-Tale, is the ability to view values by categories.

When carrying out petrophysical or well log analysis, we often want to see how values differ between different geological formations or between lithologies. This plot provides some of that functionality and enables us to have a quick understanding of the values encountered within each group.

Easily visualise the data by categories such as lithology or geological formation. Image by the author.

Visualising Data Completeness With D-Tale

When working with datasets, it is essential to consider how complete your dataset is before you begin to apply advanced analytics or machine learning to it. There are a few libraries available for this; one of my favourites is missingno – which is simple to use python library and has been integrated with D-Tale.

If we go to the menu and select Visualise followed by Missing Analysis, we will be able to see the missingno plots.

Analysing and understanding data completeness within a well log dataset with D-Tale. Image by the author.

For more information on each of these plots, check out my article below, where I explore the functionality of this library.

Using the missingno Python library to Identify and Visualise Missing Data Prior to Machine Learning

Visualising Data Using Interactive Charts with D-Tale

One of the key aspects of Data Science and petrophysics is visualising your data. This allows you to get a feel for the data compared to looking at raw numbers within a table.

To access the charts, we need to navigate to Visualise → Charts within the D-Tale menu. Once selected, a new browser tab will open up.

From here, we have a huge range of options for us to choose from.

Creating Line Charts

One of the key charts used within petrophysics is a log plot. This is essentially a line chart with depth plotted along one axis and a logging measurement plotted along the other. This allows us to visualise how the measurement changes along the wellbore and, subsequently, allows us to interpret the geology that has been drilled through.

Simple log plot generated using a line chart in D-Tale. Image by the author.

We can even plot multiple curves per line plot and control their scales. This is extremely useful for creating a density-neutron line plot. Unfortunately, we are unable to apply lithology shading to this chart, but that is a minor issue.

Density-neutron log plot track created using D-Tale. Image by the author.

However, one very useful feature is being able to colour the line by categories. This is especially great when trying to understand the log responses within different formations.

A log plot track / line plot created using D-Tale shows a bulk density log coloured by different formations. Image by the author.

Creating Scatter Plots (Crossplots) with D-Tale

Scatter plots (also known as crossplots in petrophysics) allow us to take two variables and plot them against each other. This allows us to identify trends, key interpretation parameters for petrophysical equations, and relationships between data.

Within D-Tale, we can easily create a very common petrophysical plot. The density-neutron crossplot.

D-Tale provides numerous options to customise what data is plotted. Once we have picked the x and y axis variables, we can then select a third variable to colour our plot with.

Scatter plot showing density and neutron porosity data within the Charts module of D-Tale. Image by the author.

One of the nice things with this interface is that we can remove groups we do not want to plot to allow us to focus on the group(s) that do matter.

Scatter plot showing density and neutron porosity data within the Charts module of D-Tale after removing some groups. Image by the author.

One slight annoyance with this setup is we can't control the size or shape of the markers. It would be nice to be able to make the marker smaller so that we can see more of the data points.

Pearson Correlation Matrix with D-Tale

When looking to build a machine learning model for petrophysical property prediction, we want to identify if any of the predictors we are going to use are strongly correlated with each other – an issue known as multi-colinearity. We also want to be able to identify the most suitable variables to use within our model to predict our target variable.

To display the correlation matrix, we can go to the menu and select Correlations. This will return a heatmap with the Pearson's correlation score for each numeric variable.

Correlation between well log measurements using D-Tale. Image by the author.

From the returned table, we can see a few of the logging measurements have high correlations with each other. For example, RHOB and DTC have a strong negative correlation, which is as expected. The more porous the rock, the lower the density (RHOB) value and the higher the acoustic compressional slowness (DTC).

Summary

The D-Tale python library is very powerful for Exploratory Data Analysis and can be very useful for well log measurement quality control and early analysis. This article has only covered a small fraction of the features available within D-Tale, but they are very powerful and could significantly speed up the data QC and exploration phase within well log datasets. It is a library you should have in your data science toolkit.


Thanks for reading. Before you go, you should subscribe to my content and get my articles in your inbox. You can do that here! Alternatively, you can sign up for my newsletter to get additional content straight into your inbox for free.

Secondly, you can get the full Medium experience and support thousands of other writers and me by signing up for a membership. It only costs you $5 a month, and you have full access to all of the fantastic Medium articles, as well as the chance to make money with your writing.

If you sign up using my link, you will support me directly with a portion of your fee, and it won't cost you more. If you do so, thank you so much for your support.

Tags: Data Analysis Data Analytics Tools Data Science Exploratory Data Analysis Python

Comment