Classification of fetal state using Cardiotocography data and SVM

Published in

Nerd For Tech

6 min readApr 11, 2020

Cardiotocography

Since we all know that there are numerous techniques available to observe the fetus and ultrasound technique is one of the common ones but this ultrasound technique is not very helpful to record the heart-rate of the fetus and other details such as uterine contractions. This is where the cardiotocography comes into play. Cardiotocography is the technique that helps doctors to trace the heart rate of the fetus, which includes measuring accelerations, decelerations, and variability, with the help of uterine contractions. Further, this cardiotocography can be used to classify a fetus into three states namely:

Normal trace
Suspicious trace
Pathological trace

Problem Statement

Fetal Pulse Rate and Uterine Contractions (UC) are among the basic and common diagnostic techniques to judge maternal and fetal well-being during pregnancy and before delivery. By observing the Cardiotocography data doctors can predict and observe the state of the fetus. Therefore we’ll use CTG data and Support Vector Machine to predict the state of the fetus. Therefore we will use CTG data and Support Vector Machine to predict the state of the fetus.

Defining some of the most important terms in the field of the cardiotocography:

CTG trace generally shows two lines. The upper line is a record of the fetal heart rate in beats per minute. The lower line is a recording of uterine contractions from the TOCO.

The image clearly shows two lines. The upper one represents the fetal heart rate and the lower one shows uterine contractions.

2. The four fetal heart rate features are:

Baseline heart rate
Variability
Accelerations
Decelerations

3. Uterine contractions are quantified as the number of contractions present in a 10 min period and averaged over 30 min.

Normal: ≤ 5 contractions in 10 min
High: ≥ 5 contractions in 10 min

4. The baseline heart rate is the average baseline fetal heart rate.

Reassuring feature: 110–160 beat per minute (bpm)
Non-reassuring feature: 100–109 bpm OR 161–180 bpm
Abnormal feature: 180 bpm

5. Variability is the fluctuations in the fetal heart rate this causes the tracing to appear as a jagged, rather than a smooth, line. Variability is indicative of a mature fetal neurologic system and is seen as a measure of fetal reserve.

Reassuring feature: ≥ 5 bpm
Not reassuring feature: <5 bpm for ≥40 minutes but <90 minutes
Abnormal feature: <5 bpm for >90 minutes

6. Decelerations are decreases in fetal heart rate from the baseline by at least 15 beats per minute, lasting for at least 15 seconds. There are three types of decelerations, depending on their relationship with uterine contraction. Early deceleration begins at the start of the uterine contraction and ends with the conclusion of contraction.

Reassuring feature: No deceleration
Non-reassuring feature: Early deceleration, variable deceleration or single Prolonged deceleration up to 3 minutes
Abnormal feature: Atypical variable decelerations, late deceleration or single prolonged deceleration greater than 3 minutes.

7. Three categories of CTG traces are as follows:

Normal trace: Tracings with all four features: Baseline rate 110–160 bpm, Normal variability, Absence of decelerations, and Accelerations (may or may not be present).
Suspicious trace: Tracing with ONE non-reassuring feature and the other three are reassuring.
Pathological trace: Tracing with TWO or more non-reassuring features or ONE or more abnormal features.

Approach

We will be using the features baseline value(verified by the medical expert), baseline value(obtained through SisPorto), accelerations, fetal movement (SisPorto), uterine contractions (SisPorto), light decelerations, severe decelerations, prolonged decelerations, repetitive decelerations to classify the fetus into three categories namely, Normal, Suspect, and Pathologic. We will use Support Vector Machine model for this purpose because of the following points:

It scales relatively well to high dimensional data.
SVM is not solved for local optima
SVM models have generalization in practice, the risk of over-fitting is less in SVM.
The kernel trick is the best thing about the SVMs

Looking at the dataset:

The original dataset can be found at : https://github.com/dubeyakshat07/Fetal-Heart-Rate-using-SVM/tree/master/Dataset

The dataset was downloaded from the UCI Machine Learning repository and belongs to the respective owners.

Abbreviations used in the dataset:

LBE: baseline value (medical expert)
LB: baseline value (SisPorto)
AC: accelerations (SisPorto)
FM: fetal movement (SisPorto)¶
UC: uterine contractions (SisPorto)
ASTV: percentage of time with abnormal short term variability (SisPorto)
DL: light decelerations
DS: severe decelerations
DP: prolonged decelerations
DR: repetitive decelerations
NSP: Normal=1; Suspect=2; Pathologic=3

Looking for null values:

Since the count of the missing values is very less, hence we can drop the respective rows.

Now, there are no null values.

Feature scaling

Performing the feature scaling:

We are using StandardScaler scaling because it increases the performance of our SVM model. Standardization is a transformation that centers the data by removing the mean value of each feature and then scale it by dividing (non-constant) features by their standard deviation. After standardizing data the mean will be zero and the standard deviation one.

Standardization can drastically improve the performance of models. For instance, many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order

Using StandardScaler( ) to scale the data

Splitting the data into training and testing set:

We have split the dataset into training and testing sets. The 30 % will be used for testing and 70% will be used for training purposes.

Using the train_test_split( ) for splitting the dataset

Training and testing the model:

We will be using SVM with the polynomial kernel.

As we can observe that the maximum accuracy is corresponding to the gamma value 0.1.

Using the kernel with gamma=0.1 for training and testing

Evaluating the model:

We have evaluated are model using accuracy_score, precision_score, f1_score, and recall_score.

We can observe that:

accuracy_score: Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. One may think that, if we have high accuracy then our model is best. Yes, accuracy is a great measure but only when you have symmetric datasets where values of false positive and false negatives are almost the same. The accuracy score of our model is 87.74%
precision_score: Precision is the ratio of correctly predicted positive observations of the total predicted positive observations. High precision means a low false rate. The precision score of our model is 87.44%
recall_score: Recall is the ratio of correctly predicted positive observations to all observations in actual class — yes. It’s also known as the sensitivity of the model. The recall score of our model is 87.77%
f1_score: F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best if false positives and false negatives have a similar cost. The f1_score of our model is, 87.53%

Conclusion: Our model performed pretty well on the dataset and maybe with few more modifications it may produce trustful results to the doctors.