4.1. Industrial waveform processing with Quasar

4.1. Industrial waveform processing with Quasar

In industrial manufacturing, electrical and mechanical waveforms are captured for real-time monitoring and analytical purposes. This allows the operators to predict and prevent issues before they occur.

In this guide we will walk through a typical waveform use case, which will cover:

  • Modeling waveform data with Quasar;

  • Transforming raw waveform payload;

  • Using the Quasar Pandas API for ingestion and querying.

Completing this guide will take 30-60 minutes.

4.1.1. Preparation

If you wish to run this guide locally, there are two ways to prepare your local development environment:

4.1.1.1. Docker

We have prepared a pre-loaded Docker image which contains everything needed to run practice guide. To get started, please launch the bureau14/howto-iot-waveform Docker container as follows:

$ docker run -ti --net host bureau14/iot-waveform:3.13.0
Launching QuasarDB in background..
Launching Jupyter lab...
[I 13:20:59.346 NotebookApp] Writing notebook server cookie secret to /home/qdb/.local/share/jupyter/runtime/notebook_cookie_secret
[I 13:20:59.501 NotebookApp] Serving notebooks from local directory: /work/notebook
[I 13:20:59.501 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 13:20:59.501 NotebookApp] http://localhost:8888/?token=...
[I 13:20:59.501 NotebookApp]  or http://127.0.0.1:8888/?token=...

You can now navigate with your browser to the URL provided by the Jupyter notebook and continue with this exercise.

4.1.1.2. Standalone installation

4.1.1.2.1. Install and launch QuasarDB

Please install & launch QuasarDB for your environment; the free community edition is sufficient.

For more information and instructions on how to install QuasarDB, please follow our installation guide.

4.1.1.2.2. Download this Jupyter notebook

You can download this notebook prepackaged from this location:

iot-waveform-env.tar.gz

Please download this, and extract it in a folder on your local machine.

4.1.1.2.3. Prepare Python environment

Assuming you have downloaded and extracted the Jupyter notebook, please install your environment as follows:

# Create virtualenv
$ python3 -m venv .env/

$ source .env/bin/activate

# Install requirements
(.env)
$ python3 -m pip install -r requirements.txt

# Launch local notebook
(.env)
$ jupyter notebook ./iot-waveform.ipynb

[I 17:16:02.496 NotebookApp] Serving notebooks from local directory: /home/user/qdb-guides/
[I 17:16:02.496 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 17:16:02.496 NotebookApp] http://localhost:8888/?token=...
[I 17:16:02.496 NotebookApp]  or http://127.0.0.1:8888/?token=...
[I 17:16:02.496 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

A new jupyter notebook should automatically open, otherwise please manually navigate and you can navigate your browser to http://localhost:8888/ with the environment prepared.

You are now completed with the preparations.


4.1.2. Electrical waveform

In this tutorial, we will be working with 3-phase electrical waveform data. This is a common type of data collected for industrial manufacturing.

According to the Wikipedia page about Three-phase electric power:

Three-phase power works by the voltage and currents being 120 degrees out of phase on the three wires. As an AC system it allows the voltages to be easily stepped up using transformers to high voltage for transmission, and back down for distribution, giving high efficiency.

A three-wire three-phase circuit is usually more economical than an equivalent two-wire single-phase circuit at the same line to ground voltage because it uses less conductor material to transmit a given amount of electrical power. Three-phase power is mainly used directly to power large motors and other heavy loads. Small loads often use only a two-wire single-phase circuit, which may be derived from a three-phase system.

As such, industrial manufacturers use 3-phrase electricity for their heavy equipment.

In order to prevent problems and potential breakdowns of this equipment, the electricity is monitored for several purposes:

  • Detect and mitigate surges / sags;

  • Build predictive maintenance models to detect early signals of machine failure;

  • Analyze specific historical moments of interest to better understand the behavior of their equipment.

For this, measurement devices are added to the electrical circuits that capture the measurements. Such measurements are represented using waveforms, and looks like this:

[1]:
import pandas as pd
from utils import sine
from utils import plot_waveform

df = pd.DataFrame({'vA': sine(0, 200, 5, 500),
                   'vB': sine(120, 200, 5, 500),
                   'vC': sine(240, 200, 5, 500)})
_ = plot_waveform(df)
Loading BokehJS ...

In this example above, you can see all three phases identified by colors (red, blue and black).

The data captured has the following configuration:

  • Both voltage and current are measured;

  • Each phase is measured individually. This means that for each measurement, we get 6 data points, voltage and current for each of the three phases.

  • The frequency is 50Hz or 60Hz, depending on the country;

  • Sampling rate is typically between 40,000 and 80,000 samples per second.

Continuously monitoring all electricity would be a rather costly operation: a single machine with two sensors generates over 3 billion data points in 12 hours of operation, and the majority of that data would not be that interesting.

Instead, data is captured during a short period, typically 30s - 60s, at moments of interest: typically at machine start / machine stop. The data for a single measurement period is captured and aggregated as a single unit, which we call “payload”.

In this tutorial, we will be focus heavily on the capturing, processing and analysis of these payloads.


4.1.3. Imports and setups

Now that we have explained what electrical waveform data is, we can start working on the actual solution.

First let’s go through some boilerplate to import libraries and establish a connection with the QuasarDB cluster.

[2]:
import datetime
import json
import random
import pprint
import copy
from tqdm import tqdm

import numpy as np
import pandas as pd
import quasardb
import quasardb.pandas as qdbpd

# Utilities for plotting waveforms
from utils import plot_waveform
pp = pprint.PrettyPrinter(indent=2, sort_dicts=False)


def get_conn():
    return quasardb.Cluster("qdb://127.0.0.1:2836")

# This connection will be used for the remainder of the notebook
conn = get_conn()

4.1.4. Load sample data

Loads all sample waveform data that is used for this guide. It will be available in the payloads dict.

[3]:
from io import BytesIO
from gzip import GzipFile
from urllib.request import urlopen, Request
import tarfile

resp = urlopen(Request("https://doc.quasardb.net/howto/iot-waveform/iot-waveform-data.tar.gz", headers={'User-Agent': 'Mozilla/5.0'}))
f = GzipFile(fileobj=BytesIO(resp.read()))
tf = tarfile.TarFile(fileobj=f)
files = tf.getmembers()

n = 0
payloads = []
for file in files:
    fn = file.name

    with tf.extractfile(fn) as fp:
        payload = json.loads(fp.read())
        for k in payload['axis'].keys():
            n += len(payload['axis'][k])

        payloads.append(payload)

print("Loaded {} waveform payloads with a total of {:,} points ({:,} per payload)".format(len(payloads), n, int(n / len(payloads))))
Loaded 15 waveform payloads with a total of 17,280,000 points (1,152,000 per payload)

4.1.5. Waveform capture

A typical waveform ingestion process is as follows:

  1. A device measures a for a brief period, and captures values at a predefined interval (e.g. 20,000 Hz).

  2. This data is compressed into a single payload with an associated timestamp and sensor id.

  3. It is then uploaded to be processed and ingested into the main database.

An example of an electrical waveform payload may look like this as it arrives:

[4]:
payload = copy.deepcopy(payloads[0])
# Truncate the payload slightly for readability
for k in payload['axis'].keys():
    payload['axis'][k] = payload['axis'][k][0:5]
payload['count'] = 5

pp.pprint(payload)


p = payloads[0]
per_cycle = int(p['sample_rate'] / p['freq'])
n_cycles = 5

print()
print("Voltage waveform visualization (first {} cycles, with a resolution of {:,} points per cycle)".format(n_cycles, per_cycle))
print()

df = pd.DataFrame(p['axis'], columns=['volt-a', 'volt-b', 'volt-c'])
df = df[0:(per_cycle * n_cycles)]
_ = plot_waveform(df, xaxis_label='Samples (n)')
{ 'timestamp': '2021-09-12T12:03:03',
  'payload_id': 0,
  'sensor_id': 'emea-fac1-elec-1832',
  'sample_count': 192000,
  'sample_rate': 48000,
  'freq': 5,
  'axis': { 'volt-a': [ 220.0,
                        220.07199483035976,
                        220.14398965300944,
                        220.21598446023896,
                        220.28797924433826],
            'volt-b': [ 410.52558883257655,
                        410.48958121550766,
                        410.45355319851706,
                        410.41750478546294,
                        410.3814359802059],
            'volt-c': [ 29.47441116742357,
                        29.438423954132645,
                        29.402457148473616,
                        29.36651075429822,
                        29.33058477545589],
            'cur-a': [ 220.0,
                       220.07199483035976,
                       220.14398965300944,
                       220.21598446023896,
                       220.28797924433826],
            'cur-b': [ 410.52558883257655,
                       410.48958121550766,
                       410.45355319851706,
                       410.41750478546294,
                       410.3814359802059],
            'cur-c': [ 29.47441116742357,
                       29.438423954132645,
                       29.402457148473616,
                       29.36651075429822,
                       29.33058477545589]},
  'count': 5}

Voltage waveform visualization (first 5 cycles, with a resolution of 9,600 points per cycle)