Table Of Contents

7.15. Python

7.15.1. Introduction

The quasardb module contains multiple classes to make working with a quasardb cluster simple. It is written in C++ on top of numpy <>, and provides high-performance access to the QuasarDB cluster using a simple API.

7.15.2. Requirements

The QuasarDB Python API is built and tested against the following versions:

  • Python 3.5
  • Python 3.6
  • Python 3.7

In addition to this, we support the following environments:

  • MacOS 10.9+
  • Microsoft Windows
  • Linux
  • FreeBSD

7.15.3. Installation

The QuasarDB Python module is using PyPi / pip.

Windows and MacOS

On Windows and MacOS, the QuasarDB Python module is distributed in binary format, and can be installed without any additional dependencies as follows:

pip install quasardb

This will download the API and install all its dependencies.


For Linux users, installation via pip through PyPi will trigger a compilation of this module. This will require additional packages to be installed:

  • A modern C++ compiler (llvm, g++)
  • CMake 3.5 or higher
  • QuasarDB C API

Ubuntu / Debian

On Ubuntu or Debian, the installation can be achieved as follows:

$ apt install apt-transport-https ca-certificates -y
$ echo "deb [trusted=yes] /" > /etc/apt/sources.list.d/quasardb.list
$ apt update
$ apt install qdb-api cmake g++
$ pip install wheel
$ pip install quasardb


On RHEL or CentOS, the process is a bit more involved because we need a modern GCC compiler and cmake. It can be achieved as follows:

# Enable SCL for recent gcc
$ yum install centos-release-scl -y

# Enable EPEL for recent cmake
$ yum install epel-release -y

# Enable QuasarDB Repository
$ echo $'[quasardb]\nname=QuasarDB repo\nbaseurl=\nenabled=1\ngpgcheck=0' > /etc/yum.repos.d/quasardb.repo

$ yum install devtoolset-7-gcc-c++ cmake3 make qdb-api python3-devel python3-wheel

# Make cmake3 the default
$ alternatives --install /usr/bin/cmake cmake /usr/bin/cmake3 10

# Start using gcc 7
$ scl enable devtoolset-7 bash

# Default RHEL7 setuptools is not recent enough
$ pip install --upgrade setuptools

# Install the Python module
$ pip install quasardb

7.15.4. Verifying installation

You can verify the QuasarDB Python module is installed correctly by trying to print the installed version:

$ python
Python 3.7.2 (default, Feb 13 2019, 15:08:44)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import quasardb
>>> print(quasardb.version())

This tells you the currently installed version of the Python module, and the QuasarDB C API it is linked against is 3.1.0. Ensure that this version also matched the version of the QuasarDB daemon you’re connecting to.

7.15.5. Connection management


The rest of this document assumes that you have a non-secure QuasarDB cluster up and running under qdb:// For more information on launching and managing a QuasarDB cluster, please refer to Server Administration.

Establishing a connection with the QuasarDB cluster is easy. Launch your Python REPL and execute the following:

import quasardb

c = quasardb.Cluster("qdb://")

The execution of the above code snippet will output:

>>> quasardb.Cluster("qdb://")
<quasardb.quasardb.Cluster object at 0x7f81c7705df8>

Secure cluster

If your cluster is set up using security features, you can provide authentication details as follows:

import quasardb

c = quasardb.Cluster(uri='qdb://',

Note that the user_private_key and cluster_public_key have to be provided as a string, not a path. For more information on setting up security, please refer to the quasardb cluster key generator and quasardb user adder documentation.


The default timeout is one minute. To specify a different timeout, you must pass it as a parameter when constructing your quasardb Cluster object:

c = quasardb.Cluster("qdb://", datetime.timedelta(minutes=2))


Please be adviced that establishing a connection is an expensive operation. In a production setting, you are strongly adviced to reuse your connections.

7.15.6. Timeseries

Creating a table

In this example we will create a timeseries with three columns, “close”, “volume”, and “value_date”. The respective types of the columns are double precision floating point values, 64-bit signed integer, and high resolution nanosecond-precise timestamps.

For this, we first acquire a reference for the table and then instruct to create it.

t = c.ts("my_table")

columns = [quasardb.ColumnInfo(quasardb.ColumnType.Double, "close"),
           quasardb.ColumnInfo(quasardb.ColumnType.Int64, "volume"),
           quasardb.ColumnInfo(quasardb.ColumnType.Timestamp, "value_date")]


Tagging tables

To accomodate managing a large number of tables, you can tag your tables to make querying them easier. You can tag a table as follows:

assert t.has_tag('nasdaq') == False

Columnar API

If your timeseries data is columnar, or you have no need to correlate row data, you are adviced to use the columnar timeseries API described in this section. By leveraging the high-speed numpy.ndarray, it is significantly faster than the Tabular API.

You can always use the columnar API to query, even if your data is inserted in rows.


For each of its value types, our Python API provides a function to insert a numpy array of data:

  • quasardb.quasardb.TimeSeries.double_insert()
  • quasardb.quasardb.TimeSeries.int64_insert()
  • quasardb.quasardb.TimeSeries.blob_insert()
  • quasardb.quasardb.TimeSeries.timestamp_insert()

Their function signature and usage are all the same, except for the underlying numpy array type.

For each of these, you will need to create two numpy arrays:

  • An array of datetime64 timestamp values with nanosecond precision
  • An array with an equal amount of values.

For example, if we would like to insert 100 double points into our table, we could use the following code:

import numpy as np

t = c.ts("my_table")

points = 100

# Create 100 timestamp points, one per second, starting Jan 1, 2017
timestamps = np.array([(np.datetime('2017-01-01') + np.timedelta64(i, 's'))
                        for i in range(points)]).astype('datetime64[ns]')

# Generate 100 random doubles
values = np.random.uniform(-100.0, 100.0, points)

# Insert the values
t.double_insert("close", timestamps, doubles)

As you can see, we create a timestamps array for the first 100 seconds of 2017, and a values array filled with random numpy doubles.

The data is written to the QuasarDB cluster immediately.


Column-oriented retrieval is the fastest way to retrieve data from a QuasarDB cluster. If the way you plan on analyzing the data allows for it, you are recommended to use this API.

For each of its value types, our Python API provides a function to retrieve arrays of data:

  • quasardb.TimeSeries.double_get_ranges()
  • quasardb.TimeSeries.int64_get_ranges()
  • quasardb.TimeSeries.blob_get_ranges()
  • quasardb.TimeSeries.timestamp_get_ranges()

Their function signature and usage are all the same, except for the numpy array type returned.

Each of these functions accept two arguments: the column name and the time ranges you wish to retrieve. Multiple ranges can be queried in a single query.

For example, if we would like to retrieve all the doubles for the close column in the years 2015 and 2017, we could use the following code:

import numpy as np

t = c.ts("my_table")

points = t.double_get_ranges("close", [(np.datetime64('2015-01-01', 'ns'),
                                        np.datetime64('2016-01-01', 'ns')),
                                       (np.datetime64('2017-01-01', 'ns'),
                                        np.datetime64('2018-01-01', 'ns'))])

timestamps = points[0]
values = points[1]
assert len(timestamps) == len(values)

In the code above, timestamps and values are both numpy arrays. Each of the entries inside these arrays map to each other; values[10] has the corresponding timestamp timestamps[10], etcetera.

Tabular API

When your data is row-oriented, and you want to interact with your data as such, you should use the API described in this section: you can use the Writer and the Reader as your primary interfaces.

Batch Writer

The batch writer provides you with an interface to send data to the QuasarDB cluster. The data is buffered locally and sent in batches, ensuring efficiency and performance.

The first step is to create a bulk inserter that matches all the timeseries and columns you want to insert to:

batch_columns = [quasardb.BatchColumnInfo("my_table", "close", 50000),
                 quasardb.BatchColumnInfo("my_table", "volume", 50000),
                 quasardb.BatchColumnInfo("my_table", "value_date", 50000)]

batch_inserter = c.ts_batch(batch_columns)

The number you specify in the column is the number of rows you expect to send per rows, enabling the API to pre-allocate the right amount of memory, significantly increasing performance. If you specify 50000 as seen above, it means you expect to push every 50000 rows.

You can insert data into multiple timeseries at the same time by simply specifying multiple columns from multiple timeseries, such as this:

batch_columns = [quasardb.BatchColumnInfo("my_table", "close", 50000),
                 quasardb.BatchColumnInfo("my_table", "volume", 50000),
                 quasardb.BatchColumnInfo("my_table", "value_date", 50000),
                 quasardb.BatchColumnInfo("other_table", "close", 50000),
                 quasardb.BatchColumnInfo("other_table", "volume", 50000),
                 quasardb.BatchColumnInfo("other_table", "value_date", 50000)]

Once the batch_inserter object is created you insert the data row per row:

import numpy as np

# All timestamps are numpy datetime64 with nanosecond precision
batch_inserter.start_row(np.datetime64('2018-01-01', 'ns'))
batch_inserter.set_double(0, 1.0) # set close
batch_inserter.set_int64(1, 231) # set volume
batch_inserter.set_timestamp(2, np.datetime64('2018-02-02', 'ns'))

# send to the server


Note that the data isn’t sent to the server as long as you don’t call push. Typically it is recommended to insert in large batches, such as 50k - 100k rows per batch.


The reader provides you with an interface to retrieve data from the QuasarDB cluster as rows. The data is retrieved in a stream, and the interface exposes to you an iterator to process the result set.

By default, the reader retrieves all columns and puts them into a list, each column mapped to the appropriate offset within the row:

t = c.ts("my_table")
total_volume = 0
for row in t.reader():
    print("timestamp = ", row.timestamp(), ", volume = ", row[1])
    total_volume = total_volume + row[1]

This works, because the ‘volume’ column is the second column of the table, and as such is mapped to index 1.

In situations where you do not know the column’s relative index within the table, you can look up the index by querying the table:

t = c.ts("my_table")
total_volume = 0
volume_index = t.get_column_index_by_id('volume')
for row in t.reader():
    total_volume = total_volume + row[volume_index]

Since this is a relatively expensive operation, you are recommended to cache the index as shown above.

If we want to furher optimize the code, remember that we are still retrieving all values for all columns: under the hood, the ‘close’ and ‘value_date’ columns are retrieved, causing unneeded overhead.

We can further optimize this by explicitly providing a set of columns we wish to query. In this situations, each offset within the row is mapped to the column you provide:

for row in t.reader(columns=['volume']):
    total_volume = total_volume + row[0]

Notice how our index changed from 1 to 0.

If we care only about a small timespan, we can explicitly provide the range to query. The following examples considers only the data in 2017:

import numpy as np

for row in t.reader(columns=['volume'],
                    ranges=[(np.datetime64('2017-01-01', 'ns'),
                             np.datetime64('2018-01-01', 'ns'))])
    total_volume = total_volume + row[0]

If desired, you can provide multiple ranges.

Last but not least, if performance is not a concern, we can also ask the reader to generate dicts instead of lists:

t = c.ts("my_table")
for row in t.reader(dict=True):

This will print something like:

{'close': 1.0, 'volume': 231, 'volume_date': 2017-01-01T00:00:00.000000000}
{'close': 2.0, 'volume': 321, 'volume_date': 2017-01-02T00:00:00.000000000}

Running a query

You can run a query directly from the Python API and have the results as a dictionary of numpy.array().

q = c.query("select * from my_table in range(2018, +10d)")
res =

for col in res.tables["my_table"]:
    # is a string for the name of the column
    # is a numpy array of the proper type
    print(, ": ",

Do note how the result data is grouped per table: this allows you to easily traverse over results that span multiple tables. For more information on our query language, please see Query language.

7.15.7. Pandas

The QuasarDB Python API provides integration with Pandas as a pandas.DataFrame or pandas.Series. Both these interfaces allow high-speed retrieval or storage of data.


If your data is columnar. you should use the pandas.Series integration that QuasarDB provides.

The pandas code can be imported as follows:

import quasardb.pandas as qdbpd

Note that the quasardb module does not depend upon pandas itself: you have to install pandas >= 0.24.0 yourself.


Assuming a table with a double column “my_double”, you can write this Series as follows:

table = c.ts("my_table")

idx = pd.date_range(start, periods=1000, freq='S')
sx = pd.Series(np.random.uniform(-100.0, 100.0, 1000),
               index=idx) # <- important

qdbpd.write_series(sx, table, "my_double")

And that’s all there’s to it. The Series has now been written into QuasarDB, and is immediately available to all clients.


This same series can be read back as follows:

doubles = qdbpd.read_series(table, "my_double")

If you want to limit your query to just a small set of time ranges, you can optionally provide this when reading a timeseries:

import numpy as np

first_range = (start, start + np.timedelta64(10, 's'))
doubles_subrange = qdbpd.read_series(table, "my_double", ranges=[first_range])

This will return just the first 10 seconds after start.


If your data is tabular, you should use the pandas.DataFrame integration that QuasarDB provides.


Assuming a table “my_table”, you can write an entire DataFrame as follows:”

# Dataframe must be timestamp-indexed
df = my_dataframe()

qdbpd.write_dataframe(df, c, "my_table")

You can tell QuasarDB to automatically create the table as follows:

qdbpd.write_dataframe(df, c, "my_table", create=True)

You can also pass a QuasarDB table reference directly:

table = c.ts("my_table")
qdbpd.write_dataframe(df, c, table)


This same dataframe can be read back as follows:

my_table_df = qdbpd.read_dataframe(table)

This automatically reads all columns, and indexes your dataset by timestamp. If your dataset is sparse and might contiain null columns, you can use the (slower) row_index:

my_table_df = qdbpd.read_dataframe(table, row_index=True)

You can select a subset of columns, speeding up the retrieval time, as follows:

my_table_df = qdbpd.read_dataframe(table, columns=["open", "close"], row_index=True)

7.15.8. Key/Values

In addition to being a timeseries database, QuasarDB also provides a feature-rich key/value API.


To get the list of tags for an entry, use the get_tags method.

tags = b.get_tags()

To find the list of items matching a tag, you create a tag object. For example, if you want to find all entries having the tag “my_tag”. The get_entries method will then list the entries matching the tag.

c = quasardb.Cluster("qdb://")
tag = c.tag("my_tag")
entries = tag.get_entries()

Entry expiry

Expiry is either set at creation or through the expires_at and expires_from_now methods for the given data type.


The behavior of expires_from_now is undefined if the time zone or the clock of the client computer is improperly configured.

To set the expiry time of an entry to 1 minute, relative to the call time:


To set the expiry time of an entry to January, 1st 2020:

b.expires_at(datetime.datetime(year=2020, month=1, day=1))

Or alternatively:

b.update("content", datetime.datetime(year=2020, month=1, day=1))

To prevent an entry from ever expiring:


By default, entries never expire. To obtain the expiry time of an existing entry as a datetime.datetime object:


This will print:


7.15.9. Reference