Table Of Contents

5.7. quasardb insertion tool

5.7.1. Introduction

The quasardb insertion tool (qdb_railgun) enables you to insert CSV files into the database in the fastest way possible. The intent is to make it possible to configure various ways of inserting data with as few config files as needed.

5.7.2. Quick Reference

Option Usage Default
-h, --help display help  
--help-config display information about config file format  
--help-parsers display information about the parsers  
--with-header specify if the CSV file has a header false
-c, --cluster cluster URI qdb://127.0.0.1:2836
-f, --file file to insert  
--config Configuration file  
--max-threads maximum number of threads used 1
--csv Values are separated with commas Used by default.
--tsv Values are separated with tabulations  
--user-credentials-file Path to the user’s private file  
--cluster-public-key-file Path to the cluster’s public file  

Config File

Say your CSV is organized with the following header:

time,product,seid,market_id

The first line would look like:

2017-10-12T08:18:43.525290,ICE,36462,1

The config file describe each columns and has some syntactic sugar to help do so.

{
    "shard_size": "30d",
    "columns": [
      {
          "parser": "iso_8601"
          "intos": "$timestamp"
      },
      {
          "id": "product",
          "parser": "blob",
          "intos": "my_timeseries"
      },
      {
          "parser": "skip"
      },
      {
          "id": "market_id",
          "parser": "int64",
          "intos": "my_timeseries"
      }
    ]
}

The shard_size parameter specify the shard_size that will be used to create the timeseries. Each element of columns describe one column in the csv.

The $timestamp is a specific name to precise where the timestamp used for the row should be picked. It must have a date, time or datetime parser. You can set multiple $timestamp columns in a date/time scenario, they will be added to each other to form a timestamp since epoch.

The id key can be set up or ignored based on your preference

{
    "id": "csv_id" | nothing,
    "parser": "$parser_type",
    "intos": ["$timestamp"
                | "timeseries_id"
                | {"ts": "timeseries_id", "col": "col_id"}
                | {"ts": "timeseries_id", "col": "col_id", "prealloc_count": 1000}
            ]
            | "$timestamp"
            | "timeseries_id"
            | {"ts": "timeseries_id"}
            | nothing
}

Meaning you can use an id and not specify the col parameter in the intos field

{
    "id": "market_id",
    "parser": "int64",
    "intos": ["ts_1", "ts_2"]
}

This will insert the market_id into both ts_1 and ts_2 timeseries.

On the contrary you can fully specify an intos field without specifying the id by mentioning explicitly ts, col and prealloc_count

{
    "parser": "int64",
    "intos": [{"ts": "ts_1", "col":"market_id"}, {"ts": "ts_2", "col": "market_id", "prealloc_count": 1000}]
}

This example has the same behaviour as the previous one. But it has the advantage of letting us rename the column in the second timeseries if we want to.

The prealloc_count permits to reserve size for the number of elements you think will fit in a each push. This would basically be the number of rows you expect to push for a given column.

Config Parameters

shard_size
The shard size used for the creation of a timeseries. Possible units: d, h, min, s, ms.
columns
An array of column, each representing one field in the csv.
id

The potential csv id of the column.

Can be used as the timeseries column.

Can be left unnasigned.

parser
The parser used to convert the column value.
intos
The fields in which the parsed value will be added.
ts
A timeseries name.
col
A column name in a given timeseries.
prealloc_count
The number of preallocated elements in a given timeseries column.

Program Options

-h, --help

Displays basic usage information.

--help-config

Displays information about config file format.

--help-parsers

Displays information about the parsers.

--with-header

Specifies if the CSV file has a header.

Default value
false
-c, --cluster=<cluster_uri>

Specifies the address and port of the quasardb cluster to which the insertion tool must connect.

Argument
The URI (list of comma-separated endpoints, i.e. addresses and ports, preceded by qdb://) of a cluster on which the tests will be run.
Default value
qdb://127.0.0.1:2836, the IPv4 localhost address and the port 2836.
-f, --file=<path_to_file>

Specifies the file used for insertion.

Argument
Path to the file.
--config=<path_to_file>

Specifies the configuration file used to setup the inserter.

Argument
Path to the file.
--max-threads=<max_threads>

Maximum number of threads used. Will be ignored if you use a quoted parser in the config file.

Default value
1
Implicit value
hardware_concurrency, the number of threads available on your machine.
--csv

Set the delimiter between values to ','. Used by default.

--tsv

Set the delimiter between values to '\t'.

--user-credentials-file=<path_to_file>

Specifies the user’s private key file with both username and secret_key.

Argument
Path to the file.
--cluster-public-key-file=<path_to_file>

Specifies the path to the cluster’s public key file.

Argument
Path to the file.
arrow_backPrevious
5.6. quasardb benchmarking tool
Next arrow_forward
5.8. quasardb configuration generator