Table Of Contents

5.7. quasardb insertion tool

5.7.1. Introduction

The quasardb insertion tool (qdb_railgun) enables you to insert CSV files into the database in the fastest way possible. The intent is to make it possible to configure various ways of inserting data with as few config files as needed.

5.7.2. Quick Reference

Option Usage Default
-h, --help display help  
--help-config display information about config file format  
--help-parsers display information about the parsers  
-c, --cluster set the cluster URI qdb://127.0.0.1:2836
-f, --file sepcify the file to insert  
--config specify the configuration file  
--i-make-config interactive mode: create a config file  
--i-auto interactive mode guess fields  
--i-verbose interactive mode becomes more explicit  
--check-syntax check the file’s syntax and display errors  
--max-threads maximum number of threads used max_hardware_concurrency()
--user-credentials-file Path to the user’s private file  
--cluster-public-key-file Path to the cluster’s public file  

Basic configuration

The configuration file is in this JSON format:

{
    "delimiter": <One of ',', ';' or '\t', character separating the values>,
    "has_header": <'true' or 'false', if it has a first line containing the column names>,
    "default_timeseries": <The timeserie's name in which the values are inserted>,
    "columns": [
        // In case of a timestamp key value
        {
            "parser": <One of 'time, date, datetime, iso_8601, epoch, epoch_ms, epoch_ns'>,
            "into": "$timestamp"
        },
        // Otherwise, to insert the value
        {
            "parser": <One of 'time, date, datetime, iso_8601, epoch, epoch_ms, epoch_ns, double, int64, blob, quoted_blob'>,
            "into": <default timeseries's column where to put the value>
        }
    ]
}

Example of configuration: - The first column creates the timestamp key. - The second column contains text inserted in the table ‘names’ of the time series ‘VIP’. - The third column contains integer inserted in the table ‘ages’ of the time series ‘VIP’.

{
    "delimiter": ",",
    "has_header": false,
    "default_timeseries": "VIP",
    "columns": [
        {
            "parser": "datetime",
            "into": "$timestamp"
        },
        {
            "parser": "blob",
            "into": "names"
        },
        {
            "parser": "skip_column",
            "into": "ages"
        }
    ]
}

This configuration file would allow to insert values like this:

2018.15.05 11:03:02,Steve,21
2018.15.05 11:03:04,Duke Nukem,54
2018.15.05 11:03:05,Germaine,78
2018.15.05 11:03:07,,24
2018.15.05 11:03:11,Molly,
...

Notice that empty values are allowed if they are not timestamp values.

Advanced configuration use cases

If several column values are marked as timestamps, the key used will the sum of these timestamps. If can be useful for example if one value is the current date, and the other value the current time.

The main JSON structure can give an optional “shard_size”, a mandatory parameter if there are mentionned timeseries that do not exist yet (otherwise, Railgun will ask to enter it in command line). Units available are ‘d, h, min, s, ms’.

If a column’s parser is one of ‘skip_column, skip_quoted_column’, the value is ignored.

“into” can also be equal to a structure to have more flexibility:

"into": {
    "timeseries": <The timeseries's name. Overrides "default_timeseries">
    "column":     <The column's name>
}

Finally, “into” can be an array of several ‘into’ values. In this case, it will be inserted in multiple tables, perhaps in different timeseries.

Here is a more complex configuration example:

The first and the third columns create the timestamp key. The second column contains text inserted in: - The table ‘names’ of the time series ‘VIP’. - The table ‘persons’ of the time series ‘traces’. - The fourth columm is ignored.

{
    "delimiter": ",",
    "has_header": false,
    "default_timeseries": "VIP",
    "shard_size": "20min",
    "columns": [
        {
            "parser": "date",
            "into": "$timestamp"
        },
        {
            "parser": "blob",
            "into": [
                "names",
                {
                    "timeseries": "traces",
                    "column":     "persons"
                }
            ]
        },
        {
            "parser": "time",
            "into": "$timestamp"
        },
        {
            "parser": "skip_column"
        }
    ]
}

This second configuration file would allow to insert values like this:

2018.15.05,Steve,11:03:02,aaa
2018.15.05,Duke Nukem,11:03:04,1234
2018.15.05,Germaine,11:03:05,
2018.15.05,,11:03:07,xxx
2018.15.05,Molly,11:03:11,boop
...

Moreover, it allows to create the timeseseries automatically (without asking the shard_size).

Parsers

  • time: Parse a time of the day, eg. 11:03 or 02:49:31.123456789
  • date: Parse a date, eg. 2018.05.15
  • datetime: Parse a date followed by time, eg. 2018.05.15 11:03:58.007
  • iso_8601: Parse a date and time in flexible ISO format, eg. 2018-05-15T11:03:58.007, or 2018-05-15 11:03:58.007
  • epoch: Parse a number as seconds, eg. 2 or 2.45
  • epoch_ms: Parse a number as milliseconds, eg. 2000
  • epoch_ns: Parse a number as nanoseconds, eg. 2000000000
  • double: Parse a floating value, eg. -1234.567, 123, -.2 or 3.
  • int64: Parse a signed integer, eg. 123 or -54
  • blob: Parse text, eg. 123, bip bop or @#$%^&*{[<
  • quoted_blob: Parse optionally quoted text, eg. ‘new linesnn’, “commas,,” or bop bip. Double quotes to escape them: ‘it’’s raining’
  • skip_column: Skip a column value, to skip values parsed by ‘blob’
  • skip_quoted_column: Skip optionally quoted value, to skip values parsed by ‘quoted_blob’

Program Options

-h, --help

Displays basic usage information.

--help-config

Displays information about config file format.

--help-parsers

Displays information about the parsers.

-c, --cluster=<cluster_uri>

Specifies the address and port of the quasardb cluster to which the insertion tool must connect.

Argument
The URI (list of comma-separated endpoints, i.e. addresses and ports, preceded by qdb://) of a cluster on which the tests will be run.
Default value
qdb://127.0.0.1:2836, the IPv4 localhost address and the port 2836.
-f, --file=<path_to_file>

Specifies the file used for insertion.

Argument
Path to the file.
--config=<path_to_file>

Specifies the configuration file used to setup the inserter.

Argument
Path to the file.
--i-make-config=<path_to_new_file>

The interactive mode outputs a configuration file at the given path then quits.

Argument
Path to the new file.
--i-auto

The interactive mode asks no questions.

--i-verbose

The interactive mode is verbose, and asks more questions if ‘–i-auto’ is not specified.

--check-syntax

The program will locally check the file’s syntax then display it’s errors and quit.

--max-threads=<max_threads>

Maximum number of threads used. Will be ignored if you use a quoted parser in the config file.

Default value
1
Implicit value
hardware_concurrency, the number of threads available on your machine.
--user-credentials-file=<path_to_file>

Specifies the user’s private key file with both username and secret_key.

Argument
Path to the file.
--cluster-public-key-file=<path_to_file>

Specifies the path to the cluster’s public key file.

Argument
Path to the file.
arrow_backPrevious
5.6. quasardb benchmarking tool
Next arrow_forward
5.8. quasardb configuration generator