Table Of Contents

5.7. quasardb insertion tool

5.7.1. Introduction

The quasardb insertion tool (qdb-railgun) enables you to insert CSV files into the database in the fastest way possible. The intent is to make it possible to configure various ways of inserting data with as few config files as needed.

5.7.2. Quick Reference

Option Usage Default
-h, --help display help  
--with-header CSV file has a header  
-c, --cluster cluster URI qdb://127.0.0.1:2836
-f, --file CSV file to insert  
--timeseries timeseries config file  
--parsers parsers config file  
--columns columns config file  
--max-threads maximum number of threads used 1

5.7.3. Usage scenarii

Say your CSV is organized with the following header: time,product,id The first line would look like: 2017.10.12 08:18:43.525290,ICE,36462

You need to separately give three configuration files. First define the different parsers that will be used.

timeseries.json:

[
    {
        "name": "test",
        "shard_size": "32s",
        "columns": {
            "product": "blob",
            "id": "int64"
        }
    }
]

This file is only used to create the timeseries if needed. As you can see the time field is not required since it’s what will be used when inserting data. The columns could actually have completely different name.

parsers.json:

[
    {
        "time": "datetime"
    },
    {
        "product": "blob_no_copy"
    },
    {
        "id": "int64"
    }
]

As you can see the key of each object should be the name of the field in the header even though it’s not mandatory. The value is the type of the parser that will be used. It’s required to least a datetime parser or a date, time parsers combo in the fields. It’s required to have the same order as in the actual CSV file, which is why the structure is an array.

columns.json:

[
    {
        "datetime": "time"
    },
    {
        "timeseries": "test",
        "columns": [
            {
                "csv_col": "product",
                "ts_col": "product",
                "prealloc_count": 200000
            },
            {
                "csv_col": "id",
                "ts_col": "id",
                "prealloc_count": 200000
            }
        ]
    }
]

This is where you define the mapping between the parsers and the timeseries in which you insert data. The order is not relevant, it will use the csv_col value to map to the parser. You could insert to a second timeseries if you wanted to. It’s required to have precisely one “datetime” or one “date” and one “time” objects.

The prealloc_count permits to reserve size for the number of elements you think will fit in a shard. In other terms the number of elements you expect to be inserted in the amount of time specified by shard-size.

5.7.4. Options

-h, --help

Displays basic usage information.

--with-header

Indicates that the CSV file has a header.

-c, --cluster <URI>

Specifies a URI to the cluster to which the shell should connect.

-f, --file <filepath>

Specifies the path to the CSV file.

--timeseries <filepath>

Specifies the path to the timeserieg config file.

--parsers <filepath>

Specifies the path to the parsers config file.

--columns <filepath>

Specifies the path to the columns config file.

--max-threads <thread_count>

Specifies the maximum number of CPU threads used for insertion.

arrow_backPrevious
5.6. quasardb benchmarking tool
Next arrow_forward
5.8. quasardb configuration generator