# 3.1. Cluster Organization¶

## 3.1.1. What is a Cluster?¶

Each server running a quasardb daemon is called a node. By itself, a node can provide fast key-value storage for a project where a SQL database might be too slow or impose unwanted design limitations.

However, the real power of a quasardb installation comes when multiple nodes are linked together into a cluster. A cluster is a peer-to-peer distributed hash table based on Chord. In a cluster, quasardb nodes self-organize to share data and handle client requests, providing a scalable, concurrent, and fault tolerant database.

## 3.1.2. Stabilization¶

Stabilization is the process during which nodes agree on their position in the cluster. Stabilization happens when bootstrapping a cluster, in case of failure, or when adding nodes. It is transparent and does not require any intervention. A cluster is considered stable when all nodes are ordered in the ring by their respective ids.

In most clusters, the time required to stabilize is extremely short and does not result in any modification of order of nodes in the ring. However, if one or several nodes fail or if new nodes join the cluster, stabilization also redistributes the data between the nodes (see Data Migration). Thus the stabilization duration can vary depending on the amount of data to migrate, if any.

Nodes periodically verify their location in the cluster to determine if the cluster is stable. This interval can vary anywhere from 1 second up to 2 minutes. When a node determines the cluster is stable, it will increase the duration between each stabilization check. On the contrary, when the cluster is determined to be unstable, the duration between stabilization checks is reduced.

## 3.1.3. Adding a Node to a Cluster¶

Tip

Add nodes when activity is low to limit disruption.

During the period, the cluster is fully operational and clients are unaware that a node is joining the cluster.

## 3.1.4. Removing a Node from a Cluster¶

When a node is removed through a clean shutdown, it informs the other nodes in the ring on shutdown. The other nodes will immediately re-stabilize the cluster. If data replication is disabled, the entries stored on the node are effectively removed from the database. If data replication is enabled, the nodes with the duplicate data will serve client requests.

When a node is removed due to a node failure, the cluster will detect the failure during the next periodic stabilization check. At that point the other nodes will automatically re-stabilize the cluster. If data replication is disabled, the entries stored on the node are effectively removed from the database. If data replication is enabled, the nodes with the duplicate data will serve client requests.

Entries are not migrated when a node leaves the cluster, only when a node enters the cluster.

## 3.1.5. Recovering from Node Failure¶

When a node recovers from failure, it needs to reference a node within the ring to rejoin the cluster. The configuration for the first node in a ring generally does not reference other nodes, thus, if the first node of the ring fails, you may need to adjust its configuration file to refer to an operational node.

If following a major network failure, a cluster forms two disjointed rings, the two rings will be able to unite again once the underlying failure is resolved. This is because each node “remembers” past topologies.

The detection and re-stabilization process surrounding node failures can add a lot of extra work to the affected nodes. Frequent failures will severely impact node performance.

Tip

A cluster operates best when more than 90% of the nodes are fully functional. Anticipate traffic growth and add nodes before the cluster is saturated.

## 3.1.6. Nodes IDs¶

Each node is identified by an unique 256-bit number: the ID. If a node attempts to join a cluster and a node with a similar ID is found, the new node will exit the cluster.

In quasardb 2.0 nodes ID are either automatic, indexed or manual. The syntax is as such:

• automatic: auto
• indexed: current_node/total_node (e.g. 3/8 for the third node of an 8 nodes clustter)
• manual: a 256-bit hexadecimal number grouped by 64-bit blocks (e.g 2545ef-35465f-87887e-5354)

Users are strongly encouraged to use the indexed ID generation mode. In indexed mode, quasardb will generate the ideal ID for a node given it’s relative position. For example, if you have a 4 nodes clusters, each node should be given the following id:

• node 1 - 1/4
• node 2 - 2/4
• node 3 - 3/4
• node 4 - 4/4

If you want to reserve ID space to allow the cluster to grow to 32 nodes without changing all ids, you should then use the following numbering

• node 1 - 1/32
• node 2 - 9/32
• node 3 - 17/32
• node 4 - 25/32

The ideal IDs are equidistant from each-other, for optimal key-space value and that’s exactly what indexed mode computes.

If you wish to manually supply the nodes ID of your cluster, the following table gives a list of possible good IDs for a given cluster size:

Cluster size Suggested IDs
02
1. 0000000000000000-0-0-1
2. 8000000000000000-0-0-1
03
1. 0000000000000000-0-0-1
2. 5555555500000000-0-0-1
3. aaaaaaaa00000000-0-0-1
04
1. 0000000000000000-0-0-1
2. 4000000000000000-0-0-1
3. 8000000000000000-0-0-1
4. c000000000000000-0-0-1
05
1. 0000000000000000-0-0-1
2. 3333333300000000-0-0-1
3. 6666666600000000-0-0-1
4. 9999999900000000-0-0-1
5. cccccccc00000000-0-0-1
06
1. 0000000000000000-0-0-1
2. 2aaaaaaa00000000-0-0-1
3. 5555555500000000-0-0-1
4. 8000000000000000-0-0-1
5. aaaaaaaa00000000-0-0-1
6. d555555500000000-0-0-1
07
1. 0000000000000000-0-0-1
2. 2492492400000000-0-0-1
3. 4924924900000000-0-0-1
4. 6db6db6d00000000-0-0-1
5. 9249249200000000-0-0-1
6. b6db6db600000000-0-0-1
7. db6db6db00000000-0-0-1
08
1. 0000000000000000-0-0-1
2. 2000000000000000-0-0-1
3. 4000000000000000-0-0-1
4. 6000000000000000-0-0-1
5. 8000000000000000-0-0-1
6. a000000000000000-0-0-1
7. c000000000000000-0-0-1
8. e000000000000000-0-0-1
09
1. 0000000000000000-0-0-1
2. 1c71c71c00000000-0-0-1
3. 38e38e3800000000-0-0-1
4. 5555555500000000-0-0-1
5. 71c71c7100000000-0-0-1
6. 8e38e38e00000000-0-0-1
7. aaaaaaaa00000000-0-0-1
8. c71c71c700000000-0-0-1
9. e38e38e300000000-0-0-1
10
1. 0000000000000000-0-0-1
2. 1999999900000000-0-0-1
3. 3333333300000000-0-0-1
4. 4ccccccc00000000-0-0-1
5. 6666666600000000-0-0-1
6. 8000000000000000-0-0-1
7. 9999999900000000-0-0-1
8. b333333300000000-0-0-1
9. cccccccc00000000-0-0-1
10. e666666600000000-0-0-1
11
1. 0000000000000000-0-0-1
2. 1745d17400000000-0-0-1
3. 2e8ba2e800000000-0-0-1
4. 45d1745d00000000-0-0-1
5. 5d1745d100000000-0-0-1
6. 745d174500000000-0-0-1
7. 8ba2e8ba00000000-0-0-1
8. a2e8ba2e00000000-0-0-1
9. ba2e8ba200000000-0-0-1
10. d1745d1700000000-0-0-1
11. e8ba2e8b00000000-0-0-1
12
1. 0000000000000000-0-0-1
2. 1555555500000000-0-0-1
3. 2aaaaaaa00000000-0-0-1
4. 4000000000000000-0-0-1
5. 5555555500000000-0-0-1
6. 6aaaaaaa00000000-0-0-1
7. 8000000000000000-0-0-1
8. 9555555500000000-0-0-1
9. aaaaaaaa00000000-0-0-1
10. c000000000000000-0-0-1
11. d555555500000000-0-0-1
12. eaaaaaaa00000000-0-0-1
13
1. 0000000000000000-0-0-1
2. 13b13b1300000000-0-0-1
3. 2762762700000000-0-0-1
4. 3b13b13b00000000-0-0-1
5. 4ec4ec4e00000000-0-0-1
6. 6276276200000000-0-0-1
7. 7627627600000000-0-0-1
8. 89d89d8900000000-0-0-1
9. 9d89d89d00000000-0-0-1
10. b13b13b100000000-0-0-1
11. c4ec4ec400000000-0-0-1
12. d89d89d800000000-0-0-1
13. ec4ec4ec00000000-0-0-1
14
1. 0000000000000000-0-0-1
2. 1249249200000000-0-0-1
3. 2492492400000000-0-0-1
4. 36db6db600000000-0-0-1
5. 4924924900000000-0-0-1
6. 5b6db6db00000000-0-0-1
7. 6db6db6d00000000-0-0-1
8. 8000000000000000-0-0-1
9. 9249249200000000-0-0-1
10. a492492400000000-0-0-1
11. b6db6db600000000-0-0-1
12. c924924900000000-0-0-1
13. db6db6db00000000-0-0-1
14. edb6db6d00000000-0-0-1
15
1. 0000000000000000-0-0-1
2. 1111111100000000-0-0-1
3. 2222222200000000-0-0-1
4. 3333333300000000-0-0-1
5. 4444444400000000-0-0-1
6. 5555555500000000-0-0-1
7. 6666666600000000-0-0-1
8. 7777777700000000-0-0-1
9. 8888888800000000-0-0-1
10. 9999999900000000-0-0-1
11. aaaaaaaa00000000-0-0-1
12. bbbbbbbb00000000-0-0-1
13. cccccccc00000000-0-0-1
14. dddddddd00000000-0-0-1
15. eeeeeeee00000000-0-0-1
16
1. 0000000000000000-0-0-1
2. 1000000000000000-0-0-1
3. 2000000000000000-0-0-1
4. 3000000000000000-0-0-1
5. 4000000000000000-0-0-1
6. 5000000000000000-0-0-1
7. 6000000000000000-0-0-1
8. 7000000000000000-0-0-1
9. 8000000000000000-0-0-1
10. 9000000000000000-0-0-1
11. a000000000000000-0-0-1
12. b000000000000000-0-0-1
13. c000000000000000-0-0-1
14. d000000000000000-0-0-1
15. e000000000000000-0-0-1
16. f000000000000000-0-0-1
3. Concepts
3.2. Data Storage