Byzantine dataset servers

Summary

DAP servers use dataset servers to download and pre-filter requested datasets.

Dataset servers may (i.e. will) fail to properly transmit data (memory or disk corruption, malicious user, etc...).

This document describes a way to protect PyCTOH from byzantine dataset servers.

Rationale

digraph datastream {
   rankdir=LR;
   "Store" -> "DatasetServer" -> "DAP Server";
}

Storage servers provide intrinsic error checking. Dataset servers can trust them without alleviating data integrity : if an undetected error occurs while retrieving data, it will be detected by DAP server at the next stage.

The critical part along the path from data store to DAP Server is dataset server to DAP server transmission: dataserver may be on an untrusted host, and return faked, though valid, data.

A byzantine server detection system should validate data received from untrusted hosts and be able to score them with some “trust-level”.

Design

Dataset requests are deterministic: whatever server we request, response should be the same, bit-per-bit.

When DAP server requests for a dataset, it may [*] ask to another dataset server to perform the same request and to return just a checksum of the response [†].

DAP server checksums dataset received from first server and compares this to the value returned by the second one.

If values don’t match, both servers are tagged with a byzantine warning flag, and the same request is re-issued to some other hosts. When we have a validated response, we can unflag original hosts which gave valid response.

Footnotes

[*]Whether to perform a byzantine check can be determined by some byzantine_check_rate config value.
[†]This saves bandwith. A strong checksum is used so that a malicious server can’t forge a fake response with a correct checksum. md5 is a good choice.

Implementation

Table Of Contents

Previous topic

Archieve original GDR

Next topic

OceanDataView integration

This Page

(::Title::)

(::Pubdate::)
(::Description::)