data-dispersal
Version 1.0.0.0 revision 0 uploaded by PeterRobinson.
Package meta
- Synopsis
- Space-efficient and privacy-preserving data dispersal algorithms.
- Description
This library provides space-efficient (m,n)-information dispersal algorithms (IDAs).
Given a ByteString
bstr
of lengthD
, we encodebstr
as a listfs
ofn
Fragments, each containing a ByteString of lengthO(D/m)
. Then, each fragment infs
could be stored on a separate machine for fault-tolerance. Even if up ton-m
of these machines crash, we can still reconstruct the original ByteString out of the remaining m fragments. The total space required for the n fragments isO((n/m)*D)
. Note thatm
andn
are roughly in the same order, so the actual storage overhead for getting good fault-tolerance increases only by a constant factor.The module
Data.IDA
contains the basic information dispersal algorithm. The moduleCrypto.IDA
augments the dispersal scheme by combining it with secret sharing, i.e., the knowledge of up tom-1
fragments does not leak any information about the original data. See Crypto.IDA for details.GHCi Example:
> :m + Data.IDA > let msg = Data.ByteString.Char8.pack "my really important data" > let fragments = encode 5 15 msg -- Now we could distributed the fragments on different sites to add some -- fault-tolerance. > let frags' = drop 5 $ take 10 fragments -- let's pretend that 10 machines crashed > decode frags' "my really important data"
Fault-Tolerance:
Suppose that we have
N
machines and encode our data as2log(N)
fragments with reconstruction threshold m =log(N)
. Let's assume that we store each fragment on a separate machine and each machine fails (independently) with probability at most 0.5.What is the probability of our data being safe?
Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).
What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size D/m, so the total space is
n * D/ m = 2D.
In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.
This library is based on the following works:
"Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.
"How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.
"Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146
- Author
- Peter Robinson <peter.robinson@monoid.at>
- Bug reports
- n/a
- Category
- Data, Cryptography
- Copyright
- Peter Robinson 2014
- Homepage
- http://monoid.at/code
- Maintainer
- peter.robinson@monoid.at
- Package URL
- n/a
- Stability
- n/a