Skip to content

Root hash mismatch when compared to other libraries. #182

@azteca1998

Description

@azteca1998

Hello,

We're developing a patricia merkle tree and would like to compare our implementation with yours (both for testing and benchmarking). However we've found that the computed hash from your library doesn't match the ones computed from other libraries (including ours).

Hashing mismatch example.
//! # Basic example.
//!
//! Target hash extracted from here:
//!   https://github.com/ethereum/tests/blob/develop/TrieTests/trietest.json#L97-L104
//!
//! ## Dependencies
//!
//! ```toml
//! [dependencies]
//! cita_trie = "4.0.0"
//! hasher = "0.1.4"
//! hex-literal = "0.3.4"
//! memory-db = "0.31.0"
//! reference-trie = "0.26.0"
//! sha3 = "0.10.6"
//! trie-db = "0.24.0"
//!
//! [dependencies.patricia-merkle-tree]
//! git = "https://github.com/lambdaclass/merkle_patricia_tree"
//! ```

use cita_trie::{MemoryDB, PatriciaTrie, Trie};
use hasher::HasherKeccak;
use hex_literal::hex;
use patricia_merkle_tree::PatriciaMerkleTree;
use sha3::Keccak256;
use std::sync::Arc;

fn main() {
    const DATA: &[(&[u8], &[u8])] = &[(b"abc", b"123"), (b"abcd", b"abcd"), (b"abc", b"abc")];
    const HASH: [u8; 32] = hex!("7a320748f780ad9ad5b0837302075ce0eeba6c26e3d8562c67ccc0f1b273298a");

    println!("Expected   : {HASH:02x?}");
    println!("Our    hash: {:02x?}", run_ours(DATA.iter().copied()));
    println!("Cita   hash: {:02x?}", run_cita(DATA.iter().copied()));
    println!("Parity hash: {:02x?}", run_parity(DATA.iter().copied()));
}

fn run_ours<'a>(data: impl Iterator<Item = (&'a [u8], &'a [u8])>) -> [u8; 32] {
    let mut trie = PatriciaMerkleTree::<_, _, Keccak256>::new();

    data.for_each(|(p, v)| {
        trie.insert(p, v);
    });

    trie.compute_hash().as_slice().try_into().unwrap()
}

fn run_cita<'a>(data: impl Iterator<Item = (&'a [u8], &'a [u8])>) -> [u8; 32] {
    let mem_db = Arc::new(MemoryDB::new(true));
    let hasher = Arc::new(HasherKeccak::new());

    let mut trie = PatriciaTrie::new(mem_db, hasher);

    data.for_each(|(p, v)| {
        trie.insert(p.to_vec(), v.to_vec()).unwrap();
    });

    trie.root().unwrap().try_into().unwrap()
}

fn run_parity<'a>(data: impl Iterator<Item = (&'a [u8], &'a [u8])>) -> [u8; 32] {
    use memory_db::{HashKey, MemoryDB};
    use reference_trie::ExtensionLayout;
    use trie_db::{NodeCodec, TrieDBMutBuilder, TrieHash, TrieLayout, TrieMut};

    let mut mem_db =
        MemoryDB::<_, HashKey<_>, _>::new(<ExtensionLayout as TrieLayout>::Codec::empty_node());
    let mut root = <TrieHash<ExtensionLayout>>::default();

    let mut trie = TrieDBMutBuilder::<ExtensionLayout>::new(&mut mem_db, &mut root).build();

    data.for_each(|(p, v)| {
        trie.insert(p, v).unwrap();
    });

    trie.commit();
    *trie.root()
}

After some investigation we've found that although the hashing algorithm is the same (given the same inputs will produce the same outputs), the inputs to the hashing function are different. As per the ethereum wiki on patricia merkle tries we've used RLP encoding on our implementation, and also found references to it in yours.

When trying to decode (using RLP) the data passed to your hash function we've found that it can't be RLP (or at least not the RLP on the ethereum wiki) since it didn't make sense. For example, attempting to decode an RLP list with a few exabytes of data.

Do you use a specific (non-ethereum RLP) encoding that fits your needs? Can it be changed to use ethereum's RLP?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions