-
Notifications
You must be signed in to change notification settings - Fork 261
Home
- Where can I find BlockSci's documentation?
- Does BlockSci support cryptocurrency XYZ?
- Does BlockSci support Monero?
- Does BlockSci support Ethereum?
- Does BlockSci support Omni Layer / Colored Coins / etc.?
- What software do you use to develop BlockSci?
- Does BlockSci run on CentOS / Windows / etc.?
- Does BlockSci provide state-of-the-art clustering?
- How do I use BlockSci's clustering module?
- Which heuristic is the clusterer using by default?
- How do I disable change address clustering?
- Why do some clusters appear to be empty?
- Why is
cluster.size()
slow? - How do I use BlockSci's tagging feature?
- How can I map addresses to exchanges or pools?
- How do I extract the full scriptPubKey and scriptSig of an output/input?
- How can I extract balances of all addresses?
- How can I plot the UTXO Age Distribution over time?
Documentation for the Python interface is available here. Most users will want to use this interface.
BlockSci can support many cryptocurrencies that are similar to Bitcoin (e.g., they forked Bitcoin's codebase and made no modifications to the data model). BlockSci comes with a disk parser that is highly optimized for Bitcoin, and a RPC parser that should work with forks of Bitcoin (but is much slower than the disk parser).
Please be aware that the parser can easily break when a cryptocurrency changes the data format, adds new consensus rules or otherwise changes the rules of how blocks and transactions are created.
No. Monero's data model is different from Bitcoin's and thus doesn't currently work with BlockSci. It would be possible to extend BlockSci to support Monero, but this is currently not on our roadmap.
No. Ethereum's design is fundamentally different from Bitcoin's and thus incompatible with BlockSci.
BlockSci only handles parsing of the core blockchain layer (layer 1), but exposes any special data stored in the blockchain. Thus, for most protocols that build upon layer 1, you can write your own analysis code.
Related issues:
We're developing BlockSci on OSX using XCode. You can easily generate an XCode project using cmake
:
mkdir xcode && cd xcode
cmake -G Xcode -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl ..
We don't have any recommendations for IDEs on other platforms, though we are using gdb
to debug BlockSci on Linux.
We only provide support for Ubuntu and OSX (MacOS). It may be possible to run BlockSci on other platforms by manually compiling the various dependencies.
If you encounter an issue with your BlockSci setup, you can try running blocksci_parser YOURCONFIG.json doctor
to diagnose issues.
Note that this only checks a handful of potential issues and may not be able to identify your specific problem.
The default open files limit of many Linux distributions (e.g., Ubuntu) is too small for BlockSci.
This can lead to, among other things, transactions apparently missing from addresses (i.e. when using addr.txes()
).
After you have increased the open files limit, reparse the chain.
As of August 2019 the default disk size of 500GB of the v0.5 AMI may not suffice anymore, we therefore strongly recommend choosing a larger disk size (e.g., 600 GB) when you first create the instance.
Follow this guide to increase the disk space of your existing AMI.
BlockSci provides the fundamental building blocks of address clustering: multi-input clustering with CoinJoin detection and change address clustering with support for various different change address heuristics (disabled by default).
There are, however, many corner cases (e.g., MtGox allowing users to import their private keys, breaking the multi-input heuristic) that require special treatment to prevent the occurrence of "superclusters". Superclusters are extremely large clusters that occur when different clusters collapse into each other due to over-eager address linking. To some degree, address clustering today is more art than science, and building a highly accurate clustering module, while possible, is not in the current roadmap for BlockSci. Anything that goes beyond the basic address clustering described above, you'll need to implement yourself.
Here's some helpful literature on address clustering:
- A Fistful of Bitcoins: Characterizing Payments Among Men with No Names
- The Unreasonable Effectiveness of Address Clustering
- Data-Driven De-Anonymization in Bitcoin
We recommend using the clustering module available through the Python interface.
If you haven't used the clusterer before, you'll need to first create a clustering:
import blocksci
chain = blocksci.chain("/path/to/blocksci/config.json")
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain)
If you already created such a clustering, you can simply load it:
cm = blocksci.cluster.ClusterManager("/directory/where/cluster/files/can/be/stored", chain)
By default, the clusterer is using the multi-input heuristic: inputs that are co-spent in the same transaction are clustered together, unless the transaction looks like a CoinJoin transaction.
BlockSci provides a number of different change address heuristics.
You can use a different change address heuristic by passing it to the create_clustering
function. For example:
reuse_change_heuristic = blocksci.heuristics.change.address_reuse()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, reuse_change_heuristic)
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, blocksci.heuristics.change.none)
Some clusters may appear to be empty (with cluster.size() == 0
and cluster.transactions() == []
) while cluster.type_equiv_size
is 1. This is not a bug, but an artifact of BlockSci's internal deduplication.
For example, assume there is a multisig address with three pubkeys. BlockSci keeps track of the three pubkeys independently of their combined use in a multisig address. During clustering, each of these four addresses (the multisig as well as the three pubkeys) starts in their own cluster. If the individual pubkeys are never used on their own, they'll remain in their single-address cluster. If a method such as .size()
or .transactions()
is called for such a cluster, BlockSci will check whether the addresses in the cluster have actually been used. If an address has never been used individually (as in the example above), BlockSci will tell you that the cluster is empty.
Clustering works based on equiv addresses (see above). When calling cluster.size()
, BlockSci looks up in a database with which address types the equiv addresses are actually used on chain.
cluster.type_equiv_size
does not perform this check but simply returns the number of equiv addresses in the cluster.
You can pass an {address: tag}
dictionary to blocksci.cluster.ClusterManager.tagged_clusters(<tags>)
function to retrieve an iterator over all clusters that contain tagged addresses. See below for an example that uses a graphsense-tagpack for tags from walletexplorer.com.
import yaml
def import_from_tagpack(chain, filename):
tag_file = open(filename, "r")
data = yaml.safe_load(tag_file)
tags = {chain.address_from_string(x['address']): str(x['label']) for x in data['tags']}
print(data['description'])
print("Curated by {}\n".format(data['creator']))
print("Successfully loaded {} tags.".format(len(tags)))
return tags
tags = import_from_tagpack(chain, "data/walletexplorer.yaml")
tagged_clusters = cm.tagged_clusters(tags).to_list()
Refer to the documentation for more information about the TaggedCluster
and TaggedAddress
classes.
BlockSci allows to tag address clusters with names, but we don't provide any such tags ourself. There are a few public sources such as WalletExplorer or Blockchain.info, but they may not be reliable or complete.
BlockSci can map blocks to pools by looking at the information contained in the coinbase transaction, but the data we use to identify pools does not cover all pools/coinbase transactions. Furthermore, there's no guarantee that miners report their identity correctly in the coinbase transaction.
blocksci.get_miner(chain[300005])
>>> 'SlushPool'
For most standard scripts, BlockSci does not store the full scriptSig and scriptPubKey but instead extracts the important information and stores it as an Address
. Docs » Reference » Address Classes » Addresses provides more information about what information is stored.
The actual scriptSig and scriptPubKey are stored only for non-standard scripts. For example:
myout = chain.tx_with_hash("15c2b9bc3b93e0c0a037c5fa8402d0e34e13d3bb0ce7fca65888e5d24e597dcc").outputs[0]
myout.address_type == blocksci.address_type.nonstandard
>> True
myoutput.address.out_script
>> 'OP_DEPTH OP_1SUB OP_IF OP_RETURN 737069746861736820616e6420796d6f64652c2062726f6772616d6d657273346c796665 OP_ENDIF 0 OP_TOALTSTACK OP_DUP OP_HASH256 efb81cd930d56703304f63d7f94575c4cd17f0985ed2fd126aabf1d866471d2f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 9ddd5c986827e8bc5848b4fdc1f8152f597b852ed2429ae7ee2baf7a14096a8f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 fda5bd74925349ba07de25db126b9148a7a508e48475c33d2abe7c81a341a3ab OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_FROMALTSTACK'
See Faster way to get all address balances #264