This project provides the Python implementation of the Pathfinder tool for the Circles UBI project. It includes network flow algorithms and tools to analyze and visualize the flow of value through a network of trust-based connections, facilitating the understanding and optimization of value transfers in the Circles UBI ecosystem.
- Methodology
- Installation
- Usage
- Project Structure
- Class Descriptions
- Running the Script
- Examples
- Dashboard
The graph is constructed using two primary data sources:
- Trusts: This dataset defines which tokens each account trusts and is willing to accept.
- Balances: This dataset shows the token balances held by each account.
For each account (referred to as the "truster"), we follow these steps:
- Identify all tokens trusted by the account.
- Find all other accounts holding any of these trusted tokens.
- Create edges from these token-holding accounts to the truster.
- Each edge represents a potential flow of a specific token from the holder to the truster.
This process is repeated for all accounts, resulting in a complex, multi-directed graph.
In a naive implementation, each edge's capacity would be set to the token balance held by the sender. However, this approach presents a problem: if a sender has multiple edges for the same token (connecting to different trusters), it could lead to "balance non-conservation." This occurs because standard flow algorithms are not aware of the need to conserve total balance across multiple edges.
For example, if account A holds 100 units of token B and can send to both accounts C and D, a naive implementation might allow a total flow of 200 units (100 to C and 100 to D), which violates the actual balance constraint.
To enforce balance conservation, we introduce intermediate nodes:
- For each unique combination of sender and token, we create an intermediate node.
- Instead of direct edges from sender to trusters, we now have:
- An edge from the sender to the intermediate node
- Edges from the intermediate node to each truster accepting that token
For example:
- If A can send token B to both C and D:
- We create an intermediate node A_B
- We add an edge from A to A_B with capacity equal to A's balance of token B
- We add edges from A_B to C and from A_B to D
This structure ensures:
- The total outflow of token B from A is limited to its actual balance
- A_B acts as a "gate," enforcing the balance constraint
- The flow to C and D can be any combination, as long as their sum doesn't exceed A's balance of token B
By using this intermediate node structure, we automatically enforce balance conservation without needing to modify standard flow algorithms.
For an actual implementation take the case take the requested Flow of 4000000000000000000 from node 9 to node 318. The full graph implementation gives
Which can the be simplified to
-
Clone the repository:
git clone https://github.com/hdser/pyfinder.git cd pyfinder
-
Install dependencies:
pip install -r requirements.txt
The main script main.py
provides two modes of operation:
- Run Mode: Analyze flow between specific source and sink nodes.
- Benchmark Mode: Compare performance of different flow algorithms.
To run the script:
python -m src.main
src/
: Contains the main project modulesdata_ingestion.py
: Handles data loading and preprocessinggraph_creation.py
: Creates and manages the network graphflow_analysis.py
: Implements flow analysis algorithmsvisualization.py
: Provides graph visualization functionsgraph_manager.py
: Coordinates the overall flow analysis process
main.py
: The main script to run the analysisdata/
: Directory for input data filesoutput/
: Directory for output files and visualizations
-
GraphManager
- Main class that orchestrates the flow analysis process.
- Methods:
__init__(self, trusts_file, balances_file, graph_type='networkx')
: Initializes the graph manager.analyze_flow(self, source, sink, flow_func, cutoff)
: Performs flow analysis between source and sink.visualize_flow(self, simplified_paths, simplified_edge_flows, original_edge_flows, output_dir)
: Generates visualizations of the flow.
-
DataIngestion
- Handles data loading and preprocessing.
- Methods:
__init__(self, df_trusts, df_balances)
: Initializes with trust and balance data._create_id_mappings(self)
: Creates mappings between addresses and internal IDs._create_edge_data(self)
: Generates edge data for the graph.get_id_for_address(self, address)
: Retrieves internal ID for a given address.get_address_for_id(self, id)
: Retrieves address for a given internal ID.
-
NetworkXGraph / GraphToolGraph
- Wrapper classes for graph libraries (NetworkX and graph-tool).
- Methods:
__init__(self, edges, capacities, tokens)
: Initializes the graph.compute_flow(self, source, sink, flow_func, requested_flow)
: Computes maximum flow.flow_decomposition(self, flow_dict, source, sink, requested_flow)
: Decomposes flow into paths.simplified_flow_decomposition(self, original_paths)
: Simplifies flow paths.
-
NetworkFlowAnalysis
- Performs the core flow analysis.
- Methods:
analyze_flow(self, source, sink, flow_func, requested_flow)
: Main method for flow analysis.simplify_edge_flows(self, edge_flows)
: Simplifies edge flows for visualization.
-
Visualization
- Handles the creation of visualizations.
- Methods:
plot_flow_paths(g, paths, edge_flows, id_to_address, filename)
: Plots simplified flow paths.plot_full_flow_paths(g, edge_flows, id_to_address, filename)
: Plots full flow paths.custom_flow_layout(G, source, sink)
: Creates a custom layout for flow graphs.
-
Ensure your input data files are in the
data/
directory:circles_public_V_CrcV2_TrustRelations.csv
: Trust relationships datacircles_public_V_CrcncesByAccountAndToken.csv
: Account balances data
-
Run the main script:
python -m src.main
-
Choose a mode:
- Enter
1
for Run Mode - Enter
2
for Benchmark Mode
- Enter
-
Follow the prompts to input source and sink nodes, choose algorithms, etc.
-
View the results in the console and check the
output/
directory for visualizations and benchmark results.
- Graph Library Selection: Choose between NetworkX and graph-tool for graph operations.
- Algorithm Selection: Select from various maximum flow algorithms:
- Default (Preflow Push)
- Edmonds-Karp
- Shortest Augmenting Path
- Boykov-Kolmogorov
- Dinitz
- Source and Sink Input: Enter Ethereum addresses for source and sink nodes.
- Requested Flow: Optionally specify a requested flow value.
- Results Display: View detailed results of the flow analysis.
- Visualization: Interactive graphs showing flow paths.
- Run the script and choose Run Mode (option 1)
- When prompted, enter:
- Source node:
9
- Sink node:
318
- Requested flow: (press Enter for maximum flow)
- Source node:
- Choose an algorithm (e.g., 1 for Default Preflow Push)
- The script will output:
- Flow value
- Execution time
- Paths and edge flows
- Check the
output/
directory for visualizations:simplified_flow_paths.png
: Simplified flow pathsfull_flow_paths.png
: Full flow paths including intermediate nodes
- Run the script and choose Benchmark Mode (option 2)
- The script will run all algorithms for predefined source-sink pairs
- View the benchmark results in the console
- Check
output/benchmark_results.csv
for detailed results
- Controls: Input fields for source, sink, requested flow, and algorithm selection.
- Results: Displays the computed flow value, execution time, and other key metrics.
- Transactions: Shows a detailed list of transfers in the computed flow.
- Flow Path Graphs: Two interactive visualizations side by side:
- Simplified Graph: Displays a simplified version of the flow, hiding intermediate nodes.
- Aggregated Graph: Presents an aggregated view of transfers between main nodes.
These visualizations provide different levels of detail, allowing users to analyze the flow from various perspectives.
To launch the dashboard, run:
python dashboard.py
This will start a local server, and you can access the dashboard through your web browser.
This project is licensed under the MIT License - see the LICENSE file for details.