Skip to content

Commit

Permalink
Separate directory for textbooks
Browse files Browse the repository at this point in the history
  • Loading branch information
AustinTSchaffer committed Jun 3, 2024
1 parent a926c15 commit 4035b66
Show file tree
Hide file tree
Showing 87 changed files with 288 additions and 101 deletions.
22 changes: 0 additions & 22 deletions OMSCS/Courses/NS/L01 - Intro Notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,28 +9,6 @@ Course textbook is an online resource
- Errata: http://networksciencebook.com/translations/en/resources/NetworkScienceErrata.pdf
- Book: http://networksciencebook.com/

## Notes from Readings
http://networksciencebook.com/chapter/1

- cascading events, failures in highly connected systems can cascade, causing system-wide problems
- a failure in a high capacity link/node can cause traffic/energy to be routed to other links/nodes that can't handle the new throughput
- "uncovering the hidden structure of an organization"
- "Accurate maps of such _organizational networks_ can expose the potential lack of interactions between key units, help identify individuals who play an important role in bringing different departments and products together, and help higher management diagnose diverse organizational issues."
- Companies in this space
- Maven 7
- Activate Networks
- Orgnet
- Data sources
- Self-reported
- "Who's your mentor?"
- "Who do you turn to for advice?"
- "Who do you talk to regularly?"
- Books
- Linked by Albert Laszlo Barabasi
- Six Degrees by Duncan J Watts
- Nexus by Mark Buchanan
- Connected by Nicholas A Christakis and James H Fowler

## Knowledge Quiz
- Network science is the study of complex systems through their network representation.
- The network architecture of a complex system is not sufficient to understand the system's functions and dynamics.
Expand Down
80 changes: 1 addition & 79 deletions OMSCS/Courses/NS/L02 - Relevant Concepts From Graph Theory.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
tags:
- OMSCS
- AI
- NS
---
# L02 - Relevant Concepts From Graph Theory

Expand All @@ -10,84 +10,6 @@ tags:
- review basic graph algorithms
- relate concepts to real-world networks

## Notes from Chapter 2
http://networksciencebook.com/chapter/2

- Famous problem, the Bridges of Königsberg (1735)
- Euler solved it by making an abstract representation of the map, and showing that no routes traversed all bridges uniquely
- Later a new bridge was added which made the problem solvable

> A walking path that goes through all bridges can have only one starting and one end point. Thus such a path cannot exist on a graph that has more than two nodes with an odd number of links. The Königsberg graph had four nodes with an odd number of links, A, B, C, and D, so no path could satisfy the problem.
- "digraph" = "directed graph"

### Degree

- Degree
- A key property of each node is its "degree", representing the number of links it has to other nodes.
- $k_i$ is the degree of node $i$
- In an undirected graph, the total number of links $L$ can be expressed as the sum of node degrees, dividing by 2 to remove duplicates: $L=\frac{1}{2}\sum_{i=1}^{N}k_i$
- In directed graph, we distinguish between "incoming degree" $k_i^{in}$ and "outgoing degree" $k_i^{out}$. The node's total degree is $k_i = k_i^{in} + k_i^{out}$
- Average Degree
- The average degree of the network's node's is the network's average degree
- Degree Distribution
- the degree distribution $p_k$ provides the probability that a randomly selected node in the network has degree $k$
- bell curve science
- For a network with $N$ nodes, the degree distribution is the normalized histogram: $p_k=\frac{N_k}{N}$

### Adjacency matrix
- Model directed/undirected networks using a matrix of size $N \times N$.
- The degree of a node in an undirected network can be obtained by summing a row or column
- The degree of a node in a directed network can be obtained by summing the row and column corresponding to the node.

![[Pasted image 20240520160420.png]]

- real networks are sparse
- The max number of links in a network is given by $L_{max}=\frac{N(N-1)}{2}$
- In real networks, $L<<L_{max}$
- Adjacency matrices become less practical as $N$ increases

### Weighted networks
- in many applications, links have independent weights ($w_{ij}$)
- can't always measure the appropriate weights
- often approximate weighted networks as unweighted networks

> Metcalfe's Law
>
> According to Metcalfe’s law the _cost_ of network based services increases linearly with the number of nodes (users or devices). In contrast the _benefits_ or _income_ are driven by the number of links $L_{max}$ the technology makes possible, which grows like $N^2$ according to (2.12). Hence once the number of users or devices exceeds some _critical mass_, the technology becomes profitable.
### Bipartite Networks
![[figure-2-9.jpg]]

- a bipartite graph (bigraph) is a network whos nodes can be divided into 2 disjoint sets U and V such that each link connects a U-node to a V-node
- "We can generate two projections for each bipartite network. The first projection connects two U-nodes by a link if they are linked to the same V-node in the bipartite representation. The second projection connects the V-nodes by a link if they connect to the same U-node"
- There's no U-U links and no V-V links in the actual network
- A well known bipartite network is the Hollywood actor network, where U nodes are actors and V nodes are movies.
- There's also tripartite networks, e.g. Recipes-Ingredients-Compounds

![[figure-2-11.jpg]]

### Paths and Distances
- shortest path
- network diameter - $d_{max}$ = max shortest path in the network
- $\langle d \rangle$ - average path length
- BFS is commonly used here
- UCS could/should be used in place of BFS if edges have weights/costs

## Connectedness
> If the network has disconnected components, the adjacency matrix can be rearranged into a block diagonal form, such that all nonzero elements of the matrix are contained in square blocks along the diagonal of the matrix and all other elements are zero.
![[Pasted image 20240520162142.png]]

We can find if a network is fully connected using BFS

### Clustering Coefficient
- For a node $i$ with degree $k_i$, the local clustering coefficient is defined as $C_i=\frac{2L_i}{k_i(k_i-1)}$
- $L_i$ is the number of links between the $k_i$ neighbors of node $i$
- $\langle C \rangle$ would be the average C over the whole network

![[Pasted image 20240520162420.png]]

## Module Notes
- Use the notation G=(V,E) to refer to a graph G with a set of vertices V and a set of edges E
- undirected
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
tags:
- OMSCS
- NS
---
# L03 - Degree Distribution and The Friendship Paradox

## Overview
- Measure and interpret the degree distribution of a network
- Understand the _“friendship paradox”_ to illustrate the importance of the degree distribution 
- An application of the friendship paradox: vaccination targets when the network topology is unknown
- Learn the $G(n,p)$ model as the most basic type of random graph
- Degree correlations and assortative networks

## Reading
- [[Chapter 03 - Random Networks]]
- [[Chapter 07 - Degree Correlations]]
- Recommended: [Simulated Epidemics in an Empirical Spatiotemporal Network of 50,185 Sexual Contacts.](http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001109) Luis E. C. Rocha, Fredrik Liljeros, Petter, Holme (2011)

## Notes from Module

- degree distribution is a normalized histogram showing probabilities

## Degree Distribution Moments

- average degree
- second moment
- variance

![[Pasted image 20240525095430.png]]

- Complementary Cumulative Distribution Function (CCDF)
- Shows the probability that a variable has a probability $\ge$ a specific value.
- Useful for large networks, where a histogram will be hard to read.

![[Pasted image 20240525095457.png]]

## Two Special Degree Distributions

![[Pasted image 20240525095707.png]]

- CCDF plots are typically shown with at least one log scale
- log-linear: Exponential decay
- log-log: power law distribution

- degree distribution of a sex-contact network (video)

## Friendship Paradox

- $q_k$ equals the number of nodes of degree K, times the probability that an edge connects to a specific node of degree K
- $q_k=(np_k)(\frac{k}{2m})$
- $q_k=\frac{kp_k}{\frac{2m}{n}}$
- $q_k=\frac{kp_k}{\bar{k}}$

- $m$ is the number of edges
- $n$ is the number of nodes
- $p_k$ is the probability that a node has degree $k$
- $k$ is the degree that we're currently considering

> the probability that the randomly chosen stub connects to a node of degree k is proportional to both k and the probability that a node has degree k.
Expected value of the degree of a node's neighbor.

$$\bar{k_{nn}}=\sum_{k=0}^{k_{max}}kq_k$$

The friendship paradox is that $\bar{k_{nn}}$ tends to be higher than $\bar{k}$
- Networks where every node has the same $k$ have no disparity between $\bar{k_{nn}}$ and $\bar{k}$
- Star networks tend to have the highest disparity between $\bar{k_{nn}}$ and $\bar{k}$

## The G(n,p) Model (aka ER Graphs, aka Gilbert Model)
- random graphs
- network has n nodes
- the probability that any two distinct nodes are connected with an undirected edge is p
- All formulas assume that there are no self-edges
- the number of edges $m$ in a $G(n,p)$ model is a random variable
- The expected number of edges is: $p\frac{n(n-1)}{2}$
- The average node degree is: $p(n-1)$
- The density of the network is: $p$
- the degree variance is: $p(1-p)(n-1)$
- The degree distribution of the $G(n,p)$ model follows the binomial distribution: $Binomial(n-1,p)$
- There is no correlations between the degrees of neighboring nodes.
- If we reach a node v by following an edge from another node, the expected value of v’s degree is one more than the average node degree.
- $\bar{S}=1-S$ is the probability that a node does not belong to the largest connected component (LCC) of the network
- $\bar{S}=\left((1-p)+(p\bar{S})\right)^{n-1}$
- $p=\frac{\bar{k}}{n-1}$
- $\bar{S}=\left(1-\frac{\bar{k}}{n-1}(1-\bar{S})\right)^{n-1}$
- $S=1-e^{-\bar{k}S}$
- If the average degree is larger than one ($\bar{k}\gt1$), the size of the LCC is $S>0$
- The LCC suddenly explodes when the average node degree is larger than 1. This is referred to as a "phase transition"
- Once the average node degree reaches/exceed $\bar{k}=1$, the network suddenly acquires a giant connected component that includes a large fraction of all network nodes
- The critical point corresponds to a connection probability of $p=\frac{1}{n-1}\approx\frac{1}{n}$ because $\bar{k}=(n-1)\times{p}$
- The probability that a node does not connected to any node in the LCC: $(1-p)^{Sn}\approx(1-p)^n$ (if $S\approx1$)
- When the average degree $(\bar{k}=np)$ is higher than $ln(n)$, we expect to have a single connected component.

## Assortative, Neutral and Disassortative Networks
Some of the above work assumes that there is no statistical correlation between $\bar{k}$ and $\bar{k_{nn}}$. In practice, some networks do have associations between the two metrics.

![[Pasted image 20240526205925.png]]

- In an Assortative Net, highly connected nodes will tend to have neighbors that are also highly connected
- In a Disassortative Net, highly connected nodes will tend to have neighbors which have fewer connections.
- In a Neutral Net, there is little-to-no correlation between the connectiveness of a node and the connectiveness of its neighbors.

Looking into the correlation between $\bar{k}$ and $\bar{k_{nn}}$ gives some insight into the network, and what effect highly connected nodes have in its neighbors.

![[Pasted image 20240526205619.png]]

Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
tags:
- OMSCS
- NS
---
# L04 - Random vs Real Graphs and Power-Law Networks

## Overview
- See examples of real networks with highly skewed degree distributions 
- Understand the math of power-law distributions and the concept of “scale-free” networks 
- Learn about models that can generate networks with power-law degree distribution
- Explain the practical significance of power-law degree distributions through case studies

## Required Reading
- [[Chapter 04 - The Scale-Free Property]]
- [[Chapter 05 - The Barabasi-Albert Model]]

## Module Notes
- Real networks cannot be modeled as random ER graphs
- such networks follow the binomial distribution
- real networks actually have highly skewed degree distributions
- For many networks, the power law degree distribution $p_k \propto k^{-\alpha}$ is a more appropriate model

![[Pasted image 20240526212703.png]]

## Power-Law Degree Distribution
- $p_k=ck^{-\alpha}$
- The probability that the degree of a node is equal to a positive integer $k$ is proportional to $k^{-\alpha}$

![[Pasted image 20240526213600.png]]

## How to plot a power-law degree distribution
- linear scale: bad
- log-log scale with linear binning: good, but higher end will be noisy
- log-log scale with logarithmic binning.
- bin width of the histogram increases exponentially with $k$
- A potential issue with this approach is that you need to do some analysis for picking how fast to increase the bin with with $k$
- C-CDF

![[Pasted image 20240526214235.png]]

## Scale-free nature of power-law networks
scale-free == power-law

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 26 additions & 0 deletions Textbooks/Network Science Book/Chapter 01 - Introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
tags:
- OMSCS
- NS
---
# Chapter 1 - Introduction
http://networksciencebook.com/chapter/1

- cascading events, failures in highly connected systems can cascade, causing system-wide problems
- a failure in a high capacity link/node can cause traffic/energy to be routed to other links/nodes that can't handle the new throughput
- "uncovering the hidden structure of an organization"
- "Accurate maps of such _organizational networks_ can expose the potential lack of interactions between key units, help identify individuals who play an important role in bringing different departments and products together, and help higher management diagnose diverse organizational issues."
- Companies in this space
- Maven 7
- Activate Networks
- Orgnet
- Data sources
- Self-reported
- "Who's your mentor?"
- "Who do you turn to for advice?"
- "Who do you talk to regularly?"
- Books
- Linked by Albert Laszlo Barabasi
- Six Degrees by Duncan J Watts
- Nexus by Mark Buchanan
- Connected by Nicholas A Christakis and James H Fowler
82 changes: 82 additions & 0 deletions Textbooks/Network Science Book/Chapter 02 - Graph Theory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
tags:
- OMSCS
- NS
---
# Chapter 02 - Graph Theory
http://networksciencebook.com/chapter/2

- Famous problem, the Bridges of Königsberg (1735)
- Euler solved it by making an abstract representation of the map, and showing that no routes traversed all bridges uniquely
- Later a new bridge was added which made the problem solvable

> A walking path that goes through all bridges can have only one starting and one end point. Thus such a path cannot exist on a graph that has more than two nodes with an odd number of links. The Königsberg graph had four nodes with an odd number of links, A, B, C, and D, so no path could satisfy the problem.
- "digraph" = "directed graph"

### Degree

- Degree
- A key property of each node is its "degree", representing the number of links it has to other nodes.
- $k_i$ is the degree of node $i$
- In an undirected graph, the total number of links $L$ can be expressed as the sum of node degrees, dividing by 2 to remove duplicates: $L=\frac{1}{2}\sum_{i=1}^{N}k_i$
- In directed graph, we distinguish between "incoming degree" $k_i^{in}$ and "outgoing degree" $k_i^{out}$. The node's total degree is $k_i = k_i^{in} + k_i^{out}$
- Average Degree
- The average degree of the network's node's is the network's average degree
- Degree Distribution
- the degree distribution $p_k$ provides the probability that a randomly selected node in the network has degree $k$
- bell curve science
- For a network with $N$ nodes, the degree distribution is the normalized histogram: $p_k=\frac{N_k}{N}$

### Adjacency matrix
- Model directed/undirected networks using a matrix of size $N \times N$.
- The degree of a node in an undirected network can be obtained by summing a row or column
- The degree of a node in a directed network can be obtained by summing the row and column corresponding to the node.

![[Pasted image 20240520160420.png]]

- real networks are sparse
- The max number of links in a network is given by $L_{max}=\frac{N(N-1)}{2}$
- In real networks, $L<<L_{max}$
- Adjacency matrices become less practical as $N$ increases

### Weighted networks
- in many applications, links have independent weights ($w_{ij}$)
- can't always measure the appropriate weights
- often approximate weighted networks as unweighted networks

> Metcalfe's Law
>
> According to Metcalfe’s law the _cost_ of network based services increases linearly with the number of nodes (users or devices). In contrast the _benefits_ or _income_ are driven by the number of links $L_{max}$ the technology makes possible, which grows like $N^2$ according to (2.12). Hence once the number of users or devices exceeds some _critical mass_, the technology becomes profitable.
### Bipartite Networks
![[figure-2-9.jpg]]

- a bipartite graph (bigraph) is a network whos nodes can be divided into 2 disjoint sets U and V such that each link connects a U-node to a V-node
- "We can generate two projections for each bipartite network. The first projection connects two U-nodes by a link if they are linked to the same V-node in the bipartite representation. The second projection connects the V-nodes by a link if they connect to the same U-node"
- There's no U-U links and no V-V links in the actual network
- A well known bipartite network is the Hollywood actor network, where U nodes are actors and V nodes are movies.
- There's also tripartite networks, e.g. Recipes-Ingredients-Compounds

![[figure-2-11.jpg]]

### Paths and Distances
- shortest path
- network diameter - $d_{max}$ = max shortest path in the network
- $\langle d \rangle$ - average path length
- BFS is commonly used here
- UCS could/should be used in place of BFS if edges have weights/costs

### Connectedness
> If the network has disconnected components, the adjacency matrix can be rearranged into a block diagonal form, such that all nonzero elements of the matrix are contained in square blocks along the diagonal of the matrix and all other elements are zero.
![[Pasted image 20240520162142.png]]

We can find if a network is fully connected using BFS

### Clustering Coefficient
- For a node $i$ with degree $k_i$, the local clustering coefficient is defined as $C_i=\frac{2L_i}{k_i(k_i-1)}$
- $L_i$ is the number of links between the $k_i$ neighbors of node $i$
- $\langle C \rangle$ would be the average C over the whole network

![[Pasted image 20240520162420.png]]
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
tags:
- OMSCS
- NS
---
# Chapter 03 - Random Networks
http://networksciencebook.com/chapter/3

Loading

0 comments on commit 4035b66

Please sign in to comment.