In Postgres: The Graph Database You Didn’t Know You Had we covered how we can store a DAG (Directed Acyclic Graph) in a relational database. We looked at one representation of a graph data structure called an adjacency list—a table with two columns representing how nodes connect. The columns store connections from the source or “from” node to a destination or “to” node.
Edges | |
---|---|
Source | Destination |
A | B |
A | C |
Using Python, a code representation would look like:
class Node:
pass
A = Node()
B = Node()
C = Node()
adjacency_list = [
{ "source": A, "destination": B },
{ "source": A, "destination": C },
]
An adjacency list is a simple and efficient way to store graph data, but there are many more ways to represent graphs. In this article, we’ll explore one of these alternatives called the adjacency matrix.
Adjacency Matrix
An adjacency matrix is a table with a row and column for each node in the graph (or NxN matrix). To represent the edges we store a 1 in a column for a connection and a 0 for no connection. We will start out exploring how to represent DAG’s as an adjacency matrix since we covered DAG’s in the Postgres article. To store direction in an adjacency matrix the x-axis or rows represent the “from” or “source” node. The y-axis represents the connection. Using the same graph as above, the adjacency matrix would look like this:
A | B | C | |
---|---|---|---|
A | 0 | 1 | 1 |
B | 0 | 0 | 0 |
C | 0 | 0 | 0 |
adjacency_matrix = [
[0, 1, 1],
[0, 0, 0],
[0, 0, 0]
]
What about non-DAG graphs?
For undirected graphs, graphs where all connections are bidirectional, the setup is the same to create the adjacency matrix.
The only difference is our graph has no explicit direction. If A
connects to B
, then B
also connects to A
.
Let’s take our previous DAG graph, and remove the direction.
A | B | C | |
---|---|---|---|
A | 0 | 1 | 1 |
B | 1 | 0 | 0 |
C | 1 | 0 | 0 |
A cool property emerges from undirected graphs. Can you spot a pattern in the table above? Draw a diagonal line from the top left of our matrix to the bottom right (sandwich square style) and fold the matrix along this line.
The connections mirror each other across this line. Our matrix is considered symmetric when this occurs. Symmetric matrices appear in all undirected graphs, but not DAGs!
This is great, but why would we use an Adjacency Matrix?
Matrices are incredibly fast at checking if two nodes are connected
Instead of having to loop through all the edges in an adjacency list to find if two nodes are connected, an adjacency matrix acts
as a hash table. In our example graph above, if we wanted to check if A has an edge to B it would be as simple as checking the [B][A]
position
in the matrix.
Note: Since we’re working with DAG’s and 2D arrays, the order we look up a connection is flipped: [Destination][Source]
.
A = 0
B = 1
has_connection = adjacency_matrix[B][A]
Removing and adding a connection between nodes is similar and run in constant time.
Need to connect two nodes? Modify the position in the matrix to be 1
(adjacency_matrix[destiation][source] = 1
).
What about deleting an edge? Update the value to be 0
(adjacency_matrix[destiation][source] = 0
)!
Why would I not use an Adjacency Matrix?
They’re huge!
If we look at an adjacency matrix we need to store every bit of information about the graph, the connected nodes (1’s), and not connected nodes(0’s). Compared to an adjacency list which stores only the connections; adjacency matrices take up a lot of space!
It’s expensive to add/remove nodes
We talked about how adjacency matrices are fast at looking up, adding, and removing connections between nodes. On the flip side, adjacency matrices are slow when adding or removing nodes from the graph. To add a node we need to copy our matrix to a new matrix with an additional row and column(of size N+1 x N+1). Then, populate the matrix with the connections to or from the new node. To compare, adding a new node to an adjacency list is a matter of pushing a new node to the list of nodes.
End
There are many different ways to represent and visualize a graph. A different representation may be better based on your personal use cases and requirements. Need to quickly look up which nodes are connected or need to quickly add or remove edges? Then the adjacency matrix may be the best choice. Are you constantly adding nodes to the graph and are limited in your memory footprint? The adjacency list is the representation for you!