# Programing Problem: Construct a Tree Given Its Edges

By Xah Lee. Date: . Last updated: .

Problem: given a list of edges of a tree, each edge looks like this `[child, parent]` , construct the tree.

Here's a sample input in Python nested list syntax: `[[0, 2], [3, 0], [1, 4], [2, 4]]`

The nodes of the tree has integer id. You can assume them to be string too.

Here's a sample print of a tree data structure:

```4
1
2
0
3```

this means, the root node's name is 4. It has 2 branches, named 1 and 2. The branche named 2 has 1 children, named 0, and it has one children named 3.

You can choose your own data structure for the output. For example, nested list with 1st element being the node name, or use nested hash of hash, where key is the node name, and value is its children.

Here's sample data structure in python, using hash of hash.

```{
"4": {
"1": {},
"2": {
"0": {
"3": {}
}
}
}
}```

Other data structure are accepted.

Code it using your favorite language. I'll be able to look at and verify in Mathematica, Python, Ruby, Perl, PHP, JavaScript, Emacs Lisp, Java. But other langs, such as Clojure and other lisps, OCaml, Haskell, erlang, Fortress, APL, HOL, Coq, Lua, Factor, Rust, Julia … are very welcome. [see Proliferation of Computing Languages]

You should be able to do this within 8 hours from scratch. Took me 5 hours.

Python solution will be posted in a week, on 2014-01-14 (or sooner if many showed solutions). Post your solution in comment (or link to github or pastebin).

### Edge Generator

Here's a helper function that generates random edges list, for testing.

```# -*- coding: utf-8 -*-
# python 2
# xah lee, 2013-12-03

import random

def randomTreeEdges(max_nodes):
id_pool = range(max_nodes)
random.shuffle(id_pool)
root = id_pool.pop()
first_child = id_pool.pop()
edges = [[first_child, root]]
while len(id_pool) != 0:
if random.random() > .1: # add child
edges.append([id_pool.pop(), random.choice(edges)])
edges.append([id_pool.pop(), random.choice(edges)])
random.shuffle(edges)
return edges

edges = randomTreeEdges(20)
print "edges {}".format(edges)

# edges [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]```

Sample output:

```[[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

{
"9": {
"12": {
"1": {
"7": {
"10": {
"6": {
"0": {},
"16": {}
}
},
"2": {
"8": {
"14": {
"4": {}
}
},
"3": {}
}
},
"11": {},
"13": {},
"5": {
"17": {},
"15": {}
}
},
"19": {
"18": {}
}
}
}
}```

## Solutions

### Python

Here's a Python 2 solution, by Xah Lee.

```# -*- coding: utf-8 -*-
# python 2

# problem: given a list of edges of a tree: [child,parent], construct the tree.
# xah lee, 2013-12-03

import json # pretty print nested hash

def construct_tree(edges):
"""Given a list of edges [child, parent], return a tree."""

parentDB = {} # key is a node. value is its parent
childrenDB = {} # key is a node. value is its children as hash
for edg in edges:
c, p = edg, edg
parentDB[c] = p
if p in childrenDB:
childrenDB[p][c] = None
else:
childrenDB[p] = {c:None}

root = None
for v in childrenDB.viewkeys():
if v not in parentDB:
root = v
break

leaf_set = (set(parentDB.keys()) - set(childrenDB.keys()))

# construct the tree, bottom up
tree = {}
for node in leaf_set:
tree[node] = {}

while len(parentDB) != 0:
if head not in childrenDB: # not parent of somebody
return tree

if __name__ == '__main__':
edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
print(json.dumps(construct_tree(edges), indent=1))```

Here's the best Python 3 solution by Ting-Yu Lin:

```#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
This code resolves the problem described in
http://xahlee.info/python/python_construct_tree_from_edge.html

by Ting-Yu Lin aethanyc 2014-01-08 https://gist.github.com/aethanyc/8313640

"""

import collections
import json

def construct_trees_by_TingYu(edges):
"""Given a list of edges [child, parent], return trees. """
trees = collections.defaultdict(dict)

for child, parent in edges:
trees[parent][child] = trees[child]

# Find roots
children, parents = zip(*edges)
roots = set(parents).difference(children)

return {root: trees[root] for root in roots}

if __name__ == '__main__':
edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
print(json.dumps(construct_trees_by_TingYu(edges), indent=1))```

Here's a python 2 solution by whonose.

```# -*- coding: utf-8 -*-
# python 2
# by whonose 2014-01-08

import json

edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

def construct_tree_by_whonose(edges):
"""Given a list of edges [child, parent], return a tree."""

def get_node(node):
if node not in tree:
tree[node] = {}
return tree[node]

tree = {}

parent = set([e for e in edges])
children = set([e for e in edges])
root = list(parent.difference(children))
stack = [root]
while len(stack):
n = stack.pop()
for e in edges:
if e == n:
get_node(n)[e] = get_node(e)
stack.append(e)

return {root: tree[root]}

print json.dumps(construct_tree_by_whonose(edges), indent=1)```

Here's a python 3 solution by Ian.

```# by Ian 2014-01-08

def construct_tree_by_Ian(edges):
forest = {}
# Populate the forest with depth-1 trees, with placeholders for subtrees.
for (child, parent) in edges:
if parent not in forest:
forest[parent] = {}
forest[parent][child] = {}
if not forest:
# Graph is empty; an edge list cannot represent a single-node tree.
return {}
roots = set(forest.keys())
# Replace the placeholders with corresponding trees from the forest.
for tree in forest.values():
roots -= tree.keys()
for child in tree:
if child in forest:
tree[child] = forest[child]
# Make sure we have a single root.  No check is made for cycles.
if len(roots) == 0 or len(roots) > 1:
raise ValueError("Graph is not a tree")
# The tree located at the actual root contains the entire tree.
root = roots.pop()
tree = {root: forest[root]}
return tree

if __name__ == '__main__':
import json
edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
print(json.dumps(construct_tree_by_Ian(edges), indent=1))```

## Test Script

timing test file python_construct_tree_from_edge_speed_test_1.py

## Timing comparison

On 2k edges, here's result.

• 0.04 secs ← TingYu
• 0.08 secs ← Xah
• 107.01 secs ← whonose's

When using 2M edges, here's the result.

• 9.07 secs ← TingYu
• 12.88 secs ← Xah

Ian's solution is about 76 times slower than TingYu's. Here's them running in python 3. (python_construct_tree_from_edge_speed_test_2.py3)

• 0.05 secs ← TingYu
• 3.83 secs ← Ian

Xah's solution needs to be modified in order to run in python 3. I haven't done so yet.

TingYu's solution is truely beautiful. The code is extremely short, basically just 6 lines. But, more importantly, is the algorithm he used.

### Emacs Lisp Solution

```;; -*- coding: utf-8 -*-
;; by Kalman Reti 2014-01-09

(require'cl)
(defun tree-from-edges (foo)
"reconstruct tree from list of (child parent) lists"
;; recursive function to simplify internal data structure for answer
(flet ((prune (list)
(cond ((null list) nil)
((not (listp list)) list)
(t (cons (prune (car list)) (loop for i in (cddr list) collect (prune i)))))))

;;hash table entries are triples of pid, parent dotted to list of children
(loop with ht = (make-hash-table :test 'equal)
for (child parent) in foo
for child-entry = (gethash child ht)
for parent-entry = (gethash parent ht)
if child-entry
do (when (null (second child-entry))
(setf (second child-entry) parent))
else
do (setq child-entry (setf (gethash child ht) (list child parent-entry)))
if parent-entry
do (push child-entry (cddr parent-entry))
else
do (setq parent-entry (setf (gethash parent ht) (list parent nil child-entry)))
(setf (second child-entry) parent-entry)
finally (return (loop for x being the hash-values of ht
for (pid parent . children) = x
when (null parent)
collect (prune x))))))

;; In scratch buffer
(tree-from-edges '((0 2) (3 0) (1 4) (2 4)))
;; ((4 (2 (0 (3))) (1)))﻿

;; For larger test cases, I used
;; ps -eaf | awk '{print \$2, \$3}'

;; in a shell buffer then
;;   (set symbol (loop for child = (ignore-errors (read (current-buffer)))
;;             while child
;;             for parent = (read (current-buffer))
;;             collect (list child parent))))

;; to get the data into a variable.  I compared my output with pstree.  From start to finish, about an hour.﻿```

### Clojure lisp

```;; -*- coding: utf-8 -*-
;; clojure
;; by Tassilo 2014-01-08 https://gist.github.com/tsdh/8314450

(defn parent-children-map
"Converts a vector of [child parent] edges into a map where every entry has
the form [parent set-of-children].

Examples:

(parent-children-map [[0, 2], [3, 0], [1, 4], [2, 4]])
;=> {4 #{1 2}, 0 #{3}, 2 #{0}}

(parent-children-map [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4]
[3 0] [7 0] [5 2] [0 2] [11 1]])
;=> {2 #{0 5}, 0 #{3 7}, 5 #{9}, 6 #{8}, 4 #{1 2}, 1 #{6 10 11}}"
[edges]
(reduce (fn [m [child parent]]
(update-in m [parent]
(fn [s] (set (conj s child)))))
{}
edges))

(defn get-root
"Gets the root entry of the parent-children-map pcm."
[pcm]
(loop [xs pcm]
(if (seq xs)
(let [[p _ :as root] (first xs)]
(if (some (fn [[_ cs]] (contains? cs p)) pcm)
(recur (rest xs))
root))
(throw (RuntimeException. "There is no root!")))))

(defn make-tree
"Returns a map representing the given sequence of edges (each edge
represented as [child parent]) as a tree.

Examples:

(make-tree [[0, 2], [3, 0], [1, 4], [2, 4]])
;=> {4 {2 {0 {3 nil}},
1 nil}}

(make-tree [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4]
[3 0] [7 0] [5 2] [0 2] [11 1]])
;=> {4 {2 {5 {9 nil},
0 {7 nil,
3 nil}},
1 {11 nil,
10 nil,
6 {8 nil}}}}"
[edges]
(let [pcm (parent-children-map edges)]
(letfn [(build-child-trees [cs]
(apply merge (map #(build-tree [% (pcm %)]) cs)))
(build-tree [[root cs]]
(if root
{root (build-child-trees cs)}
{}))]
(build-tree (get-root pcm)))))```

### Shen Lisp

```\* I changed the format to [parent child] which is a bit more natural.  A type secure version in Shen which returns a string in columnar format.  *\
\* Nice to see friendly challenge problems here again.  *\
\* Mark Tarver, 2014-01-15  *\

(define tree
{(list (list A)) --> string}
Edges -> (let Root (root Edges (map (function reverse) Edges))
TreeString (cn "c#10;" (tree-string Root Edges 0))
TreeString))

(define root
{(list (list A)) --> (list (list A)) --> A}
[[Root | _] | Edges] All -> Root   where (empty? (assoc Root All))
[_ | Edges] All -> (root Edges All))

(define tree-string
{A --> (list (list A)) --> number --> string}
Node Edges Indent -> (let Daughters (daughters Node Edges)
String (str-node Node Indent)
DStrings (map (/. D (tree-string D Edges (+ 1 Indent))) Daughters)
DString (list->str DStrings)
(@s String "c#10;" DString)))

(define list->str
{(list string) --> string}
[] -> ""
[X | Y] -> (@s X (list->str Y)))

(define str-node
{A --> number --> string}
Node 0 -> (str Node)
Node Indent -> (cn " " (str-node Node (- Indent 1))))

(define daughters
{A --> (list (list A)) --> (list A)}
Node Edges -> (mapcan (/. Edge (if (= (head Edge) Node) (tail Edge) [])) Edges))

\* (85+) (tree [[0 1] [0 2] [2 6] [2 7] [1 3] [1 4] [4 5]])  *\
\* "  *\
\* 0  *\
\*  1  *\
\*   3  *\
\*   4  *\
\*    5  *\
\*  2  *\
\*   6  *\
\*   7  *\
\* " : string *\```

## more solutions

more solutions i haven't looked yet.