# Programing Problem: Construct a Tree Given Its Edges

Problem: given a list of edges of a tree, each edge looks like this
`[`

, construct the tree.`child`, `parent`]

Here's a sample input in Python nested list syntax: `[[0, 2], [3, 0], [1, 4], [2, 4]]`

The nodes of the tree has integer id. You can assume them to be string too.

Here's a sample print of a tree data structure:

4 1 2 0 3

this means, the root node's name is 4. It has 2 branches, named 1 and 2. The branche named 2 has 1 children, named 0, and it has one children named 3.

You can choose your own data structure for the output. For example, nested list with 1st element being the node name, or use nested hash of hash, where key is the node name, and value is its children.

Here's sample data structure in python, using hash of hash.

{ "4": { "1": {}, "2": { "0": { "3": {} } } } }

Other data structure are accepted.

Code it using your favorite language. I'll be able to look at and verify in Mathematica, Python, Ruby, Perl, PHP, JavaScript, Emacs Lisp, Java. But other langs, such as Clojure and other lisps, OCaml, Haskell, erlang, Fortress, APL, HOL, Coq, Lua, Factor, Rust, Julia … are very welcome. [see Proliferation of Computing Languages]

You should be able to do this within 8 hours from scratch. Took me 5 hours.

Python solution will be posted in a week, on 2014-01-14 (or sooner if many showed solutions). Post your solution in comment (or link to github or pastebin).

to post code in the comment, please add this tag wrapper:

`<pre><code class="python">…</code></pre>`

### Edge Generator

Here's a helper function that generates random edges list, for testing.

# -*- coding: utf-8 -*- # python 2 # xah lee, 2013-12-03 import random def randomTreeEdges(max_nodes): id_pool = range(max_nodes) random.shuffle(id_pool) root = id_pool.pop() first_child = id_pool.pop() edges = [[first_child, root]] while len(id_pool) != 0: if random.random() > .1: # add child edges.append([id_pool.pop(), random.choice(edges)[0]]) else: # add sibling edges.append([id_pool.pop(), random.choice(edges)[1]]) random.shuffle(edges) return edges edges = randomTreeEdges(20) print "edges {}".format(edges) # edges [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

Sample output:

[[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]] { "9": { "12": { "1": { "7": { "10": { "6": { "0": {}, "16": {} } }, "2": { "8": { "14": { "4": {} } }, "3": {} } }, "11": {}, "13": {}, "5": { "17": {}, "15": {} } }, "19": { "18": {} } } } }

## Solutions

### Python

Here's a Python 2 solution, by Xah Lee.

# -*- coding: utf-8 -*- # python 2 # problem: given a list of edges of a tree: [child,parent], construct the tree. # xah lee, 2013-12-03 import json # pretty print nested hash def construct_tree(edges): """Given a list of edges [child, parent], return a tree.""" parentDB = {} # key is a node. value is its parent childrenDB = {} # key is a node. value is its children as hash for edg in edges: c, p = edg[0], edg[1] parentDB[c] = p if p in childrenDB: childrenDB[p][c] = None else: childrenDB[p] = {c:None} root = None for v in childrenDB.viewkeys(): if v not in parentDB: root = v break leaf_set = (set(parentDB.keys()) - set(childrenDB.keys())) # construct the tree, bottom up tree = {} for node in leaf_set: tree[node] = {} while len(parentDB) != 0: for head, tail in tree.items(): if head != root: if head not in childrenDB: # not parent of somebody if parentDB[head] in tree: # add sibling tree[parentDB[head]][head] = tail else: # add parent tree[parentDB[head]] = {head: tail} del tree[head] del childrenDB[parentDB[head]][head] if len(childrenDB[parentDB[head]]) == 0: del childrenDB[parentDB[head]] del parentDB[head] return tree if __name__ == '__main__': edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]] print(json.dumps(construct_tree(edges), indent=1))

Here's the best Python 3 solution by Ting-Yu Lin:

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ This code resolves the problem described in http://xahlee.info/python/python_construct_tree_from_edge.html by Ting-Yu Lin aethanyc 2014-01-08 https://gist.github.com/aethanyc/8313640 """ import collections import json def construct_trees_by_TingYu(edges): """Given a list of edges [child, parent], return trees. """ trees = collections.defaultdict(dict) for child, parent in edges: trees[parent][child] = trees[child] # Find roots children, parents = zip(*edges) roots = set(parents).difference(children) return {root: trees[root] for root in roots} if __name__ == '__main__': edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]] print(json.dumps(construct_trees_by_TingYu(edges), indent=1))

Here's a python 2 solution by whonose.

# -*- coding: utf-8 -*- # python 2 # by whonose 2014-01-08 import json edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]] def construct_tree_by_whonose(edges): """Given a list of edges [child, parent], return a tree.""" def get_node(node): if node not in tree: tree[node] = {} return tree[node] tree = {} parent = set([e[1] for e in edges]) children = set([e[0] for e in edges]) root = list(parent.difference(children))[0] stack = [root] while len(stack): n = stack.pop() for e in edges: if e[1] == n: get_node(n)[e[0]] = get_node(e[0]) stack.append(e[0]) return {root: tree[root]} print json.dumps(construct_tree_by_whonose(edges), indent=1)

Here's a python 3 solution by Ian.

# by Ian 2014-01-08 def construct_tree_by_Ian(edges): forest = {} # Populate the forest with depth-1 trees, with placeholders for subtrees. for (child, parent) in edges: if parent not in forest: forest[parent] = {} forest[parent][child] = {} if not forest: # Graph is empty; an edge list cannot represent a single-node tree. return {} roots = set(forest.keys()) # Replace the placeholders with corresponding trees from the forest. for tree in forest.values(): roots -= tree.keys() for child in tree: if child in forest: tree[child] = forest[child] # Make sure we have a single root. No check is made for cycles. if len(roots) == 0 or len(roots) > 1: raise ValueError("Graph is not a tree") # The tree located at the actual root contains the entire tree. root = roots.pop() tree = {root: forest[root]} return tree if __name__ == '__main__': import json edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]] print(json.dumps(construct_tree_by_Ian(edges), indent=1))

## Test Script

timing test file python_construct_tree_from_edge_speed_test_1.py

## Timing comparison

On 2k edges, here's result.

- 0.04 secs ← TingYu
- 0.08 secs ← Xah
- 107.01 secs ← whonose's

When using 2M edges, here's the result.

- 9.07 secs ← TingYu
- 12.88 secs ← Xah

Ian's solution is about 76 times slower than TingYu's. Here's them running in python 3. (python_construct_tree_from_edge_speed_test_2.py3)

- 0.05 secs ← TingYu
- 3.83 secs ← Ian

Xah's solution needs to be modified in order to run in python 3. I haven't done so yet.

TingYu's solution is truely beautiful. The code is extremely short, basically just 6 lines. But, more importantly, is the algorithm he used.

### Emacs Lisp Solution

;; -*- coding: utf-8 -*- ;; by Kalman Reti 2014-01-09 ;; https://plus.google.com/113859563190964307534/posts/dmz88usQKQT (require'cl) (defun tree-from-edges (foo) "reconstruct tree from list of (child parent) lists" ;; recursive function to simplify internal data structure for answer (flet ((prune (list) (cond ((null list) nil) ((not (listp list)) list) (t (cons (prune (car list)) (loop for i in (cddr list) collect (prune i))))))) ;;hash table entries are triples of pid, parent dotted to list of children (loop with ht = (make-hash-table :test 'equal) for (child parent) in foo for child-entry = (gethash child ht) for parent-entry = (gethash parent ht) if child-entry do (when (null (second child-entry)) (setf (second child-entry) parent)) else do (setq child-entry (setf (gethash child ht) (list child parent-entry))) if parent-entry do (push child-entry (cddr parent-entry)) else do (setq parent-entry (setf (gethash parent ht) (list parent nil child-entry))) (setf (second child-entry) parent-entry) finally (return (loop for x being the hash-values of ht for (pid parent . children) = x when (null parent) collect (prune x)))))) ;; In scratch buffer (tree-from-edges '((0 2) (3 0) (1 4) (2 4))) ;; ((4 (2 (0 (3))) (1))) ;; For larger test cases, I used ;; ps -eaf | awk '{print $2, $3}' ;; in a shell buffer then ;; (defun read-data (symbol) ;; (set symbol (loop for child = (ignore-errors (read (current-buffer))) ;; while child ;; for parent = (read (current-buffer)) ;; collect (list child parent)))) ;; (read-data 'foo) ;; to get the data into a variable. I compared my output with pstree. From start to finish, about an hour.

### Clojure lisp

;; -*- coding: utf-8 -*- ;; clojure ;; by Tassilo 2014-01-08 https://gist.github.com/tsdh/8314450 (defn parent-children-map "Converts a vector of [child parent] edges into a map where every entry has the form [parent set-of-children]. Examples: (parent-children-map [[0, 2], [3, 0], [1, 4], [2, 4]]) ;=> {4 #{1 2}, 0 #{3}, 2 #{0}} (parent-children-map [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4] [3 0] [7 0] [5 2] [0 2] [11 1]]) ;=> {2 #{0 5}, 0 #{3 7}, 5 #{9}, 6 #{8}, 4 #{1 2}, 1 #{6 10 11}}" [edges] (reduce (fn [m [child parent]] (update-in m [parent] (fn [s] (set (conj s child))))) {} edges)) (defn get-root "Gets the root entry of the parent-children-map pcm." [pcm] (loop [xs pcm] (if (seq xs) (let [[p _ :as root] (first xs)] (if (some (fn [[_ cs]] (contains? cs p)) pcm) (recur (rest xs)) root)) (throw (RuntimeException. "There is no root!"))))) (defn make-tree "Returns a map representing the given sequence of edges (each edge represented as [child parent]) as a tree. Examples: (make-tree [[0, 2], [3, 0], [1, 4], [2, 4]]) ;=> {4 {2 {0 {3 nil}}, 1 nil}} (make-tree [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4] [3 0] [7 0] [5 2] [0 2] [11 1]]) ;=> {4 {2 {5 {9 nil}, 0 {7 nil, 3 nil}}, 1 {11 nil, 10 nil, 6 {8 nil}}}}" [edges] (let [pcm (parent-children-map edges)] (letfn [(build-child-trees [cs] (apply merge (map #(build-tree [% (pcm %)]) cs))) (build-tree [[root cs]] (if root {root (build-child-trees cs)} {}))] (build-tree (get-root pcm)))))

### Shen Lisp

\* I changed the format to [parent child] which is a bit more natural. A type secure version in Shen which returns a string in columnar format. *\ \* Nice to see friendly challenge problems here again. *\ \* Mark Tarver, 2014-01-15 *\ (define tree {(list (list A)) --> string} Edges -> (let Root (root Edges (map (function reverse) Edges)) TreeString (cn "c#10;" (tree-string Root Edges 0)) TreeString)) (define root {(list (list A)) --> (list (list A)) --> A} [[Root | _] | Edges] All -> Root where (empty? (assoc Root All)) [_ | Edges] All -> (root Edges All)) (define tree-string {A --> (list (list A)) --> number --> string} Node Edges Indent -> (let Daughters (daughters Node Edges) String (str-node Node Indent) DStrings (map (/. D (tree-string D Edges (+ 1 Indent))) Daughters) DString (list->str DStrings) (@s String "c#10;" DString))) (define list->str {(list string) --> string} [] -> "" [X | Y] -> (@s X (list->str Y))) (define str-node {A --> number --> string} Node 0 -> (str Node) Node Indent -> (cn " " (str-node Node (- Indent 1)))) (define daughters {A --> (list (list A)) --> (list A)} Node Edges -> (mapcan (/. Edge (if (= (head Edge) Node) (tail Edge) [])) Edges)) \* (85+) (tree [[0 1] [0 2] [2 6] [2 7] [1 3] [1 4] [4 5]]) *\ \* " *\ \* 0 *\ \* 1 *\ \* 3 *\ \* 4 *\ \* 5 *\ \* 2 *\ \* 6 *\ \* 7 *\ \* " : string *\

## more solutions

more solutions i haven't looked yet.

Common Lisp https://groups.google.com/forum/#!topic/comp.lang.lisp/N-9UtzPmtVY

emacs lisp
~~https://plus.google.com/113859563190964307534/posts/dmz88usQKQT~~

## Reddit Discussion

Reddit discussion. http://www.reddit.com/r/learnpython/comments/2y5ncs/

- TreeGenerator
- MinimumTreeGenerator
- MinimumIndexSet
- Perl Module: Tree::Matica (Mathematica Tree Functions in Perl)
- Vector Normalize Function in Mathematica, APL, Haskell, Ruby, Python, Perl, Lisp, JavaScript, Java, …
- In-place Algorithm, Reverse List in JavaScript, Python, Perl, Lisp, Wolfram Lang
- Programing: Decimalize Latitude Longitude
- Programing Exercise, Validate Matching Brackets
- One Language to Rule Them All? Or, What Language to Use for Find Replace?
- Programing Challenge: Replace String Pairs