Programing Problem: Construct a Tree Given Its Edges

By Xah Lee. Date: . Last updated: .

Problem: given a list of edges of a tree, each edge looks like this [child, parent] , construct the tree.

Here's a sample input in Python nested list syntax: [[0, 2], [3, 0], [1, 4], [2, 4]]

The nodes of the tree has integer id. You can assume them to be string too.

Here's a sample print of a tree data structure:

4
 1
 2
  0
   3

this means, the root node's name is 4. It has 2 branches, named 1 and 2. The branche named 2 has 1 children, named 0, and it has one children named 3.

You can choose your own data structure for the output. For example, nested list with 1st element being the node name, or use nested hash of hash, where key is the node name, and value is its children.

Here's sample data structure in python, using hash of hash.

{
 "4": {
  "1": {},
  "2": {
   "0": {
    "3": {}
   }
  }
 }
}

Other data structure are accepted.

Code it using your favorite language. I'll be able to look at and verify in Mathematica, Python, Ruby, Perl, PHP, JavaScript, Emacs Lisp, Java. But other langs, such as Clojure and other lisps, OCaml, Haskell, erlang, Fortress, APL, HOL, Coq, Lua, Factor, Rust, Julia … are very welcome. [see Proliferation of Computing Languages]

You should be able to do this within 8 hours from scratch. Took me 5 hours.

Python solution will be posted in a week, on 2014-01-14 (or sooner if many showed solutions). Post your solution in comment (or link to github or pastebin).

Edge Generator

Here's a helper function that generates random edges list, for testing.

# -*- coding: utf-8 -*-
# python 2
# xah lee, 2013-12-03

import random

def randomTreeEdges(max_nodes):
    id_pool = range(max_nodes)
    random.shuffle(id_pool)
    root = id_pool.pop()
    first_child = id_pool.pop()
    edges = [[first_child, root]]
    while len(id_pool) != 0:
        if random.random() > .1: # add child
            edges.append([id_pool.pop(), random.choice(edges)[0]])
        else:   # add sibling
            edges.append([id_pool.pop(), random.choice(edges)[1]])
    random.shuffle(edges)
    return edges

edges = randomTreeEdges(20)
print "edges {}".format(edges)

# edges [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

Sample output:

[[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

{
  "9": {
    "12": {
      "1": {
        "7": {
          "10": {
            "6": {
              "0": {},
              "16": {}
            }
          },
          "2": {
            "8": {
              "14": {
                "4": {}
              }
            },
            "3": {}
          }
        },
        "11": {},
        "13": {},
        "5": {
          "17": {},
          "15": {}
        }
      },
      "19": {
        "18": {}
      }
    }
  }
}

Solutions

Python

Here's a Python 2 solution, by Xah Lee.

# -*- coding: utf-8 -*-
# python 2

# problem: given a list of edges of a tree: [child,parent], construct the tree.
# xah lee, 2013-12-03

import json # pretty print nested hash

def construct_tree(edges):
    """Given a list of edges [child, parent], return a tree."""

    parentDB = {} # key is a node. value is its parent
    childrenDB = {} # key is a node. value is its children as hash
    for edg in edges:
        c, p = edg[0], edg[1]
        parentDB[c] = p
        if p in childrenDB:
            childrenDB[p][c] = None
        else:
            childrenDB[p] = {c:None}

    root = None
    for v in childrenDB.viewkeys():
        if v not in parentDB:
            root = v
            break

    leaf_set = (set(parentDB.keys()) - set(childrenDB.keys()))

    # construct the tree, bottom up
    tree = {}
    for node in leaf_set:
        tree[node] = {}

    while len(parentDB) != 0:
        for head, tail in tree.items():
            if head != root:
                if head not in childrenDB: # not parent of somebody
                    if parentDB[head] in tree: # add sibling
                        tree[parentDB[head]][head] = tail
                    else: # add parent
                        tree[parentDB[head]] = {head: tail}
                    del tree[head]
                    del childrenDB[parentDB[head]][head]
                    if len(childrenDB[parentDB[head]]) == 0:
                        del childrenDB[parentDB[head]]
                    del parentDB[head]
    return tree

if __name__ == '__main__':
    edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
    print(json.dumps(construct_tree(edges), indent=1))

Here's the best Python 3 solution by Ting-Yu Lin:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
This code resolves the problem described in
http://xahlee.info/python/python_construct_tree_from_edge.html

by Ting-Yu Lin aethanyc 2014-01-08 https://gist.github.com/aethanyc/8313640

"""

import collections
import json

def construct_trees_by_TingYu(edges):
    """Given a list of edges [child, parent], return trees. """
    trees = collections.defaultdict(dict)

    for child, parent in edges:
        trees[parent][child] = trees[child]

    # Find roots
    children, parents = zip(*edges)
    roots = set(parents).difference(children)

    return {root: trees[root] for root in roots}

if __name__ == '__main__':
    edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
    print(json.dumps(construct_trees_by_TingYu(edges), indent=1))

Here's a python 2 solution by whonose.

# -*- coding: utf-8 -*-
# python 2
# by whonose 2014-01-08

import json

edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

def construct_tree_by_whonose(edges):
    """Given a list of edges [child, parent], return a tree."""

    def get_node(node):
        if node not in tree:
            tree[node] = {}
        return tree[node]

    tree = {}

    parent = set([e[1] for e in edges])
    children = set([e[0] for e in edges])
    root = list(parent.difference(children))[0]
    stack = [root]
    while len(stack):
        n = stack.pop()
        for e in edges:
            if e[1] == n:
                get_node(n)[e[0]] = get_node(e[0])
                stack.append(e[0])

    return {root: tree[root]}

print json.dumps(construct_tree_by_whonose(edges), indent=1)

Here's a python 3 solution by Ian.

# by Ian 2014-01-08

def construct_tree_by_Ian(edges):
    forest = {}
    # Populate the forest with depth-1 trees, with placeholders for subtrees.
    for (child, parent) in edges:
        if parent not in forest:
            forest[parent] = {}
        forest[parent][child] = {}
    if not forest:
  # Graph is empty; an edge list cannot represent a single-node tree.
        return {}
    roots = set(forest.keys())
    # Replace the placeholders with corresponding trees from the forest.
    for tree in forest.values():
        roots -= tree.keys()
        for child in tree:
            if child in forest:
                tree[child] = forest[child]
    # Make sure we have a single root.  No check is made for cycles.
    if len(roots) == 0 or len(roots) > 1:
        raise ValueError("Graph is not a tree")
    # The tree located at the actual root contains the entire tree.
    root = roots.pop()
    tree = {root: forest[root]}
    return tree

if __name__ == '__main__':
    import json
    edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
    print(json.dumps(construct_tree_by_Ian(edges), indent=1))

Test Script

timing test file python_construct_tree_from_edge_speed_test_1.py

Timing comparison

On 2k edges, here's result.

When using 2M edges, here's the result.

Ian's solution is about 76 times slower than TingYu's. Here's them running in python 3. (python_construct_tree_from_edge_speed_test_2.py3)

Xah's solution needs to be modified in order to run in python 3. I haven't done so yet.

TingYu's solution is truely beautiful. The code is extremely short, basically just 6 lines. But, more importantly, is the algorithm he used.

Emacs Lisp Solution

;; -*- coding: utf-8 -*-
;; by Kalman Reti 2014-01-09
;; https://plus.google.com/113859563190964307534/posts/dmz88usQKQT

(require'cl)
(defun tree-from-edges (foo)
  "reconstruct tree from list of (child parent) lists"
  ;; recursive function to simplify internal data structure for answer
  (flet ((prune (list)
        (cond ((null list) nil)
              ((not (listp list)) list)
              (t (cons (prune (car list)) (loop for i in (cddr list) collect (prune i)))))))

  ;;hash table entries are triples of pid, parent dotted to list of children
  (loop with ht = (make-hash-table :test 'equal)
    for (child parent) in foo
    for child-entry = (gethash child ht)
    for parent-entry = (gethash parent ht)
    if child-entry
    do (when (null (second child-entry))
         (setf (second child-entry) parent))
    else
    do (setq child-entry (setf (gethash child ht) (list child parent-entry)))
    if parent-entry
    do (push child-entry (cddr parent-entry))
    else
    do (setq parent-entry (setf (gethash parent ht) (list parent nil child-entry)))
       (setf (second child-entry) parent-entry)
    finally (return (loop for x being the hash-values of ht
                  for (pid parent . children) = x
                  when (null parent)
                  collect (prune x))))))

;; In scratch buffer
(tree-from-edges '((0 2) (3 0) (1 4) (2 4)))
;; ((4 (2 (0 (3))) (1)))

;; For larger test cases, I used
;; ps -eaf | awk '{print $2, $3}'

;; in a shell buffer then
;; (defun read-data (symbol)
;;   (set symbol (loop for child = (ignore-errors (read (current-buffer)))
;;             while child
;;             for parent = (read (current-buffer))
;;             collect (list child parent))))
;;  (read-data 'foo)

;; to get the data into a variable.  I compared my output with pstree.  From start to finish, about an hour.

Clojure lisp

;; -*- coding: utf-8 -*-
;; clojure
;; by Tassilo 2014-01-08 https://gist.github.com/tsdh/8314450

(defn parent-children-map
  "Converts a vector of [child parent] edges into a map where every entry has
  the form [parent set-of-children].

  Examples:

  (parent-children-map [[0, 2], [3, 0], [1, 4], [2, 4]])
  ;=> {4 #{1 2}, 0 #{3}, 2 #{0}}

  (parent-children-map [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4]
                        [3 0] [7 0] [5 2] [0 2] [11 1]])
  ;=> {2 #{0 5}, 0 #{3 7}, 5 #{9}, 6 #{8}, 4 #{1 2}, 1 #{6 10 11}}"
  [edges]
  (reduce (fn [m [child parent]]
            (update-in m [parent]
                       (fn [s] (set (conj s child)))))
          {}
          edges))

(defn get-root
  "Gets the root entry of the parent-children-map pcm."
  [pcm]
  (loop [xs pcm]
    (if (seq xs)
      (let [[p _ :as root] (first xs)]
        (if (some (fn [[_ cs]] (contains? cs p)) pcm)
          (recur (rest xs))
          root))
      (throw (RuntimeException. "There is no root!")))))

(defn make-tree
  "Returns a map representing the given sequence of edges (each edge
  represented as [child parent]) as a tree.

  Examples:

  (make-tree [[0, 2], [3, 0], [1, 4], [2, 4]])
  ;=> {4 {2 {0 {3 nil}},
          1 nil}}

  (make-tree [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4]
              [3 0] [7 0] [5 2] [0 2] [11 1]])
  ;=> {4 {2 {5 {9 nil},
             0 {7 nil,
                3 nil}},
          1 {11 nil,
             10 nil,
             6 {8 nil}}}}"
  [edges]
  (let [pcm (parent-children-map edges)]
    (letfn [(build-child-trees [cs]
              (apply merge (map #(build-tree [% (pcm %)]) cs)))
            (build-tree [[root cs]]
              (if root
                {root (build-child-trees cs)}
                {}))]
      (build-tree (get-root pcm)))))

Shen Lisp

\* I changed the format to [parent child] which is a bit more natural.  A type secure version in Shen which returns a string in columnar format.  *\
\* Nice to see friendly challenge problems here again.  *\
\* Mark Tarver, 2014-01-15  *\

(define tree
  {(list (list A)) --> string}
   Edges -> (let Root (root Edges (map (function reverse) Edges))
                 TreeString (cn "c#10;" (tree-string Root Edges 0))
                 TreeString))

(define root
  {(list (list A)) --> (list (list A)) --> A}
   [[Root | _] | Edges] All -> Root   where (empty? (assoc Root All))
   [_ | Edges] All -> (root Edges All))

(define tree-string
  {A --> (list (list A)) --> number --> string}
  Node Edges Indent -> (let Daughters (daughters Node Edges)
                            String (str-node Node Indent)
                            DStrings (map (/. D (tree-string D Edges (+ 1 Indent))) Daughters)
                            DString (list->str DStrings)
                            (@s String "c#10;" DString)))

(define list->str
  {(list string) --> string}
   [] -> ""
   [X | Y] -> (@s X (list->str Y)))

(define str-node
  {A --> number --> string}
   Node 0 -> (str Node)
   Node Indent -> (cn " " (str-node Node (- Indent 1))))

(define daughters
  {A --> (list (list A)) --> (list A)}
   Node Edges -> (mapcan (/. Edge (if (= (head Edge) Node) (tail Edge) [])) Edges))

\* (85+) (tree [[0 1] [0 2] [2 6] [2 7] [1 3] [1 4] [4 5]])  *\
\* "  *\
\* 0  *\
\*  1  *\
\*   3  *\
\*   4  *\
\*    5  *\
\*  2  *\
\*   6  *\
\*   7  *\
\* " : string *\

more solutions

more solutions i haven't looked yet.

Common Lisp https://groups.google.com/forum/#!topic/comp.lang.lisp/N-9UtzPmtVY

emacs lisp https://plus.google.com/113859563190964307534/posts/dmz88usQKQT

Reddit Discussion

Reddit discussion. http://www.reddit.com/r/learnpython/comments/2y5ncs/