Programing Challenge: Constructing a Tree Given Its Edges

, , …,

Problem: given a list of edges of a tree: [child, parent], construct the tree. Here's a sample input in Python nested list syntax: [[0, 2], [3, 0], [1, 4], [2, 4]].

Here's a sample print of a tree data structure:

4
 1
 2
  0
   3

this means, the root node's name is 4. It has 2 branches, named 1 and 2. The branche named 2 has 1 children, named 0, and it has one children named 3. The node's names are given as integers, but you can assume them to be strings.

You can choose your own data structure for the output. For example, nested list with 1st element being the node name, or use nested hash of hash, where key is the node name, and value is its children.

Here's sample data structure in python, using hash of hash.

{
 "4": {
  "1": {},
  "2": {
   "0": {
    "3": {}
   }
  }
 }
}

Other data structure are accepted.

Code it using your favorite language. I'll be able to look at and verify in Mathematica, Python, Ruby, Perl, PHP, JavaScript, Emacs Lisp, Java. But other langs, such as Clojure and other lisps, OCaml, Haskell, erlang, Fortress, APL, HOL, Coq, Lua, Factor, Rust, Julia … are very welcome. 〔☛ Proliferation of Computing Languages

You should be able to do this within 8 hours from scratch. Took me 5 hours.

Python solution will be posted in a week, on 2014-01-14 (or sooner if many showed solutions). Post your solution in comment (or link to github or pastebin).

to post code in the comment, please add this tag wrapper:

<pre><code class="python"></code></pre>

Edge Generator

# -*- coding: utf-8 -*-
# python
# xah lee, 2013-12-03

import random

def randomTreeEdges(max_nodes):
    id_pool = range(max_nodes)
    random.shuffle(id_pool)
    root = id_pool.pop()
    first_child = id_pool.pop()
    edges = [[first_child, root]]
    while len(id_pool) != 0:
        if random.random() > .1: # add child
            edges.append([id_pool.pop(), random.choice(edges)[0]])
        else:                   # add sibling
            edges.append([id_pool.pop(), random.choice(edges)[1]])
    random.shuffle(edges)
    return edges

edges = randomTreeEdges(20)
print "edges {}".format(edges)

# edges [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
[[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

{
  "9": {
    "12": {
      "1": {
        "7": {
          "10": {
            "6": {
              "0": {},
              "16": {}
            }
          },
          "2": {
            "8": {
              "14": {
                "4": {}
              }
            },
            "3": {}
          }
        },
        "11": {},
        "13": {},
        "5": {
          "17": {},
          "15": {}
        }
      },
      "19": {
        "18": {}
      }
    }
  }
}

Solutions

Python

# -*- coding: utf-8 -*-
# python

# problem: given a list of edges of a tree: [child,parent], construct the tree.
# xah lee, 2013-12-03

import json                     # pretty print nested hash

def construct_tree(edges):
    """Given a list of edges [child, parent], return a tree."""

    parentDB = {} # key is a node. value is its parent
    childrenDB = {} # key is a node. value is its children as hash
    for edg in edges:
        c, p = edg[0], edg[1]
        parentDB[c] = p
        if p in childrenDB:
            childrenDB[p][c] = None
        else:
            childrenDB[p] = {c:None}

    root = None
    for v in childrenDB.viewkeys():
        if v not in parentDB:
            root = v
            break

    leaf_set = (set(parentDB.keys()) - set(childrenDB.keys()))

    # construct the tree, bottom up
    tree = {}
    for node in leaf_set:
        tree[node] = {}

    while len(parentDB) != 0:
        for head, tail in tree.items():
            if head != root:
                if head not in childrenDB:     # not parent of somebody
                    if parentDB[head] in tree: # add sibling
                        tree[parentDB[head]][head] = tail
                    else:           # add parent
                        tree[parentDB[head]] = {head: tail}
                    del tree[head]
                    del childrenDB[parentDB[head]][head]
                    if len(childrenDB[parentDB[head]]) == 0:
                        del childrenDB[parentDB[head]]
                    del parentDB[head]
    return tree

if __name__ == '__main__':
    edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
    print(json.dumps(construct_tree(edges), indent=1))
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
This code resolves the problem described in
http://xahlee.info/perl-python/python_construct_tree_from_edge.html

by Ting-Yu Lin aethanyc 2014-01-08 https://gist.github.com/aethanyc/8313640

"""

import collections
import json

def construct_trees_by_TingYu(edges):
    """Given a list of edges [child, parent], return trees. """
    trees = collections.defaultdict(dict)

    for child, parent in edges:
        trees[parent][child] = trees[child]

    # Find roots
    children, parents = zip(*edges)
    roots = set(parents).difference(children)

    return {root: trees[root] for root in roots}

if __name__ == '__main__':
    edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
    print(json.dumps(construct_trees_by_TingYu(edges), indent=1))
# -*- coding: utf-8 -*-
# python
# by whonose 2014-01-08

import json

edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]

def construct_tree_by_whonose(edges):
    """Given a list of edges [child, parent], return a tree."""

    def get_node(node):
        if node not in tree:
            tree[node] = {}
        return tree[node]

    tree = {}

    parent = set([e[1] for e in edges])
    children = set([e[0] for e in edges])
    root = list(parent.difference(children))[0]
    stack = [root]
    while len(stack):
        n = stack.pop()
        for e in edges:
            if e[1] == n:
                get_node(n)[e[0]] = get_node(e[0])
                stack.append(e[0])

    return {root: tree[root]}

print json.dumps(construct_tree_by_whonose(edges), indent=1)
# -*- coding: utf-8 -*-
# python 3
# by Ian 2014-01-08

def construct_tree_by_Ian(edges):
    forest = {}
    # Populate the forest with depth-1 trees, with placeholders for subtrees.
    for (child, parent) in edges:
        if parent not in forest:
            forest[parent] = {}
        forest[parent][child] = {}
    if not forest:
        # Graph is empty; an edge list cannot represent a single-node tree.
        return {}
    roots = set(forest.keys())
    # Replace the placeholders with corresponding trees from the forest.
    for tree in forest.values():
        roots -= tree.keys()
        for child in tree:
            if child in forest:
                tree[child] = forest[child]
    # Make sure we have a single root.  No check is made for cycles.
    if len(roots) == 0 or len(roots) > 1:
        raise ValueError("Graph is not a tree")
    # The tree located at the actual root contains the entire tree.
    root = roots.pop()
    tree = {root: forest[root]}
    return tree

if __name__ == '__main__':
    import json
    edges = [[0, 6], [17, 5], [2, 7], [4, 14], [12, 9], [15, 5], [11, 1], [14, 8], [16, 6], [5, 1], [10, 7], [6, 10], [8, 2], [13, 1], [1, 12], [7, 1], [3, 2], [19, 12], [18, 19]]
    print(json.dumps(construct_tree_by_Ian(edges), indent=1))

of the above 3 solutions, whonose's is 2675 times slower than TingYu's. (temp test file python_construct_tree_from_edge_speed_test_1.py)

When using 2000000 edges, here's the result:

Ian's solution is about 76 times slower than TingYu's. Here's them running in python 3. (python_construct_tree_from_edge_speed_test_2.py3)

Xah's solution needs to be modified in order to run in python 3. I haven't done so yet.

TingYu's solution is truely beautiful. The code is extremely short, basically just 6 lines. But, more importantly, is the algorithm he used. I haven't dig into detail yet, but it is different from mine. I (Xah) build the tree manually, bottom up from leafs.

emacs lisp

;; -*- coding: utf-8 -*-
;; by Kalman Reti 2014-01-09
;; https://plus.google.com/113859563190964307534/posts/dmz88usQKQT

(require'cl)
(defun tree-from-edges (foo)
  "reconstruct tree from list of (child parent) lists"
  ;; recursive function to simplify internal data structure for answer
  (flet ((prune (list)
        (cond ((null list) nil)
              ((not (listp list)) list)
              (t (cons (prune (car list)) (loop for i in (cddr list) collect (prune i)))))))

  ;;hash table entries are triples of pid, parent dotted to list of children
  (loop with ht = (make-hash-table :test 'equal)
    for (child parent) in foo
    for child-entry = (gethash child ht)
    for parent-entry = (gethash parent ht)
    if child-entry
    do (when (null (second child-entry))
         (setf (second child-entry) parent))
    else
    do (setq child-entry (setf (gethash child ht) (list child parent-entry)))
    if parent-entry
    do (push child-entry (cddr parent-entry))
    else
    do (setq parent-entry (setf (gethash parent ht) (list parent nil child-entry)))
       (setf (second child-entry) parent-entry)
    finally (return (loop for x being the hash-values of ht
                  for (pid parent . children) = x
                  when (null parent)
                  collect (prune x))))))

;; In scratch buffer
(tree-from-edges '((0 2) (3 0) (1 4) (2 4)))
;; ((4 (2 (0 (3))) (1)))

;; For larger test cases, I used
;; ps -eaf | awk '{print $2, $3}'

;; in a shell buffer then
;; (defun read-data (symbol)
;;   (set symbol (loop for child = (ignore-errors (read (current-buffer)))
;;             while child
;;             for parent = (read (current-buffer))
;;             collect (list child parent))))
;;  (read-data 'foo)

;; to get the data into a variable.  I compared my output with pstree.  From start to finish, about an hour.

Clojure lisp

;; -*- coding: utf-8 -*-
;; clojure
;; by Tassilo 2014-01-08 https://gist.github.com/tsdh/8314450

(defn parent-children-map
  "Converts a vector of [child parent] edges into a map where every entry has
  the form [parent set-of-children].

  Examples:

  (parent-children-map [[0, 2], [3, 0], [1, 4], [2, 4]])
  ;=> {4 #{1 2}, 0 #{3}, 2 #{0}}

  (parent-children-map [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4]
                        [3 0] [7 0] [5 2] [0 2] [11 1]])
  ;=> {2 #{0 5}, 0 #{3 7}, 5 #{9}, 6 #{8}, 4 #{1 2}, 1 #{6 10 11}}"
  [edges]
  (reduce (fn [m [child parent]]
            (update-in m [parent]
                       (fn [s] (set (conj s child)))))
          {}
          edges))

(defn get-root
  "Gets the root entry of the parent-children-map pcm."
  [pcm]
  (loop [xs pcm]
    (if (seq xs)
      (let [[p _ :as root] (first xs)]
        (if (some (fn [[_ cs]] (contains? cs p)) pcm)
          (recur (rest xs))
          root))
      (throw (RuntimeException. "There is no root!")))))

(defn make-tree
  "Returns a map representing the given sequence of edges (each edge
  represented as [child parent]) as a tree.

  Examples:

  (make-tree [[0, 2], [3, 0], [1, 4], [2, 4]])
  ;=> {4 {2 {0 {3 nil}},
          1 nil}}

  (make-tree [[10 1] [1 4] [6 1] [8 6] [9 5] [2 4]
              [3 0] [7 0] [5 2] [0 2] [11 1]])
  ;=> {4 {2 {5 {9 nil},
             0 {7 nil,
                3 nil}},
          1 {11 nil,
             10 nil,
             6 {8 nil}}}}"
  [edges]
  (let [pcm (parent-children-map edges)]
    (letfn [(build-child-trees [cs]
              (apply merge (map #(build-tree [% (pcm %)]) cs)))
            (build-tree [[root cs]]
              (if root
                {root (build-child-trees cs)}
                {}))]
      (build-tree (get-root pcm)))))

Shen Lisp

\* I changed the format to [parent child] which is a bit more natural.  A type secure version in Shen which returns a string in columnar format.  *\
\* Nice to see friendly challenge problems here again.  *\
\* Mark Tarver, 2014-01-15  *\

(define tree
  {(list (list A)) --> string}
   Edges -> (let Root (root Edges (map (function reverse) Edges))
                 TreeString (cn "c#10;" (tree-string Root Edges 0))
                 TreeString))

(define root
  {(list (list A)) --> (list (list A)) --> A}
   [[Root | _] | Edges] All -> Root   where (empty? (assoc Root All))
   [_ | Edges] All -> (root Edges All))

(define tree-string
  {A --> (list (list A)) --> number --> string}
  Node Edges Indent -> (let Daughters (daughters Node Edges)
                            String (str-node Node Indent)
                            DStrings (map (/. D (tree-string D Edges (+ 1 Indent))) Daughters)
                            DString (list->str DStrings)
                            (@s String "c#10;" DString)))

(define list->str
  {(list string) --> string}
   [] -> ""
   [X | Y] -> (@s X (list->str Y)))

(define str-node
  {A --> number --> string}
   Node 0 -> (str Node)
   Node Indent -> (cn " " (str-node Node (- Indent 1))))

(define daughters
  {A --> (list (list A)) --> (list A)}
   Node Edges -> (mapcan (/. Edge (if (= (head Edge) Node) (tail Edge) [])) Edges))

\* (85+) (tree [[0 1] [0 2] [2 6] [2 7] [1 3] [1 4] [4 5]])  *\
\* "  *\
\* 0  *\
\*  1  *\
\*   3  *\
\*   4  *\
\*    5  *\
\*  2  *\
\*   6  *\
\*   7  *\
\* " : string *\

more solutions

more solutions i haven't looked yet.

Common Lisp https://groups.google.com/forum/#!topic/comp.lang.lisp/N-9UtzPmtVY

emacs lisp https://plus.google.com/113859563190964307534/posts/dmz88usQKQT

blog comments powered by Disqus