Closure Table part deux – Nodes and Adjacencies – A tree in MySQL

From this SO question/answer, we’re managing a tree data structure from a MySQL database.

We have:

mysql> desc nodes ;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| name        | varchar(255) | YES  |     | NULL    |                |
| leftSibling | int(11)      | NO   |     | 0       |                |
+-------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> desc adjacencies;
+------------+---------+------+-----+---------+----------------+
| Field      | Type    | Null | Key | Default | Extra          |
+------------+---------+------+-----+---------+----------------+
| relationId | int(11) | NO   | PRI | NULL    | auto_increment |
| parent     | int(11) | NO   |     | NULL    |                |
| child      | int(11) | NO   |     | NULL    |                |
| pathLen    | int(11) | NO   |     | NULL    |                |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

So here’s a picture of a tree:

That tree is stored in the database with an entry in the nodes table for each of the 10 nodes of the tree as follows:


mysql> select * from nodes ;
+----+--------------------------+-------------+
| id | name                     | leftSibling |
+----+--------------------------+-------------+
|  1 | node 1                   |           0 |
|  2 | node 1.1                 |           0 |
|  3 | node 2                   |           1 |
|  4 | node 1.1.1               |           0 |
|  5 | node 2.1                 |           0 |
|  6 | node 1.2                 |           2 |
|  7 | node 1.3                 |           0 |
|  8 | hi there, new child of 7 |           0 |
|  9 | 2nd child of 7           |           8 |
| 10 | 3rd child of 7           |           9 |
+----+--------------------------+-------------+
10 rows in set (0.00 sec)

Node 0 is fictional and serves as the root of the tree. There is no node 0 entry in the nodes table, however.

Notice the leftSibling property of each node is a reference to
its SISTER NODE to its left. If a node is a “leftmost child” (no sisters to the left of it) then it has a 0 for its left sibling.

The leftSibling property is what will allow us to keep the children
in the correct order.

Now, to store all the parent child relationships in the tree, we use a “closure table” called adjacencies.

mysql> select * from adjacencies ;
+------------+--------+-------+---------+
| relationId | parent | child | pathLen |
+------------+--------+-------+---------+
|          1 |      1 |     1 |       0 |
|          2 |      0 |     1 |       1 |
|          3 |      2 |     2 |       0 |
|          4 |      1 |     2 |       1 |
|          5 |      0 |     2 |       2 |
|          6 |      3 |     3 |       0 |
|          7 |      0 |     3 |       1 |
|          8 |      4 |     4 |       0 |
|          9 |      2 |     4 |       1 |
|         10 |      1 |     4 |       2 |
...

32 rows in set (0.00 sec)

So this is the WEIRDEST TABLE and takes some getting used to. In order to understand why this is done this way, think of a traditional adjacency list. The reason you can’t use a plain old adjacency list in MySQL is because plain old adjacency lists would require MULTIPLE queries to pull out all the children of say, node 1. Think about it, and try looking up Bill Karwin’s SQL antipatterns book for further reading.

So, all the adjacencies table does is list ALL OF A CHILD’S PARENTS.
This makes it dead simple to retrieve all the ids of all the children of any given node in one fell swoop:

    -- All children of node 1:
    select child
    from adjacencies
    where parent=1 ;

And.. getting all the parents/grandparents of any given node (say node=7) is just as easy:

     -- ALL MY PARENTS,grandparents
    select a.parent
    from adjacencies a
    where child=7
    and parent!=7 ;

So, there is a bit of redundancy in the table, and lots of repeated entries. This may make your SQL-sense tingle but that is normal.

On with the code. Run the script and examine the tables created. Try the addNode(), move() and deleteNode() procedures.

So, you can see that Node=1 has only 2 entries in this table where it is the child. The first entry above has ( parent=1, child=1, pathLen=0 ). That means we actually consider 1 as a parent of itself.

drop table if exists nodes ;

create table nodes (
  id int not null primary key auto_increment,
  name varchar( 255 ),
  leftSibling int not null default 0
) ;

insert into nodes values

  ( 1, 'node 1', 0 ),
  ( 2, 'node 1.1', 0 ),
  ( 3, 'node 2', 1 ),
  ( 4, 'node 1.1.1', 0 ),
  ( 5, 'node 2.1', 0 ),
  ( 6, 'node 1.2', 2 ),
  ( 7, 'node 1.3', 6 )
  
;






drop table if exists adjacencies ;

create table adjacencies(
  
  relationId int not null primary key auto_increment,
  parent int not null,
  child int not null,
  pathLen int not null

) ;
-- do parenthood for each node.
insert into adjacencies( child, parent, pathLen ) values
  -- node 1 is a top level node, child of the root
  -- I chose to add an entry linking each node to
  -- "node 0" for nicer syntax on "select whole tree"
  -- also avoids null in "whose parent is" subquery
  
  
  ( 1, 1, 0 ),
  ( 1, 0, 1 ),
  
  -- node 2:
  ( 2, 2, 0 ),
  ( 2, 1, 1 ),
  ( 2, 0, 2 ),
  
  
  -- node 3
  ( 3, 3, 0 ),
  ( 3, 0, 1 ),
  
  ( 4, 4, 0 ),
  ( 4, 2, 1 ),
  ( 4, 1, 2 ),
  ( 4, 0, 3 ),
  
  ( 5, 5, 0 ),
  ( 5, 3, 1 ),
  ( 5, 0, 2 ),
  
  ( 6, 6, 0 ),
  ( 6, 1, 1 ),
  ( 6, 0, 2 ),
  
  ( 7, 7, 0 ),
  ( 7, 1, 1 ),
  ( 7, 0, 2 ) 
;
  




-- ------------------------------
-- ------------------------------
--       stored procedures
-- ------------------------------
-- ------------------------------
  
DELIMITER //

-- 1. getRightMostChild
drop function if exists getRightMostChild //
create function getRightMostChild(

  i_ofNode int

)
returns int
READS SQL DATA
BEGIN
  
return(
  select -- RIGHTMOST CHILD
    adjacencies.child as 'rightMostChildId'

  from
    adjacencies
      
  where
    -- children of this @node
    adjacencies.parent  = i_ofNode

    -- direct adjacencies only
    and adjacencies.pathLen=1
      
    -- if this node is named
    -- an leftSibling for any other node,
    -- then it CANNOT be the rightmost node
    and adjacencies.child NOT IN (
      SELECT nodes.leftSibling from nodes
      -- rather than do a join in the subquery
      -- i chose to leave it at just "this node
      -- won't be named as "leftSibling" for ANY node
      -- if indeed it is the rightmost child
      -- of a parentNode.
    ) ) ;
  
END //


-- 2.  AddNode
drop procedure if exists addNode //
create procedure addNode(

  i_name varchar(255),
  i_parent int

)

modifies sql data

BEGIN
  DECLARE newNodeId int ;
  DECLARE nodeToLeft int ;
  
  -- get the node that this one will go
  -- right of
  -- by selecting the RIGHTMOST existing child
  -- of i_parent
  select getRightMostChild( i_parent ) into nodeToLeft ;
  
  if nodeToLeft is null then
    -- it means this parent has no children yet
    SET nodeToLeft = 0 ; -- first child points to 0 to indicate its leftmost
  end if ;
  
  -- insert
  -- ADD ((ID)) AS LAST CHILD:
  insert into nodes( name, leftSibling ) value ( i_name, nodeToLeft ) ;
  
  -- get the id of the new node we just created
  -- to finish the tie up 
  select last_insert_id() into newNodeId ;
  
  -- now tie up child stuff
  -- the insertions into 'adjacencies' is going to be
  -- ( 8, 8, 0 ) -- self reference 0 depth
  -- AS A DESCENDANT OF ALL PARENTS OF (1) BECAUSE YOU ADDED AS CHILD OF (1)

  -- get ALL parents of i_parent, and
  -- make them also parent newNode, at
  -- a +1 depth level
  insert into adjacencies( parent, child, pathLen )
  
  -- (8, 8, 0)
  SELECT newNodeId, newNodeId, 0  -- the 0-length self referential entry each node needs
  
  UNION
  
  select 
    parent, newNodeId, pathLen+1

  from
    adjacencies

  where
    child = i_parent
  
  ;
  
  -- because of the self-ref entry (parentId,parentId,0), the above query
  -- will also automatically do (parentId,child,1)
  -- (the direct parent assignment) as well
  
END //



-- 3.  DeleteNode
drop procedure if exists deleteNode //
create procedure deleteNode(
  i_nodeId int
)
BEGIN

  drop temporary table if exists adjsToRemove ;
  create temporary table adjsToRemove( ids int ) ;

  -- Get all adjacency relations WHERE
  -- any of MY GRANDPARENTS
  -- are pointing to any of MY CHILDREN:
  insert into adjsToRemove( ids )

  select
    relationId
  from
    adjacencies
  where
    parent in 
  (
  -- ALL MY PARENTS,grandparents
    select a.parent
    from adjacencies a
    where child=i_nodeId
    and parent!=i_nodeId
  )

  -- only concerns the relations of my
  -- grandparents WHERE MY CHILDREN ARE CONCERNED
  and child in
  -- get all my children
  (
    select child
    from adjacencies
    where parent=i_nodeId
  )

;

  -- those relations need be deleted
  delete from adjacencies where relationId in ( select ids from adjsToRemove ) limit 10000 ;

END //



-- 4.  MoveNode
-- --------------
-- --------------
-- ---- move ----
-- --------------
-- --------------
drop procedure if exists move //
create procedure move(
  i_nodeId int,
  i_newParent int
)

modifies sql data

BEGIN

  DECLARE myOldLeftSibling int ;
  DECLARE myNewLeftSibling int ;
  
  -- anyone who thinks i am his leftSibling
  -- needs to be updated to now think
  -- that myOldLeftSibling is now their leftSibling
  select id into myOldLeftSibling
  from nodes 
  where id=i_nodeId ;
  
  -- tell them i'm gone, if any
  update nodes set leftSibling=myOldLeftSibling where leftSibling=i_nodeId limit 1 ;
  
  
  
  -- my new left sibling will be the last child of
  -- my new parent
  select getRightmostChild( i_newParent ) into myNewLeftSibling ;
  
  -- avoid null values, use 0 instead
  if myNewLeftSibling is NULL then SET myNewLeftSibling=0 ; end if ;
  
  update nodes set leftSibling=myNewLeftSibling where id=i_nodeId limit 1 ;
  
  
  
  
  
  
  
  -- tell all my children to
  -- stop thinking my old parents
  -- are still their parents
  -- now tell all my children to stop considering
  -- ALL MY CURRENT PARENTS as their parents



  drop temporary table if exists adjsToRemove ;
  create temporary table adjsToRemove( ids int ) ;

  -- Get all adjacency relations WHERE
  -- any of MY GRANDPARENTS
  -- are pointing to any of MY CHILDREN:
  insert into adjsToRemove( ids )

  select
    relationId
  from
    adjacencies
  where
    parent in 
  (
  -- ALL MY PARENTS,grandparents
    select a.parent
    from adjacencies a
    where child=i_nodeId
    and parent!=i_nodeId
  )

  -- only concerns the relations of my
  -- grandparents WHERE MY CHILDREN ARE CONCERNED
  and child in
  -- get all my children
  (
    select child
    from adjacencies
    where parent=i_nodeId
  )

;

  -- those relations need be deleted
  delete from adjacencies where relationId in ( select ids from adjsToRemove ) limit 10000 ;

  
  
  
  



  -- this magic query forms all possible combinations
  -- of (grandparent1, child1) (grandparent2, child1) etc..
  -- so EVERY CHILD "meets" EVERY GRANDPARENT
  insert into adjacencies( parent, child, pathLen )
  select
    adjParent.parent,
    adjChild.child,
    
    adjParent.pathLen+1+adjChild.pathLen
    
  from
    adjacencies adjParent
  CROSS JOIN
    adjacencies adjChild
  on
    adjParent.child=i_newParent
    and adjChild.parent=i_nodeId
  
  ; 


END //
DELIMITER ;
-- no more procs
-- -------------
-- -------------





-- Tests:
call addNode( 'hi there, new child of 7', 7 ) ;
call addNode( '2nd child of 7', 7 ) ;
call addNode( '3rd child of 7', 7 ) ;

call move( 7, 5 ) ;

-- after the call move command above, the tree looks like this:

One Comment

- anietog
- Posted February 14, 2012 at 2:26 pm
- Permalink
hi, Thanks a lot for this great tutorial. It is helping a lot. However i have some conrcens related to Twin nodes.
i was wondering this algorithm will not work with twins slblings, i mean twins with same name as you can have in a XML file,for example two tags of same type. This algorithm is addind at the end of a sibling list. So it for example, i want to insert another ‘1.1’ and put it in order

So if i added using this algorithm i will end up with this sibling relationship: 1.1;1.2;1.3;1.1

But i think a good order should be like:1.1;1.1;1.2;1.3

should i need a getRightTwin() rather that getRightMost() ?

thanks a lot once again. Really helpful your article.

2 Trackbacks/Pingbacks

By Breadcrumps z mysql i php on 17 Jul 2011 at 12:33 pm

[…] […]
By What is the most efficient/elegant way to parse a flat table into a tree? on 06 Nov 2022 at 3:49 am

[…] More detail and SQL code on my blog. […]

Bobobobo’s Weblog