WordPress to MODx Migration Part 2: Schema Relationships and Comments
In part 1 of this Wordpress to MODx migration I introduced you to xPDO's schema and model generators and started work on migrating wordpress post and page data into MODx. In this post I'm going to show you how I built the relationships in the wordpress schema file and how I pulled in all of wordpress's comments and kept the threading in tact. Plus the script has evolved to handle where to store pages vs posts, what to do with versions, keeping child/parent relationships intact, and some basic cleaning up by placing config values in the configuration section.
Schema Relationships
Previously the schema file that xPDO generated was simply a dump of what it found in the database. Relationships have to be built manually and I've since done that (as well as getting rid of any tables we don't need, like plugin tables). I basically followed the wiki on how to make relationships in a schema file and built the relationships based on the table structure and my knowledge of how wordpress works on the database level. Here is the current schema file which is most likely still a work in progress.
<?xml version="1.0" encoding="UTF-8"?>
<model package="wordpress" baseClass="xPDOObject" platform="mysql" defaultEngine="MyISAM">
  <object class="Comments" table="comments" extends="xPDOObject">
		<field key="comment_ID" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="comment_post_ID" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" index="index" />
		<field key="comment_author" dbtype="tinytext" phptype="string" null="false" />
		<field key="comment_author_email" dbtype="varchar" precision="100" phptype="string" null="false" default="" />
		<field key="comment_author_url" dbtype="varchar" precision="200" phptype="string" null="false" default="" />
		<field key="comment_author_IP" dbtype="varchar" precision="100" phptype="string" null="false" default="" />
		<field key="comment_date" dbtype="datetime" phptype="datetime" null="false" default="0000-00-00 00:00:00" />
		<field key="comment_date_gmt" dbtype="datetime" phptype="datetime" null="false" default="0000-00-00 00:00:00" index="index" />
		<field key="comment_content" dbtype="text" phptype="string" null="false" />
		<field key="comment_karma" dbtype="int" precision="11" phptype="integer" null="false" default="0" />
		<field key="comment_approved" dbtype="varchar" precision="20" phptype="string" null="false" default="1" index="index" />
		<field key="comment_agent" dbtype="varchar" precision="255" phptype="string" null="false" default="" />
		<field key="comment_type" dbtype="varchar" precision="20" phptype="string" null="false" default="" />
		<field key="comment_parent" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" />
		<field key="user_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" />
		<field key="comment_subscribe" dbtype="enum" precision="'Y','N'" phptype="string" null="false" default="N" />
    <aggregate alias="Users" class="Users" local="user_id" foreign="ID" cardinality="one" owner="foreign" />
  </object>
	<object class="Options" table="options" extends="xPDOObject">
		<field key="option_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="blog_id" dbtype="int" precision="11" phptype="integer" null="false" default="0" />
		<field key="option_name" dbtype="varchar" precision="64" phptype="string" null="false" default="" index="unique" />
		<field key="option_value" dbtype="longtext" phptype="string" null="false" />
		<field key="autoload" dbtype="varchar" precision="20" phptype="string" null="false" default="yes" />
	</object>
  <object class="Postmeta" table="postmeta" extends="xPDOObject">
		<field key="meta_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="post_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" index="index" />
		<field key="meta_key" dbtype="varchar" precision="255" phptype="string" null="true" index="index" />
		<field key="meta_value" dbtype="longtext" phptype="string" null="true" />
	</object>
  <object class="Posts" table="posts" extends="xPDOObject">
		<field key="ID" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="post_author" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" />
		<field key="post_date" dbtype="datetime" phptype="datetime" null="false" default="0000-00-00 00:00:00" />
		<field key="post_date_gmt" dbtype="datetime" phptype="datetime" null="false" default="0000-00-00 00:00:00" />
		<field key="post_content" dbtype="longtext" phptype="string" null="false" />
		<field key="post_title" dbtype="text" phptype="string" null="false" />
		<field key="post_category" dbtype="int" precision="4" phptype="integer" null="false" default="0" />
		<field key="post_excerpt" dbtype="text" phptype="string" null="false" />
		<field key="post_status" dbtype="varchar" precision="20" phptype="string" null="false" default="publish" index="index" />
		<field key="comment_status" dbtype="varchar" precision="20" phptype="string" null="false" default="open" />
		<field key="ping_status" dbtype="varchar" precision="20" phptype="string" null="false" default="open" />
		<field key="post_password" dbtype="varchar" precision="20" phptype="string" null="false" default="" />
		<field key="post_name" dbtype="varchar" precision="200" phptype="string" null="false" default="" index="index" />
		<field key="to_ping" dbtype="text" phptype="string" null="false" />
		<field key="pinged" dbtype="text" phptype="string" null="false" />
		<field key="post_modified" dbtype="datetime" phptype="datetime" null="false" default="0000-00-00 00:00:00" />
		<field key="post_modified_gmt" dbtype="datetime" phptype="datetime" null="false" default="0000-00-00 00:00:00" />
		<field key="post_content_filtered" dbtype="text" phptype="string" null="false" />
		<field key="post_parent" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" index="index" />
		<field key="guid" dbtype="varchar" precision="255" phptype="string" null="false" default="" />
		<field key="menu_order" dbtype="int" precision="11" phptype="integer" null="false" default="0" />
		<field key="post_type" dbtype="varchar" precision="20" phptype="string" null="false" default="post" index="index" />
		<field key="post_mime_type" dbtype="varchar" precision="100" phptype="string" null="false" default="" />
		<field key="comment_count" dbtype="bigint" precision="20" phptype="integer" null="false" default="0" />
    <aggregate alias="Users" class="Users" local="post_author" foreign="ID" cardinality="one" owner="foreign" />
    <composite alias="Comments" class="Comments" local="ID" foreign="comment_post_ID" cardinality="many" owner="local" />
    <composite alias="PostMeta" class="Postmeta" local="ID" foreign="post_id" cardinality="many" owner="local" />
    <composite alias="TermRelationship" class="TermRelationships" local="ID" foreign="object_id" cardinality="many" owner="local" />
	</object>
  <object class="TermRelationships" table="term_relationships" extends="xPDOObject">
		<field key="object_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" index="pk" />
		<field key="term_taxonomy_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" index="pk" />
		<field key="term_order" dbtype="int" precision="11" phptype="integer" null="false" default="0" />
	  <aggregate alias="TermTaxonomy" class="TermTaxonomy" local="term_taxonomy_id" foreign="term_taxonomy_id" cardinality="one" owner="foreign" />
	  <aggregate alias="Post" class="Posts" local="object_id" foreign="ID" cardinality="one" owner="foreign" />
  </object>
  <object class="TermTaxonomy" table="term_taxonomy" extends="xPDOObject">
		<field key="term_taxonomy_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="term_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" index="index" />
		<field key="taxonomy" dbtype="varchar" precision="32" phptype="string" null="false" default="" index="index" />
		<field key="description" dbtype="longtext" phptype="string" null="false" />
		<field key="parent" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" />
		<field key="count" dbtype="bigint" precision="20" phptype="integer" null="false" default="0" />
    <composite alias="TermRelationship" class="TermRelationships" local="term_taxonomy_id" foreign="term_taxonomy_id" cardinality="many" owner="local" />
	</object>
  <object class="Terms" table="terms" extends="xPDOObject">
		<field key="term_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="name" dbtype="varchar" precision="200" phptype="string" null="false" default="" index="index" />
		<field key="slug" dbtype="varchar" precision="200" phptype="string" null="false" default="" index="unique" />
		<field key="term_group" dbtype="bigint" precision="10" phptype="integer" null="false" default="0" />
    <aggregate alias="TermTaxonomy" class="TermTaxonomy" local="term_id" foreign="term_id" cardinality="one" owner="foreign" />
  </object>
  <object class="Usermeta" table="usermeta" extends="xPDOObject">
		<field key="umeta_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="user_id" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" default="0" index="index" />
		<field key="meta_key" dbtype="varchar" precision="255" phptype="string" null="true" index="index" />
		<field key="meta_value" dbtype="longtext" phptype="string" null="true" />
	</object>
  <object class="Users" table="users" extends="xPDOObject">
		<field key="ID" dbtype="bigint" precision="20" attributes="unsigned" phptype="integer" null="false" index="pk"  generated="native" />
		<field key="user_login" dbtype="varchar" precision="60" phptype="string" null="false" default="" index="unique" />
		<field key="user_pass" dbtype="varchar" precision="64" phptype="string" null="false" default="" />
		<field key="user_nicename" dbtype="varchar" precision="50" phptype="string" null="false" default="" index="index" />
		<field key="user_email" dbtype="varchar" precision="100" phptype="string" null="false" default="" />
		<field key="user_url" dbtype="varchar" precision="100" phptype="string" null="false" default="" />
		<field key="user_registered" dbtype="datetime" phptype="datetime" null="false" default="0000-00-00 00:00:00" />
		<field key="user_activation_key" dbtype="varchar" precision="60" phptype="string" null="false" default="" />
		<field key="user_status" dbtype="int" precision="11" phptype="integer" null="false" default="0" />
		<field key="display_name" dbtype="varchar" precision="250" phptype="string" null="false" default="" />
    <composite alias="Usermeta" class="Usermeta" local="ID" foreign="user_id" cardinality="many" owner="local" />
	</object>
Once that was done, I deleted all of my generated model files, and regenerated them using the same script from the first post. Now we have model files with the relationships that we need for posts and comments, and maybe categories (those are next on the list to do). The nice part about doing the schema relationships and model generation is that once you get it right you're done and can simply copy these files in the future. That is, if you need to do another Wordpress to MODx migration, you don't have to repeat these steps.
New Migration Code
Below is the new migration code fully commented for your reading. This is still very much a work in progress and is here mostly to document my progress. However, I can see the benefit of adding this code to github for the masses to download once it gets a little more polished. It also has the potential to become a MODx package (if I figure out how to make one :) ). The code now includes a GPL license.
Key areas that have changed since the first iteration are:
- New configuration section. See lines 47-100
- Some code that truncates the content and comment tables. The flag used here will be set to false by default in the future, but for now it set to true for my convenience. See lines 137-145
- Refactored the code that inserts posts and pages. See lines 170-214
- Added code for inserting comments using quip. See lines 216-245
<?php
ini_set('max_execution_time',120);
/**
 * xPDO Wordpress to MODx Migration Code
 * Author James Ehly (devtrench.com)
 *
 * This script uses xpdo to connect to modx and wordpress packages to migrate
 * data from a wordpress database to a modx database
 *
 * The code below may or may not work for your particular wordpress and MODx install.
 * It should be used as a reference point, and you should customize it to your
 * specific needs, but I hope that most general cases are covered.
 *
 * This script is intended to be run on a new install of MODx Revolution.  If
 * you are using an existing MODx Revo install then the script will probably
 * work but might have some unexpected results. So, backup your data first, and
 * use at your own risk.
 *
 * This program is free software; you can redistribute it and/or modify it under
 * the terms of the GNU General Public License as published by the Free Software
 * Foundation; either version 2 of the License, or (at your option) any later
 * version.
 *
 * This program is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
 * FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
 * details.
 *
 * You should have received a copy of the GNU General Public License along with
 * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
 * Place, Suite 330, Boston, MA 02111-1307 USA
 *
 * @author James Ehly <james@devtrench.com>
 * @copyright Copyright 2010, James Ehly
 * @license http://opensource.org/licenses/gpl-2.0.php GNU Public License v2
 * @todo - figure out categories and other template variables
 * @todo - figure out post authors
 *
 */
################################################################################
# Configuration
################################################################################
/**
 * a flag to turn this script on and off
 */
$can_migrate = true;
/**
 * Wordpress Database settings
 *
 * $wp_database_host     string database hostname
 * $wp_database_charset  string database character set
 * $wp_database_name     string database name
 * $wp_database_prefix   string database table prefix
 * $wp_database_username string database username
 * $wp_database_password string database password
 */
$wp_database_host     = 'localhost';
$wp_database_charset  = 'utf-8';
$wp_database_name     = 'wordpress_devtrench';
$wp_database_prefix   = 'wp_';
$wp_database_username = 'root';
$wp_database_password = '';
/**
 * $post_document_parent
 * the MODx document parent id, or array of values (if the page needs to be
 * created) for WP posts. If set to 0 the parent will be the document root.
 * If the id does not exist, a document will be created.
 *
 * integer or array (see modResource for possible values)
 * example array: array('pagetitle'=>'Blog','context_key'=>'web')
 * !! don't forget the context_key or your pages won't show up in your context
 */
$post_document_parent = array(
        'pagetitle'=>'Blog',
        'alias'=>'blog',
        'published'=>1,
        'context_key'=>'web');
/**
 * $page_document_parent
 * the modx document parent id, or array of values (if the page needs to be
 * created) for WP pages. If set to 0, the parent will be the document root.
 * If the id does not exist, a document will be created.
 *
 * integer or array (see modResource for possible values)
 * example array: array('pagetitle'=>'Blog')
 */
$page_document_parent = 0;
/**
 * $empty_tables
 * if set to true, the tables that will be updated are emptied first.
 */
$empty_tables = true;
################################################################################
if (!$can_migrate) die('Migration cannot be processed, this script is off.');
// Include the xpdo and modx classes
include ('../../xpdo/xpdo.class.php');
include ('../../model/modx/modx.class.php');
// Instantiate a new modx object.  MODx inherits from xpdo so we can use it
// like an xpdo object, but it has the extra functions needed for saving content.
// Thanks to Shaun McCormick for writing docs on this.
$modx = new modX();
$modx->initialize('web');
// Add the Quip package
$can_migrate_comments = $modx->addPackage( 'quip', '../../components/quip/model/',
        $modx->db->config['table_prefix'] );
// Now instantiate a new xpdo object and add our wordpress package.  This gives
// us the ability to make queries on the wordpress database as an xpdo object.
$wp = new xPDO('mysql:host=' . $wp_database_host .
                ';dbname=' . $wp_database_name .
                ';charset=' . $wp_database_charset,
        $wp_database_username,
        $wp_database_password );
$can_migrate_wp = $wp->addPackage('wordpress','../',$wp_database_prefix);
if (!$can_migrate_wp) die('Wordpress Package could not be loaded.');
/**
 * during migrations where you need to try, try and try again to get it right,
 * it's easier to truncate the tables here instead of doing it manually.
 * setting $empty_tables to true will delete data from the following tables.
 */
if($empty_tables)
{
  if ($can_migrate_comments)
  {
    $modx->db->query('TRUNCATE ' . $modx->getFullTableName('quip_comments'));
    $modx->db->query('TRUNCATE ' . $modx->getFullTableName('quip_comments_closure'));
  }
  $modx->db->query('TRUNCATE ' . $modx->getFullTableName('site_content'));
}
// first we set up our document parents
$post_parent = setupParent($post_document_parent, 'Post');
if ($post_parent === false)
  die('There was an error setting up the post parent.
       Check your configuration value for $post_document_parent');
$page_parent = setupParent($page_document_parent, 'Page');
if (!$page_parent === false)
  die('There was an error setting up the page parent.
       Check your configuration value for $page_document_parent');
// get all wordpress posts.  Isn't this so easy?
$posts = $wp->getCollection('Posts');
$post_count = 0;
$comment_count = 0;
// iterate over each post and create a new modResource object, mapping our post
// fields to our wordpress fields
foreach($posts as $post)
{
  $resource = '';
  // logic for adding posts
  $type = $post->get('post_type');
  switch($type)
  {
    case 'post':
      $parent = $post_parent;
      $document_type = 'document';
      break;
    case 'page':
      $parent = $post->get('post_parent');
      $parent = (!empty($parent)) ? $parent : $page_parent;
      $document_type = 'document';
      break;
    case 'attachment':
      $document_type = 'attachment';
      break;
    case 'revision':
      $document_type = 'revision';
      break;
    default:
  }
  if ($document_type == 'document')
  {
    $resource = $modx->newObject('modResource');
    $data = array(
            'id'=>$post->get('ID'),
            'content'=>$post->get('post_content'),
            'pagetitle'=>$post->get('post_title'),
            'context_key'=>'web',
            'alias'=>$post->get('post_name'),
            'published'=> ($post->get('post_status') == 'publish') ? 1 : 0,
            'pub_date'=> ($post->get('post_status') == 'publish') ? $post->get('post_date') : 0,
            'parent' => $parent,
            'template'=> 1,
    );
    $resource->fromArray($data,'',true);
    // call the save function which inserts our object record into the database.
    $resource->save();
    $post_count++;
    $res_id = $resource->get('id');
  }
  if ($can_migrate_comments)
  {
    // get wp comments that are related to this post
    // Thanks BobRay for showing me how to do this
    $criteria = $wp->newQuery('Comments');
    $criteria->where(array(
            'comment_post_ID' => $post->get('ID'),
    ));
    $criteria->sortby('comment_ID','ASC');
    $comments = $post->getMany('Comments',$criteria);
    if (!is_null($comments))
    {
      foreach($comments as $comment)
      {
        $quip = $modx->newObject('quipComment');
        $comment_data = array(
                'id'=>$comment->get('comment_ID'),
                'thread'=>$post->get('post_title') . '_' . $post->get('ID'),
                'parent'=>$comment->get('comment_parent'),
                'author'=>$comment->get('user_id'),
                'body'=>$comment->get('comment_content'),
                'name'=>$comment->get('comment_author'),
                'email'=>$comment->get('comment_author_email'),
                'website'=>$comment->get('comment_author_url'),
                'ip'=>$comment->get('comment_author_ip'),
                'createdon'=>$comment->get('comment_date'),
                'resource'=>$res_id, // here's where the comment is joined to the post
        );
        $quip->fromArray($comment_data,'',true);
        $quip->save();
        $comment_count++;
      }
    }
  }
}
// Processing is now finished. Echo the number of posts inserted.
echo $post_count . ' post' . (($post_count == 1) ? '' : 's') . ' added. ' . $comment_count . ' comment' . (($comment_count == 1) ? '' : 's') . ' added.';
echo 'If you are done with migrating posts you should remove this file or set $can_migrate to false so this page can\'t be executed in the future.';
/**
 * creates a parent resource for posts and pages
 *
 * input integer | array (parent values), string (container name)
 * returns integer or false
 */
function setupParent($parent, $type='Undefined')
{
  global $wp,$modx;
  // if the parent is 0 return true
  if ($parent === 0)
  {
    return $parent;
  }
  // if the parent is an integer, get the resource
  if (is_integer($parent))
  {
    $parent_resource = $modx->getObject('modResource',$parent);
  }
  // if the parent already exists return true
  if(is_object($parent_resource))
  {
    return $parent;
  }
  // if the parent is null (resource not found), or the parent is an array,
  // create the resource
  if (is_null($parent_resource) || is_array($parent))
  {
    if (is_integer($parent))
      $parent = array('id'=>$parent,'pagetitle'=>$type . ' Container','published'=>1);
    $resource = $modx->newObject('modResource');
    $resource->fromArray($parent,'',true);
    $resource->save();
    return $resource->get('id');
  }
  else
  {
    // don't know what happened so fail
    return false;
  }
}
?>
That's all I have time to write on this for now. The past few weeks have been incredibly busy for me, so I'm glad that I was able to get this far and push out the code where it is. Part 3 will include mapping wordpress categories and authors, and hopefully the first commit into GitHub.
Once the script is finished I'm gonna migrate this blog, kiss Wordpress goodbye and say hello to MODx (and probably a ton of comment spam). Comment spam aside, I'm really looking forward to being able to work on this site in MODx. Maybe I can work in Akismet integration next if someone hasn't already done that.