WordPress can be a relatively simple or extremely complex operation depending on the preliminary preparation that takes place on the starting site and the knowledge of the CMS functionality itself. Let’s see how to proceed together.
Study of the starting website
The first question to ask is: how is the starting site structured?
- it’s a static site (no database)
- it’s a dynamic site (with a database)
In the first case we are bound to the number of existing pages: if there are few we can simply import the contents manually in the WordPress editor, but if their number is high we have to create a script that automates the process of retrieving the data we are interested in, ie :
- title of the post or page
- the content of the post or page
- category of the post or page
How to do?
make a scan of the site directories looking for the HTML files present there following the tutorial: List files and directories with PHP
once you have obtained the HTML files, which I recommend to study the structure first, you will have to extract the title, the content and the category. You can use the PHP DOM extension, being careful to suppress errors in case of malformed marking (see this page using libxml_use_internal_errors ).
as regards the title of the post, you can use the element title, while for the content you will have to know which element encloses it (for example <div id=”content”></div>); for the category the speech is more complex: if it is not specified in the HTML code, it will be possible to extract it from the file path. For example, if the path is sito.it/articoli/news/the category it will be “news”, ie the name of the directory.
Example:
<pre class=”brush: php; html-script: true”>
function get_inner_html( $node ) {
$innerHTML= ”;
$children = $node->childNodes;
foreach ( $children as $child ) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
foreach( $files as $file ) {
$html = file_get_contents( $file );
$dom = new DOMDocument();
$dom->loadHTML( $file );
$title = $dom->getElementsByTagName( ‘title’ )->item( 0 )->
firstChild->nodeValue;
$content_element = $dom->getElementById( ‘content’ );
$content = get_inner_html( $content_element );
}
</pre>
If instead the site is dynamic and has a database, the following questions must be asked:
- Which table corresponds to posts or WordPress pages?
- Which table corresponds to WordPress taxonomies?
- Which table corresponds to WordPress users?
Suppose we have this “news” table:
id | title | subtitle | text | date | category |
---|---|---|---|---|---|
2 | Test | Lorem ipsum dolor … | <p> Lorem ipsum … | 05/25/2014 | 11 |
Here is the correlation:
Starting database | WordPress |
---|---|
title | post_title |
subtitle | post_excerpt |
text | POST_CONTENT |
date | POST_DATE |
category | post_category |
To get the category name:
SELECT nome FROM categorie WHERE id_categoria = 11
The easiest way to solve the category problem is still the following:
- Create in the WordPress backend the categories with the same name as those of the original database.
- Write down the ID that WordPress assigns to each category.
- Create an array like the following:
$cat_to_ids = array( array( 'titolo', 2 ), array( 'titolo', 3 ), array( 'titolo', 4 ) );
- In this array, each member is, in turn, an array that contains the name of the original database category and the ID that WordPress has assigned to the categories created.
- We will use the array with this function:
function get_wp_cat_id( $name ) { global $cat_to_ids; $id = 1; foreach( $cat_to_ids as $cid ) { $n = $cid[0]; if( $name == $n ) { $id = $cid[1]; } } return $id; }
- You will use this function passing it as a parameter the name of the categories of the starting database.
At this point you will be asked: will you have to populate the WordPress database manually?. The answer will be negative, in fact WordPress has some very useful functions that automate the whole process.
wp_insert_post () for importing articles
Analyze this function in action starting from the previous example:
<pre class=”brush: php; html-script: true”>
<? Php
/ * Template Name: Import * /
// Let’s create a page template that we will then remove
require_once ($ _SERVER [‘DOCUMENT_ROOT’]. ‘/wp-admin/includes/taxonomy.php’);
include (‘Database.php’);
$ db = new Database ();
// We need to use PHP’s MySQL functionality directly because WordPress’s $ wpdb only works
// with the WordPress database.
$ results = $ db-> fetch (‘SELECT * FROM news’);
foreach ($ results as $ result) {
$ title = $ result [‘title’];
$ excerpt = $ result [‘subtitle’];
$ content = $ result [‘text’];
$ date = $ result [‘data’]. ’00:00:00′; // WordPress has the format YYYY-mm-dd HH: MM: SS
$ cat = $ result [‘category’];
$ res = $ db-> fetch (“SELECT name FROM categories WHERE id_category = $ cat”);
$ name = $ res [0] [‘name’];
$ wp_cat = get_wp_cat_id ($ name);
$ args = array (
‘post_content’ => $ content,
‘post_title’ => $ title,
‘post_excerpt’ => $ excerpt,
‘post_date’ => $ date,
‘post_category’ => array ($ wp_cat),
‘post_status’ => ‘publish’,
‘post_author’ => 4 // ID of a pre-existing user in WordPress
);
wp_insert_post ($ args);
}
</pre>
We have created a page template in the current theme because in this case the two databases are on the same host, both the starting database and the destination database (our installation of WordPress). The remote MySQL option is strongly discouraged because it is a highly expensive and error-prone operation.
Before launching the import page you will have to check that PHP has the resources available to perform the operation, namely:
- at least 512 Mb of RAM;
- an execution timeout of at least 3 minutes.
Obviously, if it comes to importing a hundred articles these resources are redundant, but the question changes if the articles are in the order of thousands.
As you can see, the function wp_insert_post()very easily performs its task, if the operation was successful, it has as its return value the ID of the post just created. If you want, you can use this ID to perform further operations:
$post_id = wp_insert_post( $args ); $created_at = time(); update_post_meta( $post_id, 'created', $created_at );
In this case we have added a custom field to the post just inserted with the timestamp of its insertion.
wp_insert_user () to update WordPress users
Similar to the previous function, wp_insert_user()create or update a WordPress user. This function has three mandatory parameters:
- user_login: the username for the login;
- user_pass: the user’s clear password;
- user_email: the user’s email.
Since the password must be clear, no encrypted password can be used. For this reason, it is necessary to generate a new password for the user and send it to him by e-mail:
<pre class=”brush: php; html-script: true”>
function create_password ($ length = 16) {
$ valid_characters = ‘abcdefghijklmnopqrstuxyvwzABCDEFGHIJKLMNOPQRSTUXYVWZ + – * # & @ !?’;
$ valid_char_number = strlen ($ valid_characters);
$ result = ”;
for ($ i = 0; $ i <$ length; $ i ++) {
$ index = mt_rand (0, $ valid_char_number – 1);
$ result. = $ valid_characters [$ index];
}
return $ result;
}
$ results = $ db-> fetch (‘SELECT * FROM users’);
foreach ($ results as $ result) {
$ username = $ result [‘username’];
$ email = $ result [’email’];
$ pwd = create_password ();
$ args = array (
‘user_login’ => $ username,
‘user_pass’ => $pwd,
‘user_email’ => $email,
‘role’ => ‘subscriber’ // Roles in order of importance and privileges: administrator, contributor, editor, author, subscriber
);
$user_id = wp_insert_user( $args );
if( !is_wp_error( $user_id ) ) {
wp_mail( $email, ‘Password’, “Password: \n $pwd” );
}
}
</pre>
This function has the ID of the newly created user as the return value. If you want, you can use this ID to perform further operations:
$created_at = time(); update_user_meta( $user_id, 'member-since', $created_at );
In this case we have added a metadata to the user just inserted with the timestamp of its insertion.
wp_insert_attachment () and images
The function is wp_insert_attachment()used to insert images and other attachments in the Media Library. This function does not generate the directory hierarchy below /wp-content/uploads, so you can not use it to transfer, for example, the images of the starting site in WordPress.
The simplest solution is to check how the images have been associated with the contents of the starting site: you will notice that in the vast majority of cases the images are present as elements in the content.
So it’s about checking the image paths, especially if the directories of the starting site have been moved. What can be done in practice is to repair the routes if they generate an error 404.
Conclusion
Planning and study = importation. Never rush on this task without having clear how you want to proceed. Most problems arise from the superficiality and lack of documentation rather than the technical difficulties inherent in a migration.