WordPress and Excel: connected thanks to Python and XML-RPC APIs

0
724

Thanks to the XML-RPC API, we can make a Python script dialog with WordPress and do really interesting things like modifying the CMS from Excel files. This is the case of a customer who had to update his online catalog periodically starting from some Excel spreadsheets generated by his internal management system. These updates, in bi-monthly or quarterly average, include:

  • inclusion of new products;
  • removals of products no longer in the catalog;
  • modification of data relating to the product and/or its accessories;
  • product categorization;
  • moving products from one category to another;
  • inclusion of new categories or removal of categories that are no longer used;
  • definition of product variants;
  • assignment of products to multiple targets of use (a hammer affects both a mason and a carpenter);
  • assignment of technical manuals and/or commercial brochures to one or more products -and/or variants;
  • management of the ordering of products (taken from the order in the Excel sheet).

Given the amount of data to be updated and the impossibility of detecting what changes have been made to the excel sheets periodically provided, it has been proposed to the customer that each update consisted of a removal of the whole category/product hierarchy and the subsequent reconstruction of the starting from the data contained in the Excel sheets, provided that this ‘reconstruction’ of the catalog did not require more than thirty minutes of “disservice” with each update.

The limited time allowed for the update obviously excluded the possibility of manual insertion, modification or removal of data and files attached and led to finding a programmatic solution to the need. The choice of the language used for the implementation of the data import script fell on Python for the code conciseness, the wide range of available libraries and the high capacity to handle complex data structures with syntactic efficiency.

The following are examples of code taken directly from the import script and edited to make it as easy as possible to study or reuse.

Reading Excel sheets

To address the reading of Excel sheets, the xlrd library was used. The Excel sheet contains many columns, however we are only interested in some, it is also necessary to manage the possibility that some of the columns have been moved and/or exchanged places with others; finally, the organization of the Excel sheet provides that the value of a column (“PRODUCT”) has the same value for all the items that make up the product (a sort of bill of materials).

#!/usr/bin/env python2
# coding: utf-8

from decimal import Decimal
from pprint import pprint
from xlrd import open_workbook
import sys

#
# dizionario contenente la posizione di ogni colonna all'interno del file excel
# i nomi delle colonne sono definiti nella prima riga del foglio excel e devono
# essere identici ai nomi qui riportati. Eventuali colonne aggiuntive saranno
# semplicemente ignorate
#
XL_COLUMNS = {
    'PRODOTTO': -1,
    'DES PRODOTTO': -1,
    'TIPO': -1,
    'ARTICOLO': -1,
    'DES ITA': -1,
    'DES ENG': -1,
    'QTY': -1,
    'EUR ARTICOLO': -1
}

# --

def stringa( cell ):
"""
Ritorna il contenuto di una cella come stringa
"""
if cell.ctype == 2:
    return "%d" % cell.value
else:
    return cell.value.strip()

# --

def intero( cell ):
"""
Ritorna il contenuto di una cella come intero
Ritorna None se la cella non è numerica
"""
if cell.ctype == 2:
    return int( cell.value )
else:
    return None

# --

def decimale( cell ):
"""
Ritorna il contenuto di una cella come valore con 2 cifre decimali
Ritorna None se la cella non è numerica
"""
if cell.ctype == 2:
    return Decimal( str( cell.value ) ).quantize(Decimal('1.00'))
else:
    return None

# --

if __name__ == '__main__':
#
# recupero posizioni colonne
#
xl = open_workbook('listino.xls')

sheet = xl.sheet_by_index(0)

for col in range( sheet.ncols):
XL_COLUMNS[ stringa( sheet.cell(0,col) ) ] = col

if -1 in XL_COLUMNS.values():
    print "Intestazioni colonne mancanti alla riga 1:"
for k,v in XL_COLUMNS.items():
if v == -1:
    print k
sys.exit( 1 )

#
# recupero gerarchia prodotti/dotazioni/accessori/opzioni/servizi
#
products = {}
valid_article_codes = []

for row in range(1, sheet.nrows):
product_code = stringa( sheet.cell(row, XL_COLUMNS['PRODOTTO']) ).upper()
article_code = stringa( sheet.cell(row, XL_COLUMNS['ARTICOLO']) ).upper()
article_type = stringa( sheet.cell(row, XL_COLUMNS['TIPO']) )
descr_italian = stringa( sheet.cell(row, XL_COLUMNS['DES ITA']) )
descr_english = stringa( sheet.cell(row, XL_COLUMNS['DES ENG']) )
qty = intero( sheet.cell(row, XL_COLUMNS['QTY']) )
price = decimale( sheet.cell(row, XL_COLUMNS['EUR ARTICOLO']) )

valid_article_codes.append( article_code )

#
#
#
if not product_code and not article_type:
continue

if product_code not in products:
products[ product_code ] = {
    'accessories': [],
    'options': [],
    'services': [],
    'versions': [],
    'dotations': [],
    'targets': [],
    'ordinamento': row,
}

if product_code == article_code: # Prodotto Principale
products[ product_code ]['descr_italian'] = descr_italian
products[ product_code ]['descr_english'] = descr_english
products[ product_code ]['price'] = price

elif article_type == 'A': # Accessorio
products[ product_code ]['accessories'].append({
    'article_code': article_code,
    'descr_italian': descr_italian,
    'descr_english': descr_english,
    'price': price,
})

elif article_type == 'D': # Dotazione
if 'codice sconto' not in descr_italian.lower():
products[ product_code ]['dotations'].append({
    'article_code': article_code,
    'descr_italian': descr_italian,
    'descr_english': descr_english,
    'qty': qty,
})

elif article_type == 'O': # Opzione
products[ product_code ]['options'].append({
    'article_code': article_code,
    'descr_italian': descr_italian,
    'descr_english': descr_english,
    'price': price,
})

elif article_type == 'S': # Servizio
products[ product_code ]['services'].append({
    'article_code': article_code,
    'descr_italian': descr_italian,
    'descr_english': descr_english,
    'price': price,
})

elif article_type == 'V': # Versione
products[ product_code ]['versions'].append({
    'article_code': article_code,
    'descr_italian': descr_italian,
    'descr_english': descr_english,
    'price': price,
})

#
# stampa la struttura dati ottenuta dal file excel
#

pprint( products )

Once this is done, the next step will be to retrieve information about the media to be associated, such as PDF documents and images.

Recover information about the media to associate (PDF and images)

The images and PDF files related to the articles are uploaded and/or replaced by the WordPress media manager. All files to be associated with Articles must have a name composed according to the scheme: <codice articolo>[-<testo libero>].[jpg|gif|png]. Exceptions are PDF files, which are indexed by name and associated by specific columns in the XLS file (association not shown here), because multiple products can refer to the same PDF file. So if for example, the code of our article is ‘ABCD’, the following files will be successfully associated:

ABCD.png
ABCD.gif
ABCD-fronte.png
ABCD-retro.png
ABCD-view-front-ingrandita.jpg

While the following files will not be associated:

logo.png &amp;lt;--- unknown article code
ABCDE.gif &amp;lt;--- unknown article code
ABCD.txt &amp;lt;--- extension not acceptable
ABCD ref.jpg &amp;lt;--- unknown article code

The purpose of the following code is to create a simple list of files associated with each known article:

#
# connection parameters to the server
#
WP_URL = 'http://www.mywpsite.com/xmlrpc.php' # site URL + /xmlrpc.php
WP_USERNAME = 'a-user-wp' # use a real username
WP_PASSWORD = 'fp *********] h' # use a real password
WP_BLOGID = ''                                
WP_STATUS_DRAFT = 0
WP_STATUS_PUBLISHED = 1

#
# XML-RPC connection to the WordPress server
#
server = xmlrpclib.ServerProxy (WP_URL)

#
# recovery of media information
#
images = {}
pdf = {}

media_library = []
media_offset = 0
media_sublibrary = []

#
# we request data to WordPress, 100 media at a time up to
# completion of the media list
#
while not media_offset or media_sublibrary:
    media_library.extend( media_sublibrary )
    media_sublibrary = server.wp.getMediaLibrary( WP_BLOGID, WP_USERNAME, WP_PASSWORD, {'number': 100, 'offset': media_offset} )
    media_offset += 100

#
# analizziamo il nome del media ed eventualmente memorizziamo l'id
#
for media in media_library:
#
# trova l'id articolo corrispondente
#
article_code = media['link'].split('/')[-1]

if article_code.endswith( '.pdf' ):
    pdf[ article_code ] = media['attachment_id']
    continue

while article_code != '' and article_code not in valid_article_codes:
    article_code = '-'.join( article_code.split('-')[:-1] )

if not article_code: # codice articolo sconosciuto
    continue

#
# match trovato, memorizza il link in pdf[] o in images[]
#
if media['link'].lower().endswith( '.pdf' ):
    if article_code not in pdf:
        pdf[ article_code ] = []

        pdf[ article_code ].append( media['attachment_id'] )

elif media['link'].lower()[-4:] in [ '.jpg', '.gif', '.png' ]:
    if article_code not in images:
        images[ article_code ] = []

        images[ article_code ].append( media['attachment_id'] )

Indexing of product categories

We index the categories defined as terms within the taxonomy, so as to be able to specify the right category when each product is created.

WP_TAXONOMY = 'categorie-prodotti'
#
# indicizzazione categorie
#
wp_cat_ids = {}

wp_cat = server.wp.getTerms( WP_BLOGID, WP_USERNAME, WP_PASSWORD, WP_TAXONOMY )

if not wp_cat:
    print "ERROR: no categories in", WP_TAXONOMY
    sys.exit( 1 )

for cat in wp_cat:
    if cat['parent'] == '0':
        wp_cat_ids[ cat['slug'] ] = {
            'id': cat['term_id'],
            'children': {}
    }

for subcat in wp_cat:
    if subcat['parent'] == cat['term_id']:
    wp_cat_ids[ cat['slug'] ]['children'][ subcat['slug'] ] = subcat['term_id']

Elimination of obsolete data

Within the original script, further checks and associations of data are performed (presence of at least one image for each product, association of the product with the categories and the destinations of use and others ..). Once all the associations and the non-destructive data checks have been completed, we can proceed with the actual importation, preceded by the total cancellation of the obsolete data. To avoid deleting a parent product before its children, the products in the reverse order of id:

old_products = []
old_offset = 0
old_subset = []
#
# otteniamo l'elenco di prodotti da eliminare
#
while not old_offset or old_subset:
    old_products.extend( old_subset )
    old_subset = server.wp.getPosts( WP_BLOGID, WP_USERNAME, WP_PASSWORD, {
        'post_type': 'products-catalog',
        'number': 100,
        'offset': old_offset
    })
old_offset + = 100

old_products = sorted (old_products, key = lambda product: product ['post_id'], reverse = True)

for product in old_products:
    server.wp.deletePost (WP_BLOGID, WP_USERNAME, WP_PASSWORD, product ['post_id'])

The next step will therefore be the one concerning the insertion of new products followed by the connection of the configurations to the parent products.

Insertion of new products

At this point we have eliminated all the products, the categories, the targets (intended use) and the media (images and PDF) have not been touched so, for example, the links in the menus that will lead to the list of products of a specific category or with a specific intended use are still fully functional. We start to import the new data, following the order of the Excel sheet:

LANG = 'IT'

sorted_products = sorted ([item for item in products.items ()], key = lambda p: p [1] ['sort'], reverse = True)

for product_code, product in sorted_products:

    slug = product ['category']. lower (). replace ('+', ''). replace ('', '-'). replace ('', '-') + '-' + LANG.lower ()
    subslug = product ['subcat']. lower (). replace ('+', ''). replace ('', '-'). replace ('', '-') + '-' + LANG.lower ()

if slug == subslug:
    ids = [wp_cat_ids [slug] ['id']]

else:
    ids = [wp_cat_ids [slug] ['id'], wp_cat_ids [slug] ['children'] [subslug]]

data = {
    'post_title': product ['descr_' + {'IT': 'italian', 'EN': 'english'} [LANG]],
    'post_type': 'card-products',
    'post_content': make_content (LANG, product),
    'terms': {WP_TAXONOMY: ids},
    'post_status': 'publish',
    'custom_fields': [],
}

if product_code in images and images [product_code]:
    data ['post_thumbnail'] = images [product_code] [0]

else:
    data ['post_thumbnail'] = 9 # absent image, we replace it with company logo (id = 9)

    id_doc = 1

    data ['custom_fields'] = []

for document in product ['documents']: # product ['documents'] = list of PDF files to be associated
    if id_doc&amp;gt; 6: # max 6 documents
        continuous

if document.startswith ('qrg_'):
    title = {'IT': 'Quick Guide', 'EN': 'Quick Reference Guide'} [LANG]

else:
    title = {'IT': 'Product Sheet', 'EN': 'Data Sheet'} [LANG]

media_id = pdf.get (document.replace ('', '-'). replace ('+', '-'). replace ('-', '-'), 0)

if media_id:
    data ['custom_fields']. append ({'key': 'title_link_% d'% id_doc, 'value': title})
    data ['custom_fields']. append ({'key': 'attachment_% d'% id_doc, 'value': media_id})
    id_doc + = 1

data ['custom_fields']. append ({
    'key': 'target',
    'value': product ['targets'] # product ['targets'] = list of usage destinations
})

data ['custom_fields']. append ({
    'key': 'sort',
    'value': product ['sort']
})

product ['wp_id'] = server.wp.newPost (WP_BLOGID, WP_USERNAME, WP_PASSWORD, date, WP_STATUS_PUBLISHED)

Connection of configurations (variants) to parent products

The products have a custom field called “configurations” that contains the list of product ids that are variants of the parent product. After having inserted all the products and obtained the relative WordPress id, we can proceed with the update of the parent products and enter the list of child products. The list of configurations was obtained from another Excel file and inserted in the key configurations of each product as a list of product codes:

for product_code, product in products.items ():
        if product.get ('configurations', None) == None:
            continuous

configurations = []

for configuration in product['configurations']:
    configurations.append( products[ configuration ]['wp_id'])

server.wp.editPost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, product['wp_id'], {
    'custom_fields': [ {
        'key': 'configurazioni',
        'value': configurations
    } ]
}

At this point, our task is finished and we have updated the catalog. The original script, which performs many more checks than those listed here, imports 100 products with related category associations, subcategories, targets, PDF documents, images in a time never exceeding 3 minutes. As you can see, the XMLRPC commands used are very few and since XMLRPC is a standard protocol, it is possible to implement the code in any other language chosen by the developer.

The commands used are:

Topic Method
server xmlrpclib.ServerProxy( WP_URL )
average server.wp.getMediaLibrary( WP_BLOGID, WP_USERNAME, WP_PASSWORD, { 'number': 100, 'offset': offset } )
categories server.wp.getTerms( WP_BLOGID, WP_USERNAME, WP_PASSWORD, WP_TAXONOMY )
term_id server.wp.newTerm( WP_BLOGID, WP_USERNAME, WP_PASSWORD, data)
status server.wp.deleteTerm( WP_BLOGID, WP_USERNAME, WP_PASSWORD, WP_TAXONOMY, term_id )
subset server.wp.getPosts( WP_BLOGID, WP_USERNAME, WP_PASSWORD, { 'post_type': 'prodotti-catalogo', 'number': 100, 'offset': offset } )
post_id server.wp.newPost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, data, WP_STATUS_PUBLISHED )
status server.wp.editPost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, post_id, modified_data )
status server.wp.deletePost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, post_id )

 

References available via the WordPress Codex:

A thank you of the author goes to Andrea Papotti for the implementation of the Python code.

LEAVE A REPLY

Please enter your comment!
Please enter your name here