Thanks to the XML-RPC API, we can make a Python script dialog with WordPress and do really interesting things like modifying the CMS from Excel files. This is the case of a customer who had to update his online catalog periodically starting from some Excel spreadsheets generated by his internal management system. These updates, in bi-monthly or quarterly average, include:
- inclusion of new products;
- removals of products no longer in the catalog;
- modification of data relating to the product and/or its accessories;
- product categorization;
- moving products from one category to another;
- inclusion of new categories or removal of categories that are no longer used;
- definition of product variants;
- assignment of products to multiple targets of use (a hammer affects both a mason and a carpenter);
- assignment of technical manuals and/or commercial brochures to one or more products -and/or variants;
- management of the ordering of products (taken from the order in the Excel sheet).
Given the amount of data to be updated and the impossibility of detecting what changes have been made to the excel sheets periodically provided, it has been proposed to the customer that each update consisted of a removal of the whole category/product hierarchy and the subsequent reconstruction of the starting from the data contained in the Excel sheets, provided that this ‘reconstruction’ of the catalog did not require more than thirty minutes of “disservice” with each update.
The limited time allowed for the update obviously excluded the possibility of manual insertion, modification or removal of data and files attached and led to finding a programmatic solution to the need. The choice of the language used for the implementation of the data import script fell on Python for the code conciseness, the wide range of available libraries and the high capacity to handle complex data structures with syntactic efficiency.
The following are examples of code taken directly from the import script and edited to make it as easy as possible to study or reuse.
Reading Excel sheets
To address the reading of Excel sheets, the xlrd library was used. The Excel sheet contains many columns, however we are only interested in some, it is also necessary to manage the possibility that some of the columns have been moved and/or exchanged places with others; finally, the organization of the Excel sheet provides that the value of a column (“PRODUCT”) has the same value for all the items that make up the product (a sort of bill of materials).
#!/usr/bin/env python2 # coding: utf-8 from decimal import Decimal from pprint import pprint from xlrd import open_workbook import sys # # dizionario contenente la posizione di ogni colonna all'interno del file excel # i nomi delle colonne sono definiti nella prima riga del foglio excel e devono # essere identici ai nomi qui riportati. Eventuali colonne aggiuntive saranno # semplicemente ignorate # XL_COLUMNS = { 'PRODOTTO': -1, 'DES PRODOTTO': -1, 'TIPO': -1, 'ARTICOLO': -1, 'DES ITA': -1, 'DES ENG': -1, 'QTY': -1, 'EUR ARTICOLO': -1 } # -- def stringa( cell ): """ Ritorna il contenuto di una cella come stringa """ if cell.ctype == 2: return "%d" % cell.value else: return cell.value.strip() # -- def intero( cell ): """ Ritorna il contenuto di una cella come intero Ritorna None se la cella non è numerica """ if cell.ctype == 2: return int( cell.value ) else: return None # -- def decimale( cell ): """ Ritorna il contenuto di una cella come valore con 2 cifre decimali Ritorna None se la cella non è numerica """ if cell.ctype == 2: return Decimal( str( cell.value ) ).quantize(Decimal('1.00')) else: return None # -- if __name__ == '__main__': # # recupero posizioni colonne # xl = open_workbook('listino.xls') sheet = xl.sheet_by_index(0) for col in range( sheet.ncols): XL_COLUMNS[ stringa( sheet.cell(0,col) ) ] = col if -1 in XL_COLUMNS.values(): print "Intestazioni colonne mancanti alla riga 1:" for k,v in XL_COLUMNS.items(): if v == -1: print k sys.exit( 1 ) # # recupero gerarchia prodotti/dotazioni/accessori/opzioni/servizi # products = {} valid_article_codes = [] for row in range(1, sheet.nrows): product_code = stringa( sheet.cell(row, XL_COLUMNS['PRODOTTO']) ).upper() article_code = stringa( sheet.cell(row, XL_COLUMNS['ARTICOLO']) ).upper() article_type = stringa( sheet.cell(row, XL_COLUMNS['TIPO']) ) descr_italian = stringa( sheet.cell(row, XL_COLUMNS['DES ITA']) ) descr_english = stringa( sheet.cell(row, XL_COLUMNS['DES ENG']) ) qty = intero( sheet.cell(row, XL_COLUMNS['QTY']) ) price = decimale( sheet.cell(row, XL_COLUMNS['EUR ARTICOLO']) ) valid_article_codes.append( article_code ) # # # if not product_code and not article_type: continue if product_code not in products: products[ product_code ] = { 'accessories': [], 'options': [], 'services': [], 'versions': [], 'dotations': [], 'targets': [], 'ordinamento': row, } if product_code == article_code: # Prodotto Principale products[ product_code ]['descr_italian'] = descr_italian products[ product_code ]['descr_english'] = descr_english products[ product_code ]['price'] = price elif article_type == 'A': # Accessorio products[ product_code ]['accessories'].append({ 'article_code': article_code, 'descr_italian': descr_italian, 'descr_english': descr_english, 'price': price, }) elif article_type == 'D': # Dotazione if 'codice sconto' not in descr_italian.lower(): products[ product_code ]['dotations'].append({ 'article_code': article_code, 'descr_italian': descr_italian, 'descr_english': descr_english, 'qty': qty, }) elif article_type == 'O': # Opzione products[ product_code ]['options'].append({ 'article_code': article_code, 'descr_italian': descr_italian, 'descr_english': descr_english, 'price': price, }) elif article_type == 'S': # Servizio products[ product_code ]['services'].append({ 'article_code': article_code, 'descr_italian': descr_italian, 'descr_english': descr_english, 'price': price, }) elif article_type == 'V': # Versione products[ product_code ]['versions'].append({ 'article_code': article_code, 'descr_italian': descr_italian, 'descr_english': descr_english, 'price': price, }) # # stampa la struttura dati ottenuta dal file excel # pprint( products )
Once this is done, the next step will be to retrieve information about the media to be associated, such as PDF documents and images.
Recover information about the media to associate (PDF and images)
The images and PDF files related to the articles are uploaded and/or replaced by the WordPress media manager. All files to be associated with Articles must have a name composed according to the scheme: <codice articolo>[-<testo libero>].[jpg|gif|png]. Exceptions are PDF files, which are indexed by name and associated by specific columns in the XLS file (association not shown here), because multiple products can refer to the same PDF file. So if for example, the code of our article is ‘ABCD’, the following files will be successfully associated:
ABCD.png ABCD.gif ABCD-fronte.png ABCD-retro.png ABCD-view-front-ingrandita.jpg
While the following files will not be associated:
logo.png &lt;--- unknown article code ABCDE.gif &lt;--- unknown article code ABCD.txt &lt;--- extension not acceptable ABCD ref.jpg &lt;--- unknown article code
The purpose of the following code is to create a simple list of files associated with each known article:
# # connection parameters to the server # WP_URL = 'http://www.mywpsite.com/xmlrpc.php' # site URL + /xmlrpc.php WP_USERNAME = 'a-user-wp' # use a real username WP_PASSWORD = 'fp *********] h' # use a real password WP_BLOGID = '' WP_STATUS_DRAFT = 0 WP_STATUS_PUBLISHED = 1 # # XML-RPC connection to the WordPress server # server = xmlrpclib.ServerProxy (WP_URL) # # recovery of media information # images = {} pdf = {} media_library = [] media_offset = 0 media_sublibrary = [] # # we request data to WordPress, 100 media at a time up to # completion of the media list # while not media_offset or media_sublibrary: media_library.extend( media_sublibrary ) media_sublibrary = server.wp.getMediaLibrary( WP_BLOGID, WP_USERNAME, WP_PASSWORD, {'number': 100, 'offset': media_offset} ) media_offset += 100 # # analizziamo il nome del media ed eventualmente memorizziamo l'id # for media in media_library: # # trova l'id articolo corrispondente # article_code = media['link'].split('/')[-1] if article_code.endswith( '.pdf' ): pdf[ article_code ] = media['attachment_id'] continue while article_code != '' and article_code not in valid_article_codes: article_code = '-'.join( article_code.split('-')[:-1] ) if not article_code: # codice articolo sconosciuto continue # # match trovato, memorizza il link in pdf[] o in images[] # if media['link'].lower().endswith( '.pdf' ): if article_code not in pdf: pdf[ article_code ] = [] pdf[ article_code ].append( media['attachment_id'] ) elif media['link'].lower()[-4:] in [ '.jpg', '.gif', '.png' ]: if article_code not in images: images[ article_code ] = [] images[ article_code ].append( media['attachment_id'] )
Indexing of product categories
We index the categories defined as terms within the taxonomy, so as to be able to specify the right category when each product is created.
WP_TAXONOMY = 'categorie-prodotti' # # indicizzazione categorie # wp_cat_ids = {} wp_cat = server.wp.getTerms( WP_BLOGID, WP_USERNAME, WP_PASSWORD, WP_TAXONOMY ) if not wp_cat: print "ERROR: no categories in", WP_TAXONOMY sys.exit( 1 ) for cat in wp_cat: if cat['parent'] == '0': wp_cat_ids[ cat['slug'] ] = { 'id': cat['term_id'], 'children': {} } for subcat in wp_cat: if subcat['parent'] == cat['term_id']: wp_cat_ids[ cat['slug'] ]['children'][ subcat['slug'] ] = subcat['term_id']
Elimination of obsolete data
Within the original script, further checks and associations of data are performed (presence of at least one image for each product, association of the product with the categories and the destinations of use and others ..). Once all the associations and the non-destructive data checks have been completed, we can proceed with the actual importation, preceded by the total cancellation of the obsolete data. To avoid deleting a parent product before its children, the products in the reverse order of id:
old_products = [] old_offset = 0 old_subset = [] # # otteniamo l'elenco di prodotti da eliminare # while not old_offset or old_subset: old_products.extend( old_subset ) old_subset = server.wp.getPosts( WP_BLOGID, WP_USERNAME, WP_PASSWORD, { 'post_type': 'products-catalog', 'number': 100, 'offset': old_offset }) old_offset + = 100 old_products = sorted (old_products, key = lambda product: product ['post_id'], reverse = True) for product in old_products: server.wp.deletePost (WP_BLOGID, WP_USERNAME, WP_PASSWORD, product ['post_id'])
The next step will therefore be the one concerning the insertion of new products followed by the connection of the configurations to the parent products.
Insertion of new products
At this point we have eliminated all the products, the categories, the targets (intended use) and the media (images and PDF) have not been touched so, for example, the links in the menus that will lead to the list of products of a specific category or with a specific intended use are still fully functional. We start to import the new data, following the order of the Excel sheet:
LANG = 'IT' sorted_products = sorted ([item for item in products.items ()], key = lambda p: p [1] ['sort'], reverse = True) for product_code, product in sorted_products: slug = product ['category']. lower (). replace ('+', ''). replace ('', '-'). replace ('', '-') + '-' + LANG.lower () subslug = product ['subcat']. lower (). replace ('+', ''). replace ('', '-'). replace ('', '-') + '-' + LANG.lower () if slug == subslug: ids = [wp_cat_ids [slug] ['id']] else: ids = [wp_cat_ids [slug] ['id'], wp_cat_ids [slug] ['children'] [subslug]] data = { 'post_title': product ['descr_' + {'IT': 'italian', 'EN': 'english'} [LANG]], 'post_type': 'card-products', 'post_content': make_content (LANG, product), 'terms': {WP_TAXONOMY: ids}, 'post_status': 'publish', 'custom_fields': [], } if product_code in images and images [product_code]: data ['post_thumbnail'] = images [product_code] [0] else: data ['post_thumbnail'] = 9 # absent image, we replace it with company logo (id = 9) id_doc = 1 data ['custom_fields'] = [] for document in product ['documents']: # product ['documents'] = list of PDF files to be associated if id_doc&gt; 6: # max 6 documents continuous if document.startswith ('qrg_'): title = {'IT': 'Quick Guide', 'EN': 'Quick Reference Guide'} [LANG] else: title = {'IT': 'Product Sheet', 'EN': 'Data Sheet'} [LANG] media_id = pdf.get (document.replace ('', '-'). replace ('+', '-'). replace ('-', '-'), 0) if media_id: data ['custom_fields']. append ({'key': 'title_link_% d'% id_doc, 'value': title}) data ['custom_fields']. append ({'key': 'attachment_% d'% id_doc, 'value': media_id}) id_doc + = 1 data ['custom_fields']. append ({ 'key': 'target', 'value': product ['targets'] # product ['targets'] = list of usage destinations }) data ['custom_fields']. append ({ 'key': 'sort', 'value': product ['sort'] }) product ['wp_id'] = server.wp.newPost (WP_BLOGID, WP_USERNAME, WP_PASSWORD, date, WP_STATUS_PUBLISHED)
Connection of configurations (variants) to parent products
The products have a custom field called “configurations” that contains the list of product ids that are variants of the parent product. After having inserted all the products and obtained the relative WordPress id, we can proceed with the update of the parent products and enter the list of child products. The list of configurations was obtained from another Excel file and inserted in the key configurations of each product as a list of product codes:
for product_code, product in products.items (): if product.get ('configurations', None) == None: continuous configurations = [] for configuration in product['configurations']: configurations.append( products[ configuration ]['wp_id']) server.wp.editPost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, product['wp_id'], { 'custom_fields': [ { 'key': 'configurazioni', 'value': configurations } ] }
At this point, our task is finished and we have updated the catalog. The original script, which performs many more checks than those listed here, imports 100 products with related category associations, subcategories, targets, PDF documents, images in a time never exceeding 3 minutes. As you can see, the XMLRPC commands used are very few and since XMLRPC is a standard protocol, it is possible to implement the code in any other language chosen by the developer.
The commands used are:
Topic | Method |
---|---|
server | xmlrpclib.ServerProxy( WP_URL ) |
average | server.wp.getMediaLibrary( WP_BLOGID, WP_USERNAME, WP_PASSWORD, { 'number': 100, 'offset': offset } ) |
categories | server.wp.getTerms( WP_BLOGID, WP_USERNAME, WP_PASSWORD, WP_TAXONOMY ) |
term_id | server.wp.newTerm( WP_BLOGID, WP_USERNAME, WP_PASSWORD, data) |
status | server.wp.deleteTerm( WP_BLOGID, WP_USERNAME, WP_PASSWORD, WP_TAXONOMY, term_id ) |
subset | server.wp.getPosts( WP_BLOGID, WP_USERNAME, WP_PASSWORD, { 'post_type': 'prodotti-catalogo', 'number': 100, 'offset': offset } ) |
post_id | server.wp.newPost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, data, WP_STATUS_PUBLISHED ) |
status | server.wp.editPost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, post_id, modified_data ) |
status | server.wp.deletePost( WP_BLOGID, WP_USERNAME, WP_PASSWORD, post_id ) |
References available via the WordPress Codex:
A thank you of the author goes to Andrea Papotti for the implementation of the Python code.