In recent times we have witnessed a growing interest in technologies that, until a few years ago, seemed to be the competence of research centers and universities, but which are finding increasingly practical applications. Among these, of particular importance is machine learning (often abbreviated as ML ), which includes a whole series of statistical and artificial intelligence algorithms aimed at analyzing the data and their use to “learn” the trend or some characteristics to they derive. The learning phase ( training ) can be aimed at making fairly reliable predictions, or at classifying and grouping data automatically and as efficiently as possible ( clustering ).

On HTML.it we have already seen how to implement some machine learning algorithms (for example in the R language guide, expressly speaking about machine learning support ). In this lesson, we will provide an overview of how and why to use machine learning on Javascript, today one of the most widespread programming languages, and often used for the web.

## Why Javascript?

Approach to machine learning and Javascript is not common. Often, indeed, the machine learning is most easily implemented with other languages, such as Python or the aforementioned R . Yet, there is a large number of ready-to-use libraries, which allow implementing all the main machine learning algorithms on Javascript. Among these, some allow to implement neural networks ( brain.js , Synaptic , Neataptic ), others provide tools for NLP ( Natural ), others provide tools for deep learning ( Webdnn ) and many other functionalities (ml.js ). Combining this great availability of libraries with the possibility of using frameworks like Node.js (and therefore using Javascript also on the server side ), it is evident that Javascript becomes a concrete option even in potentially complete applications ready for production phases.

Other criticisms commonly addressed to Javascript are related to the efficiency and difficulty of manipulating matrices. However, the growing support for this programming language (thanks also to the diffusion of web-based mobile apps), which has reduced the differences between the use of Javascript and other interpreted languages (Python, R), especially in terms of execution times. In addition, there are many libraries that allow you to easily manipulate matrices and other data structures on Javascript: think for example to math.js, which is a fundamental help in this regard.

## A practical example

Clarified why it may make sense to implement machine learning applications on Javascript, we now see a practical example using the aforementioned ml.js library. In this practical example, we will see how to carry out a linear regression: it is a process that allows approximating the performance of a function, “learning it” from a so-called data set of “training” ( training set ).

## Installation of libraries

First of all we start with the installation of the necessary libraries, which we can carry out through the yarn manager:

$ yarn add ml-regression csvtojson

Alternatively, we can take advantage of *npm* :

$ npm install ml-regression csvtojson

Installing ml-regression we will get all the functionalities required to implement linear regression, while csvtojson is a library that we will use to import training data (available at this link and, as generally happens, formatted in CSV ).

## Initialization of libraries and loading of data

At this point, we create a new project on Node.js (referring to our guide to Node.js if we are not able), and insert the following code into the index.js file :

const ml = require("ml-regression"); const csv = require("csvtojson"); const SLR = ml.SLR; // Simple Linear Regression

const csvFilePath = “advertising.csv”; // path al file .csv

let csvData = [],

x = [],

y = [];

let regressionModel;

We now use the csvtojsonfromFile library method to load the contents of the .csv file into the csvData variable :

csv() .fromFile(csvFilePath) .on("json", (jsonObj) =&gt; { csvData.push(jsonObj); }) .on("done", () =&gt; { prepareData(); performRegression(); });

## Prepare the data for the regression

At this point, we need to convert the data stored within the csvData variable (in JSON format) into a format that allows us to perform our regression. To do this, we will have to populate the two x and y arrays , representing the inputs and outputs of our training set respectively. In this regard, we will use the prepareData() function , defined as follows:

function prepareData() { /** * Ogni riga dell'oggetto csvData sarà così formattata: * { * TV: "10", * Radio: "100", * Newspaper: "20", * Sales: "1000" * } */ csvData.forEach((row) =&gt; { X.push(parseFloat(row.Radio)); y.push(parseFloat(row.Sales)); }); }

## Training and prediction

All that remains is to understand how to perform the actual regression, implemented with the following function:

function performRegression() { regressionModel = new SLR(x, y); console.log(regressionModel.toString(3)); }

The first instruction of the function performRegression()performs training, and then the actual regression. Through the method toString()(whose parameter represents the number of decimals to be used for displaying decimal numbers), we can also convert the model into an easily readable form (type f (x) = a * x + b , where a and b are the parameters obtained by means of the regression).

At this point, all that remains is to use the newly trained model to perform any kind of prediction. To do this, we can use the method predict(), to which we will pass our input:

regressionModel.predict(my_input)

What we have just seen is obviously just a simple example of the application of machine learning on Javascript. Much more information on using the ml.js library can be found by referring to the ready-made examples available on the official GitHub repository .

In the second part of this article we will see another machine learning technique, always using ml.js : the kNN algorithm.

Above we saw how to use the ml.js library to implement a simple linear regression, which is perhaps the simplest machine learning algorithm . In this lesson we will always use ml.js , but we will implement a somewhat more complicated algorithm, but in some cases very useful: the kNN algorithm (acronym that stands for K – n earest n eighbors ). Before we see how to implement it on JavaScript, let’s briefly introduce its operation.

## How kNN works

kNN is an algorithm mainly used for data classification : using a training set (ie a series of data, each of which is labeled as belonging to a class), we can in fact train kNN so that it establishes which of the classes present in the training set belongs to a new data.

To better understand how kNN works, let’s consider a practical case. The following image shows an example training set, in which each data consists of two values (length and width of the leaves of a variety of plants):

In this case, each point is associated with a class. Passing this training set (appropriately coded) to a kNN algorithm, this will be able to “learn” how to classify the data, associating the new ones with one of the three available classes. In practice, kNN finds the data closest to the data to be classified, and associates this data with the nearest data class. The number of neighboring data to be compared is a parameter, typically referred to as k (hence kNN).

To understand better, let’s consider the following example:

In this case, with k = 3, the k points closest to the one to be classified (indicated by the X fuchsia in the previous figure) are all associated with class A. Consequently, the point to be classified will also be assigned to the same class. The following example shows a case in which the k neighboring belong to different classes:

In this example, the point will be associated with the class most represented among those of the neighbors (in this case, class B). Obviously, there may be cases in which there are no dominant classes (for example with k major d 3), but the techniques for disambiguating these particular cases fall outside the scope of this lesson.

## Use kNN

Clarified, in general, the operation of kNN, let’s see how you can use the library ml.js to implement it.

## Installation

Similarly to what we saw in the first part of this article, we install the necessary libraries. We can use yarn:

$ yarn add ml-knn csvtojson prompt

Alternatively, we can opt for npm :

$ npm install ml-knn csvtojson prompt

ml-knn implements kNN, while csvtojson (which we have already used) will load the training set from a CSV file . Finally, we will use prompts to implement an interactive example to verify the functioning of our algorithm.

The training data we will use (made publicly available by the University of California) can be downloaded at this link .

Initialization of libraries and loading of data

Now let’s create a new project on Node.js, and add the following code in the index.js file:

const KNN = require('ml-knn'); const csv = require('csvtojson'); const prompt = require('prompt'); let knn;

const csvFilePath = ‘iris.csv’; // path al file dei dati

const names = [‘sepalLength’, ‘sepalWidth’, ‘petalLength’, ‘petalWidth’, ‘type’]; // utilizzati per la visualizzazione

let seperationSize; // usata per separare i dati di training da quelli di test

let data = [], X = [], y = []; let trainingSetX = [], trainingSetY = [], testSetX = [], testSetY = [];

The variable separationSize will be used to decide what fraction of the training set will be used to verify the accuracy of the training results (which, however, will be automatically managed by ml.js ).

So we use the method fromFileto load data from the CSV file:

csv({noheader: true, headers: names}) .fromFile(csvFilePath) .on('json', (jsonObj) =&gt; { data.push(jsonObj); }) .on('done', (error) =&gt; { seperationSize = 0.7 * data.length; data = shuffleArray(data); dressData(); });

Each line of the CSV file (represented by the jsonObj variable ) is inserted into the data variable . At the end of the process (when, that is, all the CSV file has been analyzed), the seprationSize variable is set to 70% of the number of rows (input data) of the CSV file, represented by data.length.

The method shuffleArraydoes nothing more than mix randomly the data of the training set, to make the distribution more uniform. One possible implementation (taken from StackOverflow) is as follows:

function shuffleArray(array) { for (var i = array.length - 1; i &gt; 0; i--) { var j = Math.floor(Math.random() * (i + 1)); var temp = array[i]; array[i] = array[j]; array[j] = temp; } return array; }

## Data preparation

Before continuing, let’s take a look at the structure of the newly uploaded data. Consider the following example:

{ sepalLength: "5.1", sepalWidth: "3.5", petalLength: "1.4", petalWidth: "0.2", type: "Iris-setosa" }

typerepresents the name of the class, which we will have to convert into numbers. Furthermore, the other values represent the components of the data to be classified (in this case, therefore, each point has 4 dimensions): we must therefore also convert these values into numbers (via the function parseFloat). All of this is implemented within the function dressData(), defined as follows:

function dressData() { /** * Le classi sono rappresentati da tre possibili * valori del parametro type: * * 1. Iris-setosa * 2. Iris-versicolor * 3. Iris-virginica * * Convertiremo questi valori in numeri, secondo * lo schema seguente: * * Iris-setosa -> 0 * Iris-versicolor -> 1 * Iris-virginica -> 2 */ let types = new Set(); data.forEach((row) => { types.add(row.type); }); typesArray = [...types]; data.forEach((row) => { let rowArray, typeNumber; rowArray = Object.keys(row).map(key => parseFloat(row[key])).slice(0, 4); typeNumber = typesArray.indexOf(row.type); X.push(rowArray); y.push(typeNumber); }); trainingSetX = X.slice(0, seperationSize); trainingSetY = y.slice(0, seperationSize); testSetX = X.slice(seperationSize); testSetY = y.slice(seperationSize); train(); }

## Training and prediction

Note that we also used the function dressData()to perform the actual training, using the function train(), defined as follows:

function train() { knn = new KNN(trainingSetX, trainingSetY, {k: 7}); test(); }

The training method requires two mandatory arguments: the input data (here called trainingSetX , and including in this case the values of sepalLength , sepalWidth , petalLength and petalWidth ) and those of output (here called trainingSetY , and representing the class of every entry). Moreover, optionally it is possible to specify the value of k , which by default is equal to 5. In this case, we will train kNN to work with k = 7.

Once the training set is prepared, we evaluate the performances with the method test(), which estimates how many prediction errors are made:

function test() { const result = knn.predict(testSetX); const testSetLength = testSetX.length; const predictionError = error(result, testSetY); console.log(`Dimensione Test Set: ${testSetLength}\nNumero di errori di classificazione: ${predictionError}`); predict(); }

The function error()calculates the number of classification errors, as follows:

function error(predicted, expected) { let misclassifications = 0; for (var index = 0; index &lt; predicted.length; index++) { if (predicted[index] !== expected[index]) { misclassifications++; } } return misclassifications; }

At this point, you just have to try to make predictions. To do this, we can take advantage of the prompt library as follows:

function predict() { let temp = []; prompt.start(); prompt.get(['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width'], function (err, result) { if (!err) { for (var key in result) { temp.push(parseFloat(result[key])); } console.log(`With ${temp} -- type = ${knn.predict(temp)}`); } }); }