Face detection with Intel Perceptual Computing SDK

0
272

Intel Perceptual Computing SDK is a library, developed by Intel, which provides developers with a set of tools for creating natural interfaces (Natural User Interface – NUI).

The SDK is composed of a software part (libraries, samples, and documentation) and a hardware part constituted by Creative Creative Camera created by Creative that provides 3D detection and three-dimensional audio. In reality, many of the library’s features are also available with regular webcams.

A feature of SDK is to provide basic implementations for the main facial reconciliation algorithms, gesture and speech recognition, but also to allow the possibility to use their own recognition algorithms.

Architecture

The Intel Perceptual Computing SDK architecture is shown in the following figure:

The part below the “SDK Interfaces” represents the implementation of the different recognition algorithms, I / O modules (for example disk writing) and the services to support them (for example the module loading infrastructure or the interoperability part with the drivers).

This whole part is written in C / C ++ and is exposed to the developer through a layer of interfaces. The use of interfaces, appropriately instantiated by the Intel framework, allows us to replace our implementations of recognition modules with those provided by default by Intel.

Above the layer of interfaces, we find utility classes, always written in C / C ++, and a layer of porting (although it would be more correct to say projection) to some application frameworks commonly used in development such as .NET, Unity or openFramework.

At the top of the architectural stack, we find, finally, our applications (written in C / C ++ or one of the other frameworks) and the samples supplied with the SDK.

Going into details of the part concerning the actual development using the SDK, the architectural scheme of the applications is as follows:

The lower layer consists of libraries written in C ++ containing:

the basic objects (identified with the Core Framework ) in which we find the recognition session, the image class, the audio and so on;
the implementation provided by the framework for the recognition algorithms (Gesture, Face Analysis we will deal with in this article, Voice Recognition, etc.);
I / O modules for reading / writing the framework streams (for example the stream of images coming from the webcam).
Applications written in C ++ can use this layer directly or they can use other layers with more aggregate and less low-level features, referred to as UtilCapture and UtilPipeline.

The managed world (for the languages VB.NET and C #) uses, instead, the port named ” C # / VB.NET Port ” (library libpxcclr.dllthat we will see later). Within this we find a series of wrapper classes that reflect, more or less faithfully, the libraries available for C ++ developers (they are recognizable by the presence of a “M” after the prefix PXCin the name).

In this article, we will use this porting.

Install the SDK

To start we can download the Perceptual Computing SDK. The moment in which the article is being written, version 1.0.8779.0 is available.

The hardware requirements to use the framework are:

  • Intel Core II, III or IV generation processors;
  • Windows 7 SP1 or Windows 8 desktop operating system (32 and 64 bit);
  • 1GB space on hard disk.

Once you have downloaded the installer (or used the web), simply launch it and follow the procedure. At the end we restart the PC.

The Creative Gesture Camera

Before continuing with practical examples of programming, let’s see what are the main features of the Creative Gesture Camera:

  • a 720p camera (which can also be used as a normal webcam with resolution 1280×720);
  • a 3D depth sensor based on infrared technology capable of detecting hand gestures at close range (with resolution 320×240);
  • an “array” of two microphones that allow to localize the origin of the sound to guarantee better performances of speech recognition.

We can connect the device to the PC with a USB 2.0 port and its size allows us to hook it to the top of the display of a notebook or mount it with standard screws on a photographic tripod. It is rather important to have it close to the subject to be filmed: the range of use is between about 15 centimeters and a meter.

Now let’s see how to start the first project with Visual Studio.

The first Visual Studio project

We realize our first project using Intel Perceptual Computing SDK that will serve us to understand the functionality of Face Detection made available to the developer.

Let’s open Visual Studio 2012 and choose a new Windows Form project with C #:

Once the project is created and we refer the library libpxcclr.dllthat is in the folders:

C: \ Program Files (x86) \ Intel \ PCSDK \ bin \ win32
C: \ Program Files (x86) \ Intel \ PCSDK \ bin \ x64

As we anticipated, this library we find the wrapper classes that expose to the .NET world (C # and VB.NET), the functionality of the framework written in C ++.

To refer to the library we use the Add Reference…Visual Studio ” context menu” as shown in the figure:

A dialog box for selecting the dll to be connected to the project appears. We are looking for the file libpxcclr.dllwith the classic option Browse:

Once this is done, we are ready to write our Face Detection application.

The UtilMPipeline class

The demo project attached to the article provides the main form in which the image coming from the webcam will be displayed on which the information obtained through the APIs of the Perceptual Computing SDK will be shown.

Basically, we will use the following features:

  • Recovery of the RGB image taken from the webcam;
  • Recovery of the faces identified by the SDK in terms of position and identifier;
  • Recovery, for each face detected, of the important points (called landmarks ) and of the posture of the same (poses).

The main form of our application will use a class, which we will call FaceDetector, whose purpose is to interact with the SDK and update the UI.

public class FaceDetector
{
    private IFaceDetectorForm form;
    private bool disconnected = false;
    
    public FaceDetector(IFaceDetectorForm form)
    {
        if (form == null) throw new ArgumentNullException("form");
        this.form = form;
    }
    ... 
}

The interface IFaceDetectorFormdefines the methods that the form must implement in order for the class FaceDetectorto update itself with the information retrieved through the SDK’s APIs.

public interface IFaceDetectorForm
{
    void DisplayWebCamImage(System.Drawing.Bitmap bitmap);
    void DrawLocation(PXCMFaceAnalysis.Detection.Data detectionData);
    void DrawLandmark(PXCMFaceAnalysis.Landmark.LandmarkData[] landmarkData);
    void DrawPose(PXCMFaceAnalysis.Landmark.PoseData poseData);
    void UpdateStatus(string p);
    bool IsDetectionStopped();
    void UpdateGUI();
}

Through the use of the IFaceDetectorForm interface we try to decouple the Windows Form window from the FaceDetector class (with the possibility of using the latter also with other technologies such as, for example, WPF).

As already mentioned, the FaceDetector class accesses the SDK’s APIs and uses the UtilMPipeline class to do so. Pipelines are classes that expose a whole series of predefined methods that facilitate access to the connected device to retrieve the information, as in this case, relating to face detection.

In this case, the FaceDetector class exposes a method that allows to start the data recovery process:

public void StartLoop()
{
    bool isPipelineStopped=true;
    UtilMPipeline pipeline;
    disconnected=false;
    pipeline = new UtilMPipeline();
    pipeline.EnableFaceLocation();
    pipeline.EnableFaceLandmark();
    form.UpdateStatus("Init Started");
    
    if (pipeline.Init())
    {
        form.UpdateStatus("Streaming");

        while (!form.IsDetectionStopped())
        {
            if (!pipeline.AcquireFrame(true)) break;
            if (!DisplayDeviceConnection(pipeline.IsDisconnected()))
            {
                PXCMFaceAnalysis faceAnalysis = pipeline.QueryFace();
                DisplayImage(pipeline.QueryImage(PXCMImage.ImageType.IMAGE_TYPE_COLOR));
                DisplayLocation(faceAnalysis);
                form.UpdateGUI();
            }
            pipeline.ReleaseFrame();
        }
    }
    else
    {
        form.UpdateStatus("Init Failed");
        isPipelineStopped=false;
    }
    pipeline.Close();
    pipeline.Dispose();
    
    if (isPipelineStopped) form.UpdateStatus("Stopped");
}

We can identify three large blocks:

Creation of the pipeline and enabling of the functionality of face location and face landmark;
Recovery loop of RGB image, face location and landmarks;
A release of the pipeline.

Pipeline creation

The creation of it UtilMPipelinetakes place through a canon newfollowed by a call to the two methods EnableFaceLocation()e EnableFaceLandmark().

The call of these methods is essential, otherwise, when we are going to request the information, respectively, of the face location or face landmarks, these will not be available.

The loop that takes care of actually retrieving the information we are interested in is executed until the interface tells the class FaceDetectorthe intention to stop the process (return value of the IsDetectionStopped method of the IFaceDetectorForm interface). Inside the loop the following code is executed:

if (!pipeline.AcquireFrame(true)) break;
if (!DisplayDeviceConnection(pipeline.IsDisconnected()))
{
    DisplayImage(pipeline.QueryImage(PXCMImage.ImageType.IMAGE_TYPE_COLOR));
    PXCMFaceAnalysis faceAnalysis = pipeline.QueryFace();
    DisplayLocation(faceAnalysis);
    form.UpdateGUI();
}
pipeline.ReleaseFrame();

 

The webcam image

The simplest thing we can do with a web cam (and sometimes even the only one) is to recover the image immortalized by it.

In our case we will use the API made available by the Perceptual Computing SDK as shown in the following code:

private void DisplayImage (PXCMImage image)
{
    PXCMImage.ImageData data;
    var status = image.AcquireAccess (PXCMImage.Access.ACCESS_READ,     
                                     PXCMImage.ColorFormat.COLOR_FORMAT_RGB32,
                                     out date);
if (status == pxcmStatus.PXCM_STATUS_NO_ERROR)
    {
        PXCMImage.ImageInfo imageInfo = image.imageInfo;
        
        var bitmap = data.ToBitmap (imageInfo.width, imageInfo.height);
        
        form.DisplayWebCamImage (bitmap);
        
        image.ReleaseAccess (ref data);
    }
}

The structure of PXCMImage.ImageDatacontains the image retrieved from the camera and exposes some methods to convert it to Bitmapor WritableBitmap.

The comparison between the two pieces of code written in C # and VB.NET also highlights a further fact that must be taken into account by the VB.NET developers when they approach the use of the API.

The framework is organized according to grafted classes rather than exploiting namespaces as normally happens in .NET libraries. The structures ImageDataor ImageInfoare, in fact, engage in class PXCMImageinstead belong to a specific namespace for images (for example Intel.PerceptualComputing.Imaging). The problem for those working with “insensitive case” languages like VB.NET is that if there are members of the container class and nested classes of the same name, they both disappear. This is the case, for example, of the property imageInfoof the class PXCMImageand of the structure PXCMImage.ImageInfocontained in it: the unfortunate VB.NET programmer will not see them and will be forced to use the Reflection technique to recover data (or to create a specific C # project to implement extension methods to recover data). Reflection is used in the attached project as an example.

Recover the detected faces

Once the PXCMFaceAnalysis instance is obtained, we can look for the faces detected by the platform.

To retrieve the identifier of any faces detected (if there were any), we can use the QueryFaceclass methodPXCMFaceAnalysis

for (UInt32 faceIndex = 0;; faceIndex ++)
{
    Int32 faceId;
    ulong faceTimestamp;
    
    if (faceAnalysis.QueryFace (faceIndex, out faceId, out faceTimestamp) == pxcmStatus.PXCM_STATUS_NO_ERROR)
    {
        / * Retrieve face data * /
    }
    else break;
}

As you can see in the code, we do not know a priori how many faces have been identified by the framework, but we have to proceed by iterating on an index ( faceIndexin our case) until we get a different result from pxcmStatus.PXCM_STATUS_NO_ERROR.

If, for the index in question, the platform has detected a face, the method QueryFaceof PXCMFaceAnalysisproviding us on the unique identifier ( faceId) and the timestamp in which it was detected.

The unique identifier allows us to retrieve all other information related to the face.

Face detection

Using the features of Face detection we are able to know the coordinates, with respect to the image recovered from the camera, in which the detected face is located.

The position of a face can be recovered thanks to the unique identifier of the same scene above.

PXCMFaceAnalysis.Detection faceDetection = faceAnalysis.DynamicCast<pxcmfaceanalysis.detection>(PXCMFaceAnalysis.Detection.CUID);

PXCMFaceAnalysis.Detection.Data detectionData;

if (faceDetection.QueryData(faceId, out detectionData) == pxcmStatus.PXCM_STATUS_NO_ERROR)
    form.DrawLocation(detectionData);
</pxcmfaceanalysis.detection>

The class PXCMFaceAnalysis.Detectionexposes the features of face detection and is provided by the framework by calling the method DynamicCast(common to many classes of Perceptual Computing). DynamicCastis a generic method that returns the desired type instance starting from its own CUID (Class Unique IDentifier).

Once we have retrieved the instance of PXCMFaceAnalysis.Detection, we use the method QueryDatato obtain the structure PXCMFaceAnalysis.Detection.Datashown in the following figure:

The structure contains the rectangle (rectangle field of type PXCMRectU32) in which the framework has detected the face, the confidence value (that is, how much the framework thinks the detection is reliable) and the angle that the face forms with the plane of the camera ( viewAngleType field: PXCMFaceAnalysis.Detection.ViewAngleFor example, the value VIEW_ANGLE_0 indicates that the user is with the face that forms a 90-degree angle to the left of the perpendicular of the frontal plane of the camera.

The face detection features are also available with a normal webcam instead of the Creative Gesture Camera, even if limited to the position and not to the angle.

Face landmark and poses

The landmarks are the highlights of a face. Intel Perceptual Computing SDK allows you to recover 6 or 7 points (each of them in the three spatial coordinates) as shown in the following figure:

Simultaneously with the recovery of these points, information on the position of the face in the space is also provided:

Value Description
Pitch(pitch) The rotation of the head along its transverse axis (that, so to speak, that passes through the ears)
Roll(roll) The rotation of the head along its longitudinal axis (the one that comes out of the nose)
Yaw(yaw) The rotation of the head along its vertical axis (the one, which comes out of the upper part of the head)

 

All this information (landmarks and poses) allows us to exactly know how the head is positioned in space. To obtain the set of landmarks and poses, starting from the identifier of the face (already seen above), we can proceed in the following way:

1. Retrieve an instance of the class PXCMFaceAnalysis.Landmark(equivalent to the landmarks of the class PXCMFaceAnalysis.Detectionseen previously);

PXCMFaceAnalysis.Landmark faceLandmark =
    faceAnalysis.DynamicCast<pxcmfaceanalysis.landmark>(PXCMFaceAnalysis.Landmark.CUID);
</pxcmfaceanalysis.landmark>

2. Set the profile information of the same to retrieve the number of landmark points that interest us (for example all 7);

PXCMFaceAnalysis.Landmark.ProfileInfo landmarkProfile;

faceLandmark.QueryProfile(out landmarkProfile);

landmarkProfile.labels = PXCMFaceAnalysis.Landmark.Label.LABEL_7POINTS;

faceLandmark.SetProfile(ref landmarkProfile);

3. Recover landmark information using the QueryLandmarkDataclass method PXCMFaceAnalysis.Landmark:

int landmarkSize = (int) (landmarkProfile.labels & PXCMFaceAnalysis.Landmark.Label.LABEL_SIZE_MASK);

PXCMFaceAnalysis.Landmark.LandmarkData[] landmarkData = new PXCMFaceAnalysis.Landmark.LandmarkData[landmarkSize];

if (faceLandmark.QueryLandmarkData(faceId, landmarkProfile.labels, landmarkData) == pxcmStatus.PXCM_STATUS_NO_ERROR)
    form.DrawLandmark(landmarkData);

4. Recover the pose information using the QueryPoseData method of the PXCMFaceAnalysis.Landmark class:

PXCMFaceAnalysis.Landmark.PoseData poseData;

if (faceLandmark.QueryPoseData(faceId, out poseData) == pxcmStatus.PXCM_STATUS_NO_ERROR)
    form.DrawPose(poseData);

We observe that the instance of the class PXCMFaceAnalysis.Landmarkis obtained, once again, using the method DynamicCast(as we have already seen previously for the class PXCMFaceAnalysis.Detection).

Landmark information is contained in an array of such objects PXCMFaceAnalysis.Landmark.LandmarkData. The structure of PXCMFaceAnalysis.Landmark.LandmarkDataexposes, among others, the fields labeland position that contain, respectively, the identification of the point (for example PXCMFaceAnalysis.Landmark.Label.LABEL_NOSE_TIPfor the tip of the nose) and the point of the corresponding space (with coordinates x, yand z).

The coordinates xand yare also available using a standard webcam, while zis only available if you are using a room equipped with coma depth sensor Creative Gesture Room.

The structure of PXCMFaceAnalysis.Landmark.PoseDatacontains the related information and exposes the three quantities already described above.

Conclusions

Intel Perceptual Computing is a great tool to implement NUI applications without having to focus on those perceptual algorithms whose “standard” implementation provided by Intel is already excellent.

On the other hand, its modularity makes it a tool that can adapt to the needs of those who already have these algorithms and simply want them to work in a robust and performing aframework.

In this article we only have a look at the functionality of Face Detector but the SDK provides API for Face Recognition, Text to Speech, Gesture Recognition and Speech Recognition.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here