Upcoming talks and demos:

Codemotion - Amsterdam - 16 May
DevDays - Vilnius - 17 May
Strata - London - 22 May



View Natalino Busa's profile on LinkedIn
Principal Data Scientist, Director for Data Science, AI, Big Data Technologies. O’Reilly author on distributed computing and machine learning. ​

Natalino leads the definition, design and implementation of data-driven financial and telecom applications. He has previously served as Enterprise Data Architect at ING in the Netherlands, focusing on fraud prevention/detection, SoC, cybersecurity, customer experience, and core banking processes.

​Prior to that, he had worked as senior researcher at Philips Research Laboratories in the Netherlands, on the topics of system-on-a-chip architectures, distributed computing and compilers. All-round Technology Manager, Product Developer, and Innovator with 15+ years track record in research, development and management of distributed architectures, scalable services and data-driven applications.

Friday, July 8, 2016

The current state of Predictive Analytics: Strata Interview with Jenn Webb

During last Strata conference in London I had the pleasure to share some thoughs on the current state and the challenges of predictive analytics with Jenn Webb, O’Reilly Radar’s managing editor for Design, Hardware, Data, Business, and Emerging Tech spaces.
We touched on a number of subjects related to Data Science, Machine Learning and their applications: the advent of predictive APIs fueled by big data and machine learned models, the advantages and limits of deep learning, and the current and future applications of predictive analytics to financial services and marketing.


How would you describe the current state of predictive analytics? And what are the biggest challenges you’re facing in that space today?

The problem of data representation

In general, predictive analytics is based on the process of extracting and crafting features and discovering relevant patterns in data, and by doing predictive analytics can be used to learn models which can be applied to forecast and score new datapoints. Models work by translating the original data fields into variables - also known as features - which are better at describing the problem than the raw data fields.
For instance, if you would have a problem of classification and you would like to classify persons which have an inclination to buy a certain product, you would convert the available information into a set of features to best describe this problem. Those features could capture a higher meaning which is not directly recorded in the raw data, such as for instance the customer’s mood and intent or the customer’s propensity to buy. In short, I would say that predictive analytics is centered around the idea of extracting and defining good features, which conversely capture the patterns in the data.

Engineering features

Today, features are most of the time handcrafted, so one of the task of a data scientist/analyist today would actually consist in looking for structures in the dataset and understand and unravel which patterns are hidden in it. Consequentely, he/she would define a number of transformations which that would describe well the problem at hand with this new set of features.
The task of extracting and defining features which correctly represent the mechanism and structure embedded in the data is called feature engineering. Coming with a good set of features is still a very hard problem and relies on the creativity, the skills and the knowledge of data science teams. They need to provide ways about how to transform the original set of data fields into a meaningful and representative set of features.
Feature engineering can be a very hard task especially for cybersecurity or certain marketing datasets where patterns are sparse and variables are not very descriptive. In these cases, it’s quite hard to craft and engineer features because the understanding of those initial variables might not be sufficient to extract the right features or structures from the data.

Feature learning

Lately, next to feature engineering we see another approach where those features and patterns are actually themselves “machine learned”. Deep neural networks and hierarchical machine learning approaches are able to capture and identify semantically relevant features for a given problem in a automated, algorithmic way.

Automatic feature extraction lifts the task of feature engineering from data scientists at the cost of explainability. For instance, recently the Google Go has learned to play the game Go at professional level, however we don’t know exactly what are the patterns which had been learned by the machine in order to perform the task so well.

Predictive analytics

The biggest change which is happening in the last years in predictive analytics is indeed moving from narrow AI to strong AI by using feature extraction and layered machine learning models such as deep learning rather than feature engineering. These techniques were already available for the last 30 years but only now we are have sufficient data and sufficient computing power in order to perform this task well.
Diving a little bit deeper into this, in a recent talk you outlined machine learning techniques that businesses can implement today and you talked about how predictive models can be embedded as microservices. So what are some of the more accessible techniques that businesses can use and what are some of the more interesting microservices applications you’re seeing?
Microservices are data services for the hipster generation. Conceptually, microservices are still exposing a programming interface, but, when compared to traditional Service Oriented Architectures (SOA) they tend to be more intuitive, easier to learn, adopt and use. When done properly, microservices provide a better separation of concerns and better “concept of one” as each microservice would provide one purpose only rather than providing a large set of functions and uses.

A cognitive pyramid of API’s

Most of the APIs and micro services developed so far were catalogs. So a typical API would allow you (man and machine alike) an interface to insert/modify/delete records. Today, we see that API are starting to be stacked in layers, where at the bottom we find catalog-like API, dealing with data, which are conversely used by predictive APIs which deal with classification, prediction and recommendations, which eventually will lead to APIs which offer a full AI interface and interaction. The ultimate API layer would be a sentient or cognitive API layer which would answer complex reasoning tasks to facilitate some aspects of our life.
In my opinion, the quality will create a sort of “natural selection” mechanism for predictive APIs. Those API which will provide high quality predictions, classifications, and recomendations will thrive while other will be not used and will eventually fade away. In a sense, we can talk of a predictive API ecosystem where both machine and people provide feedback on the quality of the predictive services offered. This interaction produces more data which conversely allows those API to better learn. I would say, that we are experiencing a renaissance of data and automated data analyses which might have quite some significant implications on our future lives.
You mentioned marketing and one of your areas of expertise look at solutions around personalized marketing applications. What sorts of applications are you seeing today already and what do you expect to see in the future?
I believe that marketing is moving into even increasingly and deeper understanding of the customer and the customer’s context. In the past, we used to create models and segments, and product-customer offers based on simple analytics and marketeers’ hunches and intuitions and a limited understanding of the customer intent, also because of not all touchpoints and customer interactions were not fully captured.
Today, more and more information and customer data is captured through digital touchpoints, via devices, apps, and sensors. Therefore, it is possible today to create more complex and richer models for each individual. If I interact with an retail webapp, it’s very relevant to know, if I am browsing around looking for inspiration in a sort of “discovery mode” or a if I would like to quickly close a purchase, and I am in “analysis mode”, and interested in understanding the details of a given product.
Capturing the intent of a customer is also marketing. Slowly but surely, new marketing tools and services are coming up which provide more “cognitive” and proactive approach to marketing.
Shifting gears just a little bit, another area that you’ve been exploring is machine learning and financial services. You recently participated on a panel. What interesting applications are you seeing in that space?
There are a number of financial APIs and services which are based on machine learning.
Some are meant to speed up the customer journey and the financial service scrutiny process. Processes which would require days are brought down to minutes. This is possible because the models and the risk calculations involved are based on big data and machine learning algorihtm rather than only on the advisor or expert resources.
Others are meant to simplify and streamline our lives, by for instance providing a better overview on how and when we spend. These predictive techniques can potentially relieve us from the task of remebering when a payment is due and providing an indication of the “free to spend” money each month. ING a company where I worked in the past has recently released a new feature in their mobile app about predicting recurring payments.
These are just a few examples of machine learning applied to financial services. I am sure that we will see more of this data-driven tools in finance in the coming months.