Roblox is Filing for an IPO

Roblox is filing for an IPO with a placeholder listing of over $1 billion dollars, thus far for their offering. I was an original Roblox player, and didn’t even know it was still around until less…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Machine learning in 15 questions

No, it’s not about Terminator…

A lot is said about machine learning. Some perceive it as a monster that is leading humanity to its doom. Others perceive it as a magician who solves all their illness. In reality it’s much simpler than that (and a little bit less scary).

Machine learning is a technique that allows automatic systems improvement using data.

The amount of data explosion and the progress in processing and storage techniques, have help machine learning to establish in many areas.

Behind this mysterious name hides a very simple concept. To learn, the system is inspired by existing samples, grouped in databases. And that helps it to understand the task it is asked to perform.

Let’s be clear.

It is not because we are talking about machine learning that we have to imagine robots taking courses in a classroom :-).

Like in many other controversial fields, artificial intelligence and machine learning are subjects to many debates. One of the consequences of this, is the eruption of many buzzwords. And we end up a bit lost.

In reality, deep learning is only a subset of machine learning. Which is a subset of artificial intelligence. In a diagram here is what it looks like.

Machine learning is established in a very large number of fields. There are no real limits to what machine learning models can achieve. As soon as there is data, machine learning can be used.

In finance, for example, stock market volatility seems to leave no room for prediction. Nevertheless, machine learning provides a solution. It allows you to give precise forecasts about the evolution of a stock market share.

Machine learning is also becoming a major tool in healthcare. It is often used by doctors today. The medicine of the future is often associated with the 3 P’s: Prevention, Prediction, Personalization. And these are tasks that ML systems can help to do.

It is especially in medical imaging that today’s ML systems stand out. Progress in terms of computer vision has been impressive in recent years. And today’s AI are at least as strong as doctors on tasks such as detecting cancer tumors or to calculate bone age.

More generally, AI represents a huge development opportunity for companies. It allows to set up long term strategies through decision support tools.

Machine learning makes possible to answer questions such as : Should I go ahead with this building project? Does this candidate’s profile correspond to this position? How will the electric car market evolve between now and 2025? And millions of other questions specific to each activity.

ML is essentially divided in two learning modes. Supervised learning and unsupervised learning.

In supervised learning, the data available is labeled. The outputs of the model are already known. This is the case when we train neural networks for example.

Conversely, when we do unsupervised learning, we let the model train by itself, without labeling the data. One of the most widely used unsupervised clustering algorithms is k-means. We give the algorithm all the points of our dataset. The goal for the model is to find the output of these same points grouped into several categories.

Many of the models we use today are what we call black boxes. It means that they are opaque, and their internal functioning is not completly understood.

The performance-explainability dilemma is well-known in data science. Very often the most efficient models are black boxes whose functioning is the least explainable.

When I started working on data science projects, I was surprised to see that building the model was only a small step in a long process.

Projects are often divided in this way:

Contrary to popular belief, data cleaning is the main task in a machine learning project. And that’s a pity (at least in my opinion 🙂 ), because it’s much less fun than designing the model! All this work we do upstream is what we call preprocessing.

The preprocessing techniques depend on the project and the type of data you’re working with. Very often the following steps are as followed:

Before putting a model into production, there is a whole validation phase that begins. And you have to be very meticulous. Otherwise you will end up with this model :

First of all, before training a model, we make sure to separate the available data into many pieces. Training datasets and testing datasets. This is called cross validation.

This makes it possible to test the model once it has been trained, this step is essential. It ensures the reliability of the model but also allows us to compare several approaches, to determine which one is the most interesting.

Overfitting is the data scientist’s greatest enemy. It occurs when the model tries to stick too much to the data. As a result, it is no longer generalizable.

Before we know how to avoid overfitting, we must learn how to detect it.

On this curve (which you will always have to draw to check the model performance) we can see that from a certain point on, our accuracy on the test data drops. This means that the model starts to be less and less efficient. We’re overfitting.

Several methods exist to avoid overfitting:

Many tools are very usefull when it comes to build machine learning projects. For model development, we use powerful object programming languages such as Python or C.

Python remains the reference language. It has a very active open source community which allows to have very powerful modules such as Pandas, Tensorflow or Scikit-learn. They make machine learning models easier to implement. It would be too long to list all the tools used in machine learning. If I had to keep only 3 (for the model building part), I would take those.

There are many machine learning models. The most popular today are the deep learning algorithms, they are reliable, easy to train and give quite good results most of the time.

There are a large number of methods depending on what you want to do.

For clustering :

Neural networks :

Decision trees :

With all these algorithms, one may wonder which one to choose. The transition with the next question is perfect 🙂

Machine learning is a matter of choice. From the data to the algorithm to use, the data scientist has many decisions to make. The choice of the algorithm to be used is probably the most crucial.

Several criteria are to be taken into account when choosing a model:

Artificial intelligence is often seen as a magic tool capable of anything. In reality it is not as simple as this. Models often need a lot of data to be able to give good results.

I often like to remember how marketing has made artificial intelligence more impressive than it really is. Today’s models are very limited, they requires a lot of training and are not very generalizable.

Today, data scientists are well appreciated. A machine learning engineer must have both theoretical and practical skills. He or she must be a very good statistician, which is essential to correctly understand the different algorithms and their subtleties. From a more practical point of view, he must be comfortable with programming tools such as Python.

If you have any other question suggest it in the coments section :)

Don’t forget to put a clap (or two, or three… four ?) and subscribe !

Add a comment

Related posts:

The future of financial crime fighting with Human AI on blockchains

At Monerium we are bridging the traditional banking world to blockchains, so businesses can move their money seamlessly into the emerging economy on blockchains. This new economy helps small and…

My journey in Interactive Media Management

For me to emigrate to Canada has been difficult, it is not as simple as people think not only because of the cultural change to which one is confronted but also because of the fact that if you have…

Las medidas de Santos ante la crisis de inmigrantes Venezolanos

El presidente colombiano Juan Manuel Santos y su gabinete se reunieron el pasado 8 de febrero para tomar medidas frente a la gran migración de venezolanos a Colombia. Santos, su gabinete y la ONU se…