Thursday, September 14, 2017

New Age of Data Mining with Machine Learning, Deep Learning, Natural Language Processing, Artificial Intelligence and Robotics technologies

We are currently in the middle of the perfect storm of incredible time for transformative technology today and it is an evolutionary step in humanity. In the last two decades, computing power has become much cheaper and data has become available for computing to utilize. It's the ability of machines to think in the cognitive way similar way how human mind thinks which will bring in the difference. The machines are able to identify, gather, store, correlate, think and predict based on the historical data and patterns available within it. The machines are much more efficient in storing and computing all the possibilities including all the permutations and combinations much faster than humans do.

We have started getting recommendations on related around products and services that you have bought or searched earlier on internet(eg. Amazon). We are able to talk to our personal digital assistants and get valuable work done out of them(eg. Apple Siri, Amazon Alexa and Google Assistant). We are able to see that our personal emails are able to correlate events and derive a value for actions (eg. an email in your Gmail with a flight schedule and Google Maps automatically mapping to that location).

This is the new age of Data Mining. Data Mining as a branch of computer science has been in existence for several years. Data Mining was the next logical step after processing and storing data in data warehouses. Data mining is defined as a process of discovering hidden valuable knowledge by analyzing large amounts of data, which is stored in databases or data warehouse, using various data mining techniques such as machine learning, artificial intelligence(AI) and statistical.  Knowledge Discovery in Databases, or KDD for short, defines the broad process of finding knowledge in data, and emphasizes the "high-level" application of particular data mining methods.
 
Fig: KDD Process




The various components in Data Mining Algorithms are as follows:
1)Model Representation: To determine the nature and structure of the representation to be used
2) Score function : To measuring how well different representations fit the data
3) Search/Optimization method: An algorithm to optimize the score function
4) Data Management:  Deciding what principles of data management are required to implement the algorithms efficiently.

Data Mining process included classification, clustering, regression as the fundamental steps.
A typical data mining algorithm used to look like this

Data Mining Algorithm






Here are some of the regression models that used be represented with various algorithmic approaches.







There were different sets of data that used to exist during the data mining model definition
a) training the model with a set of data called "training data",
b) a set of data called "validation data" used calculate the estimate of Squared Error Score and
c) a set of data called "Test Data" to calculate an unbiased estimate of Squared Error Score of  a selected model.

I happened to hear about a lot of new jargosn such as Data Science, Machine Learning and Deep Learning used by most of the amateurs or even experts in the IT field and using these words without understanding the meaning of them. So I decided to take a course on  it from the experts including some hands on labs. Once I took the course, I realized that these jargons are nothing but same old data mining, statistics and neural networks. A typical old wine in new bottle for the industry like  "Big Data" and "Infocomm" in the past.

There are a lot of other entrepreneurs building technology business around Artificial Intelligence and  Robotics in various different business domains including Biology. For example www.claralabs.com,  www.vicarious.com, www.zymergen