Technology: Enterprise Software and Integration

Data Science, Machine Learning, Artificial Intelligence, Natural Language Processing, Deep Learning, Real-time streaming, Stream Processing & STL, Data Integration & ETL , Data Services,Enterprise Application Integration (EAI), Databases, Enterprise Application, Middleware, Data warehousing, Web Services, ERP/CRM/SRM. Note: All the expressions here are my individual personal opinions, thoughts or frustrations. These views does not include my employers or business partners views.

Thursday, September 24, 2020

What will be the main AI projects in enterprises for next few years?

Before I answer what will be the enterprise projects around AI, lets first look at the consumer-facing projects that use AI.

- Shopping recommendation on Amazon.com

- Song recommendation on Spotify

- Movies/Video recommendation on Netflix

- Face recognition on Facebook for tagging

- Image editing using FaceApp or similar

- Talking to Chat applications on company websites

- Google Assistant/Alexa/Siri conversational system

- Sentiment Analysis for social media customer service

- Self-driving cars

- Route planning on maps

- Facial recognition

- Object recognition

-Video surveillance (intrusion detection and object tracker)

-Audience voice mix in IPL matches in Sept 2020
(Covid season with no physical audience https://www.khaleejtimes.com/sports/ipl-2020/ipl-2020-can-do-without-the-plastic-noise)

In a similar manner, I am expecting that enterprise systems and applications will also start to implement similar projects in specifical industrial vertical as follows:

- Fraud detection of financial transactions (eg: Money Laundering, credit card frauds)

- Identifying threatening and ransom calls from Telecom call data

- Real-time Retail offers for customers

- Online Exam proctoring

- Utility (Gas/Power/Water) demand-based generation

- Improved Customer support systems (much better than today's dumb chatbots)

- Medical proactive detection from health data trackers

- Context-sensitive and time-sensitive advertisements

- Autonomous server elasticity optimizing billing and performance

- Genetic Algorithms in Drug discovery

- Genome analysis, early detection of diseases, warning and curing it is developing stages (like Cancer)

Now let's look into generic platforms that might come up in the next few years:

- AutoML Platform: An AutoML platform is that it can take any given data, prepare the data, and run multiple ML algorithms along with additional tuning to yield the best result. This would be an AI beginner's platform and will see a lot of adoption between 2020-2023.

- RPA: RPA is more of a solution than a problem. It’s the automation of tasks with AI. Most of the existing manual systems will be replaced by this.

- NLP platform: Voice to words (like writing meeting minutes, emails, etc) and words to voice. This would be for specialized enterprise applications including call center platform automation or meeting scheduler etc ( say YES instead of pressing 1).

- Semantic modeling: The majority of data is textual and needs an accurate correlation to make a lot of useful decisions, and this is where semantic modeling comes to help.

- Prognostics Platform: Predicting analytics, root cause analysis, and solution recommendation and useful for enterprises to lower-cost improve productivity. This could also be for System and Application Monitoring/Management tools.

- Optimization Platform: Optimization is a huge problem everywhere from supply chain to manufacturing, sales to marketing. Most of these would still be an industry vertical solution platform for the next 2-3 years.

- Data quality Platform: Data quality is a huge problem in enterprises, it has a lot of missing and inaccurate data, esp manufacturing. And too many data sources to look for info.

- Real-time Online ML Platform: An online machine learning platform for real-time data learning for ever-changing data. It’s not a single learning algorithm: in fact, lots of algorithms can learn online. An example would be stock market price/volume data which might never follow a pattern of the past.

- Digital Twin Platform: A digital twin is a digital representation of a physical object or system. The technology behind digital twins has expanded to include large items such as buildings, factories, cities, people, and processes.
An example would be to simulate a cricket bowler's actions and bowling style.

However, I think it will take a longer time for such platforms to evolve. It is not as simple on the ground to prepare different solutions from a common platform, unlike the standard platforms that exist today (say a Data Quality platform tool that could be used across different verticals). Tons of variations exist and manually connecting the dots is sometimes not possible as people who have knowledge move on. And a lot of system being introduced, continuity is lost and correlation is not clear at all. This why a software company cannot build generic tools/platforms for some of the real-world cases.

Additional References:

Sunday, March 29, 2020

Should you invest in Natural Language Processing (NLP) technology?

What is the need for Natural Language Processing (NLP) technology in software products?

I think NLP as a basic technology would be needed in all the software products especially in the end user interfaces (eg: Web Site, Mobile Apps, Amazon Echo apps etc). NLP implementation requirement would be like a standard functionality, something very similar to a monitoring functionality or a multi-language (I18N & L10N) feature in a standard product.

Why do you say that NLP would be an integral part of every IT products? Can you explain few NLP use-cases in IT products?

For answering the need of an integrral usage of NLP, lets first look at the various use-cases of NLP that would be useful in your product and how it can add value to your product.

Chatbots - Every website would need to have this real-time customer interaction tool to filter lot of customer queries before routing to the sales and support team.
Search Engine - Customers and internal teams should be able to have human-like queries to be issued on the knowledge base.
Spam Filtering - Standard filtering, organizing and prioritizing an incoming job task or email.
Transcription of Audio/Video - The option of automatically scripting all the human speech (Speech to Text). In the past, this used to a major task for health care transcription for legal purposes, but today, many tools would also want to create the words automatically.
For example, when uploading to Youtube like application, it should also have subtitle or wants to search someone's speech for certain words and point out where such a term was referenced.
Content or News Curation - This is a very interesitng usecase for industry specific business use case such as identifying some content and then do some processing in the domain of fake news, advertising, market Intelligence, Recruitments, Social Media monitoring etc.
Sentiment Analysis - Organziations can find out what is the emoition of a customer or end-user based on the usage of terms that a customer is using and can take appropriate action based on their emotion.
Intelligent Conversational Systems for Voice driven applications(Voice Bots): Human to machine interface over voice. Examples are Amazon Alexa, Google OK and Apple Siri.
Automatic Machine Translation - This can automatically give language translation like Google Translate and also provide computer-assisted Coding of a standard business rules or even coding from one language to another langugue.
Cognitive Assistant - Personal assistant who will store all your information including your schedules, and also remind you your activities or also recommend you some improvement activities like time to stand up or to sip some water or you need to cool down and not raise your blood pressure/heart beat etc after integrating with your heart beat/BP monitoring application.

Why should you invest in NLP?

As a Product Owner - If you do not invest right in the NLP technology, your products would be lagging behind your competior solution. If your novice developer has copied some public NLP code/library, the customer experience might be really pathetic and customer might think your NLP interface is too idiotic. (I personally felt this very often with many NLP applications).

As a Developer: This is going to be a great opportunity for getting a new job which gives you a competitive advantage over the other developers who does not know NLP technology. If you know the fundamentals of this technology, you can tune your NLP application based on the business usecase and provide valuable contribution to your product and thereby giving a good customer experience and competitive advantage.

Sunday, March 08, 2020

Comparing GoldenGate Kafka Handlers

GoldenGate is the only differentiated product in the market to have 3 different types of adapters to Kafka. The three different connectors to Kafka are:

1) Kafka Generic Handler (Pub/Sub)

2) Kafka Connect Handler

3) Kafka REST Proxy Handler

Each of these interfaces has its own advantages and disadvantages. The table below compares and contrasts the differences between the above three handlers:

FUNCTIONAL AREA	KAFKA HANDLER (PUB/SUB)	KAFKA CONNECT HANDLER	KAFKA REST PROXY HANDLER

Available in Opensource Apache version	Yes	Yes	No*
Schema Registry Integration	No	Yes*	Yes
Formatting to JSON	Yes, with GoldenGate Formatter	Yes, with Kafka Connect Framework	Yes
Formatting to Avro	Yes, with GoldenGate Formatter	Yes*	Yes*
Designed for high volume throughput	Yes	Yes	No
Transactions and Operations	Yes, Both Note: transactions have specific challenges, hence not recommended	Operations only	Operations only
Run-time mapping of Key and Topic	Yes	Yes	Yes
Connect Protocol	Kafka Producer API	Kafka Producer API	HTTPS, REST
Web Proxy Support	No	No	Yes
Synchronous(also known as Blocking Mode) and Asynchronous Mode of operation	Yes, both Note: Synchronous has very low throughput	No, Asynchronous only	No, Asynchronous only
Kafka Interceptor support	Yes	Yes	No

* Available with Confluent Community License and Enterprise License

For detailed information refer to GoldenGate for Big Data Documentation:

Should developers write additional code which is not given in specifications?

The question is can developers write additional code that is not given in specifications.

The answer to the question can be self-explained as in the image below :

A developer looking at the code would give 2 solutions to fix the above problem:
i) Instead of using the assignment operator, the developer should have used a comparison operator as "isCrazyMurderingRobot == true"
ii) Use final keyword so that it is unalterable as "static final bool isCrazyMurderingRobert = false; "

But, I think the above two solutions are not the right ones. The problem is that the whole routine is was an unnecessary piece of code which specifies an option to kill(humans) which was not originally expected to be programmed as per the functional specification. A programmer who tried to act smart, but made a syntactical error created the whole mess.

When I was a software developer, I remember asking a product manager whether it is acceptable to write some additional functions (or methods) in the code for some extra validation which was not there in the specification. The answer that I got was an absolute "NO" and he said it will be considered as a "SIN" in the developer's job. Then I asked him why and he explained to me this. It might be easy to add a new feature into the product by a developer but is humongous difficult to drop a feature that is there in the code. So as a developer, it might take a day to write a few hundreds of lines of code, but it takes years to remove and maintain the code.

Let's take an example of writing a connector code, a simple program that is connecting to MongoDB and as per the proposed certification matrix, it should connect to 3.5 and 3.6 versions. As a developer, you might have been proactive and added an additional check of Mongo DB version in the code. What happens with this additional check is that if the customer chooses to upgrade the MongoDB to 4.0, your code will stop to work and would require a patch to make it run. If the check was not there, it would have been a simple sanity test on your automation suite to certify the same old code with MongoDB 4.0 as well.

If you have a high urge to write that code, write it in a private branch or commented section as proactive code that may be required in for the future.

In summary, it is a professional cardinal "SIN" to add additional code into a product mainline without a Product Manager's approval.