Saturday, February 22, 2014

Past, Current and Future of Hadoop

Hadoop came in as a big data revolution. People interpreted Hadoop as Big Data. If some has to be working on Big Data then the answer was only Hadoop.

What is Hadoop?
Hadoop aims to be an open-source software for reliable, scalable, distributed computing.  The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. So in summary, Hadoop is a low cost server farm with cheap servers and cheap software ready for high scalability.

 Hadoop is being distributed by many companies such as Amazon Web Services, Cloudera, CloudSpace, Datameer, Datamine, Datasalt, Datastax, Debian, Greenplum , Hortonworks,  Hstream, IBM, Impetus, Intel, MapR etc

Is Hadoop technology completely new?
I do not think so. There has been several of those enterprise applications that were proprietary software that existed to solve such reliable, scalable and distributed computing. For web processing, it used to be called as Load balancing servers that used to run scaling out technology for years since the scale of internet happened. Similarly, in the enterprise applications such as SAP, it used to call as Application Server and the Master Gateway server to span out the processing capability. Many of the software companies built solutions to run on the GRID.

History of Linux
I have been in the IT industry since mid 1990's. This reminded me of an old wave that came out with Linux Torvalds way back in the mid 90's which mentioned as the open source paradigm shift that should happen to compete with the Microsoft monopoly. There were several tens of companies that took over the linux bandwagon which includes Debian, Fedora, Suse, Gentoo, Slackware etc. Some of them had the following commerical names such as Redhat, Suse, Mandrake, Ubuntu, Oracle etc. So who benefitted out of this? It is the hardware companies such as IBM, HP, Dell, Intel, AMD etc who could create more chips to create new operating systems and hardwares to sell to customers who did not like the monopoly of Microsoft. The companies such as IBM, HP and Oracle (earlier Sun), even though they had their own operating systems known as AIX, HP-UX and Solaris respectively, they started supported Linux to expand their business. So lets see who made money with Linux and who lost money with Linux. The hardware companies and the Linux distribution companies made money where as the enterprises or software companies lost money in developing their exisiting software solutions that were already built on AIX, HP-UX, Solaris and Windows needed to be ported to yet another 32 and 64 bit Linux OS. The cost of porting and certification was very high since many of these distributions such as RHEL 5.1, 5.2, 5.3 and similar Suse 9.1, 9.2, Mandrake xx..yy has to be tested and certified. So many of the customers who chose to pick any of the non-common Linux distributions could not get such certificated enterprise software.
 
 Relationship with Linux and Hadoop
I find a close relationship with the Linux and Hadoop. I believe that Hadoop is the new Linux of 2010's which is ready to break away the propitiatory software GRID solutions that were offered in a free software manner.  After 15 years of Linux history, there are only a few companies that are now considered successful, which is Redhat and Suse in the enterprise OS space .


Future of Hadoop
So what is the future of Hadoop, it will be the same future that happened to Linux. Few of the distribution companies many money and majority of the distribution companies would die. As I see one of the Hadoop distribution (distro) companies is following the Linux path which is the Hortonworks. The other strong close contenders are Cloudera and MapR in addition to the direct software vendors such as Greenplum and IBM.


redhatrevenue



References:
1) History of Linux http://en.wikipedia.org/wiki/History_of_Linux
2) Linux Distributions http://en.wikipedia.org/wiki/Linux_distribution
3) Hadoop Distributions
http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
4)  Hortonworks partnership with  SAP

5) Hortonworks partnership with Redhat and SAP
6) Enterprise Hadoop Market in 2013: Reflections and Directions http://hortonworks.com/blog/enterprise-hadoop-market-in-2013-reflections-and-directions/
7) Hot to get over your inaction on big data http://blogs.hbr.org/2014/02/how-to-get-over-your-inaction-on-big-data-2/
8) The decline and fall of big data http://www.infoworld.com/article/2845926/big-data/the-decline-and-fall-of-big-data.html 

Monday, February 03, 2014

Why is Social Media Data integration important for enterprises?

I happened to read couple of articles last week that talks the social media data analytic for the banking industry. According the research, 
  • Only 46% of banks can analyze external data about customers.
  • Only 32% can analyze social media activity.
  • Data volume and analytics complexity are the most common challenges cited by respondents.
  • Cloud and predictive analytics technologies will be "extremely valuable" to around 60% of respondents' strategies in the next 24 months. (Jan 2014- Dec 2015)
 It is very interesting to find 3 of the private banks in India are among the top 10 list.

Ranking for Top 20 banks

#
Bank
Area
Total # of
Facebook
‘Likes’
Total # of
Twitter
Followers
All-Time
YouTube
Views
Power 100
Score
(Q4 2013)
1
BofA
USA
1,481,401
245,207
14,439,707
2,568
2
Chase
USA
3,754,905
22,276
123,247
2,564
3
Capital One
USA
2,917,273
84,721
2,647,251
2,372
4
ICICI Bank
IN
2,608,044
11,257
1,163,370
1,905
5
Wells Fargo
USA
604,795
69,268
17,828,813
1,836
6
Citi
USA
927,904
219,615
6,002,953
1,577
7
HDFC
IN
1,920,504
11,622
409,799
1,432
8
Axis
IN
1,926,401
9,724
1,196,610
1,421
9
GT Bank
NIG
1,527,777
86,229
293,565
1,252
10
E*TRADE Bank
USA
73,421
11,942
8,882,374
1,206
11
USAA
USA
611,867
58,953
7,331,717
1,175
12
Credit Suisse
CH
63,026
37,976
10,622,427
897
13
Scotiabank
CAN
318,223
29,431
7,502,312
795
14
Barclays
UK
479,668
21,541
4,032,704
706
15
Commonwealth
AUS
523,860
28,985
2,542,623
688
16
IDBI
IN
795,247
9,912
167,598
688
17
FNB
SA
446,063
34,154
1,358,596
652
18
Natwest
UK
143,594
29,401
3,583,634
506
19
HSBC
UK
153,704
7,288
3,143,298
477
20
Kotak Mahindra
IN
150,206
87,549
2,398,734
467

Acfcording to one of the latest publication from McKinsey (Jan 2014), all the COO's should lead social-media based customer service.  According to some survey, nearly 71% of consumers who had a good social-media service experience with a brand is likely to recommend to others. The IT investments are comparatively much lower than the other complex support systems because such a social media based support infrastructure would be reused from the existing social media channels.


This would mean that Social Media data integration is becoming more and more into the main stream of use-cases for an enterprise who has direct end customers. To enable the main stream based data integration would mean that such integrations needs to be automated from the manual BPO based operation where there is a huge workforce that is required. One could use tools like Informatica's PowerExchange for SocialMedia to automate and ingest such Social Media based data to the standard data integration point into the enterprise.



References:
1) http://thefinancialbrand.com/36081/power-100-2013-q4-bank-rankings/
2) http://www.informationweek.com/big-data/big-data-analytics/data-management-key-to-banking-analytics-/d/d-id/1113646?
3) https://community.informatica.com/solutions/powerexchange_for_twitter
4) https://community.informatica.com/solutions/extract_data_from_linkedin_facebook_and_twitter
5) http://www.youtube.com/watch?v=aGU6K0wgGSk&list=PLmi6HWWEAjKqq068jimUILXqr_C8fyQiW&feature=share
6) https://www.youtube.com/watch?v=Wng1M8sEYpw&
7) http://scn.sap.com/community/business-trends/blog/2014/01/06/social-media-is-now-everyone-s-business?source=email-apj-sapflash-newsletter-20140127&lf1=821703689d274213763641e16240131 
8)http://www.mckinsey.com/insights/marketing_sales/Why_the_COO_should_lead_social_media_customer_service?cid=other-eml-alt-mkq-mck-oth-1401
9) http://marketingland.com/5-social-media-trends-to-kill-in-2014-69190  
10) http://thefinancialbrand.com/35160/banking-consumer-social-media-sentiment-report-2013-q4/