Showing posts with label etl. Show all posts
Showing posts with label etl. Show all posts

Tuesday, November 05, 2019

ELT is passé; STL(Stream-Transform-Load) is the new name for ETL

ETL: Extract Transform Load
ELT: Extract Load Transform
STL: Stream Transform Load

Everybody is interested in talking about how technology can make a difference in real-time and contextual based application experience but is anyone doing anything about it.  If someone is doing anything about it, then they are the market leaders in that domain.

In the past, most of the banking and retail industries used to run batch jobs to move the transaction data that happened during that day to a data warehouse in midnight (also called of end of day processing). One of the standard conditions for batch processing was that there should not be any transaction happening in the source data system when the batch processing was running. 

These days most of the banks and retail companies run 24x7 and they cannot have a minute of downtime on their system, ie. a customer could log on to the mobile application or website and do a transaction at midnight. They are even more concerned about setting up a disaster recovery (DR) site far away from the main production site so that some catastrophe hits their data center, they are not stopping their business. Post-Sept 2011 attack, one of the first things that most enterprises did was to create a DR site with real-time data replication technology to be implemented. Now that they cannot even have a minute of downtime in their data systems and allow customers to transact 24x7, they want to repurpose their ETL systems as STL systems so that they can have real-time ETL functionalities with the new big data technologies that can scale and process for even real-time systems. 

Here is a high-level comparison between ETL to CDC and STL

Traditional ETL
  • Batch mode of extraction, high latency, low throughput on large window
  • High load on sources, usage only during certain times of day
  • On-disk transformations
CDC (Change Data Capture)
  • Real-time Replication, high throughput in a large window
  • Low load on sources, fully utilized systems
  • High load on target to load high volumes
Stream-Transform-Load (a.k.a. Streaming ETL)
  • Best parts of ETL and CDC (low latency, low load on sources, overall higher throughput)
  • In-memory transformations
  • Reduced load to target systems
  • Reduce Garbage-in


Some of the most common use cases of data stream processing include

  • Industrial Automation
  • Log Analytics
  • Building Real-time data lakes
  • IoT (Wearables & devices) Analytics
  • Smart homes and City

Industry-specific use-cases for Stream Processing:

  • Financial Services
    • Fraud Detection
    • Real-time analysis of commodity prices
    • Real-time analysis of currency exchange data
    • Risk Management
  • Retail
    • Markdown optimization 
    • Dynamic pricing and forecasting and Trends
    • Real-time Personalized Offers
    • Shopping cart defections
    • Better store and shelf management
  • Transportation
    • Tracking Containers, Delivery Vehicles, and Other Assets
    • Vehicle Management 
    • Passenger Alerts
    • Logistics and Route Optimization
  • Telecom
    • Wifi Off-Loading
    • Video Analytics
    • Network Management
    • Security Operations
    • Geolocation Marketing
    • Mobile Data Processing
  • Health Care
    • Medical Device Monitoring
    • In-home Patient Monitoring
    • Medical Fraud Detection
o   Safer Cities
  • Utilities, Oil and Gas
    • Outage Intelligence
    • Workforce Management
    • Real-time Drilling Analysis
    • Telemetry on critical assets
  • Manufacturing
    • Smart Inventory
    • Quality Control
    • Building Management
    • Logistics and Route Optimization

Monday, May 13, 2019

Data Science to Wisdom Science

 The hottest technology course for undergraduate studies during the 80s was Computer Engineering,
then it was Computer Science in the 90's,
then it was Information Science in the 2000's, especially MIS was the hottest thing post Y2K (1st Jan 2000) and everyone was designing systems to create dashboards and reports for Executives,


In 2010, we all heard about the biggest trend as Big Data Analytics, Cloud Computing and Social data including Natural Language processing as the hottest trends,.

In 2020 we will see that the hottest graduation degree will be Data Science.... We are seeing everyone is looking for Insights from Data and hence called as Data Science . Some of the latest areas in Data Science is how to reduce (discard unnecessary) data for computing and how to get Fast (real-time) data for every one..


So Computer Science and Engineering had 20 years of life time, Information Science and Technology had 20 years of life time and next 10 years it would be Data Science and Engineering.

Following the above trend, what would be the hottest graduation course in 2040? Would it be below Data or above Knowledge?

My take: I dont think they would go below Data layer, I think experts would soon call it as WISDOM SCIENCE 😄 ?




Monday, February 03, 2014

Why is Social Media Data integration important for enterprises?

I happened to read couple of articles last week that talks the social media data analytic for the banking industry. According the research, 
  • Only 46% of banks can analyze external data about customers.
  • Only 32% can analyze social media activity.
  • Data volume and analytics complexity are the most common challenges cited by respondents.
  • Cloud and predictive analytics technologies will be "extremely valuable" to around 60% of respondents' strategies in the next 24 months. (Jan 2014- Dec 2015)
 It is very interesting to find 3 of the private banks in India are among the top 10 list.

Ranking for Top 20 banks

#
Bank
Area
Total # of
Facebook
‘Likes’
Total # of
Twitter
Followers
All-Time
YouTube
Views
Power 100
Score
(Q4 2013)
1
BofA
USA
1,481,401
245,207
14,439,707
2,568
2
Chase
USA
3,754,905
22,276
123,247
2,564
3
Capital One
USA
2,917,273
84,721
2,647,251
2,372
4
ICICI Bank
IN
2,608,044
11,257
1,163,370
1,905
5
Wells Fargo
USA
604,795
69,268
17,828,813
1,836
6
Citi
USA
927,904
219,615
6,002,953
1,577
7
HDFC
IN
1,920,504
11,622
409,799
1,432
8
Axis
IN
1,926,401
9,724
1,196,610
1,421
9
GT Bank
NIG
1,527,777
86,229
293,565
1,252
10
E*TRADE Bank
USA
73,421
11,942
8,882,374
1,206
11
USAA
USA
611,867
58,953
7,331,717
1,175
12
Credit Suisse
CH
63,026
37,976
10,622,427
897
13
Scotiabank
CAN
318,223
29,431
7,502,312
795
14
Barclays
UK
479,668
21,541
4,032,704
706
15
Commonwealth
AUS
523,860
28,985
2,542,623
688
16
IDBI
IN
795,247
9,912
167,598
688
17
FNB
SA
446,063
34,154
1,358,596
652
18
Natwest
UK
143,594
29,401
3,583,634
506
19
HSBC
UK
153,704
7,288
3,143,298
477
20
Kotak Mahindra
IN
150,206
87,549
2,398,734
467

Acfcording to one of the latest publication from McKinsey (Jan 2014), all the COO's should lead social-media based customer service.  According to some survey, nearly 71% of consumers who had a good social-media service experience with a brand is likely to recommend to others. The IT investments are comparatively much lower than the other complex support systems because such a social media based support infrastructure would be reused from the existing social media channels.


This would mean that Social Media data integration is becoming more and more into the main stream of use-cases for an enterprise who has direct end customers. To enable the main stream based data integration would mean that such integrations needs to be automated from the manual BPO based operation where there is a huge workforce that is required. One could use tools like Informatica's PowerExchange for SocialMedia to automate and ingest such Social Media based data to the standard data integration point into the enterprise.



References:
1) http://thefinancialbrand.com/36081/power-100-2013-q4-bank-rankings/
2) http://www.informationweek.com/big-data/big-data-analytics/data-management-key-to-banking-analytics-/d/d-id/1113646?
3) https://community.informatica.com/solutions/powerexchange_for_twitter
4) https://community.informatica.com/solutions/extract_data_from_linkedin_facebook_and_twitter
5) http://www.youtube.com/watch?v=aGU6K0wgGSk&list=PLmi6HWWEAjKqq068jimUILXqr_C8fyQiW&feature=share
6) https://www.youtube.com/watch?v=Wng1M8sEYpw&
7) http://scn.sap.com/community/business-trends/blog/2014/01/06/social-media-is-now-everyone-s-business?source=email-apj-sapflash-newsletter-20140127&lf1=821703689d274213763641e16240131 
8)http://www.mckinsey.com/insights/marketing_sales/Why_the_COO_should_lead_social_media_customer_service?cid=other-eml-alt-mkq-mck-oth-1401
9) http://marketingland.com/5-social-media-trends-to-kill-in-2014-69190  
10) http://thefinancialbrand.com/35160/banking-consumer-social-media-sentiment-report-2013-q4/