Tuesday, October 15, 2013

How can you secure and backup your data hosted on Cloud SaaS services ?

Some of my memories started kicking in when I happened to read an article on the internet "What To Do When Your Cloud Service Fades Away"  by @sergeykandaurov


I have been a cloud service offerings for more than a decade around 1998-2000. I remember using some of the so called dotcom SaaS companies who offered the onpremises tools services online then (now called as the Cloud services). Some of them were box.com, usa.net, freeservers.com I used to have accounts in all of these SaaS free offerings when the storage capacities used to be 1-10MB space whereas the standard data transfer mechanism used to still the 1.44 MB Floppy disk. These websites used to offer free storage and it used to be mechanism for backing up or storing data even on a 64kbps dialup line those days.

Most of these Y2K companies died during the period 2002-2005 and probably the only surviving company is freeservers.com.These companies were good enough to notify their users and gave adequate time for take the data backup from these website before they were shutdown. So I too ensured to take a back copy of my old data that was on the cloud.

Currently, there are many companies that offer the services to a typical internet user such as Microsoft's Skydrive, Google's drive, Dropbox.com, Soundcloud.com etc. As an individual, I have account on all of these cloud services. Similalry, there are many corporates and enterprises uses Cloud SaaS services such as Outlook.com (the microsoft exchange server on cloud), Salesforce.com, workday.com, netsuite.com,....and the list goes on.


Have you thought about how would you ensure to take a backup or migrate from one of the cloud SaaS companies decided to move on by closing their business or when you feel insecure holding data with them. This is one of the place where companies like Informatica can provide tools to integrate between the cloud systems and the on-premises . To read more visit www.informatica.com and www.informaticacloud.com




Friday, September 27, 2013

Databases in the 21st Century: Can the CIO dictate for a single database within an enterprise?

Can the CIO dictate and normalize only one single Database vendor within an organization?

Historically, Database was a standard method for storing data in row format in a predefined manner. Many of the enterprises would have been either associated with IBM DB2/UDBC or Sybase or Informix or Oracle or Microsoft SQLServer for providing their databases requirement along with the other software that is required by them.  Those days, the requirement for Databases were very simple, and the basic expectation was to store the data and provide business backup for a 2-tier or a 3-tier application for either recording transactions or for reporting.

During our college days, most of us would have learned through the lineage of DBMS, RDBMS and OODBMS and the other databases for real large volume of data called the datawarehouses. We have been having some other DBMS such as Document and NoSQL databases. Very recently, some of us would have heard about the Cloud Databases, columnar databases, device databases and in-memory databases. So where does that leave us? Can we still depend only on one database vendor for the enterprise- say IBM or Microsoft or SAP(merger with Sybase) or Oracle ?

The high level classification of NoSQL DB are as follows:

Data Model Performance Scalability Flexibility Complexity Functionality
Relational Database variable variable low moderate relational algebra.
Key–value Stores high high high none variable (none)
Graph Database variable variable high high graph theory
Document Store high variable (high) high low variable (low)
Column Store high high moderate low minimal

For the list of  all the various databases that are currently available on various hardware's and systems from various vendors, you can visit http://en.wikipedia.org/wiki/NoSQL


The answer is a simple 'NO'. The reason is due to the fact that the IT systems have expanded in the 21st century and have created many more use-case scenarios for the databases, data-warehouses and data storage optimization

Reference
http://www.computerworld.com/s/article/9246543/IBM_buys_NoSQL_cloud_provider_Cloudant

Data connectivity to third party applications

I often get this query from my sales: "How can I connect to this xyz application" which falls into the long-tail of connectivity problem. There are multiple ways of connecting to third party applications which may fall into any of the following categories:


  1.  Standard 2-tier or 3-tier application: Majority of the custom business application deployed at an enterprise customer site would have a database behind the application. Typically, connecting directly to the database using either our native or ODBC drivers would be the easiest data  integration point for such a custom application which does not expose any other standard application interfaces.
  2.      Cloud hosted application: Many of the new cloud based application vendors provide the standard Web-Services/REST based interface for connecting to the application
  3. On-premise or cloud application exposing CLI’s to integrate: Use standard CLI functions that are exposed by the applications and write a custom program to integrate with the flat files generated as the output of the CLI 
  4. No DB Connection, No Webservices or CLI interface - Exposed through programming API: If none of the above is possible and the third party applications needs to build an exclusive connection using a programming API(C,C++,Java) to the connecting applications.
In addition to the base connectivity decision areas as mentioned above, we would need to look at the customer’s data integration use cases for connecting into each of the following areas before defining the appropriate connectivity solution.
      a)      Volume of data & Scalability (partitioning, Bulk interfaces, CDC etc)
      b)      Velocity (performance-bulk or real time)
      c)       Security(authentication, authorizations, data staging concerns etc)
      d)      Variety(mapping application data types to your data types or transforming the specialized encoded data such as JSON, EDIFACT)  
       e)      Validity (history snapshots or real time)

Thursday, June 27, 2013

Marriage of Onpremises and Cloud Software - Problems and concerns


Social Media or Cloud Adapters break too often
Non-mature cloud vendors changes an API or interface or authentication very quickly with/without informing the data integrators or the DI vendors are not ready to make the changes in that time frame.
e.g.. Microsoft Dynamics CRM Live authentication, Facebook API changes,Twitter API v1 obsolescence.
Delay in delivering fixes and pains in implementing fixes
A change in the cloud application back-end system or the API or metadata would require a patch to be shipped pretty fast and installed quickly to continue data integration. Usually, the overheads of shipping such an on-premise installers are in days instead of hours. The customer base could be thousands.
Older versions of  adapters does not support newer cloud features
Older versions of cloud adapters were built and certified some time back. Typically R&D does not have the bandwidth to certify the old adapter versions everytime when the cloud vendor makes a change.
Reduced synergy between the  Cloud and on-premise product teams
Typically the inhouse cloud product teams does not use the latest on-premise software versions or vice versa (incase the cloud software is a flavor of the on-premise on deployment.

Tuesday, March 05, 2013

Big Data Integration: Where are we heading to?

I happen to hear that Splunk.com was ranked for the 4th best innovative company on planet earth in 2013 by an agency couple of weeks back. So what do they do? They provide a software platform for real-time operational intelligence. In simple terms they are the enterprise's Google equivalent search tool to search for information from the machine generated data such as log files and alerts from operational monitoring tools. So a first word, what is so big about it. The key is the answer "big data" where the volumes are really big and someone is trying to look for a needle in the haystack.


What does Big Data mean to you? Here is an interpretation. Big Data is something big that you cannot handle in your standard databases or data warehouses where majority of data does not make sense directly for you, but there is a possibility that you could find something interesting from it if you run the expected analytic rules on it. Being said that, we have been assuming that the data is coming in a structured format just like how Splunk handles. What about unstructured data or format/schema less data? How would your machine read and interpret such a data?

Immediately you think about using some fuzzy neural logic that could work in natural language processing (NLP) across various languages. Is that it? There is another think dimension between the NLP and structured data and it is data that comes with its own metadata such as an XML without the schema or a JSON without a schema or No-SQL database entry where every entry has its own structure. This is the next immediate topic to be resolved where the structure is coming in a free flowing format which carry's its on metadata or descriptors along with it. We need to solve this format before moving to NLP.

-still in the works...keep watching

Tuesday, February 05, 2013

Social Media Trends in 2013

These are the Social Media trends in 2013:
  • Big data will get Social
  • Big data will augment data from transactional records to customer behavior and their social graph behavior on the web along with location and device/mobile generated information.
  • Social CRM
  • Social data will be added to the CRM and marketing tools to find trends in sentiment, behavior and individual preferences of customers.
  • Social media integration with other marketing
  • People spend increasing amounts of time on social networking sites and marketers want to tap into the social media distribution channel
  • Social media monitoring tools
  • Marketing ROI counted by measuring results
  • Social media monitoring tools combined with business metrics will lead to better understanding of the value of social interaction
  • Social media budgets will be much bigger
  • By finding new ways to interact and engage consumers.
  • Expecting the spending to be double by the end of 2016.

Sybase acquisition - A winner for SAP

I believe that SAP did a very strategic acquisition by buying Sybase. When I look at the Sybase products, they were very much technologically advanced than many of the equivalent software that was available in the market such as Sybase ASE, Sybase Unwired Platform, Sybase CEP etc.

As I hear (since I was not tracking the company personally), Sybase has historically very poor sales team and hence could not really move forward in the competitive market. With the SAP acquisition and the SAP's war with Oracle, Sybase can add a lot of mileage to taking on the Oracle competition with respect to the Database market. Even though SAP sells the HANA as the next generation appliance to compete in the big data market against the Oracle, I think practically I would see Sybase products really taking on the Oracle products on a head2head competition. In addition, all the Oracle database market within SAP would be replaced by Sybase. This means that nearly 80% of the 65,000 of the SAP ERP installations which consists of Oracle database would go away for Oracle in the next few years.

Wednesday, January 23, 2013

OnPremises Software to come back

According to Gartner, the 30% organizations who have been using cloud applications would switch back to on-premises software by 2014.

The other major trends are Hybrid Cloud services which includes private and public cloud along with on-premises applications


http://www.forbes.com/sites/ericsavitz/2012/10/22/gartner-10-critical-tech-trends-for-the-next-five-years/

To augment this information with the data processing capabilities, when we look at many of the cloud application service providers, they either restrict you from accessing their cloud applications with their data processing capabilities
a) Number of API calls per day (for eg. Salesforce.com 5000, 25000, 100,000,000 calls)
b) the total data volume that you can process per day (for example: 500MB per day)
c) total data storage that you can work (eg. Salesforce.com 1GB, 10GB or 100GB)
d)  Data Processing Units per month (eg. max 250,000 DPU from Datasift)

BigData & NoSQL databases- The market trend


During the past couple of years there have been a lot of technology news around the NoSQL databases especially with the Big Data Story revving up. Some of them are MongoDB, Cassandra, Hbase, Couch DB, Redis, membase, Neo4J, CouchDB, Accumulo, TripleStore, membase, DynamoDB etc. Most of these NoSQL databases can be categorized into few groups, they are as follows:
  1. Key-Value Stores:
    This technology uses a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. But it is inefficient when you are only interested in querying or updating part of a value
    Examples: Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon SimpleDB, Riak
    Typical Applications: Content caching (Focus on scaling to huge amounts of data, designed to handle massive load), logging, etc.
    Strengths:
    Fast lookups
    Weakness:
    Stored data has no schema
  2. Column family store:
    This type was created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family.
    Examples: Cassandra, HBase, Riak
    Typical Applications: Distributed file systems
    Strengths:
    Fast lookups, good distributed storage of data
    Weakness:
    Very low-level API
  3. Document store:
    These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key.  Document databases support querying more efficiently.
    Examples: CouchDB, MongoDb, Elastic Search
    Typical Applications: Web applications, Content Management systems
    Strengths:
    Tolerant of incomplete data
    Weakness:
    Query performance, no standard query syntax
  4. Graph Databases:This model follows the flexible graph model which can scale across multiple machines. This does not have the tables of rows and columns and the rigid structure of SQL, . NoSQL databases do not provide a high-level declarative query language like SQL to avoid overtime in processing. Rather, querying these databases is data-model specific. Many of the NoSQL platforms allow for RESTful interfaces to the data, while other offer query APIs.
    Examples: Neo4J, InfoGrid, Infinite Graph
    Typical Applications: Social networking, Recommendations etc
    Strengths: Graph algorithms e.g. shortest path, connectedness, n degree relationships, etc.
    Weakness: Has to traverse the entire graph to achieve a definitive answer. Not easy to cluster.
 Eventhough there are several of these NoSQL databases in the market, I am not sure whether they can be compared as apples to apples and consider most of them specialized for very specific horizontal usecases or vertical needs. Hence there is one single NoSQL database that can solve your whole enterprise wide problem and hence the implementation of these NoSQL DB would be decided by different departments within the organization.

NoSQL databases are not a replacement for the conventional RDBMS. It is expected to supplement or   augment the data for other business needs. It is also expected that in the coming years some of the established Database vendors such as Oracle, Sybase, Microsoft, IBM would take over some of the active NoSQL databases and merge them into their portfolio.


The worldwide NoSQL software market is expected to reach $3.4 Billion by 2018. It is expected to grow  at a CAGR of 21% between 2013 and 2018. NoSQL market is expected to generate $14 Billion in revenues for the period 2013 – 2018.

The NoSQL market has been very active in the past 1 year especially with several venture capital funding, mergers & acquisitions and new product offerings:
  • September 2012 – In-Q-Tel, the venture investment arm of the U.S. Intelligence Community, invests in 10gen, developer of the MongoDB open source database;
  • August 2012 – Sqrrl, a National Security Agency’s spin-off startup, raised $2Mln to develop NoSQL database Accumulo
  • July 2012 – NuoDB raises $10Mln to develop cloud NoSQL database that behaves like traditional SQL
  • June 2012 – Cloudant launches NoSQL data layer service for Windows Azure;
  • May 2012 – 10gen secures $42 million in venture funding;
  • January 2012 – Amazon launches DynamoDB, a new NoSQL data service;
  • January 2012 – Oracle announces the availability of Oracle Big Data Appliance and partners with Cloudera to provide an Apache Hadoop distribution and tools for the Big Data Appliance;
  • November 2011 – Cloudera Inc., the provider of Apache Hadoop-based data management software and services, raises $40 million.
  • November 2011 – Basho, the company behind Riak, raises $5 Mln.  

Some Usecases of When to Use NoSQL: 

  • Logging/Archiving. Log-mining tools are handy because they can access logs across servers, relate them and analyze them.
  • Social Computing Insight. Many enterprises today have provided their users with the ability to do social computing through message forums, blogs etc.
  • External Data Feed Integration. Many companies need to integrate data coming from business partners. Even if the two parties conduct numerous discussions and negotiations, enterprises have little control over the format of the data coming to them. Also, there are many situations where those formats change very frequently – based on the changes in the business needs of partners.
  • Front-end order processing systems. Today, the volume of orders, applications and service requests flowing through different channels to retailers, bankers and Insurance providers, entertainment service providers, logistic providers, etc. is enormous. These requests need to be captured without any interruption whenever an end user makes a transaction from anywhere in the world. After, a reconciliation system typically updates them to back-end systems as well as updates the end user on his/her order status.
  • Enterprise Content Management Service. Content Management is now used across companies’ different functional groups, for instance, HR or Sales. The challenge is bringing together different groups using different meta data structures in a common content management service.
  • Real-time stats/analytics. Sometimes it is necessary to use the database as a way to track real-time performance metrics for websites (page views, unique visits, etc.)  Tools like Google Analytics are great but not real-time — sometimes it is useful to build a secondary system that provides basic real-time stats. Other alternatives, such as 24/7 monitoring of web traffic, are a good way to go, too.

What Type of Storage Should you use?

Here’s a short summary that might help you make your decision:
NoSQL
  • Storage should be able to deal with very high load
  • You do many write operations on the storage
  • You want storage that is horizontally scalable
  • Simplicity is good, as in a very simple query language (without joins)
RDBMS
  • Storage is expected to be high-load, too, but it mainly consists of read operations
  • You want performance over a more sophisticated data structure
  • You need powerful SQL query language


The other big news is around the Cloud databases such as Google BigQuery, Amazon Redshift and Cloudera Impala.
This big data space would be hotter by the end of 2013  as many they would solve many of the big-data and analytical problems for the businesses. We would also see several investments happening from the established vendors in this space and also some consolidation by M&A by the end of 2014/early 2015. A nice space to watch!

News from Feb 2014: IBM buys NoSQL cloud provider Cloudant
http://www.computerworld.com/s/article/9246543/IBM_buys_NoSQL_cloud_provider_Cloudant