Tuesday, March 05, 2013

Big Data Integration: Where are we heading to?

I happen to hear that Splunk.com was ranked for the 4th best innovative company on planet earth in 2013 by an agency couple of weeks back. So what do they do? They provide a software platform for real-time operational intelligence. In simple terms they are the enterprise's Google equivalent search tool to search for information from the machine generated data such as log files and alerts from operational monitoring tools. So a first word, what is so big about it. The key is the answer "big data" where the volumes are really big and someone is trying to look for a needle in the haystack.


What does Big Data mean to you? Here is an interpretation. Big Data is something big that you cannot handle in your standard databases or data warehouses where majority of data does not make sense directly for you, but there is a possibility that you could find something interesting from it if you run the expected analytic rules on it. Being said that, we have been assuming that the data is coming in a structured format just like how Splunk handles. What about unstructured data or format/schema less data? How would your machine read and interpret such a data?

Immediately you think about using some fuzzy neural logic that could work in natural language processing (NLP) across various languages. Is that it? There is another think dimension between the NLP and structured data and it is data that comes with its own metadata such as an XML without the schema or a JSON without a schema or No-SQL database entry where every entry has its own structure. This is the next immediate topic to be resolved where the structure is coming in a free flowing format which carry's its on metadata or descriptors along with it. We need to solve this format before moving to NLP.

-still in the works...keep watching