The notion of Big Data just being a buzz word has started changing now. Most recently, organizations have started realizing the substantial use cases of this phenomenon and have adapted technologies to turn variety of theoretical concepts into reality. The question is if they are doing it right?

When it comes to designing the architecture of a big data platforms, serious considerations need to be made. The motivation to do that, there is a high chance the designed or deployed system wasn’t necessary and new solution introduced even bigger challenges. Make sure you don’t build a tank just to kill a fly! Which could be a huge waste of time and resources.

I usually follow a simple approach in order to find out what type of right architecture is required. So, I came up with my own model which I call a WWH (What, Why and How?) Model. Being a generic in nature, the model could comprehensibly also be used to discover requirements and help designing solution in various industries. Following is the detailed description of practical implementation:

Start with “What”:

Asking these simple questions to your customer would give a nice understanding to get started:

  1. What is their problem statement? e.g; Dealing with data processing bottlenecks, data leakage, data latency
  2. What is that customer wants to achieve? Their goal or vision! e.g; Building a forecasting system using predictive analytics or a customer retention/churn rate or plain business insights and KPIs
  3. What are the short and long term goals customer might has? This will help prioritising different solutions. And to give your customer direction of adaptability for the long term future benefits.
  4. What are the limitations or road blocks? e.g; resources – human resources, financial resources, hardware resources or data accessibility

Proceed with “Why”:

Most of the times organizations are just following a trend to join the band wagon and want to beat their competitors by adopting a latest technology without a proper consideration of whether it is required or not! Asking “Why they want to do what they want?”, gives a lot of to-the-point information. At this stage design requirements become much more transparent. It’s important for both the Big Data/Solutions Architects and customer to know why they need to implement this new solution. Following question may help in finding those hidden artifacts:

  1. Why they want to roll-out a Big Data solution? Is that data really consists of 100s of GBs/TBs/PBs?
  2. Why is there a need of Hadoop cluster? Probably spinning up a new cluster is not required. If the end result is just to build business insights or KPIs, implementing Hadoop Spark cluster may introduce new challenges which can hinder making those insights available way slower as compared to any other Data Warehouse solution in the market. Mostly Hadoop based solutions contain batch oriented data processing jobs. Try other alternatives: e.g; AWS Redshift, etc.

End with, “How”:

It’s now the time to execute. Being a Big Data & Solutions Architect, this is one of the most critical, challenging and my favorite phase. Once both What and Why are defined, it’s now time to present how different solutions would look like. It is highly essential that your solution is helping them to define:

  1. How they may achieve their goals? By providing relevant choices/alternatives of few solutions.
  2. How to implement a Cost Efficient, Fault-tolerant, Performance Optimized and Highly Scalable solution? Although it is quite difficult to find an ideal line there, but doing comparative analysis would help!

I hope these prerequisites would add a great value in constructing a right architecture.

Good Luck!