Over the last few years, we have seen the rise of Big Data in SMAC (social, mobile, analytics, and cloud) domain. The following excerpt talks about another parallel concept called “Big Service.”
“Big Service” is a paradigm of handing a stream of services through a pipeline processing engine. The following are the core parts of this architecture.
Data Processing pipeline
The “Big Service” is meaningless without “Data.” So, the following concept describes how this paradigm will work with its various nuts and bolts focusing on a registry and discovery platform called “Service Cloud” and a three-layer data processing engine which uses this cloud platform.
Service streams represent a service current paradigm where a continuous flow of service information passes through the pipeline. The service definition and metadata are within a data container. The data container hides the complexity and low-level details of streaming information. They can pass as a simple XML or JSON based structure over HTTP protocol. The service stream knows what are the services need to be executed and in which order as well what actions need to be taken when any service call fails. They contact the service registry where all the services keep the metadata. This is an integral part of the data processing platform and resides in the Service Cloud. The Service Cloud is an on-premise deployment of all the microservices. It promotes containerization of low-level/granular services. The purpose of service stream is to aggregate invocation of several related services to perform transactional or non-transactional operations. The service stream can also understand and co-relate among related and dependent microservices. The ingestion layer acts as the orchestration platform to bring all the required information together and pass that onto the data processing pipeline which acts as the filtering layer to send information to the interested parties. A typical example will be an eCommerce system passing the product information to the Data dynamics layer which reads and understands the complexity of the data and then pass it on ingestion layer which may call a product review service on Service Cloud to get the review and then aggregates data. At the end, it posts the information on data processing pipeline which mentions who all are the interested parties for such information. The sink system can be a message broker like Kafka or an enterprise system like a CRM system.
The service stream paradigm creates a loosely coupled layer between services and service callers. The caller service can create composite service map so that it can pass it as a full-service stream instead of acting on any ESB layer. This paradigm is mostly fit for asynchronous processing of information. But the caller may also use stream callbacks to consume the response of the request it has passed over to the streaming layer. The model also defines data current which is a stream of data containers. Most of the data gets processed in-memory for faster processing though there are options for performing store-forward operations. The service stream will take care of service off-loading from data intensive operations in a typical high volume data processing platform. The Service Cloud will help to deploy microservices in a container model.
The ingestion layer is the heart of the platform giving a flavor of ESB and service pipelines. The ingestion layer can be fault-tolerated to support failover. The data dynamics layer can validate the information in the data container and take necessary action to the caller or publisher of the message. Since data containers may also encapsulate the address of the source destination, the Data Dynamics layer can take appropriate actions based on the outcome of the validation. The variability and type of data container decide what ingestion layer be used. There can be a variety of ingestion components which can be configured at the time of deployment and register with specific microservices on Service Cloud. The ingestion layer can cache service information in its own domain. The ingestion layer can be deployed as a standalone cluster with its own dataset and tenant. This means every cluster of ingestion components can own its information in separate schema or non-relational data stores. Normally it is advisable to deploy the related service streams in the same ingestion cluster so that cluster to cluster communication can be avoided. In case any service stream is not processed by an ingestion layer, there will be a default ingestion cluster as a fallback. But having streams processed through default ingestion cluster may create performance related issues.
The data processing pipeline can aggregate information from multiple service stream. It can store the state of individual output from a service stream in case all the related information is not available. It can store a service map and process data based on the availability of required output from service streams. In case any output is pending while processing data, it can maintain the state of the available information and trigger further processing after relevant information is received. It can also enable the way streams can be processed. In some cases, it might happen that certain requests are of lower priority and can be processed later and not in real-time. It can schedule tasks to deal with such requests through a special pipeline called "tasked pipeline." The Data Dynamics can also decide the priority of data container and pass that to the pipeline.