Hot Shot 009 – Dealing With M/SOA Data (Part 1) [Podcast]
Learn about how microservices deal with processing data, both at rest and in use, in this podcast on microservices architecture.
Join the DZone community and get the full member experience.
Join For FreeThese are my verbatim notes to the PEAT UK podcast:
Hello there, once again, to another hot shot. My name is Peter Pilgrim, DevOps specialist, Java enterprise and platform engineer, and Java Champion.
How Do Microservices M/SOA Deal With Data?
Software developers certainly write code that processes data. We can become blind and less caring about the data that we are asked to process. I certainly understand this way of working when building a service. All we care about is, where does that data arrive? Where do we need to push that data? Do we need to save this very important and critical data into a persistent store? Do we need to transform the data from one form to another form in the middle of the operation?
The coding data process is essentially the life of an engineer. Information arrives on a message gateway, queue, or channel, and generally, we do something useful with it. Actually, we apply business logic to it.
The same happens to the data that arrives from the UI. Nowadays it might be a JSON payload from some sexy JavaScript framework. It is still data, and ordinarily, we might have to filter, reduce, and map that data to another form, namely an object representation in our favorite programming language. Of course, Java has all of this boilerplate with getters and setters, and we might be lucky enough to use a framework to transform JSON into objects and back again. If we are extremely lucky, we write our microservice code in Groovy, Scala, or even Kotlin, since those modern JVM languages support case classes (or data classes).
Even then, we are programming with data. Do understand we actually the data that we are working with?
Data at Rest
In information technology, data at rest means inactive data that is stored physically in any digital form (e.g. databases, data warehouses, spreadsheets, archives, tapes, off-site backups, mobile devices, etc).
This is information that developers of microservices have to know. What happens to your customer's data in that microservice? After the microservice has processed the information, is this data further published onwards to another microservice? Where does it flow to? What is the conduit? Is it RabbitMQ, Kafka, or something else? What happens if the data is saved to a database? Is it persisted to AWS RDS like Aurora in the cloud inside your production environment?
Data at rest, for engineers, is the storage state of the data. You can think of this as long-term storage. Your business might have some concerns about where the data is stored. For instance, in a cloud environment, can you be sure that the data is fixed to a geographical region of the world?
AWS has the concept of regions in most of its services. I can develop a microservice application in the UK and ensure that my VPC and availability zone are fixed to eu-west-2 (EU London), I am pretty certain that my data at rest is fixed to the UK. I know this because I can launch EC2 instances in this region only. As long as I configure the security groups and ensure that the applications and service running on EC2 cannot see outside of this region, I can be reasonably confident that my data at rest is inside the UK, and if it is is not, I would take this up with Amazon. Well, not me, but this billion dollar company wouldn't have a leg to stand on if it was proven that data can leak across regions with its infrastructure.
Data at rest is clearly important for customer data, especially in the European Union, including the UK; until March 2019, we need to think about GDPR.
Data in Use
Data in use has also been taken to mean "active data" in the context of being in a database or being manipulated by an application. For example, some enterprise encryption gateway solutions for the cloud claim to encrypt data at rest, data in transit, and data in use.
This is where we, as software engineers, have a great deal or responsibility. We write the service/MSOA that transforms customer data into real outcomes, costs, and benefits.
For platform engineering and DevOps, understanding where data is in use and at rest is also very important. For example, AWS has an Elastic Kubernetes Service (EKS), which is available to regions only in the USA/North America at the moment. Therefore, it precludes any work where the application and data must be kept only in Europe or the UK at the moment.
Another example is AWS Simple Storage Service (S3), where you can configure a bucket to serve static web pages from an EC2 instance. If your application and friends use S3 as a very simple file transfer mechanism, then you have to protect and continuously review S3 mechanisms and permissions in order to ensure only authenticated and authorized users and applications can see the data at rest that they are entitled to.
Salient Notes
GDPR - General Data Protection Regulation which came into force on 25th May 2018 for citizens of the European Union
Safe harbor - customer data privacy issues. Essentially sharing data across The Pond (The Atlantic Ocean) concerning global conglomerates and international institutions.
Data at Rest - definition from Wikipedia and also includes Data in Use.
Amazon Relational Database Service (RDS) - official documentation on a bespoke MySQL implementation and extension in the cloud. See also Amazon Aurora DB - Wikipedia
Amazon Elastic Container Service for Kubernetes Service (EKS)
Security breach #1 - Tech Republic of an S3 leaked data breach with FedEx customers
Security breach #2 - 7% of All Amazon S3 Servers Are Exposed, Explaining Recent Surge of Data Leaks
Security breach #3 - Data on 123 Million US Households Exposed Due to Misconfigured AWS S3 Bucket and also Info Security Every Single American Household
That's it for this hotshot. I hope you liked it.
Published at DZone with permission of Peter Pilgrim, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments