DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Data

Data is at the core of software development. Think of it as information stored in anything from text documents and images to entire software programs, and these bits of information need to be processed, read, analyzed, stored, and transported throughout systems. In this Zone, you'll find resources covering the tools and strategies you need to handle data properly.

icon
Latest Refcards and Trend Reports
Trend Report
Data Pipelines
Data Pipelines
Refcard #368
Getting Started With OpenTelemetry
Getting Started With OpenTelemetry
Refcard #378
Apache Kafka Patterns and Anti-Patterns
Apache Kafka Patterns and Anti-Patterns

DZone's Featured Data Resources

Data Pipeline Essentials
Refcard #371

Data Pipeline Essentials

Trend Report

Data Persistence

At the core of every modern application is an endless, diverse stream of data and with it, an inherent demand for scalability, increased speed, higher performance, and strengthened security. Although data management tools and strategies have matured rapidly in recent years, the complexity of architectural and implementation choices has intensified as well, creating unique challenges — and opportunities — for those who are designing data-intensive applications.DZone’s 2021 Data Persistence Trend Report examines the current state of the industry, with a specific focus on effective tools and strategies for data storage and persistence. Featured in this report are observations and analyses of survey results from our research, as well as an interview with industry leader Jenny Tsai-Smith. Readers will also find contributor insights written by DZone community members, who cover topics ranging from microservice polyglot persistence scenarios to data storage solutions and the Materialized Path pattern. Read on to learn more!

Data Persistence
The Everything Guide to Data Collection in DevSecOps
The Everything Guide to Data Collection in DevSecOps
By John Vester CORE
An Entity to DTO
An Entity to DTO
By Andrey Belyaev CORE
Math Behind Software and Queueing Theory
Math Behind Software and Queueing Theory
By Bartłomiej Żyliński CORE
A Real-Time Supply Chain Control Tower Powered by Kafka
A Real-Time Supply Chain Control Tower Powered by Kafka

A modern supply chain requires just-in-time production, global logistics, and complex manufacturing processes. Intelligent control of the supply network is becoming increasingly demanding, not just because of Covid. At the same time, digitalization is generating exponentially growing data streams along the value chain. This article explores a solution that links the control with the data streams leveraging Apache Kafka to provide end-to-end visibility of your supply chain. All the digital information flows into a unified central nervous system enabling comprehensive control and timely response. The idea of the supply chain control tower becomes a reality: An integrated data cockpit with real-time access to all levels and systems of the supply chain. What Is a Supply Chain Control Tower? A supply chain is a network of companies and people that are involved in the production and delivery of a product or service. Today, many supply chains are global and involve intra-logistics, widespread enterprise logistics, and B2B data sharing for end-to-end supply chains. Supply Chain Management (SCM) Supply Chain Management (SCM) involves planning and coordinating all the people, processes, and technology involved in creating value for a company. This includes cross-cutting processes, including purchasing/procurement, logistics, operations/manufacturing, and others. Automation, robustness, flexibility, real-time, and hybrid deployment (edge + cloud) are essential for future success and a pre-requirement for end-to-end visibility across the supply chain, regardless of industry. Source: Aadini (YouTube Channel) The challenge with logistics and supply chain is that you have a lot of commercial off-the-shelf applications (ERP, WMS, TMS, DPS, CRM, SRM, etc.) in place that are highly specialized and highly advanced in their function. Challenges of Batch Workloads Across the Supply Chain Batch workloads create many issues in a supply chain. The Covid pandemic showed this: Missing information: Intra-logistics within a distribution or fulfillment center, across buildings and regions, and between business partners. Cost rising: Lower overall equipment efficiency (OEE). Lower availability. Increase cost for production and buyers (B2B and B2C). Customers churning: Bad customer experience and contract discussion as a consequence of failing with delivery guarantees and other SLAs. Revenue decreasing: Less production and/or sales means less revenue. The specialized systems like an ERP, WMS, TMS, DPS, CRM, or SRM are often modernized and real-time or nearly real-time in their operation. However, the integrations between them are often still not real-time. For instance, batch waves in a WMS are being replaced with real-time order allocation processes, but the link between the WMS and the ERP is still batch. Real-Time End-to-End Monitoring A supply chain control tower provides end-to-end visibility and real-time monitoring across the supply chain: Source: Aadini (YouTube Channel) The control tower helps answer questions such as: What is happening now? Why is this happening? What might happen next? How can we perform better? What if...? A supply chain control tower combines technology, processes, and people. To learn more, check out Aadini’s “Supply Chain Control Tower” on YouTube for a simple explanation. Most importantly, the evolution of software enables real-time automation instead of just human visual monitoring. A Kafka-Native Supply Chain Control Tower Apache Kafka is the de facto standard for data streaming. It enables real-time data integration and processing at any scale. This is a crucial requirement for building a supply chain control tower. Let’s recap the added value of Kafka for improving the supply chain in one picture of business value across use cases: If you look at the definitions of a supply chain control tower, well, that’s more or less the same. A supply chain control tower is only possible with real-time data correlation, which is why Kafka is the perfect fit. Global Data Mesh for Real-Time Data Sharing Supply chains are global and rely on the collaboration between independent business units within enterprises, and B2B communication: Source: Aadini (YouTube Channel) The software industry pitches a new paradigm and architecture pattern these days: The Data Mesh. It allows independent, decoupled domains using its own technology, API, and data structures. But the data sharing is standardized, compatible, real-time, and reliable. The heart of such a data mesh beats real-time with the power of Apache Kafka: Kafka-native tools like MirrorMaker or Confluent Cluster Linking enable reliable real-time replication across data centers, clouds, regions, and even between independent companies. Kafka as the Data Hub for a Real-Time Supply Chain Control Tower A supply chain control tower coordinates demand, supply, trading, and logistics across various domains, technologies, APIs, and business processes. The control tower is not just a Kafka cluster—but Kafka is underpinning business logic, integrations, and visualizations that aggregate all the data from the various sources and make it more valuable from end to end in real-time: The data communication happens in real-time, near real-time, batch, or request-response. Whatever the source and sink applications support. However, the heart of the enterprise architecture is real-time and scalable. This enables future modernization and replacing legacy batch systems with modern real-time services. Visualization and Automatic Resolutions As core components of a modern control tower The first goal of a supply chain control tower is end-to-end visibility in real-time. Creating real-time dashboards, alerts, and integration with 3rd party monitoring tools is a massive value. However, the built pipeline enables several additional new use cases. Data integration and correlation across legacy and modern applications is the significant change for innovation regarding supply chain optimization. Instead of visualizing in real-time what happens, new applications can take automated actions and decisions to solve problems automatically based on real-time information. Real-World Kafka Examples Across the supply chain for visibility and automation Modern supply chains rely on real-time data across suppliers, fulfillment centers, warehouses, stores, and consumers. Here are a few real-world examples across verticals using Apache Kafka as a real-time supply chain control tower: Retail: Walmart monitors information across the supply chain, including distribution centers, fulfillment centers, stores, mobile apps, and B2B interfaces. Examples for use cases are local and global inventory management and replenishment systems. Manufacturing: BMW ingests data from its smart factories into the cloud to provide access to the data for visibility and new automation applications. Baader built a real-time locating system (RTLS) to optimize end-to-end monitoring and advanced calculations like routing and estimated arrival in real time. Logistics: DHL, Swiss Post, Hermes, SBB, and many other companies in the logistics and transportation space built real-time track and trace platforms to optimize the visibility and efficiency of their business processes. Mobility Services: Almost any ride-hailing or food delivery app across the planet like Uber, Lyft, Grab, or Free Now (former MyTaxi) leverages Apache Kafka for real-time notifications to the front-end and data correlation in the backend to implement real-time driver-rider-matching, routing, payment, fraud detection, and many other use cases. Food: Albertsons, Instacart, Domino’s Pizza, Migros, and many other food-related enterprises improve business processes across the food value chain with real-time services. No matter what industry you work in. Learn from other companies across verticals. Challenges to improving the supply chain are very similar everywhere. Postmodern ERP/MES/TMS Powered by Apache Kafka While the supply chain control tower provides end-to-end visibility, each system has similar requirements. Enterprise resource planning (ERP) has existed for many years. It is often monolithic, complex, proprietary, batch, and not scalable. The same is true for MES, TMS, and many other platforms for logistics and supply chain management. Postmodern ERP represents the next generation of ERP architectures. It is real-time, scalable, and open. A Postmodern ERP combines open source technologies and proprietary standard software. Many solutions are cloud-native or even offered as fully managed SaaS. Like end-users, software vendors leverage data streaming with Apache Kafka to implement a Postmodern ERP, MES, CRM, SRM, or TMS: Logistics and Supply Chain Management Require Real-Time Data Data streams grow exponentially along the value chain. End-to-end visibility of the supply chain is crucial for optimized business processes. Real-time order management and inventory management are just two examples. Visualization and taking actions in real-time via automated processes either reduce cost and risk or increases the revenue and customer experience. A real-time supply chain control tower enables this kind of innovation. The foundation of such a strategic component needs to be real-time, scalable, and reliable. That’s why the data streaming platform Apache Kafka is the perfect fit for building a control tower. Various success stories across industries prove the value of data streaming across the supply chain. Even software vendors of products for the supply chain like ERP, WMS, TMS, DPS, CRM, or SRM build their next-generation software on top of Apache Kafka. How do you bring visibility into your supply chain? Did you already build a real-time control tower? What role plays data streaming in these scenarios? Let’s connect on LinkedIn and discuss it!

By Kai Wähner CORE
Explainer: Building High Performing Data Product Platform
Explainer: Building High Performing Data Product Platform

Data is everything, and enterprises are pushing the limits to capture, manage, and utilize it optimally. However, given the monumental rise of Web3, companies may not be able to sustain themselves with conventional data management techniques. Instead, they are inclined toward futuristic analytics and need a stronger architecture to manage their data products. As per Forbes, in 2019, 95% of organizations were not managing their unstructured data and ultimately lost out on valuable opportunities. As we know, a "data product" is an engineered, reusable data asset with a targeted purpose. A data product platform integrates with multiple source systems, processes data, and makes it instantly available to all stakeholders. Building a high-performing data product needs a strategy and clarity of the essential functionalities. Here's a quick overview. Expectation Scoping: The Bare Minimum That the Platform Should Deliver An ideal data product platform should enable end-to-end test data management of products across various stages such as engineering, testing, deploying, and monitoring. This should accommodate a broad variety of workloads. It should enable the data teams to seamlessly define and maintain the metadata for data products, including the schema, policies, connectors, policies, governance, etc. Furthermore, given the rapid increase in the rate of data generation in real-time, the platform should manage every data set efficiently while providing them on demand. Among others, the data product platform should deliver the following functions: End-to-end product monitoring: Total visibility about the performance of the data product, its utility, and ability to start/pause/stop data flow. Product cataloging: Enable users to build relationships between data products, the sources, and the end users, all represented in a knowledge graph. Built-in Business Analytics: Create and analyze smart reports through interactive dashboards. Support iPaaS model: Public cloud deployment with SSO authentication, multi-tenancy support, etc. Faster Streaming: In-the-Moment Data Processing Like Never Before At the core of every analytics application is a high-performing data product platform that streams and manages data in real-time. In pursuing the same, data enterprises are building competitive fabric and mesh products. K2view, for example, has successfully implemented micro databases to enable business entity-level data storage. This means their data product platform manages millions of micro databases, each of which only stores data for a particular business entity. It empowers the platform to achieve enterprise-grade resilience, scale, and agility. Their platform performs end-to-end management in iterative delivery cycles that cover design and engineering, deployment and testing, and monitoring and maintenance. In addition, since billions of micro databases can be managed in a single deployment, it ensures faster filtering and streaming. Data Managers Centric: Interactive, Easy to Adapt, and All Inclusive Data managers are an integral asset as they develop strategies and define performance metrics for the platform. While they should have a diverse range of analytical and DevOps skills, an intelligent platform can utilize them optimally. To maximize business value and return on data investment, build a low-code/no-code integrated development environment (IDE). This will enable the managers to seamlessly execute the building, testing, and deploying of products. Furthermore, it would simplify schema creation, transformation logic, orchestration, integrations, and more. Data managers must bridge the gaps between data consumers and engineers across domains by continuously communicating their needs. They capture the requirements of the consumers and collaborate with engineers and scientists. A high-performing and interactive data product platform would automate the process as far as feasible. Flexible Architecture: Adaptive to Different System Landscapes and Operational Models Like all contemporary products, data platforms should also be deployed across the system landscape – on-premise, in the cloud, or both. Not only it maximizes flexibility but makes the solution easily scalable. You can choose either data mesh or fabric as your preferred architecture as a fundamental construct. While a fabric follows a modular and centralized framework, a mesh implements a federated data strategy. In a data fabric, the centralized structure integrates the data with the analytical tools while enabling a central entity to define the data products. Furthermore, it adapts to changes over time based on metadata analysis. On the contrary, the mesh decentralizes the architecture. It enables the business domains to define and create the data products as per need. With anonymity, business domains can create and scale product-centric solutions in real time and with more finesse. While there's no bottom line to the fabric v mesh debate, a data product platform should provide enough flexibility to adapt to both. Get Ready for Web3 Data-driven organizations have an early entrant opportunity to excel in the web3. However, to stem value from trusted and qualitative data sets, they should include data product platforms that embrace all the above features. Since data products drive operational and analytical workloads, enterprises should steer all focus to build a high-performing data management landscape. In this blog, I discussed the key components that make up a web3-ready data product platform.I hope this cover all. I'd like to know more about your data product platform expectations and strategy.

By Yash Mehta
What Should You Know About Graph Database’s Scalability?
What Should You Know About Graph Database’s Scalability?

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios. This, on the one hand, is heavily influenced by the sustained rising and popularity of big-data processing frameworks, including but not limited to Hadoop, Spark, and NoSQL databases; on the other hand, as more and more data are to be analyzed in a correlated and multi-dimensional fashion, it's getting difficult to pack all data into one graph on one instance, having a truly distributed and horizontally scalable graph database is a must-have. Do Not Be Misled Designing and implementing a scalable graph database system has never been a trivial task. There is a countless number of enterprises, particularly Internet giants, that have explored ways to make graph data processing scalable. Nevertheless, most solutions are either limited to their private and narrow use cases or offer scalability in a vertical fashion with hardware acceleration which only proves, again, that the reason why mainframe architecture computer was deterministically replaced by PC-architecture computer in the 90s was mainly that vertical scalability is generally considered inferior and less-capable-n-scalable than horizontal scalability, period. It has been a norm to perceive that distributed databases use the method of adding cheap PC(s) to achieve scalability (storage and computing) and attempt to store data once and for all on demand. However, doing the same cannot achieve equivalent scalability without massively sacrificing query performance on graph systems. Why scalability in a graph (database) system is so difficult (to get)? The primary reason is that graph system is high-dimensional; this is in deep contrast to traditional SQL or NoSQL systems, which are predominantly table-centric, essentially columnar and row stores (and KV stores in a more simplistic way) and have been proved to be relatively easy to implement with a horizontally scalable design. A seemingly simple and intuitive graph query may lead to deep traversal and penetration of a large amount of graph data, which tends to otherwise cause a typical BSP (Bulky Synchronous Processing) system to exchange heavily amongst its many distributed instances, therefore causing significant (and unbearable) latencies. On the other hand, most existing graph systems prefer to sacrifice performance (computation) while offering scalability (storage). This would render such systems impractical and useless in handling many real-world business scenarios. A more accurate way to describe such systems is that they probably can store a large amount of data (across many instances) but cannot offer adequate graph-computing power — to put it another way, these systems fail to return with results when being queried beyond meta-data (nodes and edges). This article aims to demystify the scalability challenge(s) of graph databases, meanwhile putting a lot of focus on performance issues. Simply put, you will have a better and unobstructed understanding of scalability and performance in any graph database system and gain more confidence in choosing your future graph system. There is quite a bit of noise in the market about graph database scalability; some vendors claim they have unlimited scalability, while others claim to be the first enterprise-grade scalable graph databases. Who should you believe or follow? The only way out is to equip yourself with adequate knowledge about scalability in graph database systems so that you can validate it by yourself and don't have to be misled by all those marketing hypes. Admittedly, there are many terms for graph database scalability; some can be dearly confusing, to name a few: HA, RAFT or Distributed Consensus, HTAP, Federation, Fabric, Sharding, Partitioning, etc. Can you really tell the difference, sometimes minute and often with overlapping features, of all these terms? We'll unravel them all. 3 Schools of Distributed Graph System Architecture Designs First, make sure you understand the evolution pathway from a standalone (graph database) instance to a fully distributed and horizontally scalable cluster of graph database instances. Graph 1: Evolution of Distributed (Graph) Systems. A distributed system may take many forms, and this rich diversification may lead to confusion. Some vendors misleadingly (and ironically) claim their database systems to be distributed evenly on a single piece of underpinning hardware instance, while other vendors claim their sharded graph database cluster can handle zillion-scale graph datasets while, in reality, the cluster can't even handle a typical multi-hop graph query or graph algorithm that reiteratively traverse the entire dataset. Simply put, there are ONLY three schools of scalable graph database architecture designs, as captured in the table: Table 1: Comparison of three schools of Distributed Graph Systems. HTAP Architecture The first school is considered a natural extension to the master-slave model, and we are calling it distributed consensus cluster where typically three instances form a graph database cluster. The only reason to have three or an odd number of instances in the same cluster is that it's easier to vote for a leader of the cluster. As you can see, this model of cluster design may have many variations; for instance, Neo4j's Enterprise Edition v4.x supports the original RAFT protocol, and only one instance handles workload, while the other two instances passively synchronize data from the primary instance — this, of course, is a naïve way of putting RAFT protocol to work. A more practical way to handle workload is to augment the RAFT protocol to allow all instances to work in a load-balanced way. For instance, having the leader instance handle read-and-write operations, while the other instances can at least handle read type of queries to ensure data consistencies across the entire cluster. A more sophisticated way in this type of distributed graph system design is to allow for HTAP (Hybrid Transactional and Analytical Processing), meaning there will be varied roles assigned amongst the cluster instances; the leader will handle TP operations, while the followers will handle AP operations, which can be further broken down into roles for graph algorithms, etc. The pros and cons of graph system leveraging distributed consensus include: Small hardware footprint (cheaper). Great data consistency (easier to implement). Best performance on sophisticated and deep queries. Limited scalability (relying on vertical scalability). Difficult to handle a single graph that's over ten billion-plus nodes and edges. What's illustrated below is a novel HTAP architecture from Ultipa with key features like: High-Density Parallel Graph Computing. Multi-Layer Storage Acceleration (Storage is in close proximity to compute). Dynamic Pruning (Expedited graph traversal via dynamic trimming mechanism). Super-Linear Performance (i.e., when computing resource such as the number of CPU cores is doubled, the performance gain can be more than doubled). Graph 2: HTAP Architecture Diagram by Ultipa Graph. Note that such HTAP architecture works wonderfully on graph data size that's below 10B nodes + edges. Because lots of computing acceleration are done via in-memory computing, and if every billion nodes and edges consume about 100GB of DRAM, it may take 1TB of DRAM on a single instance to handle a graph of ten billion nodes and edges. The upside of such design is that the architecture is satisfactory for most real-world scenarios. Even for G-SIBs (Globally Systemically Important Banks), a typical fraud detection, asset-liability management, or liquidity risk management use case would consume around one billion data; a reasonably sized virtual machine or PC server can decently accommodate such data scale and be very productive with an HTAP setup. The downside of such a design is the lack of horizontal (and unlimited) scalability. And this challenge is addressed in the second and third schools of distributed graph system designs (see Table 1). The two graphs below show the performance advantages of HTAP architecture. There are two points to watch out for: Linear Performance Gain: A 3-instance Ultipa HTAP cluster's throughput can reach ~300% of a standalone instance. The gain is reflected primarily in AP type of operations such as meta-data queries, path/k-hop queries, and graph algorithms, but not in TP operations such as insertions or deletions of meta-data because these operations are done primarily on the main instance before synchronized with secondary instances. Better performance = Lower Latency and Higher Throughput (TPS or QPS). Graph 3: Performance Advantages of HTAP Architecture. Graph 4: TPS comparison of Ultipa and Neo4j. Grid Architecture In the second school, there are also quite a few naming variations for such types of distributed and scalable graph system designs (some are misleading). To name a few: Proxy, Name server, MapReduce, Grid, or Federation. Ignore the naming differences; the key difference between the secondary school and the first school lies with the name server(s) functioning as a proxy between the client side and server side. When functioning as a proxy server, the name server is only for routing queries and forwarding data. On top of this, except for the running graph algorithm, the name server has the capacity to aggregate data from the underpinning instances. Furthermore, in federation mode, queries can be run against multiple underpinning instances (query-federation); for graph algorithms, however, the federation's performance is poor (due to data migration, just like how map-reduce works). Note that the second school is different from the third school in one area: data is functionally partitioned but not sharded in this school of design. For graph datasets, functional partitioning is the logical division of graph data, such as per time series (horizontal partitioning) or per business logic (vertical partitioning). Sharding, on the other hand, aims to be automated, business logic or time series ignorant. Sharding normally considers the location of network storage-based partitioning of data; it uses various redundant data and special data distribution to improve performance, such as making cuttings against nodes and edges on the one hand and replicating some of the cut data for better access performance on the other hand. In fact, sharding is very complex and difficult to understand. Automated sharding, by definition, is designed to treat unpredictable data distribution with minimal-to-zero human intervention and business-logic ignorant, but this ignorance can be very problematic when facing business challenges entangled with specific data distribution. Let's use concrete examples to illustrate this. Assuming you have 12 months' worth of credit card transaction data. In artificial partition mode, you naturally divide the network of data into 12 graph sets, one graph set with one-month transactions on each cluster of three instances, and this logic is predefined by the database admin. It emphasizes dividing the data via the metadata of the database and ignoring the connectivity between the different graph sets. It's business-friendly, it won't slow down data migration, and has good query performance. On the other hand, in auto-sharding mode, it's up to the graph system to determine how to divide (cut) the dataset, and the sharding logic is transparent to the database admin. But it's hard for developers to immediately figure out where the data is stored, therefore leading to potential slow data migration problems. It would be imprudent to claim that auto-sharding is more intelligent than functional partitioning simply because auto-sharding involves less human intervention. Do you feel something is wrong here? It's exactly what we are experiencing with the ongoing rising of artificial intelligence, we are allowing machines to make decisions on our behalf, and it's not always intelligent! (In a separate essay, we will cover the topic of the global transition from artificial intelligence to augmented intelligence and why graph technology is strategically positioned to empower this transition.) In Graph-5, a grid architecture pertaining to the second school of design is illustrated; the two extra components added on top of Graph-2's HTAP architecture are name server(s) and meta server(s). Essentially all queries are proxied through the name-server, and the name-sever works jointly with the meta-server to ensure the elasticity of the grid; the server cluster instances are largely the same as the original HTAP instance (as illustrated in Graph 2). Graph 5: Grid Architecture w/ Name Server and Meta Server. Referring to Table 1, the pros and cons of the grid architecture design can be summarized as follows: All the pros/benefits of a typical HTAP architecture are retained. Scalability is achieved with performance intact (compared to HTAP architecture). Restricted scalability — server clusters are partitioned with DBA/admin intervention. Introduction of name-server/meta-server, making cluster management sophisticated. The name-server is critical and complex in ensuring business logic is performed distributively on the server clusters and with simple merge and aggregation functionalities on it before returning to the clients. Business logic may be required to cooperate with partitioning and querying. Shard Architecture Now, we can usher in the third school of distributed graph system design with unlimited scalability — the shard (see Table 1). On the surface, the horizontal scalability of a sharding system also leverages name server and meta server as in the second school of design, but the main differences lie with the: Shard servers are genuinely shared. Name servers do NOT have knowledge about business logic (as in the second school) directly. Indirectly, it can roughly judge the category of business logic via automatic statistics collection. This decoupling is important, and it couldn't be achieved elegantly in the second school. The sharded architecture has some variations; some vendor calls it fabrics (it's actually more like grid architecture in the secondary school), and others call it map-reduce, but we should deep dive into the core data processing logic to unravel the mystery. There are only two types of data processing logic in shard architecture: Type 1: Data is processed mainly on name servers (or proxy servers) Type 2: Data is processed on sharded or partitioned servers as well as name servers. Type 1 is typical, as you see in most map-reduce systems such as Hadoop; data are scattered across the highly distributed instances. However, they need to be lifted and shifted over to the name servers before they are processed there. Type 2 is different in that the shard servers have the capacity to locally process the data (this is called: compute near or collocated with storage or data-centric computing) before they are aggregated and secondarily processed on the name servers. As you would imagine, type 1 is easier to implement as it's a mature design scheme by many big-data frameworks; however, type 2 offers better performance with more sophisticated cluster design and query optimization. Shard servers in type-2 offer computing power, while type-1 has no such capability. The graph below shows a type-2 shard design: Graph 6: Shard Architecture w/ Name Server and Meta Server. Sharding is nothing new from a traditional SQL or NoSQL big-data framework design perspective. However, sharding on graph data can be Pandora's box, and here is why: Multiple shards will increase I/O performance, particularly data ingestion speed. But multiple shards will significantly increase the turnaround time of any graph query that spans across multiple shards, such as path queries, k-hop queries, and most graph algorithms (the latency increase can be exponential!). Graph query planning and optimization can be extremely sophisticated, most vendors today have done very shallowly on this front, and there are tons of opportunities in deepening query optimization on-the-fly: Cascades (Heuristic vs. Cost) Partition-pruning (shard-pruning, actually) Index-choosing Statistics (Smart Estimation) Pushdown (making computing as close to the storage as possible) and more. In Graph-7, we captured some preliminary findings on the Ultipa HTAP cluster and Ultipa Shard cluster; as you can see, data ingestion speed improves by four times (super-linear), but everything else tends to be slower by five times or more (PageRank slower by 10x, LPA by 16X, etc.) Graph 7: Preliminary findings on the performance difference between HTAP and Shard Architecture. Stay Tuned There are tons of opportunities to continuously improve the performance of the sharding architecture. The team at Ultipa has realized that having a truly advanced cluster management mechanism and deeper query optimization on a horizontally scalable system are the keys to achieving endless scalability and satisfactory performance. Lastly, the third schools of distributed graph system architectures illustrate the diversity and complexity involved when designing a sophisticated and competent graph system. Its course, it’s hard to say one architecture is absolutely superior to another, given cost, subjective preference, design philosophy, business logic, complexity-tolerance, serviceability, and many other factors — it would be prudent to conclude that the direction of architecture evolution for the long term clearly is to go from the first school to the second school and eventually to the third school. However, most customer scenarios can be satisfied with the first two schools, and human intelligence (DBA intervention) still makes pivotal sense in helping to achieve an equilibrium of performance and scalability, particularly in the second and third schools of designs. Long live the formula: Graph Augmented Intelligence = Human-intelligence + Machine’s-graph-computing-power

By Ricky Sun
Redis: What It Is, What It Does, and Why You Should Care
Redis: What It Is, What It Does, and Why You Should Care

Redis is an open-source in-memory data store that can be used as a database, cache, or message broker. It's often used for caching web pages and reducing the load on servers. Redis also has some features that make it attractive for use as a database, such as support for transactions and publish/subscribe messaging. However, it doesn't have all the features of a traditional database like MySQL or MongoDB. In this blog post, we'll take a look at what Redis is, what it does, and why you might want to consider using it in your next project. We'll also take a look at some of the benefits and drawbacks of using Redis for caching. Finally, we will introduce you briefly to Hive, a Codeless implementation of Redis in the Backendless platform. What Is Redis? Redis is an open-source in-memory data store that works really well as a cache or message broker, but it can also be used as a database when you don't need all the features of a traditional database. It offers excellent performance, with the ability to quickly read and write data to memory. Additionally, Redis supports atomic operations, making it ideal for caching scenarios where you need fast access time. In-Memory Database An in-memory database is a type of database that stores data entirely in the main memory (RAM) rather than on disk. In-memory databases are designed to provide fast access to data by leveraging the high speed of main memory, which is several orders of magnitude faster than disk storage. In-memory databases are commonly used in applications that require fast access to large amounts of data, such as real-time analytics, online gaming, e-commerce, and social media. They are also used in applications that require high performance and scalability, as in-memory databases can handle high volumes of data and transactions without sacrificing performance. One of the main drawbacks of in-memory databases is that they are more sensitive to data loss in the event of a crash or shutdown, as the data is stored entirely in memory and is not persisted to disk. To address this issue, many in-memory databases, including Redis, provide features such as persistence and replication, which allow data to be saved to disk and replicated across multiple servers to ensure data durability and availability. Redis Persistence Redis persistence is a feature of the Redis database that allows data to be saved to disk and restored in the event of a crash or shutdown. By default, Redis stores data in memory, which means that it is lost when the Redis server is shut down or restarted. Redis persistence enables data to be saved to disk and restored when the Redis server starts up again, ensuring that data is not lost in the event of a crash or shutdown. Redis persistence can be configured in several ways, depending on the needs of the application. The simplest form of persistence is snapshotting, which involves periodically saving the entire Redis dataset to disk. This approach is fast and efficient but can result in data loss if the Redis server crashes between snapshots. Another form of persistence is append-only file (AOF) persistence, which involves saving each write operation to a log file on a disk. This approach provides better durability than snapshotting, allowing the Redis server to recreate the dataset by replaying the log file in the event of a crash. However, it can be slower and more resource-intensive than snapshotting. Overall, Redis persistence is a valuable feature that allows data to be saved to disk and restored in the event of a crash or shutdown, ensuring data durability and availability. What Is Redis Used for? Redis is often used for caching web pages, reducing the load on servers, and improving page loading times. It can also be used as a message broker to facilitate communication between different parts of an application. Additionally, Redis supports transactions, making it possible to execute multiple operations atomically. Let's look at some specific Redis use cases: Real-time analytics: Applications can use Redis to store and process large amounts of data in real time, allowing organizations to quickly analyze and visualize data to make business decisions. Online gaming: Gaming software can use Redis to store and manage game states, such as player profiles, game scores, and leaderboards, allowing fast and seamless gameplay. E-commerce: E-commerce apps can use Redis to store and manage data related to online shopping, such as product catalogs, user profiles, and shopping cart contents, which enables fast and efficient shopping experiences for users. Social media: Social apps can use Redis to store and manage data related to social media interactions, such as user profiles, friend lists, and news feeds, which allows for fast and smooth user experiences. What Are Key-Value Pairs? In Redis, a key-value pair is a data structure that consists of a unique key, which is used to identify the data, and a value, which is the data itself. Key-value pairs are the most basic data structure in Redis, and they are used to store and manage data in the database. Redis supports a wide range of data types for keys and values, including strings, hashes, lists, sets, and sorted sets. This allows developers to store and manipulate various data types in Redis, such as text, numbers, arrays, and complex data structures. Redis provides a rich set of commands for working with key-value pairs, such as SET, GET, and DEL for strings, HSET, HGET, and HDEL for hashes, and LPUSH, LGET, and LREM for lists. These commands enable developers to store, retrieve, and manipulate data in Redis efficiently and easily. Rich Data Structures Data structures in Redis are collections of data that are organized and managed in a specific way to support efficient operations. For example, the string data type in Redis is a sequence of bytes that can be used to store and manipulate text or binary data. On the other hand, the hash data type is a mapping of field-value pairs that can be used to store and manipulate complex data structures. Each data structure in Redis has its own unique set of operations that can be performed on it, such as GET, SET, and DELETE for strings, HGET, HSET, and HDEL for hashes, and LPUSH, LPOP, and LRANGE for lists. These operations enable developers to efficiently store, retrieve, and manipulate data in Redis. Overall, data structures in Redis are an important aspect of the framework, as they provide the underlying foundation for efficient data management and manipulation. Chat and Messaging Applications To support chat and messaging applications, Redis can be used to store and manage data related to conversations, users, and messages. For example, Redis can be used to store information about individual conversations, such as the participants and the latest messages. It can also be used to store information about individual users, such as their profile details and their list of contacts. Finally, Redis can be used to store the actual messages themselves, along with metadata such as the sender, recipient, and timestamp. In addition to storing data, Redis can also be used to manage messaging operations, such as delivering messages to recipients, broadcasting messages to multiple recipients, and storing messages for offline users. These capabilities make Redis a powerful tool for building chat and messaging applications that are fast, scalable, and reliable. Session Store A session store is a mechanism for storing user session data in a web application. For example, in a Redis session store, session data is stored in a Redis database, which is a fast, in-memory data structure store that can be used as a cache, database, and message broker. In a Redis session store, session data is stored in a Redis database as key-value pairs, where the key is a unique identifier for the session, and the value is the session data itself, which may include information such as the user's login status, preferences, and shopping cart contents. The benefits of using a Redis session store include improved performance and scalability, as Redis can store and retrieve session data quickly and efficiently, even when dealing with large amounts of data. Additionally, Redis allows session data to be shared across multiple servers, which can be useful in a load-balanced environment. What Are the Benefits of Redis? One of the main advantages of using Redis for caching is its fast read and write speeds. Redis can handle millions of operations per second, which allows it to serve webpages faster than traditional databases. It also offers excellent support for transactions, allowing applications to perform multiple operations atomically. Additionally, Redis supports the use of pub/sub channels for fast data sharing between applications. Redis is also highly scalable and can be deployed across multiple machines for high availability. This makes it ideal for distributed systems that need to quickly process large amounts of data. For example, Redis can be used to store session information in a distributed system and provide quick access to that data across multiple servers. This makes Redis an incredibly powerful gaming application because it can quickly and efficiently share data across multiple nodes in near real-time. In addition to excellent performance, another advantage of Redis is that it offers a number of features that are not available in traditional databases. These include pub/sub, which allows you to publish messages and subscribe to them, as well as transactions and Lua scripting. These features can be used to build powerful applications that are not possible with traditional databases. What Is Lua Scripting? Lua scripting is a technique for writing and executing scripts in the Lua programming language within a host application. Lua is a lightweight, versatile, and embeddable scripting language that is widely used for writing scripts that can be run within other applications. In the context of Redis, Lua scripting allows developers to write and execute scripts that manipulate data stored in a Redis database. Redis provides a built-in scripting engine that supports Lua, which allows developers to write scripts that can be executed within the Redis server. One of the main advantages of Lua scripting in Redis is that it allows developers to write complex operations that can be executed atomically and in a single step. This means that the scripts can manipulate data in Redis without interference from other operations, ensuring data consistency and integrity. Overall, Lua scripting is a powerful and flexible tool that can be used within Redis to write and execute complex operations on data stored in the database. What Are the Drawbacks of Using Redis? Like any technology, Redis has some drawbacks that should be considered when deciding whether to use it in a particular application. One of the main drawbacks of Redis is that it stores data entirely in memory, which means that it can be sensitive to data loss in the event of a crash or shutdown. To address this issue, Redis provides features such as persistence and replication, which allow data to be saved to disk and replicated across multiple servers. However, these features can add complexity and overhead, which may not be suitable for all applications. Another drawback of Redis is that it is a single-threaded system, meaning it can only process one command at a time. This can limit Redis's performance and scalability in applications requiring high concurrency and parallelism. To address this issue, Redis provides clustering and sharding features that allow data to be distributed across multiple servers, but these features can be complex to set up and manage. Closing Overall, Redis is an excellent tool for caching web pages and reducing server load, but it also has some features that can be used to create powerful distributed applications. It's fast, scalable, and supports advanced features like Pub/Sub and Lua scripting. However, it does have some drawbacks, such as the need for additional memory and the lack of ACID compliance or support for joins. Take all this into consideration before using Redis in your project. Frequently Asked Questions How does Redis compare to other NoSQL database systems? Redis is a type of NoSQL database, which stands for "not only SQL" and refers to a class of databases that do not use the traditional SQL relational model. Compared to other NoSQL databases, Redis has several unique characteristics that make it well-suited for certain applications. For example, one of the main advantages of Redis is its in-memory storage, which allows it to provide fast access to data and high performance. This makes Redis well-suited for applications that require fast access to large amounts of data, such as real-time analytics, online gaming, and e-commerce. Another advantage of Redis is its support for a wide range of data structures, including strings, hashes, lists, sets, and sorted sets. This allows Redis to store and manipulate various data types, making it a versatile and flexible tool for data management.

By Chris Fanchi
Arrays and Hashing
Arrays and Hashing

In this article, we will discuss some of the most popular algorithm problems using arrays and hashing approaches. Some of these problems I received during interviews. Let's start with a problem: Contains Duplicate Description: Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct. Solution: What if we add an additional data structure like a HashSet and put elements inside? If we have the same elements in Set before insert, we will return true, and that is it. So simple, isn't it? Java public boolean containsDuplicate(int[] nums) { Set<Integer> set = new HashSet<>(); for(int n : nums){ if(set.contains(n)){ return true; } else { set.add(n); } } return false; } Moving on to our next task : Valid Anagram Description: Given two strings s and t, return true if t is an anagram of s, and false otherwise. An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. Example 1: Input: s = "anagram", t = "nagaram" Output: true Example 2: Input: s = "rat", t = "car" Output: false Solution: First of all, we should understand what an anagram is. Two words will be anagrams only if they have the same characters. That means that we should compare characters. Characters can be in a different order. We can use a few approaches how to handle it. In the first variant, we can sort characters in each word and then compare them. Or we can create a HashMap and, for one word, add characters, and for another, substruct them. Below is the variant with the sorting algorithm. Java public boolean isAnagram(String s, String t) { if(s == null && t == null){ return true; } else if(s == null || t == null){ return false; } if(s.length() != t.length()){ return false; } char[] sCh = s.toCharArray(); char[] tCh = t.toCharArray(); Arrays.sort(sCh); Arrays.sort(tCh); for(int i = 0; i < s.length(); i ++){ if(sCh[i] != tCh[i]){ return false; } } return true; } Is it clear? Please, let me know in the comments. Our next problem: Two Sum Description: Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice. You can return the answer in any order. Example 1: Input: nums = [2,7,11,15], target = 9 Output: [0,1] Explanation: Because nums[0] + nums[1] == 9, we return [0, 1]. Example 2: Input: nums = [3,2,4], target = 6 Output: [1,2] Example 3: Input: nums = [3,3], target = 6 Output: [0,1] Solution: This is one of the basic Hash problems. Let's find a brut force solution. We can prepare two for each loop, and iterate over elements and compare their sums. It works, but the time complexity will be O(N^2), and it could be very, very slow. But what if, instead of the second loop, we save all previous elements into HashMap? Will it be checked with current elements? For example, we have array [3,3] and target = 6. In the first iteration, we will put into map 3 as the key and 0(index) as the value. And then, on the next iteration, we check the map with target - cur In our case, it will be 6 - 3 = 3. We have to pair it in our map with element 3 and map it to get the response. Let's take a look at the code: Java public int[] twoSum(int[] nums, int target) { int[] rez = new int[2]; Map<Integer, Integer> map = new HashMap<>(); for (int i = 0; i < nums.length; i++){ int rest = target - nums[i]; if(map.containsKey(rest)){ rez[0] = map.get(rest); rez[1] = i; return rez; } else { map.put(nums[i], i); } } return rez; } For some of you, these problems may look easy, but not for me. I spent a lot of time trying to find a correct solution. Now we will look at the hardest problem in this article: Group Anagrams Description: Given an array of strings strs, group the anagrams together. You can return the answer in any order. An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. Example 1: Input: strs = ["eat","tea","tan","ate","nat","bat"] Output: [["bat"],["nat","tan"],["ate","eat","tea"]] Example 2: Input: strs = [""] Output: [[""]] Example 3: Input: strs = ["a"] Output: [["a"]] Solution: Do you remember the previous problem with Anagrams? I want to use the same approach. We remember that anagrams are words with the same characters, and the same characters count. What if we sort characters in the word and create a string from it? For example, we have [nat, tna]. We sort "nat" and receive "ant." We sort "tan" and again receive "ant." We can sort and put words into a map. And the key will be a sorted string, and the value will be the original word. Smart, isn't it? Time to look at the code: Java public List<List<String>> groupAnagrams(String[] strs) { Map<String, List<String>> map = new HashMap<>(); for (String s : strs) { char[] chars = s.toCharArray(); Arrays.sort(chars); String sorted = String.valueOf(chars); if (map.containsKey(sorted)) { map.get(sorted).add(s); } else { List<String> list = new ArrayList<>(); list.add(s); map.put(sorted, list); } } return new ArrayList<>(map.values()); } I hope you are enjoying this topic. Next time, I'm going to solve more complicated topics. Feel free to add your thoughts in the comments. I really appreciate your time and want to hear your feedback.

By Sergei Golitsyn
What Is Virtualization?
What Is Virtualization?

Today, "virtualization" is a very common term in the software deployment and IT worlds. Most companies are leveraging this technology not only to deploy their applications, but the virtualized images are also being used by the IT department to provide a new system to a new employee in the organization. Virtualization had made IT infrastructure provisioning very fast, quickly reproducible, and reliable. It has also made debugging, troubleshooting, and the availability of operational infrastructure much better. The use of virtualization had brought many other practices into the IT industry, like containerization. Many evolutions happened around virtualization, which has made IT operations more simple and agile today. Companies are able to save a lot on the procurement of hardware infrastructure. Virtualization is also helping to lower carbon emissions, thus helping the cause of sustainability. Virtualization Virtualization, in simple words, means creating software simulations of computing resources, network systems, and storage systems. It was used on mainframe computers in the 1960s. But it has broadened itself over the years. Now virtualization involves creating an abstract software layer for any physical hardware system and using it, sharing it with all users involved. The most popular use of virtualization is to create a Virtual Machine, or VM in short, which is a fully functional system within a host operating system. We can create multiple VMs in a host and allocate each VM a portion of the host’s computing resources, like CPUs, cores, RAM, and storage. The computing capacity of all the VMs put together can never be more than that of the host. Every VM has its own operating system. The operating system of the VM, or what is called the guest Operating System, may be different from that of the host operating system. All the VMs on one host are dependent on the host for computing resources, which they borrow from the host machine. Once a VM is created, it is stored as a file. Multiple copies of the same VM can be created quickly by simply cloning the first VM. We can copy a VM from one host to another like any other file. It is as simple as that. We can save the state of a VM and restart it from the previous state. Virtualization doesn’t simply stop here with VM. With increasing demands and requirements, virtualization technology has grown over the years, and now we can virtualize many things. We can virtualize Data centers, Networks, and storage systems. We will know about different types of virtualization that could be possible today. Application Virtualization If you want to let your users use your application without even installing it, you can use Application Virtualization. It also helps run your applications in environments which is not suitable to run your application. Bluestacks is an application that lets you run Android games on Windows. It helps reduce system integration and maintenance costs. Datacenter Virtualization Datacenter virtualization is the most complex and on-demand virtualization concept. Leveraging data center virtualization and cloud computing technology, organizations design, deploy, and develop data centers quickly. Datacenter virtualization involves virtualizing everything a physical data center offers. It virtualizes servers, hosts, networking, storage, and other infrastructure and equipment. It uses a broad range of tools and technologies to host multiple virtualized data centers in a standard data center. The vSphere suite of products is a very good example of technologies that provide data center virtualization, administration, and operation. Data Virtualization Data virtualization is used to consolidate all data stores available in the data center, create a layer of abstraction on top of it and show it as a single source. The data stores may be spread across geographies and many clusters. The virtualization layer is agnostic about the underlying type of data store. Data is accessed from its original location. It provides real-time data access with speed. It reduces system workloads and data errors. Desktop Virtualization Desktop virtualization is different from OS virtualization, where we create multiple VMs on a single host. Desktop virtualization allows an administrator to deploy multiple desktop simulations at many physical machines. It helps with mass deployments across many physical machines and ensures the same configurations and security settings are applied on all the systems. Hardware Virtualization Hardware virtualization is used to abstract computing resources from the software. Hard Virtualization extensively uses Virtual Machine Monitor called Hypervisor to accomplish its tasks. We will learn about Hypervisor later in this article. The hypervisor is directly embedded in the hardware system. The hypervisor then shares the hardware system with the software system. Hardware Virtualization is achieved in three ways. They are Paravirtualization, Full Virtualization, and Emulation Virtualization. It helps reduce hardware costs, optimize resource usage, and increase IT flexibility. Network Virtualization Computer networks involve both software and hardware components. Network virtualization creates an abstract layer on top of both network software and network hardware. That layer makes it easy for data center administrators to manage virtualized networking infrastructure very easy. This layer combines a lot of networking resources into one virtual entity. The different network entities that are virtualized include network adapters called Network Interface Cards, Switches, Firewalls, Load Balancers, Virtual LANs, and Fibre channels. Network Virtualizations are done in two different ways. First, Software Defined Networking which virtualizes network traffic routing controls. Second, network function virtualization, which takes care of virtualized network configurations and management. Network virtualization makes networking a very easy experience. The main objective is to make network functions automated and very well-scaled. Storage Virtualization Storage Virtualization is a technique where all physical storage resources available in a data center are merged into a single virtual storage resource pool. The idea is to have a single logical storage pool for one network. This abstracted, virtualized storage is agnostic about the underlying hardware and software systems used for storage. Storage Virtualization can virtualize block access storage systems delivered over Fibre Channel, iSCI, and SAN. It can also virtualize file storage systems delivered over NFS and SMB protocols. The benefits of Storage Virtualization are immense. It helps to migrate data easier for hosts and servers without interrupting the I/O. The provisioning and utilization of storage becomes better. Data management becomes a single-point function. Hypervisor A hypervisor is a kind of software that is centered around virtualization. It is used to create, manage, and run virtual machines. It is a layer of abstraction between virtual machines and the underlying hardware. It has the capability to allocate necessary compute resources to the VMs from the parent host’s compute resource pool. Apart from that, it keeps all the VMs running in a system in isolation from each other. It also stops the VMs from interfering with other spaces. Hypervisors are of two types, depending on how they are installed. 1. Bare Metal Hypervisors: These hypervisors are also called Type 1 hypervisors. They are embedded directly into the host’s hardware. They are used by most data centers. They are very efficient. They directly run VMs on top of the host’s hardware. It acts as an operating system on the host, completely replacing the operating system. These are more efficient than the Type 2 hypervisors. VMware ESXi is an example of a Bare Metal Hypervisor. 2. Hosted Hypervisors: These are also called Type 2 hypervisors. These hypervisors act as a normal application installed on the desktop. It can be started and stopped like any other program. Oracle VirtualBox is a very popular example. They have a little higher latency than Type 1 Hypervisor. They are used for testing mostly as they are less complicated to install and start work with. Virtual Machine A virtual machine, or VM for short, is a software emulation of a computer system that runs on top of a host machine. A VM has its own operating system, CPU, RAM, and storage. These computing powers are allocated to the VM by hypervisor borrowing from the host machine. There can be more than one VM running on the host machine at the same time. They run in an isolated fashion without interfering with each other. The virtual machine is saved as a virtual machine image in file format. It can be copied or moved to another machine easily. New copies of the VM could be created quickly by cloning the existing VM with the help of the hypervisor. Virtual machines are examples of operating system virtualization. Containers Containers are isolated, immutable, and self-contained sandboxes to run applications. They are very light in comparison to virtual machines. They don’t have the overhead of an operating system. Containers directly share resources with their host machine. They run on top of a container engine like Docker, unlike VMs, which run on top of a hypervisor. Containers are called running instances of an image. An image is a packaged unit of an application, its runtime, dependencies, and required libraries. When we start a container, we pull one such image and start running it. The container engine is responsible for allocating the required storage and networking functionality to the container. Containers are easier to start, run, and stop than VMs. Conclusion The article is an attempt to give a brief and high-level idea about virtualization, types of virtualization, hypervisors, VMs, and containers. These technologies are the subject of interest for Cloud and DevOps engineers. They can choose to read more about these topics and get their hands dirty. Thanks for reading.

By Aditya Bhuyan
DynamoDB Go SDK: How To Use the Scan and Batch Operations Efficiently
DynamoDB Go SDK: How To Use the Scan and Batch Operations Efficiently

The DynamoDB Scan API accesses every item in a table (or secondary index). It is the equivalent of a select * from query. One of the things I will cover in this blog is how to use Scan API with the DynamoDB Go SDK. To scan a table, we need some data to begin with! So in the process, I will also go into how to use the Batch API to write bulk data in DynamoDB. You can use the BatchWriteItem API to create or delete items in batches (of twenty-five) and it's possible to you can combine these operations across multiple tables. We will start simple and gradually improve our approach to using the APIs efficiently. I will also go over some of the basic tests that I ran to demonstrate incremental improvements. Finally, I will wrap up by highlighting some of the considerations while using these operations. You can refer to the code on GitHub. Before You Proceed Make sure to create a DynamoDB table called users with: Partition key email (data type String) On-Demand capacity mode Also, there are a few things I want to call a few things to set the context: The table was created in us-east-1 and tests were executed from an EC2 instance in us-east-1 as well Since these are general tests instead of specialized benchmarks, I did not do any special tuning (at any level). These are just Go functions that were executed with different inputs, keeping things as simple as possible. The tests include marshaling (converting Go struct to DynamoDB data types) for BatchWriteItem operations and un-marshaling (converting from DynamoDB data types back to Go struct) for Scan operation. Let's start off by exploring the BatchWriteItem API. This way we will have data to work with the Scan operations as well. Win-win! Importing Data in Batches Since you can combine 25 items in a single invocation, using a batch approach for bulk data imports is much better compared to invoking the PutItem in a loop (or even in parallel). Here is a basic example of how you would use BatchWriteItem: func basicBatchImport() { startTime := time.Now() cities := []string{"NJ", "NY", "ohio"} batch := make(map[string][]types.WriteRequest) var requests []types.WriteRequest for i := 1; i <= 25; i++ { user := User{Email: uuid.NewString() + "@foo.com", Age: rand.Intn(49) + 1, City: cities[rand.Intn(len(cities))]} item, _ := attributevalue.MarshalMap(user) requests = append(requests, types.WriteRequest{PutRequest: &types.PutRequest{Item: item}) } batch[table] = requests op, err := client.BatchWriteItem(context.Background(), &dynamodb.BatchWriteItemInput{ RequestItems: batch, }) if err != nil { log.Fatal("batch write error", err) } else { log.Println("batch insert done") } if len(op.UnprocessedItems) != 0 { log.Println("there were", len(op.UnprocessedItems), "unprocessed records") } log.Println("inserted", (25 - len(op.UnprocessedItems)), "records in", time.Since(startTime).Seconds(), "seconds") } With BatchWriteItemInput, we can define the operations we want to perform in the batch - here we are just going to perform PutRequests (which is encapsulated within another type called WriteRequest). We assemble the WriteRequests in a slice and finally, put them in a map with the key being the table name: this is exactly what the RequestItems attribute in BatchWriteItemInput needs. In this case, we are dealing with a single table but you could execute operations on multiple tables. In this example, we just dealt with one batch of 25 records (maximum permitted batch size). If we want to import more records, all we need to do is split them into batches of 25 and execute them one (sub)batch at a time. Simple enough - here is an example: func basicBatchImport2(total int) { startTime := time.Now() cities := []string{"NJ", "NY", "ohio"} batchSize := 25 processed := total for num := 1; num <= total; num = num + batchSize { batch := make(map[string][]types.WriteRequest) var requests []types.WriteRequest start := num end := num + 24 for i := start; i <= end; i++ { user := User{Email: uuid.NewString() + "@foo.com", Age: rand.Intn(49) + 1, City: cities[rand.Intn(len(cities))]} item, _ := attributevalue.MarshalMap(user) requests = append(requests, types.WriteRequest{PutRequest: &types.PutRequest{Item: item}) } batch[table] = requests op, err := client.BatchWriteItem(context.Background(), &dynamodb.BatchWriteItemInput{ RequestItems: batch, }) if err != nil { log.Fatal("batch write error", err) } if len(op.UnprocessedItems) != 0 { processed = processed - len(op.UnprocessedItems) } } log.Println("all batches finished. inserted", processed, "records in", time.Since(startTime).Seconds(), "seconds") if processed != total { log.Println("there were", (total - processed), "unprocessed records") } } I tried this with 50000 records (which means 2000 batches) and it took approximately 15 seconds. But we can do much better! Parallel Batch Import Instead of processing each batch sequentially, we can spin up a goroutine for each batch: func parallelBatchImport(numRecords int) { startTime := time.Now() cities := []string{"NJ", "NY", "ohio"} batchSize := 25 var wg sync.WaitGroup processed := numRecords for num := 1; num <= numRecords; num = num + batchSize { start := num end := num + 24 wg.Add(1) go func(s, e int) { defer wg.Done() batch := make(map[string][]types.WriteRequest) var requests []types.WriteRequest for i := s; i <= e; i++ { user := User{Email: uuid.NewString() + "@foo.com", Age: rand.Intn(49) + 1, City: cities[rand.Intn(len(cities))]} item, err := attributevalue.MarshalMap(user) if err != nil { log.Fatal("marshal map failed", err) } requests = append(requests, types.WriteRequest{PutRequest: &types.PutRequest{Item: item}) } batch[table] = requests op, err := client.BatchWriteItem(context.Background(), &dynamodb.BatchWriteItemInput{ RequestItems: batch, }) if err != nil { log.Fatal("batch write error", err) } if len(op.UnprocessedItems) != 0 { processed = processed - len(op.UnprocessedItems) } }(start, end) } log.Println("waiting for all batches to finish....") wg.Wait() log.Println("all batches finished. inserted", processed, "records in", time.Since(startTime).Seconds(), "seconds") if processed != numRecords { log.Println("there were", (numRecords - processed), "unprocessed records") } } The results improved by a good margin. Here is what I got. On average: Inserting 50000 records took ~ 2.5 seconds. Inserted 100000 records in ~ 4.5 to 5 seconds Inserted 150000 records in less than 9.5 seconds Inserted 200000 records in less than 11.5 seconds There may be unprocessed records in a batch. This example detects these records, but the retry logic has been skipped to keep things simple. Ideally, you should have an (exponential back-off-based) retry mechanism for handling unprocessed records as well. To insert more data, I ran the parallelBatchImport function (above) in loops. For example: for i := 1; i <= 100; i++ { parallelBatchImport(50000) } Alright, let's move ahead. Now that we have some data, let's try... The Scan API This is what basic usage looks like: func scan() { startTime := time.Now() op, err := client.Scan(context.Background(), &dynamodb.ScanInput{ TableName: aws.String(table), ReturnConsumedCapacity: types.ReturnConsumedCapacityTotal, }) if err != nil { log.Fatal("scan failed", err) } for _, i := range op.Items { var u User err := attributevalue.UnmarshalMap(i, &u) if err != nil { log.Fatal("unmarshal failed", err) } } if op.LastEvaluatedKey != nil { log.Println("all items have not been scanned") } log.Println("scanned", op.ScannedCount, "items in", time.Since(startTime).Seconds(), "seconds") log.Println("consumed capacity", *op.ConsumedCapacity.CapacityUnits) } Just provide the table (or secondary index) name and you are good to go! However, there are chances that you might not be able to get all items because of API limits (1 MB worth of data per invocation). In my case, it took about 0.5 secs for approximately 15000 records. The rest of the items were skipped because the 1 MB limit was breached. Using Pagination To handle the limitation around data, the Scan API returns LastEvaluatedKey in its output to point to the last processed record. All you need to do is invoke Scan again, with the value for ExclusiveStartKey attribute set to the one for LastEvaluatedKey. Using a paginated scan approach took me approximately 100 secs to scan ~ 7.5 million records. Parallel Scan Pagination helps, but it's still a sequential process. There is a lot of scope for improvement. Thankfully, Scan allows you to adopt a parallelized approach; i.e., you can use multiple workers (goroutines in this case) to process data in parallel! func parallelScan(pageSize, totalWorkers int) { log.Println("parallel scan with page size", pageSize, "and", totalWorkers, "goroutines") startTime := time.Now() var total int var wg sync.WaitGroup wg.Add(totalWorkers) for i := 0; i < totalWorkers; i++ { // start a goroutine for each segment go func(segId int) { var segTotal int defer wg.Done() lastEvaluatedKey := make(map[string]types.AttributeValue) scip := &dynamodb.ScanInput{ TableName: aws.String(table), Limit: aws.Int32(int32(pageSize)), Segment: aws.Int32(int32(segId)), TotalSegments: aws.Int32(int32(totalWorkers)), } for { if len(lastEvaluatedKey) != 0 { scip.ExclusiveStartKey = lastEvaluatedKey } op, err := client.Scan(context.Background(), scip) if err != nil { log.Fatal("scan failed", err) } segTotal = segTotal + int(op.Count) for _, i := range op.Items { var u User err := attributevalue.UnmarshalMap(i, &u) if err != nil { log.Fatal("unmarshal failed", err) } } if len(op.LastEvaluatedKey) == 0 { log.Println("[ segment", segId, "] finished") total = total + segTotal log.Println("total records processsed by segment", segId, "=", segTotal) return } lastEvaluatedKey = op.LastEvaluatedKey } }(i) } log.Println("waiting...") wg.Wait() log.Println("done...") log.Println("scanned", total, "items in", time.Since(startTime).Seconds(), "seconds") } Segment and TotalSegments attributes are the key to how Scan API enables parallelism. TotalSegments is nothing but the number of threads/goroutines/worker-processes that need to be spawned and Segment is a unique identifier for each of them. In my tests, the Scan performance remained (almost) constant at 37-40 seconds (average) for about ~ 7.5 million records (I tried a variety of page sizes and goroutine combinations). How Many TotalSegments Do I Need to Configure? To tune the appropriate number of parallel threads/workers, you might need to experiment a bit. A lot might depend on your client environment. Do you have enough compute resources? Some environments/runtimes might have managed thread pools, so you will have to comply with those. So, you will need to try things out to find the optimum parallelism. One way to think about it could be to choose one segment (single worker/thread/goroutine) per unit of data (say a segment for every GB of data you want to scan). Wrap-Up: API Considerations Both Batch and Scan APIs are quite powerful, but there are nuances you should be aware of. My advice is to read the API documentation thoroughly. With Batch APIs: There are certain limits: No more than 25 requests in a batch Individual items in a batch should not exceed 400KB. The total size of items in a single BatchWriteItem cannot be more than 16MB. BatchWriteItemcannot update items. You cannot specify conditions on individual put and delete requests. It does not return deleted items in the response. If there are failed operations, you can access them via the UnprocessedItems response parameter. Use Scan Wisely Since a Scan operation goes over the entire table (or secondary index), it's highly likely that it consumes a large chunk of the provisioned throughput, especially if it's a large table. That being said, Scan should be your last resort. Check whether Query API (or BatchGetItem) works for your use case. The same applies to parallel Scan. There are a few ways in which you can further narrow down the results by using a Filter Expression, a Limit parameter (as demonstrated earlier) or a ProjectionExpression to return only a subset of attributes. That's all for this blog. I hope you found it useful. Until next time, Happy coding!

By Abhishek Gupta CORE
A Table Tennis Success Story Built With Apache Kafka
A Table Tennis Success Story Built With Apache Kafka

The sports world is changing. Digitalization is everywhere. Cameras and sensors analyze matches. Stadiums get connected and incorporate mobile apps and location-based services. Players use social networks to influence and market themselves and consumer products. Real-time data processing is crucial for most innovative sports use cases. This blog post explores how data streaming with Apache Kafka helps reimagine the sports industry, showing a concrete example from the worldwide table tennis organization. Innovation in Sports and Gaming With Real-time Analytics Reimagining a data architecture to provide real-time data flow for sporting leagues and events is an enormous challenge. However, digitalization enables a ton of innovative use cases to improve user experiences and engage better with players, fans, and business partners. Think about wonderful customer experiences with gamification when watching a match, live betting, location-based services in the stadium, automated payments, coupons, integration with connected fan shops and shopping malls, and so on. Source: Wipro Digital Improving the sport and the related matches itself is another excellent enhancement, including analyzing and monitoring gameplay, the health of players, security, and other use cases. Source: Wipro Digital Using data is a fundamental change in sports. A very early example is the famous story of Moneyball: The Art of Winning an Unfair Game: A book by Michael Lewis, published in 2003, about the Oakland Athletics baseball team and its general manager, Billy Beane. Its focus is the team's analytical, evidence-based, sabermetric approach to assembling a competitive baseball team despite Oakland's small budget. A film based on Lewis' book, starring Brad Pitt and Jonah Hill, was released in 2011. Whether you are a coach or player, a fan, or a business related to sports, data is critical to success. Wipro Digital's whitepaper "Connected Stadium Solutions" explores the motivation and various use cases for re-imaging sports. And most use cases are only possible with real-time data. That's where data streaming comes into play... Let's look at a concrete success story. The Current State of Table Tennis World Table Tennis (WTT) is a business created by the International Table Tennis Federation (ITTF) to manage the official professional Table Tennis series of events and its commercial rights. Table tennis is more significant than you might think: There are over 200 member associations across the globe within the ITTF. World Table Tennis also leads the transformation of the digital sport and commercializes its software application for real-time event scoring worldwide with Apache Kafka. Previously, ITTF scoring was processed manually with a desktop-based, on-venue results system (OVR) - an on-premises solution to process match data that calculated rankings and records, then sent event information to other systems, such as scoreboards. Real-time data is essential in the sporting world. The ITTF team re-engineered their data system in 18 months, moving from solely on-premises infrastructure to a cloud-native data system that uses fully-managed Confluent Cloud with Apache Kafka as its central nervous system. Real-time Analytics With Kafka To Provide Stats, Engage With Fans, and Integrate With Gaming and Betting Vatsan Rama (Director of IT, ITTF Group) talked in the Confluent podcast about Streaming Real-Time Sports Analytics with Apache Kafka for World Table Tennis. Here are several exciting use cases for real-time analytics around table tennis: Real-time stats of scores and any other interesting facts in a match are sent to scoreboards, media broadcasters, betting providers, and other 3rd party consumers The empire kicks off the recording of a live feed (stream of events) Analysis of player acting in real-time and comparing it to historical data (including advanced use cases like ball spin) Smart referees using video analytics in real-time (like fault, net, offsite, foul, etc.) Stateful statistics during the broadcast, e.g., the longest ball play (rally) in the last 24 months Batch analytics of historical data for coaching and player preparation against the next opponent Worldwide consolidation of data from events and leagues across the globe and across different (sub) organizations Customer 360 with mobile apps and real-time clickstream analytics to know the fans better and increase revenue (aka fan engagement) Data exchange with business partners, e.g., low latency with SLAs for critical use cases like a live betting API integration Innovative new business models of integration with cutting-edge technologies like blockchain, NFTs, and crypto That's a lot of exciting use cases across different business units, isn't it? Most can be adapted to any other sport. So, if you work in any company related to sporting, why wait any longer? Kick off with your first data streaming project! Why Data Streaming With Kafka Makes the Difference in Processing and Sharing Sports Data Data connectivity across various interfaces, APIs, and systems plus correlation of the data in real-time is key for modernizing the data infrastructure for any sports use case: Source: Wipro Digital Apache Kafka is the de facto standard for data streaming. Let’s look at why data streaming is a perfect solution for modernizing sports use cases: Real-time data integration and processing: Most innovative sports use cases only work well if the information across systems is correlated in real-time. Kafka Connect, Kafka Streams, KSQL, and other components allow using a single infrastructure for data processing. Storage: True decoupling and backpressure handling are crucial as slow consumers have different data processing capabilities than real-time consumers. The replayability of historical information does not require yet another database or data lake. Tiered Storage enables cost-efficient long-term storage in Kafka. Hybrid edge infrastructure: Some use cases require low-latency or offline computing power. Kafka is perfect, being a single platform for real-time integration, data processing, and storage. Real-time replication between separate Kafka environments provides out-of-the-box support for edge analytics and sometimes disconnected environments. Data governance across stakeholders and environments: Data privacy, access control, compliance, and zero trust are critical characteristics of any modern IT infrastructure. The Kafka ecosystem monitors and enforces the end-to-end data flow using a Schema Registry for defining contracts between independent data producers and downstream consumers and additional tools on top of data lineage and distributed tracing. Fully managed cloud-first approach: The cloud enables focusing on business problems and innovation. Only manage your own Kafka clusters if serverless SaaS is impossible for security, cost, or latency reasons! Don't trust marketing and make sure your Kafka service is indeed fully managed, not just partially managed, where you take over the risk and operation burden. Omnichannel customer 360: Most businesses and fans require access to information across different interfaces (including web browsers, mobile apps, devices, smart point of sale, location-based services, and so on). Kafka's unique combination of real-time messaging and storage provides out-of-the-box support for building decoupling customer 360 applications. Data sharing and open API for B2B exchange: Most sports use cases hold various data sets that enable new internal and external use cases. Data sharing across business units and 3rd party business partners or public Open APIs in real-time allows innovation to improve the customer experience or establish brand-new business models. Kafka and related cloud services enable real-time data exchange. Proactive cybersecurity: Digitalization comes with its risks. Stars use social networks. Stadiums and shops get connected. Cameras monitor players and environments. And so on. Real-time situational awareness and threat intelligence are crucial to protect the data, and people are essential in a world where everything is digital and integrated. Integration with blockchain, crypto, and NFT: Beyond the current crypto winter, many innovative use cases will come for the metaverse and decentralized identity management. one example is selling instant moments in sports via NFTs. Kafka is the middleware between regular applications and the NFT and crypto trading platforms. Reimagine Sports and Turn Customers Into Fans With Data Streaming Using Apache Kafka Real-time data processing is crucial for most innovative sports use cases. Most events and actions need to be processed while the information is still hot. If data is stored at rest in a database or data lake, it is too late to act on the data for innovative use cases like notifications, recommendations, alerts, gaming, and many other use cases. Here is a concrete Kafka-powered example combining live video streaming, gamification, CRM integration, crypto and NFT services, and more: Data streaming with the de facto standard Apache Kafka is the foundation of innovation in sports. No matter if you work in a sports organization, retail, security, betting, marketing, or any other related company. The cloud is a fundamental change for sports. Organizations do not need to host and operate the infrastructure anymore. They can quickly build new use cases focusing on the business logic with small teams to innovate quickly. The example of a worldwide table tennis organization is a great real-world example. How do you use real-time data in a sports environment? Or is batch processing still sufficient for your use cases? What role plays data streaming in these scenarios? Let’s connect on LinkedIn and discuss it!

By Kai Wähner CORE
Python Exception Handling: Try, Except, and Finally in Python
Python Exception Handling: Try, Except, and Finally in Python

There are two types of error in python, i.e., Exception and Syntax error. Errors are the problems that occur in the program, and the execution of the program will stop due to this error. And on the other hand, if we will talk about exceptions when the normal program is disturbed due to some internal event, then the exception is raised in the program. Difference Between Syntax Error and Exceptions Syntax Error As its name suggests that the error occurred due to the written wrong syntax in the code, which is called a syntax error. And the program will be terminated due to the syntax error. Example: Here, in the above lines of code, ":" is not used after the if statement, then it will result in a syntax error in the program. Python # taking one variable and initializing its value var = 500 # check the value of the variable is greater than the 100 if(var>100) print("Value of the variable is greater than 100") else: print("Value of the variable is not greater than 100") Output: Python File "main.py", line 4 if(var>100) ^ SyntaxError: invalid syntax Exceptions When the code of the program is correct syntactically, but the execution of the code results in an error, then it is known as an exception. The execution of the program is not stopped by these exceptions. Instead of stopping the program execution, exceptions will disturb the normal program flow. Example: Python # taking one variable and initializing its value var = 500 # trying to divide the variable with zero a = var / 0 # printing the value of a print(a) Output: Python Traceback (most recent call last): File "main.py", line 4, in <module> a = var / 0 ZeroDivisionError: division by zero In the above-given code, an exception named ZeroDivisionError is raised as in the above code, we are trying to divide a number with the zero. Exception Handling Exception handling is the procedure of responding to unexpected or unwanted events occurring at the time of execution of a program. These events are dealt with by exception handling so that there is no system crashing, and Normal programs will be disrupted by the exception if the exception handling is not used. The exception can occur in the program due to many reasons, such as failure of the device, the user trying to enter invalid input, network connection list, errors in the code, sufficient memory not available for program execution, the user trying to open the files that are not available or there is an operation in a program that tries to divide the number by zero. In python, for all exceptions, the Exception is the base class. Try and Except Statement: Catching Exceptions In python, to catch and handle the exception, try and except keywords are used. Try block contains a statement or set of statements that will raise any exception in the program. The except block will be skipped if no exception occurs at the time of execution of try block statements. All the statements in the try block are to be written indented. If the exception occurs while executing the statements of the try block, then program control is transferred to the except block. Except contains the set of statements that are for handling the exception that has occurred in the execution of try block statements. Forex: printing error message at the time of the exception. And the statements of except will also be written indented. And after the except keyword, the exception type can also be specified. But that block will be executed only at the time of occurrence of the specified exception. Multiple except blocks with different specified exceptions can also be there with the single try block. If the exception occurring in the try block statements does not match with any of the specified except block exceptions, then the exception in the program will be terminated, and that exception remains unhandled. Example 1: Let us write the program in which we are declaring an array, and we are trying to access the element of that array that is out of the bond. This exception is handled in the below code using the except block. Python # Simple python program example for handling the runtime error # declaring an array with three elements ar = [1, 2, 3] # try block try: # using 0 index to access and print the first element of the array print ("First element stored in array = %d" %(ar[0])) # using 4 index to access and print the fifth element of the array print ("Fifth element stored in array = %d" %(ar[4])) # above statement will raise an error as there are only three elements in the array # except block except: # set of statements that will handle the exception print ("An error occurred in the program") Output: Python First element stored in array = 2 An error occurred in the program In the above lines of code, the try block contains the set statement in which there are chances of the exception occurring (in the above code, the second print statement of the code). The second print statement is trying to access and print the fifth element of the array arr, which does not exist as the size of the array is three, so only till the third element can be accessed. And the statements written in the except block will be executed after the exception. Example 2: Here, we are initializing the value of two variables one is an integer value, and the other is with string value data type, and an exception will occur when we try to add both values. Python # declaring a variable with an integer value a=5 # declaring a variable with the string value b="0" # try block try: # Exception is occurred as trying to add integer and string print (a+b) # except block except: print('An error occurred in the program') Output: Python An error occurred in the program Catching Specific Exception For the specification of different exception handlers, more than one exception block is allowed with the single try block. For example, IndexError, and DivideByZero exceptions handler can be written with the except keyword. But one block statement will be executed among all the except statements at most. And for adding the specific exception general syntax is given below: Python try: # set of statement(s) except IndexError: # set of statement(s) except ValueError: # set of statement(s) Example: Catching specific exceptions in Python # declaring a variable with the integer value a=5 # declaring another variable with the integer value b=0 try: # trying to divide both values print (a/b) # exception will occur as we are trying to divide a # number by zero # except block to handle TypeError exception except TypeError: print('Operation is not supported') # except block to handle ZeroDivisionError exception except ZeroDivisionError: print ('Divide a number by zero is not allowed') Output: Python Divide a number by zero is not allowed Try With Else Clause Python also allows the use of one of the control flow statements, the else keyword, with try-except. A set of statements written in the else block will be executed only when there is no exception at the time of trying block statements. The else block will be written after all the except blocks. The statements of the else block will also be written indented. Else block with try-except syntax is given below: Python try: # Set of statements... except: # optional block # block to write code to handle the exception else: # set of statements to be executed if no exception is # occurred in the try block Example: Try with the else clause # python to demonstrate the else clause with try-except # declaring a variable with an integer value a=10 # declaring another variable with an integer value b=5 # try block try: # trying to divide both values print (a/b) # except block except: print('An error occurred) # else block else: print('Inside else block') Output: Python 2.0 Inside else block Finally, Keywords in Python Python also allows the use of the final keyword in exception handling. The set of statements will always be executed whether the try block is terminated normally or the try block is terminated due to an exception. Finally, the block is always written after all the except blocks. Finally, block syntax is given below: Python Syntax: try: # Set of statements... except: # optional block # block to write code to handle the exception else: # set of statements to be executed if no exception is # occurred in the try block finally: # always executed # set of statement Example: # Python program to show the example of finally # try block try: # divide by zero exception will occur a = 5/0 print(a) # block of code to handle divide by zero exception except ZeroDivisionError: print("An error occurred") finally: # block of code executed always whether there is # exception occurred or not print('Finally block!! Executed Always') Output: Python An error occurred Finally block!! Executed Always Raising Exception In python, the raise keyword is used in exception handling for forcing some exceptions to occur in the program. The argument of the raise statement specifies the exception to be raised. And we can specify any exception class or an exception instance here. Example: Python # Python program to demonstrate the example of raising Exception # try block try: # raising a named exception raise NameError("Hello!!") # except block which will catch the raised NameError except NameError: print ("An exception") # again raising an exception and this will not be handled by the catch block raise The output of the above line of code first prints “An exception,” then the run time error is displayed in the output on the console as the raise keyword is used in the last line, which will raise an error. The output of the above line of code displayed on the console is given below: Output: Python An exception Traceback (most recent call last): File "main.py", line 5, in <module> raise NameError("Hello!!") NameError: Hello!! Conclusion Hope this small guide on Python Exception handling helped you grasp the basics of exception handling. Although this is a quite basic topic, save this article for brushing up your knowledge on the same just before your next python job interview or exam, as this is a most commonly asked topic! Thanks for reading.

By Sarang S Babu
Building Real-Time Weather Dashboards With Apache Pinot
Building Real-Time Weather Dashboards With Apache Pinot

Building Real-Time Weather Dashboards With Apache NiFi, Apache Pulsar, Apache Pinot, and Apache SuperSet It is so easy to build Pulsar to Pinot applications for real-time analytics. I added another source of data for weather feeds for the U.S. I am looking at adding transit and aircraft data feeds next. The sky is the limit with Pulsar + Pinot. I will probably pull more sources from my big list from the Let's Monitor talk. Apache NiFi acquires our weather feed for the United States from NOAA. This is really easy to do, and I have it well-documented at the the source referenced. Reference: https://github.com/tspannhw/SmartWeather Reference: Weather How-To The first thing we will need to do for infrastructure—put this in your DevOps box—is to create topics. We can let the producer of the first message automatically do this if you are on your laptop. Most production clusters want this pre-defined and run by an ops person. So here you go; we will take a look at the list of existing topics and then build it. Since Apache Pulsar is multi-tenant, you can create and specify a custom tenant and namespace for the weather application if you desire. Each organization will decide how to set up and build their hierarchy of tenants and namespaces that matches their application architecture landscape. Weather Pulsar Topic bin/pulsar-admin topics list public/default bin/pulsar-admin topics create persistent://public/default/weather bin/pulsar-admin topics create persistent://public/default/aircraftweather2 As part of the general application, after the data is sent from Apache NiFi to Apache Pulsar via the NiFi Connector, I have a Java Pulsar Function that creates a new schema for it. I developed this schema so it joins well with ADSB Aircraft data. Weather Function in Pulsar to Produce Our Feed Reference: https://github.com/tspannhw/pulsar-weather-function bin/pulsar-admin functions stop --name Weather --namespace default --tenant public bin/pulsar-admin functions delete --name Weather --namespace default --tenant public bin/pulsar-admin functions create --auto-ack true --jar /Users/tspann/Documents/code/pulsar-weather-function/target/weather-1.0.jar --classname "dev.pulsarfunction.weather.WeatherFunction" --dead-letter-topic "persistent://public/default/aircraftweatherdead" --inputs "persistent://public/default/weather" --log-topic "persistent://public/default/aircraftweatherlog" --name Weather --namespace default --tenant public --max-message-retries 5 Once the data is flowing, it is easy to check it and receive the data real-time with the Pulsar client command line consumer as seen below. You can see we are getting data with extra properties and keys. The data matches the schema and is in clean JSON. You can easily verify the schema from the command line if you wish as well. Any commands you run with the Pulsar CLI can also be done via REST, the Pulsar Manager, StreamNative Console, snctl, or Java Pulsar Administration API. Consume Weather Topic from Pulsar bin/pulsar-client consume "persistent://public/default/aircraftweather2" -s test1 -n 0 ----- got message ----- key:[9a88cbf5-92df-4546-bda5-a57dba7e453f], properties:[language=Java], content:{"location":"Greenwood, Greenwood County Airport, SC","station_id":"KGRD","latitude":34.24722,"longitude":-82.15472,"observation_time":"Last Updated on Dec 8 2022, 8:56 am EST","observation_time_rfc822":"Thu, 08 Dec 2022 08:56:00 -0500","weather":"Fog","temperature_string":"61.0 F (16.1 C)","temp_f":61.0,"temp_c":16.1,"relative_humidity":100,"wind_string":"Calm","wind_dir":"North","wind_degrees":0,"wind_mph":0.0,"wind_kt":0,"pressure_string":"1023.5 mb","pressure_mb":1023.5,"pressure_in":30.24,"dewpoint_string":"61.0 F (16.1 C)","dewpoint_f":61.0,"dewpoint_c":16.1,"heat_index_f":0,"heat_index_c":0,"visibility_mi":0.25,"icon_url_base":"https://forecast.weather.gov/images/wtf/small/","two_day_history_url":"https://www.weather.gov/data/obhistory/KGRD.html","icon_url_name":"fg.png","ob_url":"https://www.weather.gov/data/METAR/KGRD.1.txt","uuid":"d429a3e3-d12d-4297-9192-81a2985d8725","ts":1670520773418} Pulsar JSON Schema { "type" : "record", "name" : "Weather", "namespace" : "dev.pulsarfunction.weather", "fields" : [ { "name" : "dewpoint_c", "type" : "double" }, { "name" : "dewpoint_f", "type" : "double" }, { "name" : "dewpoint_string", "type" : [ "null", "string" ], "default" : null }, { "name" : "heat_index_c", "type" : "int" }, { "name" : "heat_index_f", "type" : "int" }, { "name" : "heat_index_string", "type" : [ "null", "string" ], "default" : null }, { "name" : "icon_url_base", "type" : [ "null", "string" ], "default" : null }, { "name" : "icon_url_name", "type" : [ "null", "string" ], "default" : null }, { "name" : "latitude", "type" : "double" }, { "name" : "location", "type" : [ "null", "string" ], "default" : null }, { "name" : "longitude", "type" : "double" }, { "name" : "ob_url", "type" : [ "null", "string" ], "default" : null }, { "name" : "observation_time", "type" : [ "null", "string" ], "default" : null }, { "name" : "observation_time_rfc822", "type" : [ "null", "string" ], "default" : null }, { "name" : "pressure_in", "type" : "double" }, { "name" : "pressure_mb", "type" : "double" }, { "name" : "pressure_string", "type" : [ "null", "string" ], "default" : null }, { "name" : "relative_humidity", "type" : "int" }, { "name" : "station_id", "type" : [ "null", "string" ], "default" : null }, { "name" : "temp_c", "type" : "double" }, { "name" : "temp_f", "type" : "double" }, { "name" : "temperature_string", "type" : [ "null", "string" ], "default" : null }, { "name" : "ts", "type" : "long" }, { "name" : "two_day_history_url", "type" : [ "null", "string" ], "default" : null }, { "name" : "uuid", "type" : [ "null", "string" ], "default" : null }, { "name" : "visibility_mi", "type" : "double" }, { "name" : "weather", "type" : [ "null", "string" ], "default" : null }, { "name" : "wind_degrees", "type" : "int" }, { "name" : "wind_dir", "type" : [ "null", "string" ], "default" : null }, { "name" : "wind_kt", "type" : "int" }, { "name" : "wind_mph", "type" : "double" }, { "name" : "wind_string", "type" : [ "null", "string" ], "default" : null } ] } Build Pinot Schema docker exec -it pinot-controller bin/pinot-admin.sh JsonToPinotSchema \ -timeColumnName ts \ -metrics "pressure_in,temp_c,temp_f,wind_mph,relative_humidity,pressure_mb"\ -dimensions "station_id,location,latitude,longitude" \ -pinotSchemaName=weather \ -jsonFile=/data/weather.json \ -outputDir=/config In order to add a table to Apache Pinot, we need a schema. Instead of hand-building a schema, we can use the JsonToPinotSchema tool to convert an example JSON record into a schema. In Docker, this is easy to execute, as seen above. Now that we have a generated schema, we can quickly build a table JSON file, which is very straightforward. Once complete, we can load the schema via the AddSchema tool. For the easiest way to load the table, I utilize the REST API and curl. Choose the DevOps mechanism that meets your enterprise standards. Load Pinot Schema and Table docker exec -it pinot-controller bin/pinot-admin.sh AddSchema \ -schemaFile /config/weatherschema.json \ -exec curl -X POST "http://localhost:9000/tables" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"tableName\": \"weather\", \"tableType\": \"REALTIME\", \"segmentsConfig\": { \"timeColumnName\": \"ts\", \"schemaName\": \"weather\", \"replication\": \"1\", \"replicasPerPartition\": \"1\" }, \"ingestionConfig\": { \"batchIngestionConfig\": { \"segmentIngestionType\": \"APPEND\", \"segmentIngestionFrequency\": \"DAILY\" } }, \"tableIndexConfig\": { \"loadMode\": \"MMAP\", \"streamConfigs\": { \"streamType\": \"pulsar\", \"stream.pulsar.topic.name\": \"persistent://public/default/aircraftweather2\", \"stream.pulsar.bootstrap.servers\": \"pulsar://Timothys-MBP:6650\", \"stream.pulsar.consumer.type\": \"lowlevel\", \"stream.pulsar.fetch.timeout.millis\": \"10000\", \"stream.pulsar.consumer.prop.auto.offset.reset\": \"largest\", \"stream.pulsar.consumer.factory.class.name\": \"org.apache.pinot.plugin.stream.pulsar.PulsarConsumerFactory\", \"stream.pulsar.decoder.class.name\": \"org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder\", \"realtime.segment.flush.threshold.rows\": \"0\", \"realtime.segment.flush.threshold.time\": \"1h\", \"realtime.segment.flush.threshold.segment.size\": \"5M\" } }, \"tenants\": {}, \"metadata\": {}" Pinot Table Format JSON { "tableName": "weather", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "ts", "schemaName": "weather", "replication": "1", "replicasPerPartition": "1" }, "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY" } }, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "pulsar", "stream.pulsar.topic.name": "persistent://public/default/aircraftweather2", "stream.pulsar.bootstrap.servers": "pulsar://SERVERNAME:6650", "stream.pulsar.consumer.type": "lowlevel", "stream.pulsar.fetch.timeout.millis": "10000", "stream.pulsar.consumer.prop.auto.offset.reset": "largest", "stream.pulsar.consumer.factory.class.name": "org.apache.pinot.plugin.stream.pulsar.PulsarConsumerFactory", "stream.pulsar.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "1h", "realtime.segment.flush.threshold.segment.size": "5M" } }, "tenants": {}, "metadata": {} } The most important pieces of this connection file is: "tableName" will be the name of your new table. "tableType" will be "REALTIME". "timeColumnName" is the field that has timestamp / UNIX time. "schemaName" is the name of the schema you specified in the schema file. "streamType" which is "pulsar". "stream.pulsar.topic.name" which is your topic. "stream.pulsar.bootstrap.servers" which connects to your Pulsar server Pinot Queries Against Real-Time Table To ensure that our data is working, first run a few test queries in the Query Console within Apache Pinot. select * from weather order by ts desc limit 102; select dewpoint_string,location,latitude,longitude, temperature_string, weather, wind_string, observation_time, ts from weather order by ts desc limit 102; Once our table and schema are loaded, we are connected to Pulsar and will start consuming messages from the listed Pulsar topic. List tables in Pinot Weather table Weather Query Weather Query in the Console To build a dashboard to the dash streamed to our real-time table in Apache Pinot, we will use Apache Superset. Let's connect from Superset to Pinot, it's easy. Pinot to Superset Connection In order to connect Superset to Pinot, you need to add the following URL depending on your server name or Docker IP.pinot+http://192.168.1.157:8099/query?server=192.168.1.157:9000/ pinot+http://SERVERNAME:8099/query?server=http://SERVERNAME:9000/ See: https://docs.pinot.apache.org/integrations/superset Superset Analytics Add a Pinot dataset for weather to Superset Add a dataset Add a chart Weather dashboards in Superset Example Data - data/weather.json {"location":"Union County Airport - Troy Shelton Field, SC","station_id":"K35A","latitude":34.68695,"longitude":-81.64117,"observation_time":"Last Updated on Dec 7 2022, 3:35 pm EST","observation_time_rfc822":"Wed, 07 Dec 2022 15:35:00 -0500","weather":"Overcast","temperature_string":"64.0 F (17.7 C)","temp_f":64.0,"temp_c":17.7,"relative_humidity":98,"wind_string":"Southwest at 4.6 MPH (4 KT)","wind_dir":"Southwest","wind_degrees":220,"wind_mph":4.6,"wind_kt":4,"pressure_mb":0.0,"pressure_in":30.26,"dewpoint_string":"63.1 F (17.3 C)","dewpoint_f":63.1,"dewpoint_c":17.3,"heat_index_f":0,"heat_index_c":0,"visibility_mi":10.0,"icon_url_base":"https://forecast.weather.gov/images/wtf/small/","two_day_history_url":"https://www.weather.gov/data/obhistory/K35A.html","icon_url_name":"ovc.png","ob_url":"https://www.weather.gov/data/METAR/K35A.1.txt","uuid":"5d6ac217-9c3d-4228-87d4-778cbf8561a2","ts":1670508009894} We have built a full live dashboard system, but we may have some other real-time analytics use cases so we can connect to Apache Flink. With Flink, we can run real-time SQL against Pulsar topics as each event arrives in the topic. This lets use run continuous queries, joins, inserts, updates, and advanced SQL applications. We often use this for fraud detection and trigger alerts as things happen. Start Flink in Docker See https://github.com/streamnative/flink-example/blob/main/docs/sql-example.md ./bin/start-cluster.sh ./bin/sql-client.sh Airport Weather Flink SQL Table CREATE CATALOG pulsar WITH ( 'type' = 'pulsar-catalog', 'catalog-service-url' = 'pulsar://Timothys-MBP:6650', 'catalog-admin-url' = 'http://Timothys-MBP:8080' ); USE CATALOG pulsar; show databases; use `public/default`; SHOW TABLES; CREATE TABLE airportweather3 ( `dewpoint_c` DOUBLE, `dewpoint_f` DOUBLE, `dewpoint_string` STRING, `heat_index_c` INT, `heat_index_f` INT, `heat_index_string` STRING, `icon_url_base` STRING, `icon_url_name` STRING, `latitude` DOUBLE, `location` STRING, `longitude` DOUBLE, `ob_url` STRING, `observation_time` STRING, `observation_time_rfc822` STRING, `pressure_in` DOUBLE, `pressure_mb` DOUBLE, `pressure_string` STRING, `relative_humidity` INT, `station_id` STRING, `temp_c` DOUBLE, `temp_f` DOUBLE, `temperature_string` STRING, `ts` DOUBLE, `two_day_history_url` STRING, `visibility_mi` DOUBLE, `weather` STRING, `wind_degrees` INT, `wind_dir` STRING, `wind_kt` INT, `wind_mph` DOUBLE, `wind_string` STRING ) WITH ( 'connector' = 'pulsar', 'topics' = 'persistent://public/default/aircraftweather2', 'format' = 'json', 'admin-url' = 'http://Timothys-MBP:8080', 'service-url' = 'pulsar://Timothys-MBP:6650' ) desc aircraftweather2; +-------------------------+--------+-------+-----+--------+-----------+ | name | type | null | key | extras | watermark | +-------------------------+--------+-------+-----+--------+-----------+ | dewpoint_c | DOUBLE | FALSE | | | | | dewpoint_f | DOUBLE | FALSE | | | | | dewpoint_string | STRING | TRUE | | | | | heat_index_c | INT | FALSE | | | | | heat_index_f | INT | FALSE | | | | | heat_index_string | STRING | TRUE | | | | | icon_url_base | STRING | TRUE | | | | | icon_url_name | STRING | TRUE | | | | | latitude | DOUBLE | FALSE | | | | | location | STRING | TRUE | | | | | longitude | DOUBLE | FALSE | | | | | ob_url | STRING | TRUE | | | | | observation_time | STRING | TRUE | | | | | observation_time_rfc822 | STRING | TRUE | | | | | pressure_in | DOUBLE | FALSE | | | | | pressure_mb | DOUBLE | FALSE | | | | | pressure_string | STRING | TRUE | | | | | relative_humidity | INT | FALSE | | | | | station_id | STRING | TRUE | | | | | temp_c | DOUBLE | FALSE | | | | | temp_f | DOUBLE | FALSE | | | | | temperature_string | STRING | TRUE | | | | | ts | BIGINT | FALSE | | | | | two_day_history_url | STRING | TRUE | | | | | uuid | STRING | TRUE | | | | | visibility_mi | DOUBLE | FALSE | | | | | weather | STRING | TRUE | | | | | wind_degrees | INT | FALSE | | | | | wind_dir | STRING | TRUE | | | | | wind_kt | INT | FALSE | | | | | wind_mph | DOUBLE | FALSE | | | | | wind_string | STRING | TRUE | | | | +-------------------------+--------+-------+-----+--------+-----------+ 32 rows in set Flink SQL Row Flink SQL Results Source Code for our Weather Application: (Pinot-Pulsar Repo) References https://github.com/streamnative/pulsar-flink-patterns https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/

By Tim Spann CORE

Top Data Experts

expert thumbnail

Oren Eini

Wizard,
Hibernating Rhinos @ayende ‏

Oren Eini (@Ayende) is the CEO of Hibernating Rhinos, an Israeli-based hi-tech company which develops RavenDB (ravendb.net), the pioneer NoSQL Document Database that’s Fully Transactional across the database. He and his team also develop productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com), and more. twitter: @HiberRhinos
expert thumbnail

Kai Wähner

Technology Evangelist,
Confluent

Kai Waehner works as Technology Evangelist at Confluent. Kai’s main area of expertise lies within the fields of Big Data Analytics, Machine Learning / Deep Learning, Messaging, Integration, Microservices, Internet of Things, Stream Processing and Blockchain. He is regular speaker at international conferences such as JavaOne, O’Reilly Software Architecture or ApacheCon, writes articles for professional journals, and shares his experiences with new technologies on his blog (www.kai-waehner.de/blog). Contact and references: kontakt@kai-waehner.de / @KaiWaehner / www.kai-waehner.de
expert thumbnail

Gilad David Maayan

CEO,
Agile SEO

Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.
expert thumbnail

Grant Fritchey

Product Advocate,
Red Gate Software

Grant Fritchey, Microsoft MVP, works for Redgate Software as their Product Evangelist. Grant has more than 20 years experience in IT including time as a developer, DBA and architect. He has written multiple books on SQL Server including "SQL Server Query Performance Tuning" and "SQL Server Execution Plans." He presents at conferences around the world.

The Latest Data Topics

article thumbnail
Real-Time Stream Processing With Hazelcast and StreamNative
In this article, readers will learn about real-time stream processing with Hazelcast and StreamNative in a shorter time, along with demonstrations and code.
January 27, 2023
by Timothy Spann
· 1,887 Views · 2 Likes
article thumbnail
Cloud Native London Meetup: 3 Pitfalls Everyone Should Avoid With Cloud Data
Explore this session from Cloud Native London that highlights top lessons learned as developers transitioned their data needs into cloud-native environments.
January 27, 2023
by Eric D. Schabell CORE
· 1,391 Views · 3 Likes
article thumbnail
Unit of Work With Generic Repository Implementation Using .NET Core 6 Web API
This article reviews the Unit of Work design pattern using a generic repository and a step-by-step implementation using the.NET Core 6 Web API.
January 27, 2023
by Jaydeep Patil
· 1,332 Views · 1 Like
article thumbnail
The 31 Flavors of Data Lineage and Why Vanilla Doesn’t Cut It
This article goes over the four critical reasons why your data quality solution needs to have data lineage.
January 27, 2023
by Lior Gavish
· 1,498 Views · 1 Like
article thumbnail
Fraud Detection With Apache Kafka, KSQL, and Apache Flink
Exploring fraud detection case studies and architectures with Apache Kafka, KSQL, and Apache Flink with examples, guide images, and informative details.
January 26, 2023
by Kai Wähner CORE
· 2,468 Views · 1 Like
article thumbnail
Upgrade Guide To Spring Data Elasticsearch 5.0
Learn about the latest Spring Data Elasticsearch 5.0.1 with Elasticsearch 8.5.3, starting with the proper configuration of the Elasticsearch Docker image.
January 26, 2023
by Arnošt Havelka CORE
· 2,154 Views · 1 Like
article thumbnail
CQRS and MediatR Pattern Implementation Using .NET Core 6 Web API
In this article, we are going to discuss the working of CQRS and MediatR patterns and step-by-step implementation using .NET Core 6 Web API.
January 26, 2023
by Jaydeep Patil
· 1,673 Views · 1 Like
article thumbnail
The Top 3 Challenges Facing Engineering Leaders Today—And How to Overcome Them
This article offers practical solutions for engineering leaders looking to lead their teams to success.
January 26, 2023
by Jennifer Grange
· 1,733 Views · 1 Like
article thumbnail
What Is Policy-as-Code? An Introduction to Open Policy Agent
Learn the benefits of policy as code and start testing your policies for cloud-native environments.
January 26, 2023
by Tiexin Guo
· 3,065 Views · 1 Like
article thumbnail
Data Mesh vs. Data Fabric: A Tale of Two New Data Paradigms
Data Mesh vs. Data Fabric: Are these two paradigms really in contrast with each other? What are their differences and their similarities? Find it out!
January 26, 2023
by Paolo Martinoli
· 2,137 Views · 1 Like
article thumbnail
Do Not Forget About Testing!
This article dives into why software testing is essential for developers. By the end, readers will understand why testing is needed, types of tests, and more.
January 26, 2023
by Lukasz J
· 2,804 Views · 1 Like
article thumbnail
Handling Automatic ID Generation in PostgreSQL With Node.js and Sequelize
In this article, readers will learn four ways to handle automatic ID generation in Sequelize and Node.js for PostgreSQL, which includes simple guide code.
January 25, 2023
by Brett Hoyer
· 2,072 Views · 3 Likes
article thumbnail
Five Key Metaverse Launch Features: Everything You Need to Know
Read the five most crucial metaverse launch features that will make your metaverse platform an instant success among Web3 participants.
January 25, 2023
by Preethi Philip
· 1,594 Views · 1 Like
article thumbnail
The Role of Data Governance in Data Strategy: Part II
This article explains how data is cataloged and classified and how classified data is used to group and correlate the data to an individual.
January 25, 2023
by Satish Gaddipati
· 2,194 Views · 5 Likes
article thumbnail
Revolutionizing Supply Chain Management With AI: Improving Demand Predictions and Optimizing Operations
How are AI and ML being used to revolutionize supply chain management? What are the latest advancements and best practices?
January 25, 2023
by Frederic Jacquet CORE
· 1,947 Views · 1 Like
article thumbnail
2023 Software Testing Trends: A Look Ahead at the Industry's Future
Discover the future of AI, DevOps, cloud computing, IoT, security, performance, automation, mobile, and big data testing.
January 25, 2023
by Praveen Mishra
· 8,336 Views · 2 Likes
article thumbnail
Best Practices to Succeed at Continuous AWS Security Monitoring
This article will look at best practices to efficiently ingest, normalize, and structure their AWS logs so that security teams can implement the proper detections.
January 25, 2023
by Jack Naglieri
· 1,921 Views · 1 Like
article thumbnail
The Future of Cloud Engineering Evolves
Central cloud engineering platform defines consistent workloads, architectures, and best practices.
January 25, 2023
by Tom Smith CORE
· 2,947 Views · 2 Likes
article thumbnail
A Brief Overview of the Spring Cloud Framework
Readers will get an overview of the Spring Cloud framework, a list of its main packages, and their relation with the Microservice Architectural patterns.
January 25, 2023
by Mario Casari
· 4,876 Views · 1 Like
article thumbnail
Memory Debugging: A Deep Level of Insight
In this article, readers will better understand memory leaks and how RAM is used, and its content provides insight into the app you can't get in any other way.
January 25, 2023
by Shai Almog CORE
· 2,611 Views · 2 Likes
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: