DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
Gravity, Residency, and Latency: Balancing the Three Dimensions of Big Data
Struggling with challenges related to big data? Remember to balance data gravity, data residency, and data latency priorities.
October 31, 2022
by Jason Bloomberg
· 8,025 Views · 3 Likes
article thumbnail
Top 10 IoT Trends That Will Impact Various Industries in the Coming Years
IoT will have a significant impact on many different industries in the coming few years.
October 28, 2022
by Calvin Austins
· 9,662 Views · 3 Likes
article thumbnail
Data Science vs. Software Engineering: A Fine Differentiation
The differences between data science and software engineering with career opportunities, salaries, and qualifications, along with a detailed comparison table.
October 27, 2022
by Richa Sareen
· 7,449 Views · 2 Likes
article thumbnail
The Difference Between Predictive Analytics and Data Science
Predictive data helps forecasts, but data science is unique. This post explains the difference between predictive analytics and data science.
October 26, 2022
by Smith Williams
· 7,221 Views · 1 Like
article thumbnail
7 Alternatives to Using Segment
Review the pros and cons of Segment like its core features Warehouses and Personas, and check out the best alternative CDP platforms.
Updated October 25, 2022
by Luke Kline
· 5,249 Views · 2 Likes
article thumbnail
Why Data Analytics Is Central to Digital Adoption Optimization
When data analysts successfully deploy digital adoption into data analysis, it impacts the transformation of organizations to a very high level.
October 25, 2022
by Alon Gehlber DZone Core CORE
· 5,967 Views · 1 Like
article thumbnail
What Are the Key Applications and Benefits of IoT Fleet Management?
IoT in fleet management can sincerely help bring more agility to your distribution channels.
October 25, 2022
by Kamal R
· 6,321 Views · 2 Likes
article thumbnail
Here Is Why You Need a Message Broker
Hopefully, by the end of this article, you will be able to understand the importance of using a message-driven architecture for building your next project.
Updated October 24, 2022
by Yaniv Ben Hemo
· 7,620 Views · 4 Likes
article thumbnail
iOS Meets IoT: Five Steps to Building Connected Device Apps for Apple
Apple’s flagship device counts a unique operating system — iOS — that should not be ignored in app rollouts for the Internet of Things.
October 21, 2022
by Carsten Rhod Gregersen
· 7,005 Views · 2 Likes
article thumbnail
Data Streaming for Data Ingestion Into the Data Warehouse and Data Lake
Data Streaming with Apache Kafka for Data Ingestion into Data Warehouse, Data Lake and Lakehouse.
October 20, 2022
by Kai Wähner DZone Core CORE
· 6,602 Views · 3 Likes
article thumbnail
Data Modeling in Cassandra and Astra DB
What does it take to build an efficient and sound data model for Apache Cassandra and DataStax Astra DB? Where would one start? Are there any data modeling rules to follow?
October 19, 2022
by Artem Chebotko
· 6,120 Views · 2 Likes
article thumbnail
Can You Beat the AI? How to Quickly Deploy TinyML on MCUs Using TensorFlow Lite Micro
Do you want to know how to use it on the microcontrollers you already work with? In this article, we provide an introduction to ML on microcontrollers.
October 19, 2022
by Nikolas Rieder
· 7,820 Views · 2 Likes
article thumbnail
O11y Guide: Keeping Your Cloud-Native Observability Options Open
Take look at architecture-level choices being made and share the open standards with the open-source landscape.
October 19, 2022
by Eric D. Schabell
· 4,703 Views · 3 Likes
article thumbnail
The Heart of the Data Mesh Beats Real-Time With Apache Kafka
Building a decentralized real-time data mesh with data streaming using Apache Kafka for truly decoupled, reliable, scalable microservices.
October 19, 2022
by Kai Wähner DZone Core CORE
· 7,577 Views · 3 Likes
article thumbnail
Use Apache Kafka SASL OAUTHBEARER With Python
Learn how to use the Confluent Python client with SASL/OAUTHBEARER security protocol to produce and consume messages to topics in Apache Kafka.
Updated October 18, 2022
by Abhishek Koserwal
· 11,969 Views · 4 Likes
article thumbnail
Case Studies: Cloud-Native Data Streaming for Data Warehouse Modernization
Let's explore a few case studies for cloud-native data streaming and data warehouse modernization.
October 15, 2022
by Kai Wähner DZone Core CORE
· 7,457 Views · 3 Likes
article thumbnail
How to Read Graph Database Benchmark (Part II)
This is the second part of the How to Read Graph Database Benchmark series and is dedicated to graph query (algorithm, analytics) results validation.
October 13, 2022
by Ricky Sun
· 5,752 Views · 1 Like
article thumbnail
Decision Guidance for Serverless Adoption
This article guides on adoption of Serverless and provides decision guidance for various, architecture and workloads, It shares a list of antipatterns.
October 12, 2022
by Abhay Patra
· 6,831 Views · 5 Likes
article thumbnail
AIOps: What, Why, and How?
A Guide To Everything About AIOps: Use cases, benefits, challenges, core elements, AIOps architecture, and future.
Updated October 11, 2022
by Mahipal Nehra
· 8,474 Views · 3 Likes
article thumbnail
Handling Big Data with HBase Part 5: Data Modeling (or, Life Without SQL)
This is the fifth of a series of blogs introducing Apache HBase. In the fourth part, we saw the basics of using the Java API to interact with HBase to create tables, retrieve data by row key, and do table scans. This part will discuss how to design schemas in HBase. HBase has nothing similar to a rich query capability like SQL from relational databases. Instead, it forgoes this capability and others like relationships, joins, etc. to instead focus on providing scalability with good performance and fault-tolerance. So when working with HBase you need to design the row keys and table structure in terms of rows and column families to match the data access patterns of your application. This is completely opposite what you do with relational databases where you start out with a normalized database schema, separate tables, and then you use SQL to perform joins to combine data in the ways you need. With HBase you design your tables specific to how they will be accessed by applications, so you need to think much more up-front about how data is accessed. You are much closer to the bare metal with HBase than with relational databases which abstract implementation details and storage mechanisms. However, for applications needing to store massive amounts of data and have inherent scalability, performance characteristics and tolerance to server failures, the potential benefits can far outweigh the costs. In the last part on the Java API, I mentioned that when scanning data in HBase, the row key is critical since it is the primary means to restrict the rows scanned; there is nothing like a rich query like SQL as in relational databases. Typically you create a scan using start and stop row keys and optionally add filters to further restrict the rows and columns data returned. In order to have some flexibility when scanning, the row key should be designed to contain the information you need to find specific subsets of data. In the blog and people examples we've seen so far, the row keys were designed to allow scanning via the most common data access patterns. For the blogs, the row keys were simply the posting date. This would permit scans in ascending order of blog entries, which is probably not the most common way to view blogs; you'd rather see the most recent blogs first. So a better row key design would be to use a reverse order timestamp, which you can get using the formula (Long.MAX_VALUE - timestamp), so scans return the most recent blog posts first. This makes it easy to scan specific time ranges, for example to show all blogs in the past week or month, which is a typical way to navigate blog entries in web applications. For the people table examples, we used a composite row key composed of last name, first name, middle initial, and a (unique) person identifier to distinguish people with the exact same name, separated by dashes. For example, Brian M. Smith with identifier 12345 would have row key smith-brian-m-12345. Scans for the people table can then be composed using start and end rows designed to retrieve people with specific last names, last names starting with specific letter combinations, or people with the same last name and first name initial. For example, if you wanted to find people whose first name begins with B and last name is Smith you could use the start row key smith-b and stop row key smith-c (the start row key is inclusive while the stop row key is exclusive, so the stop key smith-c ensures all Smiths with first name starting with the letter "B" are included). You can see that HBase supports the notion of partial keys, meaning you do not need to know the exact key, to provide more flexibility creating appropriate scans. You can combine partial key scans with filters to retrieve only the specific data needed, thus optimizing data retrieval for the data access patterns specific to your application. So far the examples have involved only single tables containing one type of information and no related information. HBase does not have foreign key relationships like in relational databases, but because it supports rows having up to millions of columns, one way to design tables in HBase is to encapsulate related information in the same row - a "wide" table design. It is called a "wide" design since you are storing all information related to a row together in as many columns as there are data items. In our blog example, you might want to store comments for each blog. The "wide" way to design this would be to include a column family named comments and then add columns to the comment family where the qualifiers are the comment timestamp; the comment columns would look like comments:20130704142510 and comments:20130707163045. Even better, when HBase retrieves columns it returns them in sorted order, just like row keys. So in order to display a blog entry and its comments, you can retrieve all the data from one row by asking for the content, info, and comments column families. You could also add a filter to retrieve only a specific number of comments, adding pagination to them. The people table column families could also be redesigned to store contact information such as separate addresses, phone numbers, and email addresses in column families allowing all of a person's information to be stored in one row. This kind of design can work well if the number of columns is relatively modest, as blog comments and a person's contact information would be. If instead you are modeling something like an email inbox, financial transactions, or massive amounts of automatically collected sensor data, you might choose instead to spread a user's emails, transactions, or sensor readings across multiple rows (a "tall" design) and design the row keys to allow efficient scanning and pagination. For an inbox the row key might look like - which would permit easily scanning and paginating a user's inbox, while for financial transactions the row key might be -. This kind of design can be called "tall" since you are spreading information about the same thing (e.g. readings from the same sensor, transactions in an account) across multiple rows, and is something to consider if there will be an ever-expanding amount of information, as would be the case in a scenario involving data collection from a huge network of sensors. Designing row keys and table structures in HBase is a key part of working with HBase, and will continue to be given the fundamental architecture of HBase. There are other things you can do to add alternative schemes for data access within HBase. For example, you could implement full-text searching via Apache Lucene either within rows or external to HBase (search Google for HBASE-3529). You can also create (and maintain) secondary indexes to permit alternate row key schemes for tables; for example in our people table the composite row key consists of the name and a unique identifier. But if we desire to access people by their birth date, telephone area code, email address, or any other number of ways, we could add secondary indexes to enable that form of interaction. Note, however, that adding secondary indexes is not something to be taken lightly; every time you write to the "main" table (e.g. people) you will need to also update all the secondary indexes! (Yes, this is something that relational databases do very well, but remember that HBase is designed to accomodate a lot more data than traditional RDBMSs were.) Conclusion to Part 5 In this part of the series, we got an introduction to schema design in HBase (without relations or SQL). Even though HBase is missing some of the features found in traditional RDBMS systems such as foreign keys and referential integrity, multi-row transactions, multiple indexes, and son on, many applications that need inherent HBase benefits like scaling can benefit from using HBase. As with anything complex, there are tradeoffs to be made. In the case of HBase, you are giving up some richness in schema design and query flexibility, but you gain the ability to scale to massive amounts of data by (more or less) simply adding additional servers to your cluster. In the next and last part of this series, we'll wrap up and mention a few (of the many) things we didn't cover in these introductory blogs. References HBase web site, http://hbase.apache.org/ HBase wiki, http://wiki.apache.org/hadoop/Hbase HBase Reference Guide http://hbase.apache.org/book/book.html HBase: The Definitive Guide, http://bit.ly/hbase-definitive-guide Google Bigtable Paper, http://labs.google.com/papers/bigtable.html Hadoop web site, http://hadoop.apache.org/ Hadoop: The Definitive Guide, http://bit.ly/hadoop-definitive-guide Fallacies of Distributed Computing, http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing HBase lightning talk slides, http://www.slideshare.net/scottleber/hbase-lightningtalk Sample code, https://github.com/sleberknight/basic-hbase-examples
Updated October 11, 2022
by Scott Leberknight
· 19,724 Views · 3 Likes
  • Previous
  • ...
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×