DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Embracing Amazon Redshift as the Future of Data Warehousing

Embracing Amazon Redshift as the Future of Data Warehousing

How Amazon Redshift utilizes its scalability and easy data retrieval to improve big data-focused applications.

Richard Smith user avatar by
Richard Smith
·
May. 25, 16 · Opinion
Like (1)
Save
Tweet
Share
4.37K Views

Join the DZone community and get the full member experience.

Join For Free

When it comes to data warehousing, Amazon Redshift holds its head high. Redshift is categorized as one of the many database services offered within the AWS. This entity is fast, highly relational and flaunts petabyte scale warehousing. The best thing about Amazon Redshift is its cost-effectiveness. Better performance is also guaranteed along with integrations of several BI tools.

Figure 1: Listing out the benefits of Redshift

Quite recently, Redshift had a major showdown against the Azure data lake and it would now be appropriate to learn about each of them in detail. This article explains the benefits and architecture of Amazon Redshift.

Description

We have already described Amazon Redshift as a performant data warehousing service, best known for the cost-effectiveness. The less-expensive nature is due to negligible upfront costs as charges are levied only based on the provisioned resources.

In terms of scalability, Amazon Redshift is extremely potent— allowing users to configure databases with ease. The UI interface is used alongside the mentioned scripts or APIs.

Several data clusters can reside over Amazon Redshift but the existing one is marked 'read only' once new servers are provisioned. Right after provisioning, the existing data sets are copied to the newer servers. DNS is switched to the newly provisioned servers — giving way to a constant loop.

Figure 2: Overviewing the basic structure

Amazon Redshift has the capability of provisioning, adding software patches and even upgrading the concerned OS to a fully functional database. Server health monitoring is pretty constant and even the failed nodes are readily replaced, whenever detected.

Most importantly, the data is kept secured by enabling SSL over the wire i.e. in the active mode. Apart from this, disk storage is also available at Amazon Redshift, with necessary encryption. Encryption offers an unconventional flexibility by securing both backed up data and the one active over the wire. 

Architecture

The basic component of Amazon Redshift is the cluster — considered as a group of few compute nodes. For more than one compute node, a ‘leader node’ is also assigned to the hierarchy.

It is the leader node that takes care of query executions in most cases. In case it exists, communications with the external world is handled only by the Leader. When it comes to client BI tools, it is the ‘leader node’ that is involved.

Upon receiving the query request, the entire query plan is generated by the Leader, and the request is intuitively distributed among the featured compute nodes.

Figure 3: Architecture for Dummies

Coming back to the compute nodes, we get a built-in processing unit and the disk. The fetched queries are executed by each one of them and the result is then transferred to the leader node. Upon consolidating the set of results, the leader processes the outcome. It is to be understood that query distribution is segregated in parallel as all compute nodes are made up of slices and each one has a portion of allocated memory. Again, the quantity of slices is determined by the number of allotted CPUs.

Data Distribution

Amazon Redshift boasts columnar data storage — featuring minimum and maximum values. Data is retrieved upon request using tokens like sort keys. Sort keys are extremely helpful when the databases are searched for exact results. Blocks are saved using the sort keys with each being 1MB in size. Data overlapping between the blocks isn’t even a possibility with Redshift’s architecture as upon request the engine can directly jump onto the desired block.

Interleaved and compound sort keys are featured for helping out the datebase with data segregations into blocks. The latter comprises of multiple columns and helps join secondary and primary ones. However, the query performance drops whenever the primary column is missing,

Next in line is the sort key, which is helpful when multiple queries are concerned. Most larger tables put this sort key to use, and every column is given equal priority.

Both these sort keys bank upon re-indexing for better query performance. However, a compound sort key is assigned as the default entity. Most applications over desktops or smartphones must look to incorporate Redshift as the data warehousing service for the clarity of data retrieval. This will be beneficial as cache storage units can then be freed once the app is uninstalled without hampering the performance of the machine. Applications can retrieve data easily upon request if Amazon Redshift is taken into account.

Importing Data

Suppose you are looking to import data from SQL Server to Amazon Redshift. The first step would be to create tables in SQL. One can then extract data reserves from SQL, via a delimiter. Once coded into a textual form, data needs to be uploaded into the ‘AWS S3’ storage unit. Finally, the copy command is implemented for importing the same to Redshift.

Figure 4: Importing data into Redshift

Featured Syntax is as follows:

copy < table name> from < S3 file path>

credentials ‘aws_access_key_id=<key id>;

aws_secret_access_key=< access key>’

delimiter ‘<Delimiter Character>'

Concept of Client ‘BI Tools’

In terms of client tools, Tableau is something which connects directly to Redshift. Once this is factored or keyed in, the details are shifted over to the 'Redshift Live Data'. Dashboards can be generated using ‘RedShift table schema’. The report will then be ready to be viewed across multiple browsers.

Bottom Line

Redshift is an integral tool when it comes to performance, cost and even scalability. Most importantly, we can easily integrate client ‘BI tools’ with the same. Overall, Amazon Redshift is a well-oiled machine with leader nodes and compute nodes working in cohesion. Sort keys form the backbone and this service comes in with application patches and even the quintessential OS upgrade.

Amazon Redshift Redshift (theory) Data (computing) Relational database Database

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Real-Time Stream Processing With Hazelcast and StreamNative
  • Fraud Detection With Apache Kafka, KSQL, and Apache Flink
  • Top Five Tools for AI-based Test Automation
  • Using JSON Web Encryption (JWE)

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: