Database Fundamentals #18: The SELECT Statement
Database Fundamentals #18: The SELECT Statement
Let's take a look at the introduction of T-SQL SELECT statement and explain the details thoroughly.
Join the DZone community and get the full member experience.Join For Free
Databases are better when they can run themselves. CockroachDB is a SQL database that automates scaling and recovery. Check it out here.
In my previous Database Fundamentals post, I showed you how to use the Query Designer to build a query. That was a SELECT statement. The basic construct of all your SELECT statements will be the same. You're going to define a list of columns, the table or tables you're interested in, and some sort of filter criteria. That's the bare bones basics of how it works. But, as we all know, the devil is in the details. There are lots and lots of details. This section will introduce the T-SQL SELECT statement and start explaining some of those details.
You've been introduced to the basic concepts of the column list in the SELECT statement. It represents the information that is going to be available for display by whatever application issued the query against the database. The basic choices are to use a shortcut for getting everything, an asterisk, referred to as a star, functions for this, or to list the columns. It is a good habit to get into to list the columns. But, so you have a full set of tools, this is how you use the * to select all columns from the Management.Address table:
SELECT * FROM Management.Address;
This will return all columns and all rows for that table. But a good rule to live by is to only move the data you need and only move that data when you need to move it. This means you're going to have a column list in place in most places. This query is more realistic and shows how you can retrieve a limited set of information from the Management.Address table:
SELECT AddressLine1, AddressLine2, City, Province, Country FROM Management.Address;
Now you'll only get the columns that the business is interested in, leaving behind the AddressId and CreateDate columns.
There's more that you can do with the columns in the SELECT list. Let's assume for a moment that the business doesn't like seeing column names without spaces, Address, and Address Line 2, would be the preferred way to display the columns to the business. You could edit your table structure so that it matches the business requirements. SQL Server will allow you to put spaces in the names of columns too. But, you then can only, ever, refer to that column when it is wrapped in brackets like this, [Address Line 2]. I don't know about you, but I'm lazy and that's a lot of extra typing, not to mention making the text more difficult to read with brackets all over the place. What if there was an easy way to change the column names without modifying your data structures? There is, it's called an alias and it's fairly easy to use. This set of code will alias the AddressLine1 column to read 'Address' and the AddressLine2 column to read 'Address Line 2':
SELECT AddressLine1 AS 'Address', AddressLine2 AS 'Address Line 2', City, Province, Country FROM Management.Address;
This will now return the columns with different labels on their heads as you can see in here:
The AS keyword is not required, but it's considered best practice to use it in order to avoid confusion. The confusion can come about because you actually don't have to refer to a column or a table in the database at all in your select list. You can simply supply a string like this:
SELECT ‘My Value’ ‘This is an alias’;
This will result in a single column and row with the column head reading "This is an alias' and the value in the row reading 'My Value' but without that AS keyword, this could be very hard to read. You can combine text like this with columns in a table.
You can also start to format text if you need to. Add a new row to the Management.Address table using this code:
INSERT Management.Address (AddressLine1, City, Province, Country, CreateDate) VALUES ('1329 Prescott Street', 'San Francisco', 'California', 'usa', GETDATE());
Note that the value for the country is in lower case. This could happen and it doesn't really matter, except that the business wants to see all countries in upper case. You've already seen the UPPER function in action modifying data, well you can use it to format the output as well. This query will change all the Country values to upper case:
SELECT AddressLine1 AS 'Address', AddressLine2 AS 'Address Line 2', City, Province, UPPER(Country) AS 'Country' FROM Management.Address;
Because there was a function on the column, the column name would not have been resolved and the business would have seen a blank column head. Instead, I supplied an alias that would have been the same as what they were used to seeing.
There are lots of functions that can be used to modify the SELECT list and they can't all be covered here. One more that you may see quite a lot is the COALESCE function. This allows you to substitute a value where the value would otherwise be NULL. Frequently, seeing NULL confuses people. Since the city of London has no province or county (in the current data), NULL values are supplied there. To change this to N/A we'll modify the query as follows:
SELECT AddressLine1 AS 'Address', AddressLine2 AS 'Address Line 2', City, COALESCE(Province,'N/A') AS 'Province', UPPER(Country) AS 'Country' FROM Management.Address;
The final output of the query will now reflect all the changes made to the original to get things formatted and labeled in a way that the business can be appreciated. You can see the final output here:
The business would like to see one more change. For the output of this query, they'd like the City, Province, and Country to be combined into a single column with the values you've defined so far, but all separated by commas. This type of formatting is fairly common. This query will combine the columns into a new column. The key point to remember here is that all the columns are of the same data type. If they weren't, you would need to convert them, again using functions, so that they were. Here's the last query:
SELECT AddressLine1 AS 'Address', AddressLine2 AS 'Address Line 2', City + ' ' + COALESCE(Province,'N/A') + ', ' + UPPER(Country) AS 'Location' FROM Management.Address;
The output is exactly as you have defined it:
As you can see, there are lots of modifications that can be made to the select list of the SELECT statement. There are also a number of different functions that you can apply to the FROM clause.
The FROM Clause
Basically, the FROM clause is just a list of tables from which data is being selected. However, as with most things, you move past the basics fairly quickly. Up to now, the SELECT statements you've been shown have violated one best practice because I wanted to keep them very simple while you were introduced to the topic. The one best practice that was violated was to always show the table that a column belonged to in the same way you always show the schema that the table belonged to. This saves SQL Server from having to read multiple locations to identify where a column or table belongs. It also makes the code more readable and more understandable once you get into queries with multiple tables. Taking one of the simplest examples, you would need to modify it like this:
SELECT Address.AddressLine1, Address.Address.AddressLine2, Address.City, Address.Province, Address.Country FROM Management.Address;
As you can see though, that makes things rather messy and much more difficult to read. The good news is that you can use an alias for the table as well. This can help you to both meet the requirements of the best practice, and keep your code more readable. The query above could be modified as follows:
SELECT a.AddressLine1, a.AddressLine2, a.City, a.Province, a.Country FROM Management.Address AS a;
Note that the keyword AS is again used to signify the alias. By writing your code this way you can meet the best practice and keep your code cleaner and easier to read.
The FROM clause doesn't have to be a literal table either. There is a concept called a derived table. This is a table defined from another SELECT statement. This is generally more useful when you start to work with joins in the FROM clause, but the concept is more easily explained by itself. A derived table is just a SELECT statement treated as another table. To show a derived table of the same query it would like this:
SELECT a2.* FROM (SELECT a.AddressLine1, a.AddressLine2, a.City, a.Province, a.Country FROM Management.Address AS a) AS a2;
This doesn't become really useful until you are working with aggregation queries or TOP statements or all kinds of more complicated queries that require rearranging the data multiple times to retrieve the result you'd like. We'll cover this more in other Fundamentals blog posts.
Another type of derived table is called a Common Table Expression or CTE. This is a derived table that can be used more than once in the SELECT statement. This example is the same as the one above, but set up with a CTE so that the derived table could be used multiple times:
WITH a2 AS (SELECT a.AddressLine1, a.AddressLine2, a.City, a.Province, a.Country FROM Management.Address AS a) SELECT a2.* FROM a2;
While a CTE can be used multiple times in a query, it can only be used in the query immediately following its definition.
A Few Comments on Formatting
Formatting code really comes down to personal choices. But, there are a few agreed on standards that it's worth pointing out some of them that have been in use. It is common to put the SQL Server keywords into all capital letters. It separates them from all the other objects that you're dealing with. If you see all caps, it's a SQL Server keyword.
Breaking the columns down into individual lines increases the readability of the code because you can discern each and every column independently. This is especially important when you begin to wrap columns in functions and combine them into new columns as you did in the examples above. If you also had to figure out that there were multiple columns in place in addition to the functions, it would make the SELECT list very difficult to read.
Indentation is also provided as a means to separate different clauses from each other so that you can see, readily, where one clause ends and the next begins.
Some of this functionality is supported by SQL Server itself. You'll notice that the Intellisense will use upper case on the keywords that it can supply. Also, if you indent the lines with a tab as you type, they will stay indented as you hit return between each line until you choose to move the cursor to the left with a shift-tab or a backspace. The purpose is to break things up and organize things so that they're more readable. Luckily SQL Server helps you with that.
That is the core of the SELECT statement, what is it that we want, and where is it coming from. There is a lot more to the SELECT statement, and the next post in the series will start exploring that by getting into join operations in the FROM clause to pull data from more than one table.
Published at DZone with permission of Grant Fritchey , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.