SQL Tutorial: Improving Query Performance with SQL Indexes
This guide walks through: what database indexes are, how they are used to improve performance, and when they should be used when managing a SQL database.
In this guide, we’re going to discuss indexes in sequel. I thought quite a bit about the best way to understand indexes because if you’ve never worked with them before they may seem a little bit foreign. I think this is one of the better ways of thinking about it.
I have a set of text lines right here.
You could imagine that each one of these items is actually a record in the database. I wanted to kind of strip out the entire idea of showing my sequel workbench and all of the things around it because those things were all distracting when it came to understanding just the basic high-level concept of what an index does will go into what an index actually is right after this.
Let’s imagine that we have a ton of records so you can see that we have record after record after record. And let’s imagine a scenario where we need to find a specific record. Well, if we do not have an index then what we’re essentially going to be doing or what’s going to be happening behind the scenes is my sequels going to say “okay, is it this post? No. Is it this one? No. This one? No.”
It’s going to keep on going down to check every single post and then who knows how long it’s going to be until you find the one you want. If you had a scenario where you had five million posts and records in your database you’re going to be sifting through a lot of content. And so this is going to be really annoying.
I’m actually looking for this line.
This is the one line I put in there it’s called “what I am really looking for.” If I were to search through my database for “what I am really looking for” this could be incredibly slow when it comes to finding it. That’s what I wanted you to think about was why indexes are even necessary. And it comes down to the fact that an index will allow us to speed up this process because right now we have to look line by line. But what happens if I say I want to see numbers here? So each one of these now has a number by it
Which means it has a reference point you could say it has an index. Now if I go all the way down I can see that this line of code or this post I was looking for is on line 317. What this means for me is when I’m searching for these values I don’t need to think of just trying to find where is this one record? I can actually just skip down to the exact spot that I want this makes it much more efficient to be able to find what I’m looking for. I don’t have to go line by line because the system essentially already has a mechanism for being able to search through it.
Now the true mechanism behind this gets a little more into some more complex algorithms and data structures and I’ll show where you can find what that is. But what it allows the system to do is instead of just trying to go line by line and find it in a linear fashion. It can actually go and skip ahead and ignore a huge chunk of the records that are not applicable just by having this index so you could think of an index essentially as a reference point. In sequel workbench let’s go and see our indexes because believe it or not you’ve actually been using indexes this entire course and many of the indexes are actually created by default. Especially when you use tools like my sequel workbench.
If I click on addresses and then click on the little inspect icon
this is going to bring up all of the different data points that we’ve seen before. But you may notice here we have this indexes tab.
If I click on this it’s going to show all of the columns in the table. But it’s also going to show all of the indexes that we have in the table. So you can see that by default whenever you create a table and you define a primary key that is going to create an index and right here we have an index of primary key and we know that it is unique and so right here you can see that he even has a name of unique.
And here is the primary one and we can see it’s address is underscore IED so there are two indexes for here. And then we also have one other index we have a index on the foreign key. So by default when we define a foreign key in our table my sequel automatically creates an index for it. And so the reason why we do that is because whenever you’re running joins if you do not have an index set on that foreign key your join query is going to be incredibly slow because it’s going to not just have to go to join the tables together it’s going to go have to perform that mapping every single time which will be very slow. So by having an index it allows you to very quickly and easily be able to search through it and all of this is happening behind the scenes. That’s what I want to reiterate these indexes are happening without you in many cases even having to do anything whatsoever. We’re able to simply take advantage of the performance benefits it has. But it is still important to understand what they are because there are times where you want to have indexes for columns that aren’t just primary keys and aren’t just foreign keys.
Imagine that you wanted to very quickly be able to find postal codes. That’s something that I could see being very practical. Well, here in our addresses table we have this postal code. Now it is a varchar and I will let you know that many of our varchars aren’t going to have the same kind of performance with an index just because they’re string-based values but that doesn’t mean you can’t use them and you actually can come here and say create an index for selected columns
and you can pick whatever algorithm you want the default one is typically fine.
Now locking this is an important thing to think about. So locking what this means is the default which allows for as much currency as possible. And this means that when this index is being utilized are we locking the record. So, in other words, say that you have one user who goes and makes a change to one of the addresses and then almost simultaneously you have another user that runs a query on that.
So the first user goes and he changes the address from New York to Arizona. Another user almost at the same time runs a query to see how many users live in New York. Well if you don’t have locking in place then you’re going to run into an issue because you may have that second user run a query and the number may be off. That may not seem like a big deal but imagine if that same kind of thing would happen with an ATM and a bank transfer occurs and a user tries to pull out money. You could have a lot of kind of messy components going into that so locking is something usually you can just use the default locking scheme and you’d be perfectly fine. Almost always you’ll be able to be able to use these.
Now, non-unique is what we’d have to go here. You also have the ability to do unique which would be a horrible idea for a postal code and then full text. Now full text is what you would do if you were building a search engine kind of feature here. I don’t think that that’s going to be an issue just the regular. All the defaults are perfectly fine. And then you can click Create and it will generate an index for you.
Now, this is one of the very first questions I get whenever I am teaching about indexes is “If indexes make it so much faster to be able to query data. Why wouldn’t we just want to index every single one of the columns?” And that if you’ve never worked with indexes that’s a perfectly normal question to ask. And the reason why you would not want to do that is because for every index you have every time that you make an insert or an update what happens is not only does it update the table it also has to go and then update every single one of the indexes you have.
So for just a few indexes on a table that’s not an issue but if say you had five columns and you went and you had to update all five of those columns that could lead to very slow right speed. If you’re updating a column if you’re adding a row or updating a row or adding a new one then you’re going to maybe run into some latency whenever you’re performing those type of tasks. The rule of thumb is whenever you want to have or whenever you have a column that gets searched very often. For example a postal code or some type of search index like an e-mail address or something like that. Those can be good for indexing obviously all your primary keys and your foreign keys. You want to index those regularly. And so that is you know that’s kind of the common pattern and a good rule of thumb.
Now that is the process by which you would create one. And so I hope this has been a good kind of walkthrough on indexes and how you can generate them why they’re important when to use them when not to use them and how you can have a high-level view of what they are in databases.