Jump to content

Data Science

Sign in to follow this  
  • entries
  • comments
  • views

Contributors to this blog

Introduction to SQL

Sign in to follow this  


When you think about handling and processing the huge amount of data, what comes to mind? For many, thoughts of utilizing Python coupled with machine learning algorithms arise. What may not initially come to thought is the notion of using SQL instead. You might be scratching your head at this prospect of using SQL in Data Science or Bioinformatics when there are other alternatives. Or, perhaps, you’re not familiar enough with this particular language to jump to a conclusion. The fact of the matter is that SQL is a programming language you should familiarize yourself with if you’re looking to jump into the world of big data.

So why would somebody use SQL instead of the many alternatives? Simply put, SQL provides simplicity and robustness that you can seldom find anywhere else. Add to the equation that Data Science careers sometimes require more than handling big data. A big skill set that one can have is the ability to conduct database management on web applications; a feat for SQL and an RDMS (we’ll discuss this a little bit later.)

Database Introduction (Introduction to SQL)

For a thorough understanding of SQL and its potential role in Data Science, some basics are needed including an introduction to databases. First off, what exactly is a database? For simplicity, a database is just an organized collection of data. Within this collection, we have even more organization in the form of tables. Tables have specific bits of information stored inside them, and within these tables, there are individual columns that have even more specificity to them.

All of this may seem a little confusing, so we’ll go ahead and see a table, called “table1,” inside a database to clear things up.

Introduction to SQL

(Word of caution:  The database table and associated column names were created with simplicity in mind. You’ll probably never run across a table titled “table1,” or a column titled “address,” especially when dealing with large databases.) 

This sample table contains some information about a fabricated client base. In our actual database, there will be other tables that contain more relevant information, but let’s pretend this is all we need for the moment. As we have mentioned, each column contains a particular characteristic and here we can see the values of clientID, address, city and state.

Every single row in our table contains specific data (in this case, a particular client), whereas the columns include universal values or traits.

SQL is the language, while a Database Management System (DBMS) is the software that contains and manages the data.

Something that a lot of people get confused with is when they hear things about MySQL, SQLite or NoSQL and don’t quite understand its relation to SQL. In our example, we showcased our database table inside a simple Excel file. In a real world example, your data will more than likely be stored in some other software dedicated to database management. This idea is what we refer to as Database Management Software, or DBMS for short. MySQL, SQLite, and NoSQL are all examples of DBMS. You should not worry about mastering DBMS’ until you get the hang of SQL itself. Most DBMS for SQL follow the same protocols, with some minor changes that you can learn later.

A particular kind of DBMS named a Relational Database Management System (RDMS) and uses a specific type of modeling called a relational model. The RDMS, in particular, is called MySQL and is a popular database choice for websites. In fact, it’s what our site uses for database management.

If you are lost with some of the technical jargon, just remember: SQL is the programming language, and a DBMS is the database system we will be using to manage our data. 

Where Can I Practice My Code?

Since we aren’t going to go in-depth with the DBMS, you’re probably wondering how you’ll be playing around with SQL code. There are some programs you can download to do so, or you can utilize this free online SQL interpreter. This website allows you to test out some basic SQL code without having to download more complicated DBMS’.

The Almighty Select Statement

In any SQL or SQL in Data Science course, the first statement you’ll learn is select. It’s quite simple as to why this is the first statement you hear.  In SQL, the objective is to alter and configure the database to suit your needs. To do this, you must select certain attributes and do something with them. However, just selecting something won’t do much. You’ll need to pick something, tell the system where you are selecting it from and then what you want to do with it. This concept of actually doing something with the data leads us to the from statement. Whenever you utilize select, you’ll almost always use from as well. Take a look at the following:

Say, for example; you want to choose all of the client’s addresses from the table example from above.  You would utilize the following SQL statement (given that the table is named table1):

Select address From table1;

The results would yield you all of the addresses for every single row. You could use the same structure if you wanted to pull all of the ClientID’s or any of the other columns from the table.

Don’t Forget The Semi-Colon

The semi-colon in SQL denotes termination, so you’ll need to place it at the end of your SQL statement.

The Where Clause Makes Things Happen

While the select and from statements allow you to pick which specific data you want to handle, the where statement allows you to conduct the altering itself. There are numerous operators that the where clause can use to help you make things happen.

  1. Using the where clause for equality, non-equality, and showing greater than or less than.  (=, <>, >, <.)
  2. Using the where clause to join other clauses.
  3. Using the where clause for mathematical calculations.
  4. Using the where clause to group data.

The list goes on and on for how you can incorporate where into SQL, but just remember that it allows you to make things happen within SQL.

Taking the Basics and Applying it to Data

Our introduction to SQL was overly simplified and was aimed at providing a very brief introduction to those who have never used SQL. If you’re eager to learn more, we recommend a free MOOC provided by Stanford Lagunita.  I’ve personally taken this course, and it’s an excellent introduction to SQL!

We touched at the beginning of this article about some of the ways you might be using SQL in Data Science or Bioinformatics careers. Ultimately, it does depend on the particular job. You may never touch SQL once you are in your future career. Or, on the contrary, you may find that your position uses SQL extensively alongside other programming languages and software. 

Due to the unexpected nature of whether or not you’re going to need SQL, it’s worth a shot to know it at some level. At the minimum, I would recommend you at least have a basic understanding of SQL and how to do simple database analysis and alterations. Again, you never know if this can come in handy.

Some careers may also not care about the specific programming languages, as long as you can conduct the data analysis that they need. Whether you’re using Python, Perl or SQL doesn’t matter nearly as much as whether or not you can perform the tasks. 

Final Thoughts

If you have used any other SQL MOOC’s or have any useful materials for beginners to grasp this programming language, feel free to post the resources below! Be sure also to join us over at the YDSOA Community Forums as we discuss a wide variety of related topics!

Sign in to follow this  


Recommended Comments

Seems like most job descriptions want something like Python and SQL, so it's worth learning at this point imo.

Share this comment

Link to comment
On 6/10/2017 at 2:03 AM, Guest Reid said:

Seems like most job descriptions want something like Python and SQL, so it's worth learning at this point imo.

Thanks for the comment Reid! If you haven't considered it yet, I highly recommend becoming a YDSOA member. It's completely free and there are a lot of great benefits! If you do decide, you can create one by clicking this link

Thanks again!

Share this comment

Link to comment

What courses or books you recommend Basil?

I'm starting 'MySQL for Python' from Albert Lukaszewski.

I will implement this new ability in a project, it's how I usually learn ;)

Any other tip?

Share this comment

Link to comment
12 hours ago, George Carvalho said:

What courses or books you recommend Basil?

I'm starting 'MySQL for Python' from Albert Lukaszewski.

I will implement this new ability in a project, it's how I usually learn ;)

Any other tip?

Hi George,

There's a free online SQL course provided by Stanford Lagunita - I highly recommend it (you also get a certificate afterwards that you can add to your resume.) You can access the course here. As for books, one that I used when first learning SQL and like a lot is called "SQL in 10 Minutes, Sams Teach Yourself." The kindle version of the book costs around $14. 

Also.. implementing new abilities in projects is an excellent way to really get an in-depth understanding of the subject! 

Hope that helps! :)

(P.S. Thanks for registering to YDSOA! Please feel free to use the YDSOA Forums if you have any questions, and join in on the discussion!) 



Share this comment

Link to comment
You are commenting as a guest. If you have an account, please sign in.
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.