Working with MS SQL Server in Julia

SQL databases are one of the most widely deployed software around the world. They form the backbone of numerous applications ranging from business data analytics to weather forecasting. A number of client-server implementations currently exist, and Microsoft's SQL Server happens to be one of them. The fully-featured developer edition is available for free. It runs on Windows, Linux and via Docker.
Data scientists often need to interact with data stored in SQL databases. While it's easy to find guides on how to do this with languages such as Python, tutorials for Julia are quite scarce. Therefore, in this article, I will focus on how to work with SQL Server using Julia. Example code is generated using a Pluto notebook with Julia 1.9.1 running on Linux (Elementary OS).
Prerequisites
- SQL Server 2022
You need to have a SQL server running locally. Easiest way to set it up is via Docker. Instructions for SQL Server 2022 are given here. To verify if the docker container is running, use the following command:
watch -n 2 sudo docker ps -a
This will update every 2 s, and the STATUS column should show something like ‘Up X minutes' where X is the time elapsed from when the container was started.
- Microsoft ODBC driver 17 for Linux
Instructions are given here. I was unable to connect to the database using the newer driver 18, hence cannot recommend to use that one.
- sqlcmd utility (optional)
The sqlcmd utility lets you enter Transact-SQL statements, and is great to test if everything is working as expected. Follow instructions here.
Loading packages
The following Julia packages will be needed. When using a Pluto notebook, it's built-in package manager will automatically download and install them for you.
using ODBC, DBInterface, DataFrames
Check drivers
Open Database Connectivity (ODBC) drivers allow us to make connections to the Sql server. Using the ODBC.jl package, we can check the currently available drivers on our system:

It is also possible to install a driver once it's location is known.

To remove a driver, use:

Add connection
Using a full connection string, we can now connect to the locally running Sql Server, which was set up previously. The IP address, port, existing database name, user ID and password are needed. Note that in case the database name is unknown, we can connect to ‘master' as this name always exists by default.

List all existing databases
Using the conn_master object, we can now execute queries on the server. Let's list all the databases.

Create a new database
In order to create a new database, we should first check if the name already exists using the list_db function. If not, then we create it as shown below with ‘FruitsDB' as an example.

Listing all the databases again, we can verify that ‘FruitsDB' has now been created.

Create a new table
SQL Server databases can contain a number of tables, which are simply an ordered collection of data. A table itself is a collection of rows, also known as records. Before we can start populating a table, we first need to create it within an existing database. As an example, let's create a table called ‘Price_and_Origin' within ‘FruitsDB'. This table will contain three columns – Name (String), Price (Float) and Origin (String). Note that VARCHAR(50) is used to denote variable-size string data. 50 is the size in bytes, and for single-byte encoding it also represents the length of the string.
Add to new table
Once a table exists, we can add data to it. Easiest way is to use DataFrame as the source. Remember that our table ‘Price_and_Origin' expects three columns with name, price and origin. Therefore, we can use some dummy data as shown below:

To insert values, we can make use of DBInterface.executemany function, which allows passing multiple values in sequence. This can be done as shown in the function below. The finally clause ensures that database connection is closed using the DBInterface.close! function. This is generally a good practice, which helps avoid accidentally reusing the same connection for something else.

Let's verify if the database got populated as we had expected. We first set up a connection ‘conn_fruit' to connect to ‘FruitsDB' on the SQL Server. Then we can select all entries from the table ‘Price_and_Origin' and pass it to a DataFrame sink.

Updating a table
Following the same sequence as shown in the previous section, the database can now be updated with new data.

Let's verify if the new data is indeed present within the database.

Removing duplicates
Re-executing the add_to_fruit_table function above again would add duplicate rows to the table.

Using a common table expression (CTE), we can delete duplicate rows from a given table. The following function helps us achieve this:

Check if the rows are unique.

Delete records
It is often needed to remove entries (matching a certain condition) from the table within a database. For example, we can remove all fruits whose price is > 95 as shown below:

Delete table
Using the DROP statement within DBInterface.execute function, a table can be deleted. Rest of the function will remain the same as delete_rows.
DBInterface.execute(conn_db,
"DROP TABLE $table_name")
Conclusion
The DBInterface.execute function accepts valid SQL statements as an input. It is therefore possible to execute all kinds of queries as outlined here in addition to what has already been presented. As shown earlier, the results of a query can be easily passed to a Julia DataFrame sink, which can then be used to perform additional operations.
The packages ODBC.jl and DBInterface.jl are being actively maintained and seem to integrate well with existing workflows especially if they involve the use of DataFrames. This opens up exciting new possibilities for performing data analysis and visualization using Julia. I hope you found this exercise useful. Thank you for your time! Connect with me on LinkedIn or visit my Web 3.0 powered website.