Loading DrugBank SQL Files

Introduction

What is SQL?

Structured Query Language (SQL, sometimes pronounced “sequel”) is a domain-specific language (DSL) for working with relational databases. Different database vendors offer their implementation of the SQL standard; as such, there is usually non-overlapping functionality between these various implementations.

If you are unfamiliar with SQL or just need a refresher, there is a good online tutorial site that also allows you to run queries within your browser against a test dataset. For specific documentation regarding database setup and use, consult your database vendor's documentation.

DrugBank SQL support

DrugBank offers CSV-formatted data downloads with specific `.sql` files for loading the data into three SQL implementations: MySQL, PostgreSQL, and Microsoft SQL Server (MS SQL). If you use a different implementation, you may have to write custom logic to load the included CSV files.

Loading the Data

Loading the data manually

For each supported implementation, there will be a corresponding zip file available from a user's downloads portal. This zip file includes several CSV files (along with corresponding `.fmt` files for MS SQL) and four `.sql` files. These latter files include all the commands required to perform various operations and can be executed as SQL scripts from the database management system itself.

To load the data, the 'create_schema,' 'load_tables,' and 'add_constraints' files should be executed (in order). The final script, 'drop_tables,' removes all DrugBank tables from the database and can be used to clean the database before loading newer data. As such, the data can be added to any supported database type by executing each script. However, it is often desirable to automate this process; this is covered in the following sections.

A simple loading script

In this part of the guide, we focus on the most popular of our supported SQL implementations: MySQL and PostgreSQL. We provide a simple script in python 3 (written in version 3.10.2) that will accept either the starting zip file or a previously unzipped file directory. It optionally drops the database (for subsequent loads) and executes the scripts to create the schema, load the tables, and add all the key constraints:

For example, to load a MySQL database on your local machine with port 3306, you would run the following:

python3 </full/path/to/the/script.py> -z </path/to/zipfile> -t 3306 -u <user name> -p <user password> –-type MySQL
Copy to clipboard

To see all of the supported command line options, along with their possible and default values, simply call the script with the help option (either '-h' or '--help'). If you are unsure of the port to specify, start with the default (3306 for MySQL and 5432 for PostgreSQL).

We will explore a few notable points for each database implementation below, which will be most valuable for users looking to implement custom scripts. If you plan to modify the provided script or implement your own custom script, it is recommended to read the implementation notes below. Otherwise, feel free to skip to the section on considerations.

A note on connections

Regardless of where the SQL server you are using lives (on the same machine or over a network connection), you must specify some parameters to connect. At a minimum, this will be the host address and exposed port, but it should also include a user account name and password. The provided script assumes a nonlocal server connection but should work if your SQL server is hosted locally.

Implementation Notes

Notable points - MySQL

While the MySQL loader is relatively straightforward, it does have some important aspects that are worth mentioning. The first is that, as opposed to other command line arguments, the password must be specified either:

using the '-p' command, in which case the password must follow immediately after with no space in between (i.e., '-p<password>')
using the '--password' command, in which case the flag must be followed by an equals sign and the password, again with no spaces (i.e., '--password=<password>')

The other point is that the script toggles two environment variables, 'local_infile' and 'FOREIGN_KEY_CHECKS.' Setting 'local_infile' to True allows MySQL to accept data from files not hosted on the same server as the SQL server. Setting 'FOREIGN_KEY_CHECKS' to False allows for missing foreign keys, which is required to load the data. The script resets each of these to their defaults following execution (regardless of whether an error occurred) and should be considered a best practice.

A MySQL recommendation

Although MySQL is common and freely available, we recommend using MariaDB as a MySQL-like implementation. The script will work with MariaDB and we have seen that certain functionality, such as adding the '--drop' flag, works better with MariaDB containers than MySQL containers (see the below section on Docker).

Notable points - PostgreSQL

The PostgreSQL loader contains some additional logic compared to the MySQL loader. Similar to MySQL, specifying a password in the command line interface differs from most other options. In PostgreSQL, the easiest way to do this is to set the 'PGPASSWORD' environment variable. Although it would be possible to set this globally, the script sets it and passes it as an environment variable for each new process spawned by the 'subprocess' module.

In addition, PostgreSQL does not support the 'EXISTS' keyword in the 'CREATE DATABASE' statement (as of version 15.1). Therefore, the more complicated statement 'SELECT 'CREATE DATABASE' ”<database>”' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '<database>')\gexec' must be used instead. This query uses a WHERE clause on the list of existing databases to filter a SELECT statement that would create the database. Also, note the use of double quotes around the database name in the initial CREATE statement, which is required for case sensitivity.

The largest difference is how PostgreSQL handles non-local input files. Rather than copying data using normal SQL commands (COPY FROM), it is required to use the `\copy` meta-command. There is a large function ('_fix_postgresql_load_file') that converts the supplied 'load_tables' script into a script that can run over a nonlocal database connection. The most notable point about the `\copy` meta-command is that the whole command should appear on a single line within the file as it is terminated by the newline character rather than a semicolon.

Other SQL implementations

Although we have covered only MySQL and PostgreSQL here, the pattern of creating the schema, loading tables, and then adding constraints should apply to other SQL implementations. Furthermore, the provided python code can be repurposed or modified to fit specific requirements.

Considerations

General

The DrugBank database is fairly large (on the order of a few gigabytes, depending on the exact data included). As such, it is recommended to remove any local files that are no longer required after having been loaded. It is also the case that the large volume of data can lead to long execution times for the 'load_tables' SQL file.

For these reasons, you should consider how often to download/load new database versions from your download portal based on your specific use case. For applications that require the most up-to-date data possible, a daily cadence is reasonable. In contrast, for less sensitive applications, it might be possible to perform these operations on a weekly, monthly, or similar cadence.

A note on passwords

For simplicity, the provided script takes the user password as a command line input. In general, this is not a best practice. Different SQL implementations have mechanisms to use configuration files or environment variables in place of this. Depending on your setup, you may wish to alter the script to remove command line password passing and rely on one of these other mechanisms instead.

A note on permissions

When running the scripts, you may run into an error along the lines of “Access denied; you need (at least one of) the SUPER privilege(s) for this operation.” An error such as this indicates that the account specified in the script has insufficient privileges to create or modify tables within the database. If you run the script to load data into a local database, it should be sufficient to run using the 'root' user (be sure to use the right password). Alternatively, you could also grant your account super privileges (this should be covered in the documentation for your specific SQL implementation). If, instead, you are attempting to load data into a shared database, contact your administrator to discuss the best approach.

Using Docker to run your database

One common mechanism for setting up database instances is Docker, a popular application for deploying containerized applications. As an example, it is quite easy to get a version of MariaDB up and running from its official docker image. Adding a file named 'docker-compose.yml' to a directory with the following content:

version: '3.8'
services:
  db:
    image: mariadb:10.9.5
    ports:
      - 3306:3306
    volumes:
      - ./data:/var/lib/mysql
    env_file:
      - .maria_env
Copy to clipboard

and a '.maria_env' file to the same directory with the following content:

MARIADB_ROOT_PASSWORD: <your root password here>
MARIADB_USER: <your user name here>
MARIADB_PASSWORD: <your user password here>
Copy to clipboard

where the username and password are whatever you wish them to be. These variables should be sufficient to spin up the container, but should you want to provide more customization, check this page. Once these files are created, you can create a container running MariaDB by running:

docker-compose up -d
Copy to clipboard

This command will pull the image (if you do not already have it locally) and then build a container using the 'docker-compose.yml' file as instructions. You can also spin up a container without needing a compose file (see the MariaDB documentation on DockerHub); the compose file is useful when connecting your database to other services within the container.

As a quick note, the compose file above will map the container's MySQL file directory to a local subdirectory named 'data' (this is the 'volumes' declaration; the specified directory on your machine will be created if it does not already exist). This is useful for facilitating communication between your local file system and that of the container, and is useful if you need to load the files from within the container itself (see below). However, this line is not required and can be omitted if desired.

The provided script will work with Dockerized versions of MySQL and PostgreSQL - just use the correct host and exposed port (127.0.0.1 and 3306 if you copied the compose file above). If, however, you wish to execute the provided SQL scripts within the container, you must include all the files in the shared volume and then connect to the container itself (e.g., by using `docker exec -it`). This is useful if, for example, you do not have the corresponding command line client installed locally.

Conclusion

SQL is a powerful query language allowing users to query and analyze DrugBank data. This article provided an overview of the SQL support currently available within DrugBank, along with a sample script to load these data in MySQL and PostgreSQL.

Implementation

Loading DrugBank SQL Data

Documentation

Implementation

Documentation

Tutorials

Loading DrugBank SQL Files

Introduction

What is SQL?

DrugBank SQL support

Loading the Data

Loading the data manually

A simple loading script

A note on connections

Implementation Notes

Notable points - MySQL

A MySQL recommendation

Notable points - PostgreSQL

Other SQL implementations

Considerations

General

A note on passwords

A note on permissions

Using Docker to run your database

Conclusion

Drug Datasets

Structured drug data for data science & ML

Drug Datasets

Free drug data for students & profs

Clinical API

Clinical intelligence tool for your software

Drug Search

Customizable drug search options

Drug-Drug Interaction Checker

Search for drug interactions with our API

Drug Allergy

Get drug allergy and cross-sensitivities info

US Drug Labels

Integrate drug manufacturer information

Drug Discovery & Repurposing

Speed up & uncover bigger discoveries with ML

In Silico Testing

Validate targets quickly & accurately

Precision Medicine

Clinical decision support for your applications

Clinical Trial Matching

Match patients to emerging trials faster

Telehealth

Power your remote care software

Electronic Medical Records

Empower providers with reliable drug information

Help Center

Got questions? We've got answers

Blog

Front-page news and deep-dive content

Customer Stories

Learn how our customers are changing the world

Books & Webinars

Check out hours of guides, books, and more

Publications Directory

Growing list of DrugBank-cited publications

General

Downloads Support

API Support

Articles

Implementation

Loading DrugBank SQL Data

Documentation

Implementation

Documentation

Tutorials

Loading DrugBank SQL Files

Introduction

What is SQL?

DrugBank SQL support

Loading the Data

Loading the data manually

A simple loading script

A note on connections

Implementation Notes

Notable points - MySQL

A MySQL recommendation

Notable points - PostgreSQL

Other SQL implementations

Considerations

General

A note on passwords

A note on permissions

Using Docker to run your database

Conclusion