Updating Your Local DrugBank Data
Introduction
Using up-to-date data can be the difference between success and failure, which is why DrugBank updates our data daily. After every update, the changes are reflected in our API and in each user's downloads portal. Meaning, it is possible to obtain a new set of files in your chosen format(s) on a regular basis. This short guide will walk you through how to automate the process of pulling DrugBank downloads data.
Managing Your Data Manually
DrugBank data download users can download zipped versions of any format of the DrugBank data they have access to through our downloads portal. The easiest way to manage data is to manually login and download a newer version of the data on a cadence that makes sense. Your downloads portal will contain a table of accessible files, which should look something like this:
Every available file will have a name, format, brief description, and a timestamp for when it was generated. Clicking the bright pink “download” button will download a zip file containing your data to the default location on your computer (e.g., on macOS, it should appear in your “Downloads” folder); alternatively, you can right-click and “save as.”
Once you have downloaded your data, you can then use it for downstream activities; for example, by loading it into a relational database for further use. Files can be downloaded multiple times.
Automating the Process
Another option is to automate the entire process. To make this as easy as possible, we’ve created the means to schedule data pulls (and any downstream processing) to run on a regular cadence (e.g., using cron or similar). In this way, you can ensure your DrugBank data will remain current without any manual intervention.
Instructions for how to pull the data using cURL are provided when you log into the downloads portal (under the heading “getting started with automatic downloads”). These instructions outline the corresponding URLs that are available and the way in which these correspond to each kind of data file available in your downloads portal. These can be called with any standard GET request to pull the latest version of a file. All that you need is the username and password associated with the downloads portal account.
To further assist users in setting up their own automated workflows, we include here a simple download script in Python (written in 3.10.2):
This script will work with a standard Python installation, but does require the user to install the requests library. The required command line arguments include the target export (listed as the "file name" in the downloads portal), the associated username and password, and a target local directory (can be an absolute or relative path).
As an example, to download and unzip a target export from a portal, you can run:
python3 </full/path/to/the/script.py> -d </path/to/target/directory> -e <export name> -u <user name> -p <user password> -t 5 --clean-zip --clean-dirs
Copy to clipboard
The '--clean-zip' flag attempts to remove the downloaded zipfile once its contents have been extracted, while the '--clean-dirs' flags attempt to remove previously unzipped subdirectories within the chosen target directory; both can be ommitted if desired. For a full list of the commandline options, run the script with the ‘-h/--help’ flag.
Best Practices - Protecting Your Credentials
Although the script can be run as shown above, where the portal profile credentials are specified on the command line, this can be insecure. Another option is to set the corresponding environment variables DRUGBANK_EXPORT_NAME, DRUGBANK_USERNAME, and DRUGBANK_PASSWORD. If these are set, the ‘-e’, ‘-u’, and ‘-p’ arguments can be skipped; the script will instead pull the values from your environment. This is beneficial, especially on shared hardware, as these data will not be logged in your shell command history.
Conclusion
In this short guide, we have discussed how to pull DrugBank data, both manually and in a more automatable manner using GET requests. The provided script can be used as-is, or serve as a useful starting point for power users to write their own logic in the language of their choice. Automating data pulls offers a simple way to keep your DrugBank data up-to-date, ensuring you always have the most accurate information.