Hosting SNOMED CT on Heroku

At Birdie we’re big believers in making the data we receive about older adults’ care as useful as possible in improving their quality of life. That means making it available in the formats that other parts of the health and social care can understand.

SNOMED CT is a lingua franca for encoding facts about patients in the healthcare world.

Among the 300,000+ concepts, there are encodings for drugs, e.g:
322236009 | Product containing precisely paracetamol 500 milligram/1 each conventional release oral tablet (clinical drug) |

procedures, e.g:
17006003 | Replacement of urinary catheter (procedure) |

and disorders, e.g:
75591007 | Fracture of fibula (disorder) |

Our setup

The organisation behind SNOMED CT, SNOMED International helpfully provide a Java application called Snowstorm to use as a search service. There isn’t a central hoster of the API as you might expect with a weather API, or mapping, so you need to be able to host the application yourself.

We opted to investigate using Heroku as a simpler alternative to services like AWS Fargate and we’ve been really pleased with how simple it was to get started. Heroku’s easy to use command line interface also makes it easy to document the steps we took in a post like this!

With a Heroku account setup, the first step was to create a new application and deploy Snowstorm to it. We’ll call the application “snomedct” on Heroku:

> git clone https://github.com/IHTSDO/snowstorm.git
> cd snowstorm
> heroku login
> heroku apps:create snomedct
> heroku git:remote -a snomedct
> git push -f heroku master

Unfortunately, without a backing store, the application crashes on startup. The next step was to add an elasticsearch cluster for storing SNOMED CT concepts

> heroku addons:create bonsai:standard-sm

This gives us an elastic search cluster and automatically sets an environment variable on our application:

BONSAI_URL=https://redacted:redacted@redacted.eu-west-1.bonsaisearch.net:443

> heroku config -a snomedct (credentials have been removed!)

This is unfortunately not the place our application expects to find the elasticsearch endpoint and credentials. As a spring application, we needed to provide it in a different format:

> heroku config:set SPRING_APPLICATION_JSON='{"snowstorm": {"rest-api": {"readonly": true}}, "elasticsearch":{"urls":"https://redacted.eu-west-1.bonsaisearch.net:443", "username": "redacted", "password": "redacted"}}'

And with this, the application started up successfully:

> heroku logs
...
--- Snowstorm startup complete ---

Loading the concepts

Previous experiments with SNOMED CT had shown us that we need a very beefy machine to run the initial import process. 8GB of memory on my laptop wasn’t enough. It also requires uploading and extracting 5GB of unzipped SNOMED CT release files and it was quickly clear that doing this via the Heroku-hosted application was going to be tricky.

Instead, we took the approach of creating a temporary AWS Lightsail virtual machine with 16GB memory. With full SSH access we’d be able to download the release files straight to the machine and not have to worry about uploading them to snowstorm with an HTTP POST request.

After getting a shell on the newly created VM we needed to set it up to build Java applications:

> sudo yum update
> sudo yum remove java-1.7.0-openjdk
> sudo yum install java-1.8.0-openjdk-devel
> sudo yum install git
> sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
> sudo sed -i s/\$releasever/7/g /etc/yum.repos.d/epel-apache-maven.repo
> sudo yum install -y apache-maven

With the VM setup to build the application we could clone it and use the maven build scripts:

> cd ~
> mkdir git
> cd git
> git clone https://github.com/IHTSDO/snowstorm.git
> cd ~/git/snowstorm
> mvn clean package

The next step was to download the SNOMED CT release files from TRUD. Note that the links below will no longer work, you’ll need to log into TRUD yourself to generate new ones which can be found through your browser devtools:

> cd ~
> curl -O https://isd.digital.nhs.uk/artefact/trud3/.../SNOMEDCT2/27.0.0/UK_SCT2CL/uk_sct2cl_27.0.0_20190601000001.zip
> curl -O https://isd.digital.nhs.uk/artefact/trud3/.../SNOMEDCT2/27.1.0/UK_SCT2DR/uk_sct2dr_27.1.0_20190612000001.zip

Similar to configuring the heroku application, we needed to provide the elasticsearch credentials for this instance of snowstorm to talk to the cluster. Note that unlike the heroku instance, we want this one to have read/write access to the cluster:

> export SPRING_APPLICATION_JSON='{"elasticsearch":{"urls":"https://redacted.eu-west-1.bonsaisearch.net:443", "username": "redacted", "password": "redacted"}}'

Finally we could run the import process. This takes a few hours for each file, so best done overnight!

> java -Xms2g -Xmx14g -jar git/snowstorm/target/snowstorm*.jar --delete-indices --import=uk_sct2cl_27.0.0_20190601000001.zip
> java -Xms2g -Xmx14g -jar git/snowstorm/target/snowstorm*.jar --import=uk_sct2dr_27.1.0_20190612000001.zip
> heroku restart --app snomedct

After terminating our expensive AWS Lightsail VM, we can now query our SNOMED CT API:

> curl --silent "https://snomedct.herokuapp.com/MAIN/concepts?term=asthma" | jq .

{
  "items": [
    {
      "conceptId": "278517007",
      "effectiveTime": "20040131",
      "moduleId": "900000000000207008",
      "active": false,
      "pt": {
        "term": "Asthmatic bronchitis",
        "lang": "en",
        "conceptId": "278517007"
      },
      "definitionStatus": "PRIMITIVE",
      "fsn": {
        "term": "Asthmatic bronchitis (disorder)",
        "lang": "en",
        "conceptId": "278517007"
      },
      "id": "278517007"
    },
    ...
  ]
}

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.