Category: Cloud

  • Hosting SNOMED CT on Heroku

    Hosting SNOMED CT on Heroku

    At Birdie we’re big believers in making the data we receive about older adults’ care as useful as possible in improving their quality of life. That means making it available in the formats that other parts of the health and social care can understand.

    SNOMED CT is a lingua franca for encoding facts about patients in the healthcare world.

    Among the 300,000+ concepts, there are encodings for drugs, e.g:
    322236009 | Product containing precisely paracetamol 500 milligram/1 each conventional release oral tablet (clinical drug) |

    procedures, e.g:
    17006003 | Replacement of urinary catheter (procedure) |

    and disorders, e.g:
    75591007 | Fracture of fibula (disorder) |

    Our setup

    The organisation behind SNOMED CT, SNOMED International helpfully provide a Java application called Snowstorm to use as a search service. There isn’t a central hoster of the API as you might expect with a weather API, or mapping, so you need to be able to host the application yourself.

    We opted to investigate using Heroku as a simpler alternative to services like AWS Fargate and we’ve been really pleased with how simple it was to get started. Heroku’s easy to use command line interface also makes it easy to document the steps we took in a post like this!

    With a Heroku account setup, the first step was to create a new application and deploy Snowstorm to it. We’ll call the application “snomedct” on Heroku:

    > git clone https://github.com/IHTSDO/snowstorm.git
    > cd snowstorm
    > heroku login
    > heroku apps:create snomedct
    > heroku git:remote -a snomedct
    > git push -f heroku master

    Unfortunately, without a backing store, the application crashes on startup. The next step was to add an elasticsearch cluster for storing SNOMED CT concepts

    > heroku addons:create bonsai:standard-sm

    This gives us an elastic search cluster and automatically sets an environment variable on our application:

    BONSAI_URL=https://redacted:redacted@redacted.eu-west-1.bonsaisearch.net:443

    > heroku config -a snomedct (credentials have been removed!)

    This is unfortunately not the place our application expects to find the elasticsearch endpoint and credentials. As a spring application, we needed to provide it in a different format:

    > heroku config:set SPRING_APPLICATION_JSON='{"snowstorm": {"rest-api": {"readonly": true}}, "elasticsearch":{"urls":"https://redacted.eu-west-1.bonsaisearch.net:443", "username": "redacted", "password": "redacted"}}'

    And with this, the application started up successfully:

    > heroku logs
    ...
    --- Snowstorm startup complete ---

    Loading the concepts

    Previous experiments with SNOMED CT had shown us that we need a very beefy machine to run the initial import process. 8GB of memory on my laptop wasn’t enough. It also requires uploading and extracting 5GB of unzipped SNOMED CT release files and it was quickly clear that doing this via the Heroku-hosted application was going to be tricky.

    Instead, we took the approach of creating a temporary AWS Lightsail virtual machine with 16GB memory. With full SSH access we’d be able to download the release files straight to the machine and not have to worry about uploading them to snowstorm with an HTTP POST request.

    After getting a shell on the newly created VM we needed to set it up to build Java applications:

    > sudo yum update
    > sudo yum remove java-1.7.0-openjdk
    > sudo yum install java-1.8.0-openjdk-devel
    > sudo yum install git
    > sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
    > sudo sed -i s/\$releasever/7/g /etc/yum.repos.d/epel-apache-maven.repo
    > sudo yum install -y apache-maven

    With the VM setup to build the application we could clone it and use the maven build scripts:

    > cd ~
    > mkdir git
    > cd git
    > git clone https://github.com/IHTSDO/snowstorm.git
    > cd ~/git/snowstorm
    > mvn clean package

    The next step was to download the SNOMED CT release files from TRUD. Note that the links below will no longer work, you’ll need to log into TRUD yourself to generate new ones which can be found through your browser devtools:

    > cd ~
    > curl -O https://isd.digital.nhs.uk/artefact/trud3/.../SNOMEDCT2/27.0.0/UK_SCT2CL/uk_sct2cl_27.0.0_20190601000001.zip
    > curl -O https://isd.digital.nhs.uk/artefact/trud3/.../SNOMEDCT2/27.1.0/UK_SCT2DR/uk_sct2dr_27.1.0_20190612000001.zip

    Similar to configuring the heroku application, we needed to provide the elasticsearch credentials for this instance of snowstorm to talk to the cluster. Note that unlike the heroku instance, we want this one to have read/write access to the cluster:

    > export SPRING_APPLICATION_JSON='{"elasticsearch":{"urls":"https://redacted.eu-west-1.bonsaisearch.net:443", "username": "redacted", "password": "redacted"}}'

    Finally we could run the import process. This takes a few hours for each file, so best done overnight!

    > java -Xms2g -Xmx14g -jar git/snowstorm/target/snowstorm*.jar --delete-indices --import=uk_sct2cl_27.0.0_20190601000001.zip
    > java -Xms2g -Xmx14g -jar git/snowstorm/target/snowstorm*.jar --import=uk_sct2dr_27.1.0_20190612000001.zip
    > heroku restart --app snomedct

    After terminating our expensive AWS Lightsail VM, we can now query our SNOMED CT API:

    > curl --silent "https://snomedct.herokuapp.com/MAIN/concepts?term=asthma" | jq .
    
    {
      "items": [
        {
          "conceptId": "278517007",
          "effectiveTime": "20040131",
          "moduleId": "900000000000207008",
          "active": false,
          "pt": {
            "term": "Asthmatic bronchitis",
            "lang": "en",
            "conceptId": "278517007"
          },
          "definitionStatus": "PRIMITIVE",
          "fsn": {
            "term": "Asthmatic bronchitis (disorder)",
            "lang": "en",
            "conceptId": "278517007"
          },
          "id": "278517007"
        },
        ...
      ]
    }
  • First steps with Azure CLI

    This guide uses Azure’s cross platform CLI to create and destroy a virtual machine.

    First things first, lets define some basic properties about the machine such as the adminstrator credentials and its hostname, which will be given the suffix of cloudapp.net:

    HOST=cccu-dev-adfs
    USERNAME=djb
    PASSWORD=Password1!
    

    To make changes to your Azure account you’ll need to authenticate yourself using an X509 certificate. This can be generated by running the following command, which will launch your web browser:

    azure account download
    

    Once the certificate is downloaded you need to register it with the command line tool:

    azure account import "~/Downloads/credentials.publishsettings"
    

    Azure uses images (or templates) of operating systems with varying applications preinstalled. To create a new VM you need to pick an image, but these are identified by mixtures of random hexidecimal and textual description. To find the right identifier you can use another API call to search images by their human readable label. The following command downloads the list as json and uses the jq command line tool to find the right json object and return its identifier:

    SELECTOR='.[] | select(.label | contains("Windows Server 2012 R2 Datacenter")) | .name'
    IMAGE=`azure vm image list --json | jq -r -c $SELECTOR | tail -n 1`
    

    You can now create and start your new server:

    azure vm create $HOST $IMAGE $USERNAME $PASSWORD \
        --rdp \
        --location "West Europe" \
        --vm-size Medium
    azure vm start $HOST
    

    By default there won’t be any endpoints created on the Azure load balancer, save for 3389 used by RDP. To access a web server running on your new server use vm endpoint:

    azure vm endpoint create $HOST 80
    

    When you’re finished with the VM you’ll want to delete it to save a few pennies. Use the vm delete command and optionally also remove the VM’s disk storage:

    azure vm delete $HOST --blob-delete --quiet