Blog

  • Hosting SNOMED CT on Heroku

    Hosting SNOMED CT on Heroku

    At Birdie we’re big believers in making the data we receive about older adults’ care as useful as possible in improving their quality of life. That means making it available in the formats that other parts of the health and social care can understand.

    SNOMED CT is a lingua franca for encoding facts about patients in the healthcare world.

    Among the 300,000+ concepts, there are encodings for drugs, e.g:
    322236009 | Product containing precisely paracetamol 500 milligram/1 each conventional release oral tablet (clinical drug) |

    procedures, e.g:
    17006003 | Replacement of urinary catheter (procedure) |

    and disorders, e.g:
    75591007 | Fracture of fibula (disorder) |

    Our setup

    The organisation behind SNOMED CT, SNOMED International helpfully provide a Java application called Snowstorm to use as a search service. There isn’t a central hoster of the API as you might expect with a weather API, or mapping, so you need to be able to host the application yourself.

    We opted to investigate using Heroku as a simpler alternative to services like AWS Fargate and we’ve been really pleased with how simple it was to get started. Heroku’s easy to use command line interface also makes it easy to document the steps we took in a post like this!

    With a Heroku account setup, the first step was to create a new application and deploy Snowstorm to it. We’ll call the application “snomedct” on Heroku:

    > git clone https://github.com/IHTSDO/snowstorm.git
    > cd snowstorm
    > heroku login
    > heroku apps:create snomedct
    > heroku git:remote -a snomedct
    > git push -f heroku master

    Unfortunately, without a backing store, the application crashes on startup. The next step was to add an elasticsearch cluster for storing SNOMED CT concepts

    > heroku addons:create bonsai:standard-sm

    This gives us an elastic search cluster and automatically sets an environment variable on our application:

    BONSAI_URL=https://redacted:redacted@redacted.eu-west-1.bonsaisearch.net:443

    > heroku config -a snomedct (credentials have been removed!)

    This is unfortunately not the place our application expects to find the elasticsearch endpoint and credentials. As a spring application, we needed to provide it in a different format:

    > heroku config:set SPRING_APPLICATION_JSON='{"snowstorm": {"rest-api": {"readonly": true}}, "elasticsearch":{"urls":"https://redacted.eu-west-1.bonsaisearch.net:443", "username": "redacted", "password": "redacted"}}'

    And with this, the application started up successfully:

    > heroku logs
    ...
    --- Snowstorm startup complete ---

    Loading the concepts

    Previous experiments with SNOMED CT had shown us that we need a very beefy machine to run the initial import process. 8GB of memory on my laptop wasn’t enough. It also requires uploading and extracting 5GB of unzipped SNOMED CT release files and it was quickly clear that doing this via the Heroku-hosted application was going to be tricky.

    Instead, we took the approach of creating a temporary AWS Lightsail virtual machine with 16GB memory. With full SSH access we’d be able to download the release files straight to the machine and not have to worry about uploading them to snowstorm with an HTTP POST request.

    After getting a shell on the newly created VM we needed to set it up to build Java applications:

    > sudo yum update
    > sudo yum remove java-1.7.0-openjdk
    > sudo yum install java-1.8.0-openjdk-devel
    > sudo yum install git
    > sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
    > sudo sed -i s/\$releasever/7/g /etc/yum.repos.d/epel-apache-maven.repo
    > sudo yum install -y apache-maven

    With the VM setup to build the application we could clone it and use the maven build scripts:

    > cd ~
    > mkdir git
    > cd git
    > git clone https://github.com/IHTSDO/snowstorm.git
    > cd ~/git/snowstorm
    > mvn clean package

    The next step was to download the SNOMED CT release files from TRUD. Note that the links below will no longer work, you’ll need to log into TRUD yourself to generate new ones which can be found through your browser devtools:

    > cd ~
    > curl -O https://isd.digital.nhs.uk/artefact/trud3/.../SNOMEDCT2/27.0.0/UK_SCT2CL/uk_sct2cl_27.0.0_20190601000001.zip
    > curl -O https://isd.digital.nhs.uk/artefact/trud3/.../SNOMEDCT2/27.1.0/UK_SCT2DR/uk_sct2dr_27.1.0_20190612000001.zip

    Similar to configuring the heroku application, we needed to provide the elasticsearch credentials for this instance of snowstorm to talk to the cluster. Note that unlike the heroku instance, we want this one to have read/write access to the cluster:

    > export SPRING_APPLICATION_JSON='{"elasticsearch":{"urls":"https://redacted.eu-west-1.bonsaisearch.net:443", "username": "redacted", "password": "redacted"}}'

    Finally we could run the import process. This takes a few hours for each file, so best done overnight!

    > java -Xms2g -Xmx14g -jar git/snowstorm/target/snowstorm*.jar --delete-indices --import=uk_sct2cl_27.0.0_20190601000001.zip
    > java -Xms2g -Xmx14g -jar git/snowstorm/target/snowstorm*.jar --import=uk_sct2dr_27.1.0_20190612000001.zip
    > heroku restart --app snomedct

    After terminating our expensive AWS Lightsail VM, we can now query our SNOMED CT API:

    > curl --silent "https://snomedct.herokuapp.com/MAIN/concepts?term=asthma" | jq .
    
    {
      "items": [
        {
          "conceptId": "278517007",
          "effectiveTime": "20040131",
          "moduleId": "900000000000207008",
          "active": false,
          "pt": {
            "term": "Asthmatic bronchitis",
            "lang": "en",
            "conceptId": "278517007"
          },
          "definitionStatus": "PRIMITIVE",
          "fsn": {
            "term": "Asthmatic bronchitis (disorder)",
            "lang": "en",
            "conceptId": "278517007"
          },
          "id": "278517007"
        },
        ...
      ]
    }
  • Favourite talks from JSConf Iceland

    A little time off from tech
    A little time off from tech

    I was very fortunate to visit Reykjavik in August this year for JSConf Iceland.  Apart from relaxing trips to natural hot spas, battling vikings on rooftop bars backed by the setting sun, dramatic geysirs, gigantic waterfalls and snowmobiling on glaciers, there was also the best selection of technical talks I’ve seen at any conference.

    Here are a few of my favourites.

    A cartoon guide to performance in React

    Lin Clark’s talk was a very easy to follow explanation of React’s core algorithm (as it stands currently).

    One quick takeaway was to avoid an anti-pattern I’ve seen (or possibly implemented!) where you add a `key` to a collection of elements to avoid a React console warning. If you simply add an incrementing integer, say from the index of a call like

    props.collection.map((e, i) => <li key={i}>e</li>)

    then React won’t be able to easily tell that a collection has been sorted. Instead, use some form of business key (for example, a product code) that is specific to the object in the collection.

    Reactive animations

    Spinning up an electron app

    As someone primarily working in frontend web development for an e-commerce company I don’t have a big set of use cases for setting up a desktop application. However, this talk was great fun and it was nice to see how easy it is to make an app with integration into system menus using Javascript.

    I’ve heard BB-8 sales are through the roof!

    Arbitrary computation on the GPU using WebGL

    This was another talk where it’s difficult to imagine what the use cases could be for doing that volume of processing on the client, whilst locking up the GPU. It was still facinating to see how the GPU could be used to ‘render’ to an off screen canvas, then collect the data back by reading the image’s colour pixel by pixel.

    This will Flow your mind

    Tryggvi Gylfason’s talk was a very succint introduction to the Flow syntax and some of the benefits it has for larger teams.

    If typed or functional languages that transpile to JavaScript are of interest I’d also recommend checking out recent JavaScript Air talks on Typescript & Flow and typed functional programming.

    Progressive rendering – how to make your app render sooner

    This talk demonstrating old school techniques for piping html to the browser before you have the whole page, but with NodeJS and streams.

    Pivoting to React, at scale

    How Pinterest have migrated from their Backbone/Django architecture to React. Probably some lessons here for anyone doing a large migration between frameworks, but particularly anything related to React.

    The ones I missed…

    With two tracks of talks there’s always going to be one or two you’d still love to see. These are on my playlist:

    Dirty little front end tricks

    A colleague went to this one and said it was his favourite of the conference.

    The New Mobile Web: Service Worker, Push and App Manifests

    Service workers are that feature I’d love to use but are difficult to justify the time on while certain popular mobile operating systems are holding back the web. I’ve heard there are lots of interesting ways to use service workers as progressive enhancement and this talk goes into using them to produce a native-like experience.

  • Getting Started with Graylog on MacOS

    Graylog is an amazing open source tool for recording, viewing and analysing application logs. The performance and full text indexing of the underlying elasticsearch database means you probably won’t need any other logging tools throughout your whole organisation.

    Application developers writing to Graylog may need a local instance to retrieve diagnostic information during development. This post gives some quick instructions for setting up a local logging server using docker.

    Docker is used for managing lightweight operating system containers with isolated process and network namespaces – each container can have a process of PID 10 using TCP port 80. Unfortunately, Mac OS X’s kernel doesn’t contain the necessary APIs to run a Linux-based container and so we need to use a Virtual Machine running Linux itself. Boot2docker, installed via Homebrew, comes to our rescue to make the whole process as simple as possible:

    brew install Caskroom/cask/virtualbox
    brew install Caskroom/cask/boot2docker
    brew install docker
    

    Now all the tools are installed we need to download and start up the Linux VM:

    boot2docker download
    boot2docker init
    boot2docker start
    

    Another complexity of the docker host being in a virtual machine is that the docker command line interface needs to connect across a virtual network interface. Boot2docker helps us here by telling us the required settings. I put the ‘export’ lines in my ~/.profile so that the docker command will work without any special setup in the future.

    boot2docker shellinit
    # outputs:
    Writing /Users/djb/.boot2docker/certs/boot2docker-vm/ca.pem
    Writing /Users/djb/.boot2docker/certs/boot2docker-vm/cert.pem
    Writing /Users/djb/.boot2docker/certs/boot2docker-vm/key.pem
        export DOCKER_HOST=tcp://192.168.59.103:2376
        export DOCKER_CERT_PATH=/Users/djb/.boot2docker/certs/boot2docker-vm
        export DOCKER_TLS_VERIFY=1
    

    To test your new docker installation you can run the hello-world container:

    docker run hello-world
    

    Assuming that went okay we can now get started with Graylog! Run the following command to download the image for containers we’ll create. This may take some time as there are a few hundred MB of Linux distribution and Java libraries to fetch.

    docker pull graylog2/allinone
    

    You can now create a new container. The numeric arguments are the TCP ports that will be exposed by the container.

    docker run -t -p 9000:9000 -p 12201:12201 graylog2/allinone
    

    Once the flurry of activity in your console comes to a stop the container is fully up and running. You can visit the admin page by visiting http://ip-of-server:9000. The IP address is that of your boot2docker virtual machine, which you found earlier using boot2docker shellinit. The default username and password to the web interface are admin and admin.

    Once you’re inside the web interface Graylog will warn you that there are no inputs running. To fix this, browse to http://ip-of-server:9000/system/inputs and add an input of type GELF HTTP running on port 12201. The name of the input isn’t important unless you have plans for hundreds of them so I’ve unimaginatively called mine GELF HTTP.

    Now we’ve got a running server configured enough for testing, but no logs! A real application will produce logs of its own but for the purpose of demonstration we can write a script to read in your local computer’s system log and send the messages to Graylog for indexing. Note that you’d never do this in production as Graylog is perfectly capable of using the syslog UDP protocol, which would avoid the need to write any code.

    #!/usr/bin/env python
    
    import urllib2
    import json
    
    path = "/var/log/system.log"
    with open(path) as log:
        log = f.readlines()
    
    for line in f:
        message = {}
    
        message['version'] = '1.1'
        message['host'] = 'localhost'
        message['short_message'] = line
    
        req = urllib2.Request('http://192.168.59.103:12201/gelf')
        req.add_header('Content-Type', 'application/json')
        response = urllib2.urlopen(req, json.dumps(message))
    

    Once your logs are in the server you can start searching them. Try querying for kernel to find all the messages logged from the Mac OS X kernel. Having a full text index of our logs is useful but belies the promise of elasticsearch’s structured storage. A more useful implementation would log application-specific information as fields in the GELF message. For example, with the following code a single click a complete history of logs for a particular client could be retrieved:

    message['client_ip'] = ipFromApplicationVariable
    
  • First steps with Azure CLI

    This guide uses Azure’s cross platform CLI to create and destroy a virtual machine.

    First things first, lets define some basic properties about the machine such as the adminstrator credentials and its hostname, which will be given the suffix of cloudapp.net:

    HOST=cccu-dev-adfs
    USERNAME=djb
    PASSWORD=Password1!
    

    To make changes to your Azure account you’ll need to authenticate yourself using an X509 certificate. This can be generated by running the following command, which will launch your web browser:

    azure account download
    

    Once the certificate is downloaded you need to register it with the command line tool:

    azure account import "~/Downloads/credentials.publishsettings"
    

    Azure uses images (or templates) of operating systems with varying applications preinstalled. To create a new VM you need to pick an image, but these are identified by mixtures of random hexidecimal and textual description. To find the right identifier you can use another API call to search images by their human readable label. The following command downloads the list as json and uses the jq command line tool to find the right json object and return its identifier:

    SELECTOR='.[] | select(.label | contains("Windows Server 2012 R2 Datacenter")) | .name'
    IMAGE=`azure vm image list --json | jq -r -c $SELECTOR | tail -n 1`
    

    You can now create and start your new server:

    azure vm create $HOST $IMAGE $USERNAME $PASSWORD \
        --rdp \
        --location "West Europe" \
        --vm-size Medium
    azure vm start $HOST
    

    By default there won’t be any endpoints created on the Azure load balancer, save for 3389 used by RDP. To access a web server running on your new server use vm endpoint:

    azure vm endpoint create $HOST 80
    

    When you’re finished with the VM you’ll want to delete it to save a few pennies. Use the vm delete command and optionally also remove the VM’s disk storage:

    azure vm delete $HOST --blob-delete --quiet
    
  • Switching Hibernate’s UUID Type Mapping per Database

    JDBC doesn’t include UUIDs as one of its generic SQL types and so the different relational databases supported by Hibernate have different mechanisms to save and retrieve UUIDs. PostgreSQL’s JDBC driver, for example, requires that you use Types.OTHER as a surrogate for the missing UUID constant:

    preparedStatement.setObject( index, uuid, Types.OTHER );

    Whereas H2 requires the simpler:

    preparedStatement.setObject( index, uuid );

    Developers may also choose not to use a database’s native mechanism, preferring instead to stipulate storage using a 36 character string or as an ordinary binary number. The way of telling Hibernate which mechanism you wish to use is done using an @Type annotation on the column:

    @Type("pg-uuid")
    private UUID id;

    Or an @TypeDef annotation on the package object:

    @TypeDef(
            name = "pg-uuid",
            defaultForType = UUID.class,
            typeClass = PostgresUUIDType.class
    )
    package com.example.domain;

    This works fine with one database, but causes problems when you wish to test your application with a different database. The solution is to create another Hibernate type definition which can delegate to a database specific type depending on some environmental setting.

    The following implementation loads the database dialect in use from a properties file on the classpath. Strictly speaking, this information is already available in the persistence.xml file, but extracting that would entail much more effort. This is done just once when the class is loaded, so shouldn’t result in a performance impact.

    /**
     * @author David Beaumont
     * @see org.hibernate.type.PostgresUUIDType
     */
    public class UUIDCustomType extends AbstractSingleColumnStandardBasicType {
    
        private static final long serialVersionUID = 902830399800029445L;
    
        private static final SqlTypeDescriptor SQL_DESCRIPTOR;
        private static final JavaTypeDescriptor TYPE_DESCRIPTOR;
    
        static {
            Properties properties = new Properties();
            try {
                ClassLoader loader = Thread.currentThread().getContextClassLoader();
                properties.load(loader.getResourceAsStream("database.properties"));
            } catch (IOException e) {
                throw new RuntimeException("Could not load properties!", e);
            }
    
            String dialect = properties.getProperty("dialect");
            if(dialect.equals("org.hibernate.dialect.PostgreSQLDialect")) {
                SQL_DESCRIPTOR = PostgresUUIDType.PostgresUUIDSqlTypeDescriptor.INSTANCE;
            } else if(dialect.equals("org.hibernate.dialect.H2Dialect")) {
                SQL_DESCRIPTOR = VarcharTypeDescriptor.INSTANCE;
            } else {
                throw new UnsupportedOperationException("Unsupported database!");
            }
    
            TYPE_DESCRIPTOR = UUIDTypeDescriptor.INSTANCE;
        }
    
        public UUIDCustomType() {
            super(SQL_DESCRIPTOR, TYPE_DESCRIPTOR);
        }
    
        @Override
        public String getName() {
            return "uuid-custom";
        }
    
    }

    Now you just have to register this new type as detailed above using @Type("uuid-custom").

  • Copying Mail using Imapsync

    Mail can be copied between IMAP servers using a graphical client such as Outlook or Thunderbird. However, these tend to be quite unreliable, often resulting in duplicates or missing messages.

    An alternative is to use the imapsync script. For example, the following command copies all your mail from Fred’s INBOX folder at example.com to the Old-Mail folder at example.net:

    imapsync --host1 mail.example.com --user1 fred --ssl1 --password1 secret1 \
             --host2 mail.example.net --user2 fred --ssl2 --password2 secret2 \
             --folder INBOX --prefix1 INBOX --prefix2 Old-Mail --dry
    

    The –dry option prevents imapsync actually making any changes. Once you are sure you have all the options right, you can remove it to perform the sync.

  • Virtualbox Guest Additions in Fedora 16

    VirtualBox compiles kernel modules against the running kernel. This means you should update your system before you start as otherwise the Guest Additions will stop working when you upgrade in the future.

    yum clean all
    yum update
    

    I recommend doing this in a virtual terminal of the guest machine (press CTRL-ALT-F2) as in graphical mode VirtualBox seems to have a habit of making the kernel of the host OS panic. Once you are updated, make sure to reboot so that you are actually running the latest kernel:

    reboot
    

    Now install the headers and compilers needed by VirtualBox Guest Additions:

    yum install kernel-devel kernel-headers gcc
    

    And install the guest additions by choosing ‘Install Guest Additions…’ from VirtualBox’s menus. Reboot again and your systems should come up with the guest additions working.

  • Adding a MySQL datasource to JBoss AS 7

    This post assumes you are using a Unix like system with a bash shell and have already downloaded and extracted the JBoss AS 7 application server.

    Retrieve the MySQL Connector

    The connector can be downloaded from: http://dev.mysql.com/downloads/connector/j/. The current version at the time of writing is 5.1.17, further sections of this guide assume that version number.

    Unzip the connector to your downloads folder and save the location of the main JAR in a variable which we can refer to later:

    cd ~/Downloads
    unzip mysql-connector-java-5.1.17.zip
    export MYSQL_JAR=~/Downloads/mysql-connector-5.1.17/mysql-connector-java-5.1.17-bin.jar

    Add a Module to AS 7

    AS 7 uses a module system to provide isolation in class loading. We need to create a new module which contains the MySQL Connector J JAR. Move to the the AS installation directory and create the folder structure for the new module:

    export JBOSS_HOME=~/Development/jboss-as-web-7.0.0.Final
    cd $JBOSS_HOME
    mkdir -p modules/com/mysql/main

    Copy the driver jar to the new directory and move to that directory:

    cp $MYSQL_JAR $JBOSS_HOME/modules/com/mysql/main
    cd $JBOSS_HOME/modules/com/mysql/main

    Define the module in XML. This is the key part of the process:

    vi module.xml

    If the version of the jar has changed, remember to update it here:

    
    
    
      
        
      
      
        
      
    
    

    The new module directory should now have the following contents:

    module.xml
    mysql-connector-java-5.1.17-bin.jar
    

    Create a Driver Reference

    Now the module has been created, we need to make a reference to it from the main application server configuration file:

    cd $JBOSS_HOME/standalone/configuration
    vi standalone.xml
    

    Find the ‘drivers’ element and add a new driver to it:

    
        
        
            
                org.h2.jdbcx.JdbcDataSource
            
        
    
    

    The ‘h2’ driver is part of the default JBoss configuration. The new driver that needs adding is named ‘mysql’.

    Add the Datasource

    Go into the configuration directory and open the main configuration file:

    cd $JBOSS_HOME/standalone/configuration
    vi standalone.xml
    

    Find the datasources element and add a new datasource inside:

    
        
            jdbc:mysql://localhost:3306/mydb
        
        
            mysql
        
        
            
                root
            
            
                
            
        
        
            
                100
            
            
        
    
    

    Start the Application Server

    Now try running the application server. Make sure you run initially as a user which has write access to the modules directory where you placed the MySQL connector JAR. This is because the appication server seems to generate an index of the directories inside the JAR.

    cd $JBOSS_HOME
    bin/standalone.xml
    

    Hopefully, amongst the application server’s console output you should see the following:

    20:31:33,843 INFO  [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-1) Bound data source [java:/mydb]