Image

This week at TrialGrid (Feb 10, 2017)

Something that Andrew and I have discovered in the last few weeks is that people are interested in not just what we are doing but how we are doing it. Of course, we are keen to use our blog to promote our amazing product, TrialGrid, but every product feature has a story and usually those stories go untold which is a pity. We thought it might be interesting to do an occasional roundup of the weeks challenges and achievements.

This weeks challenge...

Large Uploads

The easiest way to get started with TrialGrid is to upload an Architect Loader Spreadsheet from Medidata Rave Architect. TrialGrid, examines the file, runs various checks on it and then imports it. Something we noticed was that large files, in the region of 50-60MB were sometimes failing to load. We tracked this down to our hosting infrastructure where requests were timing out after 30 seconds and disconnecting the upload.

We considered raising the timeout but when the webserver is busy uploading a file it's less able to serve other requests. All we were doing with the file is forwarding it on for secure storage in Amazon's S3 service. The transfer diagram looked like this:


    +--------------+            +--------------+            +---------------+
    |              |   File     |              |    File    |               |
    |   User       +------------>   TrialGrid  +------------>  Amazon S3    |
    |              |            |              |            |               |
    +--------------+            +--------------+            +---------------+

As a middleman we're not adding much value here.

Happily, Amazon offers a Direct-to-S3 option. This involves a little more hand-shaking (not shown in diagram below) but the upshot is that the User loads their file direct to a secure location in Amazon S3. Once the upload is complete the client tells TrialGrid the file is uploaded so that it can begin the process of reading the file from S3 and starting the import.


    +--------------+            +--------------+            +---------------+
    |              |   File     |              |    File    |               |
    |   User       +------------>  Amazon S3   +------------>   TrialGrid   |
    |              |            |              |            |               |
    +--------------+            +--------------+            +---------------+

(Simplified diagram)

The whole process encrypts the data in transit (via HTTPS) between Amazon and the Users browser and the file is then encrypted at rest in Amazon's servers as this image shows:

S3 At Rest Encryption

When TrialGrid needs to access this file, it reads the file into memory, so we never even store the file on our servers.

Resources