Researchers using Electroencephalograms (‘‘EEGs’’) to diagnose clinical outcomes often run into computational complexity problems. In particular, extracting complex, sometimes nonlinear, features from a large number of time-series often require large amounts of processing time. In this paper we describe a distributed system that leverages modern cloud-based technologies and tools and demonstrate that it can effectively, and efficiently, undertake clinical research. Specifically we compare three types of clusters, showing their relative costs (in both time and money) to develop a distributed machine learning pipeline for predicting gestation time based on features extracted from these EEGs.