Preterm birth is a major public health problem with profound implications on society, there would be extreme value in being able to identify women at risk of preterm birth during the course of their pregnancy. Previous research has largely focused on individual risk factors correlated with preterm birth and less on combining these factors in a way to understand the complex etiologies of preterm birth. In this paper, we use the "Preterm Prediction Study," a clinical trial dataset collected by the National Institute of Child Health and Human Development (NICHD) – Maternal-Fetal Medicine Units Network (MFMU). We summarize two years of efforts to collect, prepare and process this dataset with a special emphasis to solve a so far elusive problem of predicting preterm birth in nulliparous (first time) mothers. Our approach includes comparison of two approaches for deriving predictive models: an SVM approach with linear and non-linear kernels and logistic regression with different model selection procedures. We demonstrate significant improvement compared to past work on this dataset while stressing the challenges we faced in data preparation and analysis.
展开▼