Using Machine Learning to improve Quality Assurance of Behavior Change Communication Program in Madhya Pradesh, India

Tue, 05/11/2021 - 04:47

0 comments

Summary:
Given the breadth of some BCC programs, health surveys are an essential way in which to assess their effectiveness. Quality assurance procedures around health surveys have traditionally sought to capture response rates and complete data. However, applications of machine learning (ML) in assessing the quality of the data captured by health surveys is rare. Given the expansion of survey tools used on tablets and mobile phones, real-time data capture presents an opportunity to enhance quality assurance procedures using ML. In this study of a BCC program evaluation, a 5000-women survey was monitored across four districts in Madhya Pradesh, India using Machine learning algorithms. Using unsupervised learning algorithms, we captured potential outliers in terms of time to completion of certain questions or sections, number of skips, or number of don't knows used. With self-completed training data or test-retest checks on longer surveys, we used supervised methods to label training or retest data and compared the unlabeled data via various algorithms. This is the first approach to our knowledge of an ML application to improve the quality assurance of data used to generate evidence on the effectiveness of a BCC program. The approach presented offers a blueprint for using advantaged data analytics to enhance survey data quality in real-time.Adoption of this approach could lead to significant improvements in the quality of surveys; improving evidence generation for BCC programs globally.

Background/Objectives:
Given the breadth of some BCC programs, health surveys are an essential way in which to assess their effectiveness. Quality assurance procedures around health surveys have traditionally sought to capture response rates and complete data. However, applications of machine learning (ML) in assessing the quality of the data captured by health surveys is rare. This study aims to examine the use of machine learning methods in monitoring data from a quantitative survey of an impact evaluation of a BCC digital health program evaluation in Madhya Pradesh, India.

Description of Intervention and/or Methods/Design:
This individual randomized control trial has enrolled 5000 pregnant women from four districts in Madhya Pradesh, India. All of these women were interviewed at baseline, and then half were enrolled to receive Kilkari audio messages from their 4th month of pregnancy till 11 months postpartum. The endline survey measuring outcomes relevant to the program evaluation, including knowledge, decision-making, discussion and practice was conducted amongst the same women 11-14 months postpartum. Enumerators and supervisors completed short surveys detailing their characteristics including prior experience on surveys, age, and place of residence. Data features such as time to completion, don't know rate, skips rate, respondent characteristics, and enumerator characteristics were used to classify interviews as true or falsely filled. Various ML algorithms, both supervised, using labeled training and validation, and unsupervised, were used to classify the data and follow up with enumerators.

Results/Lessons Learned:
Study findings led to the development of a comprehensive approach to quality assurance, a critical component of which included the development of a machine learning algorithm for identifying biases in the data. Biases were assessed across both enumerators and respondents including interviewer bias and misunderstanding on the part of the respondents. Metrics to assess these biases included sociodemographic characteristics, don't know rates, and skip patterns. Time stamps were applied to measure time-to-complete priority questions as well as sections of the tool as compared against times assessed during the initial piloting and training. Results showcased outlying interviews that could easily be followed up on and remedied through discussion with the enumerator or supervisor of the team.

Discussion/Implications for the Field:
This is the first approach to our knowledge of an ML application to improve the quality assurance of data used to generate evidence on the effectiveness of a BCC program. The approach presented offers a blueprint for using advantaged data analytics to enhance survey data quality in real-time. Adoption of this approach could lead to significant improvements in the quality of surveys; improving evidence generation for BCC programs globally.

Abstract submitted by:
Neha Shah - Johns Hopkins Bloomberg School of Public Health
Diwakar Mohan - Johns Hopkins Bloomberg School of Public Health
Kerry Scott
Amnesty LeFevre - Johns Hopkins Bloomberg School of Public Health

Source

Approved abstract for the postponed 2020 SBCC Summit in Marrakech, Morocco. Provided by the International Steering Committee for the Summit. Image credit: "Using Machine Learning to Optimize the Quality of Survey Data: Protocol for a Use Case in India"

Legacy Partners

Using Machine Learning to improve Quality Assurance of Behavior Change Communication Program in Madhya Pradesh, India

Red de La Iniciativa de Comunicación

Soul Beat Africa Network

The Drum Beat Network