Article

Evaluating App Behavior Change

Published on June 3, 2024
Contributors

The gap in measuring the behavior change potential of health apps.

If you, like millions of consumers around the world, are hoping to improve yourself using mobile apps, you’re quickly faced with the daunting task of navigating the vast sea of options in app stores. But what you are shopping for is not really just digital tools. When looking for a nutrition, fitness, or meditation app, what people really want is to reach a certain personal goal, often involving some level of behavior change – they aim to eat better, move more, or establish a mindfulness practice.

However, in a world where most apps promise this along with transformative outcomes in health, wellness, and personal development – how do we really know if they can deliver on their promise? Could we even measure that? In this article, we’ll explore these questions and consider the current ways behavior change is measured and evaluated in digital health apps.

App Store Ratings: A Flawed Indicator

Most people’s main metric for browsing and choosing an app is the App Store Rating. It makes sense, it’s a clear social proof; this is what other consumers like you think about this product or service. However, the reliance on app store ratings overlooks a crucial aspect: App store ratings aren’t based on whether an app effectively supports real-world behavior change (or even helps users reach their goals). Rather, such ratings tend to be about how users feel about the app itself based on first impressions and initial perceived usability.

Furthermore, we would argue that app store ratings as a gauge of even general app quality and effectiveness are fundamentally flawed. This is because these ratings are vulnerable to manipulation and even fabrication, with some developers engaging in practices to ensure that ratings reflect only the most positive user experiences, and others using paid reviewers to inflate their ratings [1, 2]. In fact, some research suggests that around 35% of app reviews are fake [3].

Consequently, it’s easy to see how app store ratings offer a misleading metric. At best, ratings reflect short-term user satisfaction yet ignore the app's effectiveness in promoting lasting change. This gap can clearly mislead even the most informed of consumers.

Metrics for Evaluating Health Apps

In healthcare and medicine, the assessment of apps extends beyond user ratings to encompass criteria such as safety, security, and adherence to best practices. These criteria are crucial for apps that are recommended by healthcare professionals to patients, ensuring they are safe and reliable. However, what these evaluations often lack is any examination of whether an app effectively utilizes behavioral science to encourage behavior change. As an example, the US Digital Health Assessment Framework (DHAF) performs well in assessing privacy, safety, accessibility, and security, but lacks any behavior change component. This oversight further highlights a significant gap: the need for assessment tools that can evaluate apps' potential to make a meaningful impact in changing health behaviors.

The State of Assessing Behavior Change Apps

Over the past decade, various academic initiatives [4, 5, 6, 7] have sought to address the shortcomings of existing evaluation methods by developing frameworks that assess an app's potential to drive behavior change more accurately. Despite these efforts, several critical issues persist:

Evaluation Process
Many current academic methodologies for app evaluation lack practical considerations for real-world application. Most of them rely on brief, one-time reviews, with the total assessment time averaging around 5-10 minutes per app. This approach is likely insufficient to understand the complexities of how an app might support or hinder long-term behavior change.

Use of Behavioral Science Theories
These methodologies frequently overlook important behavioral science theories and concepts. As an example, while self-determination theory (SDT) has become one of the most cited and reliable frameworks for understanding motivation, it is still frequently omitted from many frameworks. This has led to separate frameworks only focused on this aspect [8]. The same is true for many product-specific behavior change techniques, which instead rely heavily on traditional public health methodology.

Lack of Applied Considerations
Findings from academic assessments are often presented in a manner that is not easily digestible for product teams or non-behavioral scientists. Put simply, there is a lot of jargon. This includes failing to align the results with user metrics or to map them against stages of the user journey, making it challenging for developers to apply these insights practically.

Bridging the Gap

Addressing these limitations requires a concerted effort to develop evaluation methodologies that are not only grounded in behavioral science but are also practical and accessible for those in the product development and digital health fields. We are passionate about changing this state of assessing app behavior change and our Behavior Change Score initiative is one of the ways in which we are hoping to make a positive contribution.

We’d love to collaborate with product teams interested in better understanding their product's behavior change potential as well as other practitioners or researchers in the field interested in addressing this important gap.


References

  1. McGee, P. (2020, September 7). How app developers manipulate your mood to boost ranking: Higher ratings are the ‘lifeblood’ of the smartphone app world but what if they are inflated? Financial Times. https://www.ft.com/content/217290b2-6ae5-47f5-b1ac-89c6ccebab41

  2. Vincent, J. (2015, February 12). This is how App Store rankings are manipulated: Image allegedly shows racks of iPhones used to download and rate apps. The Verge. https://www.theverge.com/2015/2/12/8024861/top-10-app-store-manipulation-photo

  3. Martens, D., & Maalej, W. (2019). Towards understanding and detecting fake reviews in app stores. Empirical Software Engineering, 24(6), 3316-3355. https://doi.org/10.1007/s10664-019-09706-9

  4. Stoyanov, S., Hides, L., Kavanagh, D. J., Zelenko, О., Tjondronegoro, D., & Mani, M. (2015). Mobile app rating scale: A new tool for assessing the quality of health mobile apps. JMIR mHealth and uHealth, 3(1), e27. https://doi.org/10.2196/mhealth.3422

  5. McKay, F. H., Cheng, C., Wright, A., Shill, J., Stephens, H., & Uccellini, M. (2016). Evaluating mobile phone applications for health behaviour change: A systematic review. Journal of Telemedicine and Telecare, 24(1), 22-30. https://doi.org/10.1177/1357633x16673538

  6. McKay, F. H., Wright, A., Shill, J., Stephens, H., & Uccellini, M. (2019). Using health and well-being apps for behavior change: A systematic search and rating of apps. JMIR mHealth and uHealth, 7(7), e11926. https://doi.org/10.2196/11926

  7. Bondaronek, P., Alkhaldi, G., Slee, A., Hamilton, F. L., & Murray, E. (2018). Quality of publicly available physical activity apps: Review and content analysis. JMIR mHealth and uHealth, 6(3), e53. https://doi.org/10.2196/mhealth.9069

  8. Villalobos-Zúñiga, G., & Cherubini, M. (2020). Apps that motivate: A taxonomy of app features based on self-determination theory. International Journal of Human-Computer Studies, 140, 102449. https://doi.org/10.1016/j.ijhcs.2020.102449

Share