Building Your Outcomes Toolkit

By: Jason Olivieri, MPH, Director of Outcomes, Med-IQ


Proper Planning Precedes Peak Performance

Where does your mind wander when you’re bored? Like “stuck-on-your-third-conference-call-in-a-row” bored. Do you catch up on emails? Read HuffPost?

Personally, I fantasize about bug-out bags.

A “bug-out bag” is that Bourne Identity style survival kit. The kind of thing you reach for if your regularly scheduled programming is interrupted by the Emergency Broadcast System and it’s not a test. Maybe it’s Red Dawn. Maybe it’s the zombie apocalypse. Maybe your in-laws are coming to town. Whatever the reason… you got to get out quick and there’s only time to grab one thing: your bug-out bag. If you’ve prepared well, you survive.

It’s an odd daydream; preparing for the apocalypse. But the point is, it’s important to be ready when pressure is real.   

Assessment Tools

There is another, more practical bag I’ve been building.

I’ve spent years trying to determine the educational impact of CME. In the spirit of preparedness, I’ve summarized four steps every CE provider should consider when building their own assessment kit.

Assuming your emergencies are more likely to come in the form of a reaccreditation surveyor than a zombie horde; this guide will moderate some of the worry when the alarm sounds.

#1 Putting First Things First

Although establishing a clear link between CE participation (cause) and an effect (e.g., a gain in knowledge, competence or performance) is beyond the scope of most activities, the methodologic principals of achieving such govern every outcome assessment. The first step is to establish that the intended effect comes after the presumed cause (aka, temporal precedence). For example: CE participants scored better on a post-activity assessment than they did pre-activity.

This is standard procedure for most CE providers and typically concludes triumphantly with claims of statistical significance. While these results are nice, we need to be clear about what significance means and how to determine it properly.

Principle: The P value is used to accept or reject the hypothesis that no difference exists between comparison groups.

In CE outcomes terms, the expectation (null hypothesis) is that your pre- and post-activity data for a given question is equivalent.

Interpretation: A significant P value (less than .05) means you reject the hypothesis that no difference exists. This does not mean the two groups are different: you’re testing whether groups are the same, not whether they are different. This means a significant P value isn’t an endpoint, but rather, a signal.

A P < .05 indicates additional energy is warranted to determine if your two comparison groups are indeed distinct. You can use this signal in support of a claim temporal precedence (that the group is no longer the same as they once were), but the test isn’t conclusive.

Additionally, the more P values you calculate for a given CE activity, the more likely you are to find a false positive.

For example: A pre/post-test consisting of only five questions has an approximate 25% chance of finding a P < .05 derived not from a characteristic of your comparison groups, but rather from running multiple comparison tests.

So assessing more questions increases your likelihood of finding statistical significance, but it decreases the likelihood you can trust your results.

Luckily, there’s an easy to apply method to correct for this (i.e., Simes-Hochberg).

#2 Measurement

While our first concern is whether cause precedes effect, we must also ensure the effect is properly measured.

Most effects in CE activities are measured by multiple-choice questions (MCQs). Unfortunately, having answered countless MCQs does not translate into writing good MCQs. If you haven’t had any training in the matter, the likelihood is “extremely high” on a 5-point Likert-type scale that you’re making common mistakes.

Common flaws in question design interrupt your ability to measure your intention and thereby reduce a claim to temporal precedence. The National Board of Medical Examiners (NBME) have developed NBME U (, which covers all you need to know to write better MCQs.

I recommend these lessons:

  • Writing MCQs to Assess Application of Basic Science Knowledge (i.e., knowledge test questions with a single correct answer),
  • Writing MCQs to Assess Application of Clinical Science Knowledge (i.e., case vignettes)
  • MCQ Flaws and How to Avoid Them.

Each lesson costs $15, but that’s a small investment for the ability to markedly increase the quality of your question writing.

development tool 2.png

#3 Demonstrating Covariation

Covariation can be established by showing that the:

  • Desired effect is only attributable to the presumed cause (in this case, your CE activity), and/or
  • more presumed cause generates more of the desired effect.

The first of these requires a controlled study. Specifically, we need to compare CE participants to a similar non-participant group in regard to the desired effect. If participants demonstrate the desired effect and non-participants do not, we have evidence of covariation (i.e., the CE activity is responsible for the desired effect).

The study most typically employed in CE to demonstrate covariation is called post-test only design with nonequivalent groups. In this study, both CE participants and a select non-participant (control) group are given identical assessments at a defined point after the activity.

The assumption is that if participants better demonstrate the desired effect than non-participants, the CE activity mattered. While there is logic to this assumption, it’s undermined by the following limitations.

  1. It violates temporal precedence:
    Without data in regard to the desired effect among participants or controls prior to the activity (i.e., a pre-test), it is impossible to determine whether the activity precedes the desired effect.
  2. The control group is flawed:
    Think of who participated in your CE activity. What may have motivated them to participate? What are their practices like? Where do they live? Where did they train? How long ago did they graduate medical school? The list of distinguishing characteristics is long.

Now think about those that participated in your control group assessment. Given the difficulties typically associated with attracting clinicians to surveys, why did these individuals respond? What are their practices like? Can we really expect a few multiple-choice demographic questions to balance all the potential difference between the participant and control groups? The answer is given away in the study title: post-test only design with nonequivalent groups.

Don’t fret. Every study design has its flaws. Look closely at the roads of gold promenaded by randomized control trials and you’ll find a few yellow painted bricks. Once we know what we want to address, we can adapt our study design.

The following Table (1) describes approaches frequently used in CE outcomes assessments and their corresponding strengths and weaknesses. These designs are categorized as quasi-experimental because they lack randomization of subjects into a participant or control group, which serves to balance characteristics across groups and ensure equivalence. Without equivalence, it is difficult to distinguish whether the desired effect is due to the presumed cause (i.e., the CE activity) or differences between comparison groups. This is referred to as “selection bias.” While selection bias will always be a limitation in quasi-experimental research, there are designs to reduce its effect. 

Table 1: Quasi-experimental designs for CME outcomes

DesignDescriptionStrength and Weakness
One-group pre- vs. post-test Pre- and post-activity surveys Can establish temporal precedence, but without external control, cannot establish covariation
Post-test only with non-equivalent control Post-activity tests to participants and non-participant (control) Can suggest covariation; however, effect may be due to CE or differences between comparison groups (selection bias
and Post-test with nonequivalent control group
Pre- and post-activity surveys to both participants and non-participant (control) Can establish temporal precedence and covariation assertions enhanced by
pre-activity assessment of selection bias


As described in Table 1, we can address our previous temporal precedence concern by adding a pre-test for both participants and controls. The addition of a pre-test also provides an important variable to assess equivalence between participants and controls. For example, if participants and controls perform similarly on pre-activity assessments, it suggests a reduced likelihood of factors existing between comparison groups that could affect the desired effect.

Regarding the second type of covariation (i.e., more cause equals more effect), literature suggests that sequenced CE interventions are more likely to be effective, supporting covariation in a broad sense.

If you want to demonstrate this in your own CE program, you need:

  • A sequenced intervention, and
  • an ability to conduct regression modeling.

Regression modeling is helpful for determining how an added unit of cause (e.g., participating in two activities versus only one in a CE series) affects desired effect. For the curious, an excellent guide entitled Understanding the “Why”: Using Predictive Modeling to Inform Outcomes can be found among the on-demand webinars maintained by the Alliance.

development tool 3.png

#4 Validity

Once temporal precedence, proper measurement and covariation are established, all other possible explanations for the desired effect (i.e., confounders) must be explored. This addresses the internal validity of your assessment (i.e., how well it avoids confounding).

There are many threats to internal validity, one of which was introduced in the previous section: selection bias. The better your study reduces threats to internal validity, the more confident you can be in the results.

There is no perfect study. Even true experimental designs (such as randomized control trials [RCTs]) have concerns of internal validity, although rather less so than their quasi-experimental cousins. Table 2 details common internal validity threats, as well as examples of how each might apply to CE.

Table 2: Internal Validity Threats

Selection Characteristics unique to participant or control group may explain desired effect Participants represent those most eager to change; CE simply gave them a venue to congregate
History External events in concurrence with presumed cause may explain effect Participants are engaged in additional, complementary learning activities; effect is aggregate
Maturation Effect would have occurred naturally over time Characteristics such as advancing age and/or work experience may predispose participants to change
Mortality Study drop-outs (and remainders) may influence desired effect Respondents represent those most interested in learning change, but not necessarily reflective of all participants
Testing Exposure to repeated, identical tests can affect subsequent scores Respondents may do better on post-tests because they’ve been sensitized to the questions; doesn’t reflect actual learning
Regression Extreme scores come back toward the mean on subsequent testing, rather than away from it Low pre-activity activity assessment scores will trend toward group mean upon subsequent measurement, independent of presumed cause

While statistical methods can make some corrections for internal validity concerns, study design is the primary determinant of internal validity. And just like that limitation section tucked into the back corner of every peer-reviewed publication, if you’re presenting CME outcomes data, you should be prepared to speak to each of these potential confounders: they lurk behind every sensational outcomes-based headline you’ve ever read. 

development tool 4.png


These tools for strengthening outcomes reports should govern daily practice. Hopefully, your toolkit is now more robust and will protect you for your version of a CE outcomes apocalypse

Recent Stories
Alliance Podcast Episode 18: Spotlight on Leadership

Comparison of In-Person Versus Tele-Ultrasound Point-of-Care Ultrasound Training During the COVID-19 Pandemic

Rockpointe Presents Online CME Course on Treating High-Risk Patients With COVID-19