Skip to main content

Using cognitive load theory to evaluate and improve preparatory materials and study time for the flipped classroom



Preclinical medical education is content-dense and time-constrained. Flipped classroom approaches promote durable learning, but challenges with unsatisfactory student preparation and high workload remain. Cognitive load theory defines instructional design as “efficient” if learners can master the presented concepts without cognitive overload. We created a PReparatory Evaluation Process (PREP) to systematically assess and measure improvement in the cognitive-load efficiency of preparatory materials and impact on study time (time-efficiency).


We conducted this study in a flipped, multidisciplinary course for ~ 170 first year students at Harvard Medical School using a naturalistic post-test design. For each flipped session (n = 97), we assessed cognitive load and preparatory study time by administering a 3-item PREP survey embedded within a short subject-matter quiz students completed before class. Over three years (2017–2019), we evaluated cognitive load- and time- based efficiency to guide iterative revisions of the materials by content experts. The ability of PREP to detect changes to the instructional design (sensitivity) was validated through a manual audit of the materials.


The average survey response rate was ≥ 94%. Content expertise was not required to interpret PREP data. Initially students did not necessarily allocate the most study time to the most difficult content. Over time, the iterative changes in instructional design increased the cognitive load- and time-based efficiency of preparatory materials with large effect sizes (p < .01). Furthermore, this increased the overall alignment of cognitive load with study time: students allocated more time to difficult content away from more familiar, less difficult content without increasing workload overall.


Cognitive load and time constraints are important parameters to consider when designing curricula. The PREP process is learner-centered, grounded in educational theory, and works independently of content knowledge. It can provide rich and actionable insights into instructional design of flipped classes not captured by traditional satisfaction-based evaluations.

Peer Review reports


In the last decades, preclinical medical education reform has focused on the transition from traditional lecture-based instruction to various forms of “flipped-classroom” teaching to provide students with more experience applying their knowledge and struggling with clinical problems [1,2,3]. The flipped classroom method aims to align learning better with human cognition and thus make learning deeper and more durable, a goal that is often referred to as “active learning” [4, 5]. The success of the flipped classroom format requires that students arrive well-prepared to participate in class [6]. Ensuring students have adequate time and effective resources to prepare for class have emerged as common challenges in implementing flipped classroom formats [6, 7].

Medical school faculty are not usually trained in instructional design, and, as content experts, may struggle to accurately assess the cognitive difficulty or time required for novice learners to work through assigned materials. This phenomenon is a normal cognitive bias sometimes called the “expert blind spot” [8]. Providing students with overly comprehensive preparatory materials can convince faculty that the content is well covered, but as a result students may be overwhelmed by too much content leading to inadequate preparation for class [9,10,11,12,13] and thus interfere with active learning.

The iterative cycle of curriculum improvement is routinely performed by faculty and requires significant time and resources. While empirical evidence regarding instructional design for the flipped classroom is emerging [14,15,16], standardized design frameworks are still lacking [17,18,19]. Satisfaction-based endofcourse evaluations are widely used in higher education to assess teaching, but they lack in granularity to assess effectiveness at the level of day-to-day instructional design [20,21,22]. Thus, methods to evaluate and improve preparatory resources for flipped classes to promote student preparation for active learning are needed.

To address this problem, we developed a learner-centered PReparatory Evaluation Process (PREP) grounded in cognitive load theory (CLT). CLT defines instruction as “efficient,” when it provides the learner with sufficient guidance to successfully process novel information without overloading the limited capacity of working memory [23, 24]. The level of guidance required depends on both the learner’s prior expertise and the intrinsic complexity of the topic [25, 26]. The efficiency of the instructional materials has typically been assessed by comparing the performance on the learning task with the intensity of mental effort (“difficulty of the material”) in form of efficiency graphs or metrics [27,28,29]. This method has been widely used in the field of instructional design to assess the cognitive load efficiency of learning tasks with strong psychometric properties in various contexts [27, 29,30,31,32,33,34]. Given the time-compressed nature of undergraduate medical education and the challenges observed with managing workload in the flipped setting [7, 10], we expanded the traditional notion of cognitive load-based efficiency to also include prep time.

PREP consist of two steps – first measuring instructional efficiency of prep assignments to identify resources in need of revision, second applying instructional design principles derived from CLT [24, 26] to optimize instructional efficiency. To our knowledge, this is the first study to systematically apply CLT to assess how iterative changes to the instructional design affect the self-reported cognitive load and workload of prep resources in a flipped curriculum.

Specifically, we focused on the following research questions: (1) How can the PREP tool be used to assess the cognitive-load and time-based efficiency of individual preparatory materials? (2) What is the sensitivity of the PREP tool in detecting changes in the instructional design of preparatory materials? (3) What is the overall impact of the PREP process on instructional efficiency of the entire course?


Study design

This study describes a naturalistic post-test study without a control group looking at the cognitive load-and time-based efficiency of students engaging with flipped classroom learning in the basic science component of an undergraduate medical program. The Harvard Medical School (HMS) Program in Medical Education (PME) Educational Scholarship Review Committee deemed this study not human subjects research and exempt from further IRB review. The need for written informed consent was waived by the HMS PME Educational Scholarship Review Committee due to the retrospective nature of the study. We followed the revised standards for quality improvement reporting excellence [35].


This study was conducted in context of a multidisciplinary, pre-clinical basic science course in the Pathways program at Harvard Medical School. The 13.5 week-long course was taken by 170 students each year (~ 135 medical and ~ 35 dental students) as part of a long-standing joint first-year program where students were enrolled without differentiation in the same courses. The course, Foundations, interleaved 97 individual flipped-classroom sessions in ten different disciplines: cell biology, anatomy, developmental biology, histology, pathology, cancer biology, genetics, immunology, microbiology and pharmacology. Students attended class Monday, Tuesday, Thursday and Friday mornings each week (8:00 AM – 12:30 PM) while the afternoons were reserved for preparatory study and consolidation. Wednesdays were reserved for clinical training (see Appendix 1 for an exemplary week of the course schedule). Faculty recommend that students distribute their preparatory time across the week (including weekends), so that they prepare for no more than two individual sessions per afternoon.

Instructional design of preparatory resources

The course faculty applied the following design principles to prep work for all flipped classroom sessions: (1) Limit prep work to ~ 2 h study per ~ 80 min in-class session (or ~ 1.5x of in-class time); (2) Provide students with a short summary of the topic (5–6 sentences), learning objectives, and important keywords; (3) Where applicable, organize content derived from prior lectures into several shorter concept videos (typically in form of narrated slide presentations). Otherwise, study resources may vary, ranging from book chapters to articles or other readings; (4) Conclude each preparatory assignment with a short, open-book, multiple choice quiz to provide students with feedback on their preparation (readiness-assessment exercise or RAE). Students received credit for each RAE submitted before each session, if ≥ 50% correct. Cumulatively, all RAEs accounted for 20% of the final course grade.


To gather session-level feedback on the learner experience, we developed a 3-item survey to assess preparation time (9-point scale, from 1 h or less — 5 h or more), familiarity with content from prior courses (5-point scale, not familiar — extremely familiar), and difficulty of working through the materials as measure of cognitive load (5 point-scale, very easy — very difficult) of each prep assignment (Appendix 2). The item on cognitive load has been extensively used in various educational settings; validity evidence has been collected and published to confirm its applicability [30, 31, 36]. All 3 items were assessed by both faculty experts and students to ensure validity in our context. This 3-item PREP survey was included with each RAE starting in 2017. Completing the 3-item survey portion at the end of each RAE was optional and did not contribute to students’ grades. Students were informed that these data were collected for continuous quality improvement during the introduction to the course.


The data presented in this study were collected through three consecutive iterations of the course running between August-November in 2017 (Year 1, n = 170), 2018 (Year 2, n = 171), and 2019 (Year 3, n = 171). Students seemed to answer these questions thoughtfully as judged by variation in answers between sessions. Across all three years, only 7 out of 512 students showed little variation in what answers were selected (SD < 0.2), suggesting that they were “straightlining” or providing the same response in each item [31]. These responses were deleted prior to analysis. Two students repeated the course, and their data were deleted from the year they repeated, since they were much more familiar with the content than their peers which is likely to reduce cognitive load and prep time.

Data analysis

Efficiency graphs

The item response choices on preparatory assignment time, familiarity and difficulty were converted into numbers (Appendix 2). The data were first standardized by student (z-scores) and then aggregated by session for each year. Expressing the ratings as z-scores reduced variation based on an individual student’s preferences and/or differences in overall ability [27, 30, 31, 36]. The academic performance on the RAE was then plotted versus perceived difficulty to assess the efficiency of the session materials based on cognitive load (Fig. 1A). When plotted in this way, sessions which were most efficient fell in the upper left, above the y = x line where cognitive load was moderate to low, and students performed comparably well on the RAE. Less efficient sessions were in the lower right below the y = x line where cognitive load was higher and/or students performed more poorly.

Fig. 1
figure 1

A) Efficiency graphs (EG). To produce EGs, the data were standardized by student (z-scores), aggregated by session, and mean session values were plotted. Sessions above the y = x line were considered more efficient, sessions below the line less efficient. The position on the graph with respect to the line can also be expressed as efficiency metric E = (y – x)/√2 [23]

B) Comparing cognitive load- and time-based efficiency in year 1. Each dot represents one session. Sessions were ordered alphabetically and then numbered from 1–97. To better visualize the position of each session with respect to the line, we colored each dot with the value of the efficiency metric E, for time or cognitive load respectively. In year 1, 25 out of 97 sessions were very efficient (E ≥ 0.5) in either time (n = 7), cognitive load (n = 7), or both (n = 11). Similarly, 27 out 97 sessions were quite inefficient (defined as E ≤ − 0.5) in either time (n = 9), or cognitive load (n = 11), or both (n = 7)

C) Alignment of prep time with cognitive load over the years. Cognitive load based EGs for year 1 and year 3 were plotted. Each dot represents one session color-coded by prep time in hours. Graphs show a change from year 1 to year 3 in better alignment of prep time with most difficult materials

D) EGs with Cluster overlay. Cognitive load- and time-based efficiency graphs from panel B were overlayed with the cluster denomination. E) and F) Examples of iterative changes to individual sessions from year 1 to 3 in two different disciplines. The trail line illustrates the change in position on the graph over the years. The line starts with year 1. The end position in year 3 is indicated by the circular marker.

The position on the plot can also be expressed as efficiency metric E – a compound measure (E = y – x)/√2) that describes whether the materials are found above the y = x line (E > 0; more efficient), or below (E < 0; less efficient) [29]. This numeric representation was used to assess sensitivity of the PREP tool (see below). In addition to this traditional cognitive load-based efficiency, we also looked at what we called “time-based efficiency” by exchanging the difficulty rating with time spent (Fig. 1A). By assessing both metrics - cognitive-load and time-based efficiency - educators can determine whether time spent is appropriate for the intrinsic complexity of the topic.


K means clustering can be used to identify subgroups with common characteristics within a dataset (JMP®, Versions 14–16, SAS Institute Inc., Cary, NC, 1989–2019). We performed K means clustering using raw scores of preparation time, difficulty and familiarity ratings, but excluding quiz performance in order to look at the learning experience independent of outcome. Clusters were generated iteratively for each year and over a range of clusters (numbering from 2 to 5) to determine the best fit based on parallel coordinate plots that can be found in Appendix 3. Unlike with the efficiency graphs raw values were used for the clustering since we wanted to understand the absolute (not relative) values of prep time.

Instructional design intervention

After the initial end-of-course assessment of the preparatory materials in year 1, the course leadership decided to redesign preparatory materials using an iterative approach. In their end-of-course feedback, students described the student guide, RAEs, and concept videos as very effective, but also described experiencing a lot of variation in the educational quality of individual preparatory materials across sessions and disciplines. Cognitive load theory and multimedia principles provided a framework that allowed us to understand the feedback and how to respond. Table 1 presents a detailed description of how cognitive science and multimedia learning principles informed the iterative improvement of individual preparatory materials over the years. All interventions aimed to optimize intrinsic cognitive load, while reducing extraneous cognitive load [37,38,39].

Table 1 Cognitive load and Multimedia principles used in the redesign of the preparatory materials

PREP tool sensitivity

To understand whether the PREP survey was sensitive in identifying changes in the instructional design, we performed a manual audit of the materials for each session independently of the PREP survey results. Only major changes such as adding, removing or replacing resources were considered. In 2018, 28 out of 97 sessions (29%) underwent major revision; in 2019 it was 23 (24%). Of the 51 sessions that were revised, 8 were revised in both consecutive years. We subtracted the differences in E scores between consecutive years (Δ), expressed the difference as positive value, and compared the median PREP scores between those materials that had been revised with those that had not been altered using non-parametric analyses. The Mann-Whitney U test and Pearson correlations were performed in JMP. Cohen’s d effect size was calculated [40].


Response rate

Students were diligent in completing RAEs in preparing for class. The average response rate for the content-based, graded portion of the RAEs was 98±0.6%. RAEs contained an average of 9±2 content items with a mean item difficulty of 0.88±0.14 (N = 846). Despite the open-book nature of the RAEs, mean item discrimination was 0.36±0.18 (calculated as point biserial, N = 846), indicating students were likely treating the RAE as the low-stakes opportunity to test themselves on their level of preparation that it was meant to be. The average response rate for the optional PREP survey items was 94±4% (time spent), 95±3% (difficulty rating), and 95±3% (familiarity rating). The consistently high response rate suggested that embedding the PREP items into a task that students did routinely minimized survey fatigue.

Efficiency graphs

We assessed the efficiency of prep materials based on prep time and cognitive load for each session of the course (Fig. 1B). One would expect students to spend more time on content they rated as difficult, but that was not the case. Higher cognitive load efficiency did not necessarily result in lower prep time and vice versa (Fig. 1C), and we found no statistically significant correlation between prep time and difficulty.


To understand better what might determine the allocation of study time, we sought to identify materials that shared common characteristics. Clustering in 3 groups provided the best fit, with statistically significant differences across all parameters (prep time, familiarity, and difficulty) (Table 2).:

Table 2 Characteristics of prep materials in each cluster as self-reported by students
  • Materials in Cluster 1 required least prep time, meeting our target of < 2 h on average. They were also perceived as less difficult to learn from even though students were not particularly familiar with the content from courses prior to medical school.

  • Materials in Cluster 2 contained content that students were most familiar with compared to the other two clusters. Materials were rated somewhat more difficult than Cluster 1 materials, but preparation times exceeded our target (average preparation time > 2 h).

  • Materials in Cluster 3 were rated least familiar and most difficult, with average preparation times also exceeding our target (average preparation time > 2 h).

Familiar content stood out as a group with moderate difficulty (Fig. 2A). When plotting sessions by Cluster and in sequence of occurrence (Fig. 2B), the more familiar Cluster 2 sessions occurred mostly during first third of the course, while the later part of the course was enriched in Cluster 3 sessions. This indicated a natural progression where earlier parts of the course built more on knowledge acquired prior to medical school than later parts.

Fig. 2
figure 2

A) Familiarity. Familiarity ratings plotted versus difficulty and overlaid with the cluster denomination. The more familiar cluster 2 sessions stood out as a group with moderate difficulty. Cluster 1 and 3 sessions both covered content unfamiliar to the students from prior courses but greatly differed in perceived difficulty of the content

B) Course design. Sessions were plotted by cluster in the order of occurrence over the time of the course. (Please note that the numbers do NOT correspond to the labels in Fig. 1). Cluster 1 represents content that is unfamiliar and least difficult. Cluster 2 content is most familiar, and moderately difficult. Cluster 3 content is most difficult and least familiar. Preparation times differ across clusters and are discussed in more depth in the text. The course progresses from more familiar to less familiar content over time. Over the years the number of cluster 1 sessions slightly increased (not statistically significant) and cluster 3 sessions were intentionally distributed more evenly across weeks to balance the weekly workload. (A week comprises 8–11 sessions).

C) Impact. Cognitive load efficiency graphs overlaid with prep time as contour plot highlight how students increasingly invest their time in the most difficult concepts over the years.

While clustering provided us with context for the course, efficiency metrics allowed us to assess which individual sessions to prioritize for improvement. To consider both aspects together, we overlaid the efficiency graphs with our cluster categorization (Fig. 1D). We found that Cluster 1 sessions were both time- and cognitive load-efficient. Cluster 2 sessions were cognitive load-efficient but less time-efficient. Students spent more time on them than one might expect for content that is relatively familiar. Cluster 3 sessions, on the other hand, were least cognitive load-efficient, but counterintuitively more time-efficient. In other words, students spent less time on the materials than seemed appropriate for the most difficult concepts. This suggested that students disengaged from the most challenging materials when making decisions on how to prioritize their study time.

Demonstrating impact on the course as a whole

Over the next two years, preparatory materials were redesigned iteratively to better align cognitive load- and time-efficiency based on guidelines for faculty described in Table 1. Importantly, in our faculty development efforts we encouraged faculty to apply these strategies as consistently as they could to all sessions, while prioritizing Cluster 3 materials for major revisions where possible. In addition, we intentionally spread-out Cluster 3 sessions more evenly to balance the overall workload in each week of the course (Fig. 2B).

Using efficiency metrics, we were able to detect which materials had undergone revision versus those that had not been changed with large effect sizes (Table 3). This suggested that the PREP process was a reliable indicator of changes to the instructional design. Average prep time by session decreased somewhat over three years (from 2.2 to 2.0 h, p < .001), but importantly how students allocated that time changed as well. Sessions that were redesigned based on cognitive load principles (Table 1) showed a better correlation of preparation time with difficulty ratings (r (95) = 0.59, p < .0001) compared to those that were not altered (r (91) = 0.32, p < .01). The greater alignment of increased preparation time with difficult content is also visualized in Fig. 2C. Average prep time for the more familiar Cluster 2 materials declined (from 2.5 to 2.2 h, p < .003), while prep time for Cluster 1 (1.8 h) and Cluster 3 (2.3–2.4 h) remained the same.

Table 3 The PREP survey items detect changes in instructional design with large effect sizes (Cohen’s d)

In summary, changes in instructional design succeeded in shifting students’ allocation of preparation time away from the more familiar content towards the more difficult content.

Demonstrating impact on individual sessions

While the PREP process proved useful to assess the course as a whole, it was equally helpful in assessing individual sessions. Figure 1E illustrates the effect of iterative changes in one discipline consisting of 16 individual sessions that were overseen by one content expert. Over the course of two years the reading materials provided to the students as prep resources were shortened. Overall, the materials in this discipline were rated as very time-efficient. But for two sessions (#75 and #78), RAE performance dropped precipitously, suggesting that important information may have been omitted in the process of shortening the content, or that the RAE items were now otherwise misaligned with the revised content. This example demonstrates how efficiency metrics can be used to distinguish intended from unintended consequences, and provide faculty with suggestions for improvement without the need for content expertise.

Figure 1 F demonstrates another set of sessions in a different content area, including some of the most difficult sessions in the course per student ratings. Faculty reviewed the content and confirmed that these sessions covered very complex materials. They were concerned about the apparent lack of engagement with the materials indicated by the comparably low prep time ratings. Over the course of two years, some of the preparatory resources were converted to interactive online modules. These changes successfully increased student engagement with the content as measured in prep time.

Given the overall time constraints of the curriculum, we conclude that the instructional design interventions succeeded at both balancing and somewhat reducing overall workload while redirecting available time towards the more difficult concepts, in other words – it is prep time well spent.


The efficiency metrics used in PREP allow educators to identify preparatory resources based on their learner’s cognitive load and available time. We present this study as proof of concept that the PREP can be used to assess and improve preparatory materials in the flipped classroom and as such presents a novel tool for course evaluation that is based on educational theory [31].

PREP was sensitive in detecting changes to the instructional design without the need for content expertise. This made it a particularly useful tool in the context of our multidisciplinary settings. The familiarity measure proofed helpful in guiding sequencing and integration of course materials from more familiar to less familiar from prior courses, an important course design principle to optimize intrinsic load [26]. The metrics of time- and cognitive-load efficiency proved meaningful in identifying specific resources in need of revision. By expanding the efficiency concept to include self-reported prep time, a behavioral outcome measure of engagement with the materials, we expanded the utility of this approach to help address the long-standing problem of balancing content- and time-constraints in preclinical medical education [5, 7].

Based on iterative revisions grounded in CLT (Table 1), we succeeded in engaging our students on spending less time on more familiar content and focusing their time on materials that were conceptually more difficult. This is consistent with the literature. The learning process is prone to many cognitive biases and illusions, such as fluency in recalling factual information, that can mislead students to think that learning has been achieved and also can interfere with learning [26, 41]. The success of the flipped classroom approach depends on learners preparing independently and therefore raises the stakes for instructional design. A recent review highlights the need for clearly structured, interactive, and engaging out-of-class assignments for the flipped classroom to succeed [42, 43]. PREP provides educators with a framework and a tool to identify preparatory assignments that need revision and track the impact of these changes in the quality of the out-of-class assignments.

The need of novice learners for structure and scaffolding [5, 26] is easily misunderstood by educators as not wanting to put in the effort to learn. The goal of this work is not to create shortcuts or “cheat-sheets” for learners. Cognitive load theory explicitly states that the intrinsic cognitive load of a topic cannot be changed [26]. The goal is the opposite, to sustain the learner’s attention such that they stick with the hard topics. The science of instructional design helps us to support our learners to better manage their learning and make it easier to prioritize difficult content [26]. After almost a decade of experience with flipping the entire pre-clinical curriculum [44], our experience suggests that if we are committed to active learning, we must also be committed to effective instructional design of the preparatory assignments. Although developed and studied within a specific curriculum, we believe this method is relevant to other flipped-classroom settings.


This study presents a quality improvement project conducted at a single intuition and as such the specific data are not generalizable. For example, our finding of 2-hour prep time being time-efficient, might be 1 or 3 h in a different curricular context. However, we believe that the PREP process itself is likely of general interest to educators in medical and higher education. Unlike traditional end-of-course evaluations, PREP data are collected in near real-time and grounded in educational theory. As such the PREP process provides highly detailed and actionable insights into the “cognitive landscape” of the course from the perspective of the learner. The strength of this approach is its high ecological validity, though the ratings provided by the students might be prone to various biases. While we have observed a reasonable degree of variation in the data and took care to normalize by student to mitigate effects based on prior educational experience, we cannot be certain how much thought each student gives the ratings at each time. The approach may also not be useful for small classes. Future studies should look at how learners with different backgrounds, ethnicity or socioeconomic status might differ in their experience of the course.

Despite many changes made to individual study resources, the learning objectives taught throughout the three years of session-level data collection were the same. The effect of changes made to individual sessions varied, some having the intended outcomes, others indicating further need for improvement. Furthermore, the efficiency graph approach assumes that the RAE effectively measures the knowledge students acquire during prep. Select items in 9 RAEs (5 in year 2, and 4 in year 3) underwent significant revision along with changes made in prep resources. We think it unlikely that the changes to the course overall are an artifact of these specific edits to select RAE items, but for conclusions on individual sessions it will be important to take alterations in RAE content into account.


The iterative cycle of curriculum or course improvement is routinely performed by faculty and requires significant time and resources. Yet, this work is often performed based on subjective impressions and typically lacks outcome data grounded in educational theory. The success of the flipped classroom approach depends on learners preparing independently and therefore raises the stakes for instructional design. Our data-driven PREP approach provides educators with an analytic process focused on the two most challenging domains for novice learners – cognitive load and managing time. Efficiency metrics allow educators to improve instructional resources based on their learner’s cognitive needs and available time. In addition, they provide an opportunity for educators to manage and prioritize their own time in revising content, as well as to demonstrate the impact of continuous curricular quality improvement to students, colleagues and administrators in ways that are otherwise intractable. We believe that session-level approaches like PREP fill an important gap in assessing curricula not captured in traditional satisfaction-based course evaluations.

Data Availability

The data that support the findings of this study are available from the corresponding author upon request.


  1. Irby DM, Cooke M, O’Brien BC. Calls for reform of medical education by the Carnegie Foundation for the Advancement of Teaching: 1910 and 2010. Acad Med. 2010;85(2):220–7.

    Article  Google Scholar 

  2. Prober CG, Norden JG. Learning alone or learning together: is it time to Reevaluate Teacher and Learner Responsibilities? Acad Med. 2021;96(2):170.

    Article  Google Scholar 

  3. Pock AR, Durning SJ, Gilliland WR, Pangaro LN. Post-Carnegie II curricular reform: a north american survey of emerging trends & challenges. BMC Med Educ. 2019;19(1):260–0.

    Article  Google Scholar 

  4. Torralba KD, Doo L. Active learning strategies to improve progression from knowledge to action. Rheum Dis Clin North Am. 2020;46(1):1–19.

    Article  Google Scholar 

  5. Parmelee D, Roman B, Overman I, Alizadeh M. The lecture-free curriculum: setting the stage for life-long learning: AMEE Guide No. 135. Med Teach. 2020;42(9):962–9.

    Article  Google Scholar 

  6. Chen F, Lui AM, Martinelli SM. A systematic review of the effectiveness of flipped classrooms in medical education. Med Educ. 2017;51:585–97.

    Article  Google Scholar 

  7. Marshall AM, Conroy ZE. Effective and time-efficient implementation of a Flipped-Classroom in Preclinical Medical Education. Med Sci Educ. 2022;32(4):811–7.

    Article  Google Scholar 

  8. Tackett S, Steinert Y, Whitehead CR, Reed DA, Wright SM. Blind spots in medical education: how can we envision new possibilities? Perspect Med Educ. 2022;11(6):365–70.

    Article  Google Scholar 

  9. Akçayır G, Akçayır M. The flipped classroom: a review of its advantages and challenges. Comput Educ. 2018;126:334–45.

    Article  Google Scholar 

  10. Khanova J, Roth MT, Rodgers JE, McLaughlin JE. Student experiences across multiple flipped courses in a single curriculum. Med Educ. 2015;49:1038–48.

    Article  Google Scholar 

  11. Bouwmeester RAM, de Kleijn RAM, ten Cate OTJ, van Rijen HVM, Westerveld HE. How do medical students prepare for flipped classrooms? Med Sci Educ. 2016;26:53–60.

    Article  Google Scholar 

  12. El Sadik A, Al Abdulmonem W. Improvement in student performance and perceptions through a flipped anatomy Classroom: shifting from Passive traditional to active blended learning. Anat Sci Educ. 2021;14(4):482–90.

    Article  Google Scholar 

  13. Elzainy A, Sadik AE. The impact of flipped classroom: evaluation of cognitive level and attitude of undergraduate medical students. Ann Anat. 2022;243:151952.

    Article  Google Scholar 

  14. Jensen JL, Holt EA, Sowards JB, et al. Investigating strategies for Pre-Class Content Learning in a flipped Classroom. J Sci Educ Technol. 2018;27:523–35.

    Article  Google Scholar 

  15. Youhasan P, Chen Y, Lyndon M, Henning MA. Exploring the pedagogical design features of the flipped classroom in undergraduate nursing education: a systematic review. BMC Nurs. 2021;20:50.

    Article  Google Scholar 

  16. Banks L, Kay R. Exploring flipped classrooms in undergraduate nursing and health science: a systematic review. Nurse Educ Pract. 2022;64:103417.

    Article  Google Scholar 

  17. O’Flaherty J, Phillips C. The use of flipped classrooms in higher education: a scoping review. The Internet and Higher Education. 2015;25:85–95.

    Article  Google Scholar 

  18. Lundin M, Bergviken Rensfeldt A, Hillman T, Lantz-Andersson A, Peterson L. Higher education dominance and siloed knowledge: a systematic review of flipped classroom research. Int J Educ Technol High Educ. 2018;15:20.

    Article  Google Scholar 

  19. Barbour C, Schuessler JB. A preliminary framework to guide implementation of the flipped Classroom Method in nursing education. Nurse Educ Pract. 2019;34:36–42.

    Article  Google Scholar 

  20. Schiekirka S, Raupach T. A systematic review of factors influencing student ratings in undergraduate medical education course evaluations. BMC Med Educ. 2015;15:30.

    Article  Google Scholar 

  21. Harvey L. Back to basics for student satisfaction: improving learning rather than constructing fatuous rankings. Qual High Educ. 2022;28(3):265–70.

    Article  Google Scholar 

  22. Fleming P, Heath O, Goodridge A, Curran V. Making medical student course evaluations meaningful: implementation of an intensive course review protocol. BMC Med Educ. 2015;15(1):99.

    Article  Google Scholar 

  23. van Merriënboer JJ, Sweller J. Cognitive load theory in health professional education: design principles and strategies. Med Educ. 2010 Jan;44(1):85–93.

  24. Mayer R. Implications of cognitive load theory for multimedia learning. In: The Cambridge book of multimedia learning, Cambridge University Press, 2nd edition, 2014.

  25. Sweller J, van Merrienboer JJ, Paas FG. Cognitive architecture and instructional design. Educ Psychol Rev. 1998;10:251–96.

    Article  Google Scholar 

  26. Sweller J, van Merrienboer JJ, Paas FG. Cognitive Architecture and Instructional Design: 20 years later. Educ Psychol Rev. 2019;31:261–92.

    Article  Google Scholar 

  27. Paas FG, Van Merriënboer JJG. The efficiency of Instructional Conditions: an Approach to combine Mental Effort and Performance Measures. Hum Factors. 1993;35(4):737–43.

    Article  Google Scholar 

  28. Clark RC, Nguyen F, Sweller J. Efficiency in learning: evidence-based guidelines to manage cognitive load. Wiley; 2005.

  29. Van Gog T, Paas F. Instructional efficiency: revisiting the original construct in educational research. Educational Psychol. 2008;43:16–26.

    Article  Google Scholar 

  30. Paas F, Tuovinen JE, Tabbers H, Van Gerven PWM. Cognitive load measurement as a means to advance cognitive load theory. Educational Psychol. 2003;38:63–71.

    Article  Google Scholar 

  31. Szulewski A, Gegenfurtner A, Howes DW, Sivilotti MLA, van Merriënboer JJG. Measuring physician cognitive load: Validity evidence for a physiologic and a psychometric tool. Adv Health Sci Educ. 2017;22(4):951–68.

    Article  Google Scholar 

  32. Zhong L. Investigating learning efficiency and mental efficiency in a personalized role-playing-game environment. Interact Learn Environ. 2022;0(0):1–12.

    Google Scholar 

  33. Guo L. The Effects of the Format and frequency of prompts on source evaluation and multiple-text comprehension. Read Psychol. 2023;44(4):358–87.

    Article  Google Scholar 

  34. Liu TC, Lin YC, Paas F. A new application of the temporal contiguity effect in designing narrated slideshows. Educ Technol Res Dev. 2022;70(1):59–72.

    Article  Google Scholar 

  35. Ogrinc G, Davies L, Goodman D, Batalden P, Davidoff F, Stevens D. SQUIRE 2.0—Standards for Quality Improvement Reporting Excellence—Revised publication guidelines from a detailed Consensus process. J Am Coll Surg. 2016;222:317–23.

    Article  Google Scholar 

  36. Paas FG, Van Merriënboer JJ, Adam JJ. Measurement of cognitive load in instructional research. Percept Mot Skills. 1994;79:419–30.

    Article  Google Scholar 

  37. Sweller J, Ayres P, Kalyuga S. Intrinsic and Extraneous Cognitive Load Cognitive Load Theory,Springer, 2011.

  38. Young JQ, Van Merrienboer J, Durning S, Ten Cate O. Cognitive load theory: implications for medical education: AMEE Guide No. 86. Med Teach. 2014;36:371–84.

    Article  Google Scholar 

  39. Szulewski A, Howes D, van Merriënboer JJG, Sweller J. From theory to practice: the application of cognitive load theory to the practice of Medicine. Acad Med. 2021;96(1):24–30.

    Article  Google Scholar 

  40. Lenhard W, Lenhard A. Computation of effect sizes. Retrieved from Psychoterica. 2016;

  41. Bjork RA, Dunlosky J, Kornell N. Self-regulated learning: beliefs, techniques, and illusions. Annu Rev Psychol. 2013;64:417–44.

    Article  Google Scholar 

  42. Oudbier, Oudbier J, Spaai G, Timmermans K, Boerboom T. Enhancing the effectiveness of flipped classroom in health science education: a state-of-the-art review. BMC Medical Education. 2022;22(1):34.

  43. Hew KF, Lo CK. Flipped classroom improves student learning in health professions education: a meta-analysis. BMC Med Educ. 2018;18(1):38.

    Article  Google Scholar 

  44. Schwartzstein RM, Dienstag JL, King RW, et al. The Harvard Medical School Pathways Curriculum: Reimaging developmentally Appropriate Medical Education for Contemporary Learners. Acad Med. 2020;95(11):1687–95.

    Article  Google Scholar 

Download references


We are grateful to Evan Sanders for his help with assembling the data.


The project was supported by a grant from Harvard Initiative for Learning and Teaching (HILT), and it was developed in collaboration with the Harvard Medical School (HMS) Office of Educational Quality Improvement and the HMS Academy.

Author information

Authors and Affiliations



KF – analyzed data, created tables and was a major contributor in writing the manuscript. AMS – contributed to project’s conception, study design and writing of the manuscript. APC – contributed extensively to data analysis and creation of figures and tables. RK – conceived of the project and the survey tool and contributed to writing the manuscript. BAC – contributed to project’s conception and writing of the manuscript. HCB – conceived of the project, analyzed data, created tables and figures and was a major contributor in writing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Henrike C. Besche.

Ethics declarations

Ethical approval and consent to participate

The Harvard Medical School Institutional Review Board deemed this study exempt from formal review as it was considered a course evaluation for the purpose of quality improvement for which written informed consent is not required. All study methods were carried out in accordance with relevant guidelines and regulations, and meet all requirements in the Declaration of Helsinki.

Consent for publication

Not applicable.

Prior presentation

Parts of this manuscript were presented at the Harvard Initiative for Learning and Teaching innovation showcase in April 2019, at the International Association of Medical Science Education annual meeting in June 2021, and at the AMEE Lyon 2022 annual meeting in August 2022.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fischer, K., Sullivan, A.M., Cohen, A.P. et al. Using cognitive load theory to evaluate and improve preparatory materials and study time for the flipped classroom. BMC Med Educ 23, 345 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Instructional design
  • Flipped classroom
  • Efficiency
  • Cognitive load theory
  • Educational quality improvement