The evaluations indicate that designing and implementing a competency-based assessment programme poses quite a challenge and demands intensive preparation and perseverance. The theoretical principles provided useful guidelines, and evaluating the programme and formulating lessons learned were vital steps towards improving the programme. The mixed composition of the research team (containing both clinical supervisors and educational researchers) was a key factor during the development and implementation phase. The clinical staff members on the research team played an invaluable role in facilitating the transfer of the assessment programme on paper to its implementation in practice. We will discuss the answers to each of the research questions.
Can data from multiple individual assessments be used to combine the formative and summative functions of assessment?
The evaluation data provided no conclusive answer to the question if formative and summative functions of assessment can be combined in multiple assessment data points. Despite general acceptance of the usefulness of WBA instruments for formative assessment, their value for summative purposes is disputed [17, 18]. The definition of formative assessment as used in the FVMU assessment programme proved to be misleading. The fact that all data points ultimately contributed to the final summative decisions caused students to perceive all individual assessments as summative rather than formative. In the eyes of the students, the final summative judgement was merely postponed until after the data points from the assessments were aggregated. The mismatch between the intended purpose of individual assessments and students’ perceptions of its role may partly be explained by students’ and teachers’ insufficient preparation for and instruction about the new programme. The programme designers may have underestimated the fundamental importance of faculty development and student training. Furthermore, it seems that the criteria for the final assessment could have been explained more clearly: which performance standards were used, how data were aggregated, how the final mark was determined, which remediation programmes were possible, and which purposes were served by the assessment programme. If students and clinical supervisors would have interpreted the value of individual low-stakes assessments in the same way students may have been better able to focus on the potential learning value of WBAs rather than on their summative consequences.
Can information from individual assessment data points be aggregated meaningfully?
In the FVMU assessment programme a competency framework is used to aggregate information from individual data points of similar content [12, 15]. Since what a test or item assesses is not determined by its format but by its content  and considering that assessments should not be trivialised in the pursuit of objectivity (e.g. by designing scoring rubrics for portfolios ) it seems of the utmost importance that in programmes of assessment subjective elements should be optimised by the sampling procedure and by combining information from various sources in a qualitatively meaningful manner . Inevitably, this involves human judgement implying that the quality and expertise of judges are crucial for the quality of assessment [21, 22]. This has important implications for teacher training. A single briefing, workshop, or training session does not suffice for assessors to reach the required level of expertise. On the job training, constant feedback, and supervision are needed . This is in line with the findings from this evaluation, and we consequently redesigned the programme by including biweekly PCW meetings for training purposes and to exchange experiences.
Can assessment drive desirable learning?
In their theoretical model Van der Vleuten et al. defined learning and assessment activities as two separate entities whose boundaries are blurred . Assessment activities are part of the learning programme , but can they drive desirable learning? During the clinical clerkships students encountered many and varied learning activities (physical examination, history taking, ward rounds) each offering potential assessment opportunities. According to Prideaux, assessment and learning should be aligned to achieve the same goals and outcomes . This is congruent with the principle that all assessment activities, and as a consequence all learning activities, should be maximally meaningful to learning. This is consistent with the conceptual shift from assessment of learning to assessment for learning , and further still to assessment as learning. Previous studies have shown that trainees indicated a need for structure and guidance in the transition from novice to the level of being competent. A programme of assessment containing instruments structured to facilitate this process, could support learning and monitor progression at higher levels of professional development [7, 8]. The FVMU assessment programme, however, appears to have failed in creating an environment that gives full reign to assessment for learning. Feedback appears to have been the main stumbling block. Perceiving all WBAs as summative and a burden to supervisors, students were reluctant to ask for assessment with feedback, while supervisors claimed that time constraints impeded high quality feedback. This is in line with research reporting difficulties encountered while implementing tools to provide formative feedback [26, 27]. Besides the poor quality of narrative feedback and the lack of direct observation, the administrative burden was mentioned as an explanation for trainees to perceive narrative formative feedback as not very useful [26, 27]. For the coming years the main challenges will lie in creating a clinical environment that is intrinsically supportive of feedback, e.g. by simplifying documentation (e.g. user-friendly assessment instruments using mobile devices), feedback training for students and supervisors, and integrating WBA within the clinical organisation, as described in earlier research .
How can reflective and self-directed learning activities be promoted?
From the literature we know that it can be quite a challenge to have students reflect upon feedback let alone use it to plan new learning tasks [29, 30]. To address this problem Van der Vleuten and Schuwirth proposed a combination of scaffolding of self-directed learning with social interaction, leading to the peer group meetings in the programme . Both students and supervisors acknowledged the value of peer feedback in teams of senior and junior students. Previous research also showed potential benefits of peer-assisted learning for both junior and senior students [31, 32]. Ten Cate and Durning recognised the potential of peer-assisted learning during undergraduate clinical training, or “cognitive journeymanship”, and of incorporating valuable information from peer feedback (high-stakes assessment) . The use of peer feedback is also in line with the notion that variety in instruments and sources is prerequisite for a complete picture of learner performance [10, 33]. Recent research into students’ feedback-seeking behaviour during clinical clerkships showed that students sought information from different sources depending on a context-dependent assessment of the potential risks and benefits of feedback . Apparently, when seeking feedback to achieve certain goals students strive to balance expected negative effects with potential benefits. We therefore propose to encourage teamwork during clinical rotations to encourage the use of feedback skills by students. Furthermore, students seemed to prefer social interaction and external direction by a personal mentor. This mentor could play an important role in guiding students to reflect on their past performance and in planning new learning goals. This is in line with literature stating that scaffolding of self-directed learning needs mentoring .