Surveys of students’ perceptions of teaching: a cautionary tale
In semester 1 this year Internet Studies staff ran the very successful unit Internet Communities and Social Networks 204/504, through both Curtin and OUA. The centrepiece for this unit was the 3-week online conference which students participated in, by writing conference papers, posting them to our website and then discussing both their own and others’ papers. This very successful conference is now over but you can observe the results at the Debating Communities and Networks site. The unit was, clearly, not your normal ‘teaching and learning experience’ – all assessment, tasks and activities, resources and discussions, were aligned with making the conference work successfully – and ‘learning’ was a secondary (but very successful) outcome.
I am now, in concert with the unit controller Dr Michael Kent, doing some research into the experiences of this unit and what it might tell us about online learning, student motivation, and authentic assessment. I will be sharing some of these thoughts with you elsewhere, including giving a paper called “Going Public with Learning” at a conference in September organised at Murdoch University by Ingrid Richardson. (abstract)
However, something interesting is emerging from the research as it relates to the use and interpretation of the student surveys we use at Curtin (known as Evaluate. Because the unit ran in almost identical fashion for three different cohorts of students, at the same time and with the same teaching staff, curriculum and so on, we are now able to compare and contrast the results from Evaluate based on the differences that might be discerned from the students who respond. The only significant difference is that one cohort was most likely to have also attended a physical classroom for 2 hours a week as well as doing all of the online activity.
This situation is important. As we know, evaluation of teaching at university has become standard now in Australia. Some of the reasons for this situation are good: it is important for academics to treat their teaching as research and to inquire, empirically, into how it is working, both to improve individual units of study and also to become better all-round teachers. But some of the reasons are bad: surveys are often used in crude ways to manage teaching performance (rewards and criticisms both), or they are reported in generalised ways to show how great an area, course or university is for marketing. And, while there may be some contestation over my characterisation of the reasons as good or bad (after all, perhaps t is good to manage performance using surveys), there can be no doubt that the validity of the research or management based on student surveys rests on the quality and sophistication of the instrument: does the survey measure what it purports to measure?
Evaluate, Curtin’s instrument, has its strengths and weaknesses which you can judge for yourself: here are the items in the survey (to which students respond using a classic Strongly Agree/Agree/Disagree/Strongly Disagree/No opinion scale):
- The learning outcomes in this unit are clearly identified
- The learning experiences in this unit help me to achieve the learning outcomes
- The learning resources in this unit help me to achieve the learning outcomes
- The assessment tasks in this unit evaluate my achievement of the learning outcomes
- Feedback on my work in this unit helps me to achieve the learning outcomes
- The workload in this unit is appropriate to the achievement of the learning outcomes
- The quality of teaching in this unit helps me to achieve the learning outcomes
- I am motivated to achieve the learning outcomes in this unit
- I make best use of the learning experiences in this unit
- I think about how I can learn more effectively in this unit
- Overall, I am satisfied with this unit
The aim, broadly speaking, is that the survey assess the curriculum and content of the unit and the design of the learning experience, rather than specific teachers. In other words, Evaluate attempts to assess curriculum, abstracted from the specifics of the teaching and learning activities. It also attempts to provide insight into the students’ mindset through items 8-10 though in practice these items are treated at Curtin as they were comments by students also on the quality of the unit or its teachers. Thus, in general terms, Evaluate attempts to use student perceptions as a direct measure of the realities of the quality of the teaching and learning experience, with students positioned as informed and reliable judges of that quality.
In most cases at Curtin there is just one cohort of students for each unit, completing the Evaluate survey. There is no demographic information to enable internal comparisons. But, for NET204, in semester 1 2010, we had a very unusual situation in which the same unit was taught using three different unit codes, for 3 different groups, thus enabling 3 different and differentiated data sets to be generated. One offering was for OUA students (all external); one was for Curtin-based undergraduates (mostly internal); one was for Curtin-based graduate students (e.g. new-to-area coursework students, not higher degree students) – (mostly external). (the samples and populations were: OUA n=21, from 68 possible respondents; Curtin undergrad n=16 from 35 possible respondents; Curtin graduate n=9 from 16 possible respondents)
So what happens when different results achieved in the Evaluate survey for these three different cohorts, remembering that with the exception of the classroom contact for internals, and some separation of students for the first 1/3 of the study period, all were treated to an effectively equivalent experience? What can we learn about Evaluate itself when we compare results from a similar activity but assessed by three different sorts of students – where the main difference in the ‘learning’ comes from the students themselves?
First of all, the immediate obvious finding is that Curtin undergrads were less likely to be satisfied with the unit overall – (item 11). 95% of OUA students, and 100% Curtin graduates ‘agreed’ (either SA or A) that theu were satisfied; only 75% of Curtin undergrads agreed. And, on average, these undergrads scored the unit 10% lower on all 12 items. In other words, even with caveats about sample size, response rate and so on (caveats that rarely matter for internal management in any case), we get a face-value difference that is somewhat troubling.
The only reasonable conclusion I can draw from this is that the STUDENTS, not the curriculum or teaching, explain the difference. Curtin undergrads had a class *as well as* all the online work and thus can be assumed to have had a richer / better teaching experience of the same content. Yet they were less satisfied. I conclude that the most likely reason for this is that, on the whole, Curtin undergrads have a more teacher-centric approach to their studies and thus an authentic, challenging learning experience is not as satisfying for them because it does not fit their expectations.
How do I arrive at this conclusion? Well, digging deeper into the data, Curtin undergraduates were notably more like to agree that they had made best use of the learning experiences (+7% from average) and were more likely to agree they thought about how best to study (slightly more than OUA; a lot more than graduate students). Graduate students and OUA students had lower scores on these self-rating items. I draw the inference that Curtin undergraduates *believe* they are studying well and perceive the difficulties to be the teacher’s fault (they are not taking responsibility for their learning as much as the others); OUA and, especially, graduate students are actually studying well, but take more responsibility for problems, thinking it is their fault. They are therefore more likely to be satisfied with a unit (even if they don’t make as much of it as they could) which challenges them to be responsible for what they are learning.
Let’s also look at the item on feedback: we know feedback is the most troublesome area in all student evaluations and usally the source of the worst scores on Evaluate. Remember that, in this case, all students – across the 3 groups – received exactly the same extensive feedback (including that they had their main assignment marked, commented, suggestions for improvement and then were able to resubmit it with improvements for a better grade). Even the classroom contact would not have materially changed this situation (and might even have allowed for more feedback). Despite this equivalence, Curtin undergraduates rated feedback 19% lower than the other two groups! My interpretation is that students’ responses to the feedback item are not a reflection of the feedback given, but – rather – students’ interpretation of what feedback should be. In other words, because Curtin undergraduates got extensive and helpful feedback which required them to do more (so as to learn and improve), they actually believed that was ‘poor’ feedback – because it didn’t fit with their inflated expectation first time around or that the teacher ought to have told them how to do a good job before the assessment and therefore poor performance, leading to critical feedback, is not their fault in the first place.
Finally, let’s look at the key question of motivation (the unit was specifically designed to maximise motivation by giving students responsibility for their learning). Curtin undergraduates varied in their agreement with motivation by 12% – in other words, despite identical approaches to motivating students, the Curtin undergraduates felt themselves to be less motivated. What this suggests (again not surprisingly) is that motivation is correlated with the internal dynamics of the student, and not necessarily amenable to control by what teachers do. Of course, teachers must be focused on motivating students (indeed that is the point of authentic assessment in many cases): but surveys must be used cautiously when assessing the degree to which teachers have achieved that goal since it is, in truth, only possible for students to be motivated when a partnership (rather than a relation of domination and control) is at least approximated.
In conclusion, this unusual situation – 3 different cohorts, all responding in significant numbers to the same survey, on the same unit, with all variables pretty much the same except for cohort membership – shows the challenge of Evaluate and similar surveys. They do a good job of assessing student perceptions of teaching and learning. With some fine analysis they can also suggest ways of managing those perceptions for the better. But what they cannot do is substitute student perceptions for measures or evaluations of actual quality.
Disclaimer: This analysis is not a rigorous statistical reading of the data. That task is, in fact, impossible because of the way it is collected and presented and, moreover, would require different items to be asked in the first place. It may not be statistically significant that these variations emerge but, that said, it does on the face of it, make me suspect that there is a major difference between the purported measurement and the actual measurement goals. Furthermore, since the survey results are used for management purposes with little regard to good statistical practice, I am playing by the same rules as those who require the surveys of is
Interesting results. Having four years of uni behind me before taking this unit made it probably helped me and the way I approached the conference. You’re expected to take more responsibility for your learning in uni than in most high schools, which is beneficial in the long run but can be hard to get used to.