Marshall Memo 614
A Weekly Round-up of Important Ideas and Research in K-12 Education
November 30, 2015
1. Is poverty the reason U.S. students don’t compare well internationally?
2. A high bar for using value-added measures to evaluate teachers
3. Tips for working with adult learners
4. David Brooks on a school that develops community and character
5. The evolution of the Pledge of Allegiance
6. Using data in world language classes
7. Graphic novels in high-school classrooms
8. Short item: A model backwards-designed Algebra I course
“All over the country there are schools and organizations trying to come up with new ways to cultivate character. The ones I’ve seen that do it best, so far, are those that cultivate intense, thick community. Most of the time character is not an individual accomplishment. It emerges through joined hearts and souls, and in groups.”
David Brooks (see item #4)
“It is the foremost task of education to insure the survival of these qualities: an enterprising curiosity, an undefeatable spirit, tenacity in pursuit, readiness for sensible denial, and above all, compassion.”
Kurt Hahn of Outward Bound (quoted in item #4)
Nicole Sherf and Tiesa Graf (see item #6)
“From Day One of language learning, we should be teaching our students how to expand on what they communicate, pushing them to do so, and rewarding them for their efforts at elaborated responses.”
Nicole Sherf and Tiesa Graf (ibid.)
“Although there may be differences in views about the desirability of using VAM for evaluation purposes, there is wide agreement that unreliable or poor-quality data, incorrect attributions, lack of reliability or validity evidence associated with value-added scores, and unsupported claims lead to misuses that harm students and educators.”
An AERA statement on value-added models for teacher evaluation (see item #2)
“America’s Mediocre Test Scores: Education Crisis or Poverty Crisis?” by Michael Petrilli and Brandon Wright in Education Next, Winter 2016 (Vol. 16, #1, p. 46-52), http://educationnext.org/americas-mediocre-test-scores-education-poverty-crisis/
This AERA (American Educational Research Association) statement in Educational Researcher reviews the literature on value-added models (VAM) for evaluating teachers, school leaders, and educator preparation programs. The authors distinguish value-added models, which measure how each teacher’s performance affects students’ standardized test scores during a school year, from status models, which measure the proportion of students who exceed a performance threshold at the end of a school year, regardless of their academic standing at the beginning of that year. “Under a status model,” say the authors, “a teacher with a higher-scoring entering class typically will be advantaged in comparison to a teacher with a lower-scoring entering class. In contrast, VAM focus on test-based changes so that teachers or leaders with higher scoring entering student cohorts are not necessarily advantaged.”
But just because value-added models are superior to status models, the authors continue, “does not mean that they are ready for use in educator or program evaluation. There are potentially serious negative consequences in the context of evaluation that can result from the use of VAM based on incomplete or flawed data, as well as from the misinterpretation or misuse of the VAM results… Only if such indicators are based on high-quality, audited test data and supported by sound validation evidence for the specific purposes proposed, can they be appropriately used, along with other relevant indicators, for professional development purposes or for educator evaluation.” The authors list five prerequisites for responsible use of value-added data:
• Validity and reliability – VAM scores must be derived only from students’ scores on assessments that meet professional standards for the purpose at hand – that is, the tests used measure growth in the actual subject matter being taught and the full range of student achievement in teachers’ classrooms.
• Additional evidence – VAM scores must be accompanied by separate lines of evidence of reliability and validity that support each claim and interpretation. Does the evidence take into account the potential impact of contextual factors and selection bias?
• Evidence from several years – VAM data “should not be used unless they are derived from data obtained from sufficient numbers of students over multiple years,” say the authors. “VAM scores should always be accompanied by estimates of uncertainty to guard against overinterpretation of differences.”
• Comparability of tests over time – The transition of most states to Common Core and other revised curriculum standards in recent years can “pose a threat to the validity of the interpretation of VAM scores,” say the authors, “especially when these scores are compared before, across, and after the transition… In these instances, assessments across years may no longer be equated and the statistical links between scores are not sufficiently strong to support the validity arguments and interpretations required for VAM.”
• Student learning objectives in non-tested grades – For grades and subjects without standardized test data (e.g., K-2, art, music, physical education, health, and most high-school subjects), locally developed measures should not be used for educator accountability “unless they are accompanied by evidence of reliability and validity,” say the authors. “Because the validity of VAM scores is so dependent on the quality of the underlying assessment, they should not be implemented in grades or subjects where there is a lack of evidence of reliability and validity.”
If these technical requirements are met, the authors have three provisos for the implementation of value-added models in schools:
• Multiple measures – “VAM scores must never be used alone or in isolation in educator or program evaluation systems,” they say. “If VAM scores are used, they should be only one component in a more comprehensive educator or program education. Also their meaning should be interpreted in the context of an individual teacher’s curriculum and teaching assignments, with cautions issued regarding common interpretation problems, such as ceiling and floor effects of the tests for estimating growth for high- and low-achieving students.”
• Ongoing monitoring of technical quality – VAM analysis is relatively new, say the authors, and school districts need to keep a close eye on the quality of data and be alert to unintended consequences. “The monitoring should be of sufficient scope and extent to provide evidence to document the technical quality of the VAM application and the validity of its use within a given evaluation system,” they say. “When there is credible evidence that there are negative consequences, every effort should be made to mitigate them.”
• Transparency – The authors believe the following elements of every VAM program should be made public:
- A description of the data and the data-quality checks used;
- The methodology, statistical models, and computational methods employed;
- A rationale and explanation of how each indicator has been incorporated into the evaluation system;
- Validity evidence to support the use of the system.
When concerns or problems are reported, they say, a review should be triggered so glitches can be fixed.
The authors have one additional note of caution: “[T]he validity of inferences from VAM scores depends on the ability to isolate the contributions of teachers and leaders to student learning from the contributions of other factors not under their control. This is very difficult, not only because of data limitations but also because of the highly nonrandom sorting of students and teachers into schools and classes within schools. Consequently, such disentangling can be accomplished only imperfectly and with an unknown degree of success. The resulting bias will not be distributed evenly among schools, given wide variation in critical factors like student mobility, and could in itself make some students, schools, and teachers appear to be underperforming.”
In sum, say the authors, “the AERA recommends that VAM (which include student gain score models, transition models, student growth percentile models, and value measures models) not be used without sufficient evidence that this technical bar has been met in ways that support all claims, interpretative arguments, and uses (e.g., rankings, classification decisions). Although there may be differences in views about the desirability of using VAM for evaluation purposes, there is wide agreement that unreliable or poor-quality data, incorrect attributions, lack of reliability or validity evidence associated with value-added scores, and unsupported claims lead to misuses that harm students and educators… Ultimately, only rigorously supported inferences about the quality and effectiveness of teachers, educational leaders, and preparation programs can contribute to improved student learning.” The authors recommend further research on value-added models, and also exploration of “promising alternatives,” including the use of teacher observation data and peer assistance and review models “that provide formative and summative assessments of teaching and honor teachers’ due process rights.”
In this article in Literacy Today, Florida administrator Sloane Castleman says that one of the most positive developments in the last decade is the shift from one-shot PD workshops to instructional coaching. Effective coaches, she believes, have the potential to give teachers “growth opportunities embedded in the workplace, relevant to the specific needs of each learning and teaching community, and sustained over time.”
The problem, she’s noticed, is that not all teachers are open to working with an instructional coach. When working with colleagues, coaches need to constantly ask themselves, “What am I doing that’s conflicting with the developmental needs of adult learners?” Castleman identifies the following:
“All over the country there are schools and organizations trying to come up with new ways to cultivate character,” says David Brooks in this New York Times column. “The ones I’ve seen that do it best, so far, are those that cultivate intense, thick community. Most of the time character is not an individual accomplishment. It emerges through joined hearts and souls, and in groups.” He describes a recent visit to the Leaders School in Bensonhurst, Brooklyn, which has about 300 students speaking 22 languages, 85 percent living in poverty, and is organized on Outward Bound principles. This high school, says Brooks, “is a glowing example of community cohesion.” Here’s what struck him:
“Communities of Character” by David Brooks in The New York Times, November 27, 2015, http://nyti.ms/1IwESWd
In this article in The Language Educator, Nicole Sherf (Salem State University) and Tiesa Graf (South Hadley High School, Massachusetts) examine the kinds of assessments used by world language teachers. “The easiest data to collect and analyze are those that are objective and clear-cut,” they say – for example, filling in a missing word in a sentence, choosing the word that correctly conjugates a verb, or listing the right possessive adjective. A typical item:
Elena _______ alta.
a. está
b. es
c. tiene
d. hace
The correct answer is (b), and the item accurately assesses a fragment of Spanish grammar and is quick and easy to score. But, say Sherf and Graf, “the fill-in offers no way for the student to express a message that is meaningful or communicative, or to elaborate on Elena’s other physical characteristics and personality, if, in fact, Elena even exists to the teacher and students.
“[W]e have historically placed far too much emphasis on precision,” they continue. “We have valued correctness over communication, which has led to a focus on form rather than on communication in teaching… If the profession continues to rely on assessment through completion of disconnected, abstract and decontextualized sentences to practice or assess discrete grammar or vocabulary, students will not understand that the ultimate purpose of language learning is communication.”
Language educators can change the traditional dynamic, say Sherf and Graf, “by encouraging our students to have less fear in creating with the language and telling them that errors are a natural part of language learning. If they are not making mistakes, they are not trying hard enough. Taking risks is an important part of language learning… The data we collect and analyze to determine evidence of student growth must be connected to what our students can do with the language.”
For ideas on escaping the quick-and-easy assessment trap, Sherf and Graf harken back to the 2010 ACTFL goal of 90%+ classroom interaction in the target language. There are three key steps in making this happen:
The four criteria used to describe proficiency in this type of exercise are (a) the functions or tasks that are being completed, (b) the various contexts or curriculum content, (c) the text type or level of production, and (d) the level of precision or accuracy.
Sherf and Graf take the third, text type and level of production, and give examples of two levels of proficiency:
- Novice – the learner relies on memorizing words and phrases;
- Intermediate – able to create with the language at the sentence level.
What’s essential is getting students to respond at the sentence level. “From Day One of language learning,” say the authors, “we should be teaching our students how to expand on what they communicate, pushing them to do so, and rewarding them for their efforts at elaborated responses. If they are not encouraged and supported from the very beginning of language learning to include more information and provide strong, solid responses, they will have a hard time moving up the proficiency scale to the Intermediate level.” Students can be encouraged to think about who, what, when, where, and how to add details and use linking words like and, or, with, because, for, then, and next to extend their thinking. In the early stages, quantity is paramount; as students develop proficiency, they can begin to think about how to vary sentence types. Sentence starters like these are also helpful:
- My best friend is ….
- My best friend has….
- My best friend needs….
- I like my best friend because….
- I am with my best friend when….
To assess, teachers can record the number of words written and the amount of time students can talk with each other, and track progress as a unit progresses.
Another way to develop fluency is to have students write a weekly journal entry for a given number of minutes, answering an open-ended question on the context of the unit (for example, in a unit on the family, writing about a favorite family member, a celebrity family, or a made-up family based on TV characters). Students should keep their pencils or pens moving without worrying about correctness, not using dictionaries, and focusing on the message. Students can keep track of their word count, focusing on quantity of writing, and gradually transition to assessing and improving the quality of their entries – for example, the number of connected thoughts, extensions, and elaborations.
To measure students’ oral proficiency at the beginning and end of a unit, Sherf and Graf suggest having students take out their cell phones, dialing a number attached to the teacher’s Gmail account, and using Google Voice to speak for one minute in response to a prompt (for example, in a vocabulary unit on houses, they might be asked to describe their ideal house, or describe what is special about a specific room in their house). Students’ messages are recorded in easy-to-access files in the teacher’s Gmail account. “Amazingly, the recordings are clear and easy to understand even though all students are speaking at the same time,” say the authors. “It is best to give the task to the students and ask them to call immediately during class, offering no time to think through their answers. This trains students to speak spontaneously and to respond to the assignment quickly, an important skill in interpersonal communication.” (Sherf and Graf add that it’s important to remind students to say their names at the beginning of their message.) If cell phones can’t be used, students might use Google Voice, Audacity, or some other voice recording application in the school’s language lab.
A model backwards-designed Algebra I course – Check out this course designed by the Alexandria City Public Schools in Virginia using the Understanding by Design framework: http://www.acps.k12.va.us/curriculum/design/sample-algebra-course.pdf
© Copyright 2015 Marshall Memo LLC
About the Marshall Memo
Mission and focus:
This weekly memo is designed to keep principals, teachers, superintendents, and others very well-informed on current research and effective practices in K-12 education. Kim Marshall, drawing on 44 years’ experience as a teacher, principal, central office administrator, and writer, lightens the load of busy educators by serving as their “designated reader.”
To produce the Marshall Memo, Kim subscribes to 64 carefully-chosen publications (see list to the right), sifts through more than a hundred articles each week, and selects 5-10 that have the greatest potential to improve teaching, leadership, and learning. He then writes a brief summary of each article, pulls out several striking quotes, provides e-links to full articles when available, and e-mails the Memo to subscribers every Monday evening (with occasional breaks; there are 50 issues a year).
Individual subscriptions are $50 for a year. Rates decline steeply for multiple readers within the same organization. See the website for these rates and how to pay by check, credit card, or purchase order.
Website:
If you go to http://www.marshallmemo.com you will find detailed information on:
• How to subscribe or renew
• A detailed rationale for the Marshall Memo
• Publications (with a count of articles from each)
• Article selection criteria
• Topics (with a count of articles from each)
• Headlines for all issues
• Reader opinions (with results of an annual survey)
• About Kim Marshall (including links to articles)
• A free sample issue
Subscribers have access to the Members’ Area of the website, which has:
• The current issue (in Word or PDF)
• All back issues (also in Word and PDF)
• A database of all articles to date, searchable
by topic, title, author, source, level, etc.
• A collection of “classic” articles from all 11 years
Core list of publications covered
Those read this week are underlined.
American Educational Research Journal
American Educator
American Journal of Education
AMLE Magazine
ASCA School Counselor
ASCD SmartBrief/Public Education NewsBlast
Better: Evidence-Based Education
Center for Performance Assessment Newsletter
District Administration
Ed. Magazine
Educational Evaluation and Policy Analysis
Educational Horizons
Educational Leadership
Elementary School Journal
Essential Teacher
Go Teach
Harvard Business Review
Harvard Educational Review
Independent School
Journal of Education for Students Placed At Risk (JESPAR)
Journal of Staff Development
Kappa Delta Pi Record
Knowledge Quest
Literacy Today
Middle School Journal
Peabody Journal of Education
Perspectives
Phi Delta Kappan
Principal
Principal Leadership
Principal’s Research Review
Reading Research Quarterly
Responsive Classroom Newsletter
Rethinking Schools
Review of Educational Research
School Administrator
School Library Journal
Teacher
Teaching Children Mathematics
Teaching Exceptional Children/Exceptional Children
The Atlantic
The Chronicle of Higher Education
The District Management Journal
The Journal of the Learning Sciences
The Language Educator
The Learning Principal/Learning System/Tools for Schools
The Reading Teacher
Theory Into Practice
Time Magazine
Wharton Leadership Digest