Assessments and Testing – Are We Doing It Correctly?

What Can We Learn From the Research and Clinical Sciences Models?

Several years ago, my colleague, Dr. Paul Eslinger, and I were asked to write a short piece about “high-stakes” testing in K-12 systems for The EducationPolicy and Leadership Center. At the time, Paul was a neuropsychologist that worked with patients suffering from various forms of cognitive impairments caused by a number of medical conditions ranging from accident-related head trauma to stroke and brain tumors, Alzheimer’s disease, and dementia. At the same time, I was Chief of Developmental Pediatrics and Learning at the College of Medicine at Pennsylvania State University. test-scoresIt was at around this time, in 2000-2004, that I became interested in being able to translate basic and medical neurocognitive research to human learning with emphasis on how such information might be applied in classrooms.

The importance of “high stakes” testing came to the national forefront as a result of the push for annual improvement on such tests inherent in the then newly approved No Child Left Behind program. The following is an abbreviated portion of our discussion that will serve nicely as background for more specific discussions of assessment in science education in upcoming blogs.

Scientists Use Tests All the Time

Science performs many tests as it tries to understand the physical and biological principles that make up the world around us. Tests and analysis of test results, it might correctly be argued, are the indispensable tools of modern science. However, scientists know that each test is subject to artifacts involved in both the collection and interpretation of data. To help alleviate this unavoidable property of scientific investigation, researchers typically employ a variety of tests whenever possible. Multiple avenues of examination can function to expose anomalous results of a particular test or, even better, confirm results through several different strategies of analysis. How many tests are enough to be confident of a given physical or biological phenomena? Simply put, one can never be too confident. The more independent lines of evidence the better. Further, the greater the importance of a particular scientific investigation – the greater the significance of the findings – the more independent tests and analysis are appropriate. Nowhere is this point more important than in fields of investigation in which tests directly impact humans, such as the medical and clinical sciences. Few diagnoses rely on a single test of any kind. The physical examination is coupled with multiple laboratory test results, radiological imaging studies, second opinions by independent experts and so on, in order to achieve convergence of results and consensus of findings. In this very important, ‘high stakes’ endeavor of human health and welfare, no single test, or series of tests for that matter, is given simply to determine if the patient or doctor passes or fails! Instead, standardized tests provide one means of ascertaining the physical state of an individual at a given moment in time but must be combined with other assessments in order to be a sensitive and specific guide for diagnosing a disorder. Then, some of the same tests are used again to evaluate progress, treatment effectiveness and demonstrate the ultimate cure.

Clearly, this discussion of the prudent use of converging and discriminating tests and exams in scientific investigation and medical practice has some relevance when we discuss high stakes testing in the basic education environment. If science and medicine had chosen, instead, to search for and use a single test to assess all aspects of illness and individuals, with no attempt to independently confirm its validity and predictive value, we would likely have not progressed much over the years.

Declarative and Procedural Memory

The brain has several inter-related memory systems. Two of the most important for education are declarative and procedural memory. Declarative memory refers to the conscious recollection of facts, knowledge, experiences, and events. This system mediates the general, specific and personal aspects of declarative knowledge that are acquired through intentional learning.

In contrast, procedural memory refers to sensory-motor and skill-based learning (e.g., knowledge of how to ride a bike, play the piano, use a microscope, etc.), acquired through direct experience. Procedural memory does not require explicit recollection of initial learning experiences but rather provides the accumulated benefits of hands-on learning and knowledge. Clearly, this critical memory system is not assessed well, if at all, on standardized tests.

Learning and Consolidation Processes

Learning and consolidation processes are closely linked to children’s abilities to both register and retain new information and experiences. While key aspects of learning are better understood (e.g., being prepared, having an overview, paying attention, using and manipulating new information in interesting ways, etc.), consolidation is less clear to many educators. Consolidation is the process responsible for transferring new information from short-term to long-term memory for later retrieval (see figure, below).

IPM_Web_SMAs new information must be “processed” for this to happen, numerous teaching approaches are directed at increasing the amount of attention and processing that are applied. That is, appealing to students’ natural interests, previous knowledge, and “higher order” thinking when consolidating new information, will increase the amount of processing the new material receives. On the other hand, simple memorization requires far less processing and little critical thinking. The use of externalizations, such as “hands-on” and “inquiry-based” activities may be particularly beneficial for increasing information processing.

Consolidation is a complex process in which the brain converts new material from fleeting traces of information to long-term memories. These are then stored in more stable form with existing knowledge in brain areas called association cortices. New information begins this process by entering the brain as new input from various stimuli. Brain structures involved at this level include the major senses of touch (somatosensory cortex), sound (auditory cortex) and sight (visual cortex).

This multi-modal information is then ready for frontal lobe analysis where executive function helps to filter, compare and interweave it with existing information, already present in long-term storage. Thus, sensory-perceptual traces must be “held” onto mentally until the consolidation processes are completed. The overall process of holding onto new information while comparing it to existing information is sometimes referred to as working memory.

Once consolidated, long-term storage of memories is organized into knowledge structures throughout the various association cortices for later retrieval. Thus, students store information in a variety of places in the brain. Further, students retrieve stored information in different ways, largely dependent upon the manner in which the information was initially stored and the way they are asked to retrieve it. Lists of facts, names, dates, geographical locations and so on, without developed meaning, may be difficult to retrieve in isolation – particularly as time passes. Similar information and facts that are more critically developed by the teacher and “deeply processed” by the student will be easier to recall from memory and, more importantly, will be available for more in-depth future thinking. A student’s ability to “transfer” learned martial to new situations and circumstances is a good indication of how well they actually know and understand the material. It is difficult for any single standardized test to assess a student’s ability to recall and use information located in all of the many brain structures where long-term memories are stored.


The Interrelationships Among Learning, Intelligence, and Executive Function

Critical thinking and problem solving, perhaps surprisingly, are not really measured through standardized tests of intelligence, specific content knowledge, and even operational skills and judgment when they are devoid of appropriate, real-life contexts. This has been demonstrated through numerous kinds of studies in child development and neuropsychology. For example, Eslinger and Damasio (1985) described a patient who retained exceptional intellectual, reasoning, operational skill, and even judgment capacities after a brain tumor, as long as he was assessed with paper and pencil types of measures (e.g., who was president during the civil war….why do we have child labor laws….what is the basis for the federal taxation system, etc.). All of these measures were ‘out of context’ and not related to any real-life tasks or situations. His scores were in the superior range. However, in any real-life settings, this person could not organize his work, formulate and follow a step-by-step plan to complete an assignment, look ahead and anticipate possibilities, and accurately monitor his own progress. In short, his executive function was impaired.

In child development and throughout adulthood, intelligence and executive function do not go hand in hand. There are virtually no correlations between measures of intelligence (encompassing specific content knowledge, some problem solving skills and general judgment) and measures of executive function (encompassing capacities for planning, organization, working memory, anticipation, and self-monitoring (Archibald and Kerns, 1999; Ardila, 1999; Crinella and Yu, 2000; Eslinger and Damasio, 1985; Welsh et al., 1991)). Furthermore, while intelligence and standardized paper and pencil tests may adequately assess “what” a child knows, executive function is thought to underlie the “how” of learning and knowledge, such as: how Gettysburg was related to the outcome of the Civil War and how energy is related to motion.

Fortunately, executive function is a teachable skill that students can acquire and is related to every content area they study (Eslinger, 1997). Executive function is demonstrated through the process of critical thinking, how children go about problem solving (e.g., identifying and utilizing resources, formulating a plan, seeking feedback, improving upon their first attempt, etc.), and the product of those collective cognitive and behavioral processes.



The utility of employing a single, annual, formalized test to ascertain student achievement will inescapably be linked to the ability of that test to assess all that we view as important in the education of a child. This is no small challenge. As discussed above, an important role of our frontal lobe is the phenomena know as “executive function”. Executive function is intimately involved in our ability to think critically, solve problems, plan for the future, and follow and modify our plans as new situations arise, while keeping our initial goals in mind. These skills, one could argue, are at the very heart of what we would hope all educated citizens could do (Verner, 2002). While we may well demand that the development and cultivation of executive function be a priority accomplishment of public education, we cannot easily assess, through existing high stakes testing, whether or not it has actually done so. Furthermore, as discussed above, evidence actually suggests that there is little, if any, correlation between executive function and IQ, another cognitive parameter assessed through a formalized, paper and pencil test.

Recommendations for Student Assessments Based on Cognitive Science Considerations

  • Use multiple assessment approaches in order to categorize an individual student’s many strengths and weaknesses. Do not use a single test.
  • In addition to sit-down, “paper and pencil” exams, use testing approaches that can ascertain the student’s procedural knowledge (“how”) abilities.
  • Include significant analysis of executive function capabilities in student assessments.
  • Assure that recall of information from memory is connected as closely as possible to the mechanism by which it was committed to memory. Try to develop more valid links between instruction and assessment that capitalizes on the benefits of contextual cues and setting.
  • Ideally, try to use assessment results as an integral component of a student’s instruction, rather than exclusively as a measure of success or failure. Use assessment results to direct future instruction and/or remediation of individual students.
  • When using tests in the analysis of performance of teachers, schools or school districts, take into account the natural range of student abilities and accentuate longitudinal, multiple parameter analysis as opposed to single, high stakes exams. Just as it is difficult to assess an individual student’s overall success by a single test at a single point in time, it is also difficult assess the success of an entire educational system by the same measure.
  • Given the political and financial importance associated with high stakes testing to school districts, policy makers should be aware of potential negative impacts of such tests on education. The justifiable desire to assure and monitor quality education for all children in the basic education system may inadvertently result in undesirable instructional strategies such as:
  1. “Teaching to the test” (which may ultimately lead to less content and more test-taking instruction),
  2.  Minimizing the importance of procedural skills because they will not be tested on formalized tests (such as correctly operating scientific instruments, reciting poetry, participating in academic debates, drawing a schematic of a model, and so on.),
  3. Minimizing the development of executive function abilities as these are not readily assessed by standardized tests,
  4. Diminishing the joy and respect for learning and the student’s desire to continue school through graduation and beyond.

Finally, the desire to assure and monitor quality education for all children is commendable. However, it is also critical enough to employ frequent analysis using multiple forms of testing, with the intention of using results to improve the education of individual students and the instructional strategies of local educational systems. In upcoming Blogs we will begin to discuss, much more specifically, how assessments can be used as an integral component of science education and a driving force behind the development of critical thinking skills.


Archibald, S.J., Kerns, K.A. (1999). Identification and description of new tests of executive functioning in children. Child Neuropsychology 5: 115-129.

Ardila, A. (1999). A neuropsychological approach to intelligence. Neuropsychology Review 9: 117-136.

Crinella, F.M., Yu, J. (2000). Brain mechanisms and intelligence. Psychometric g and executive function. Intelligence 27: 299-327.

Eslinger, P.J. (1997). Brain development and learning. Basic Education, 41, 6-8.

Eslinger, P.J., Damasio, A.R. (1985). Severe disturbance of higher cognition after bilateral frontal lobe ablation: Patient EVR. Neurology, 35, 1731-41.

Verner, K. (2001). Connections in the Classroom: Brain-Based Learning. In Basic Education. 45: 3-7.

Verner, K. (2002). Transcending the Status Quo:Scientists and school educators need to join forces to raise student proficiency in science. HHMI Bulletin.

Welsh, M.C., Pennington, B.F., Groisser, D.B. (1991). A normative developmental study of executive function: A window on prefrontal function in children. Developmental Neuropsychology 7: 131-149.

Preschool Assessment

The preschool portion of the LabLearner program was assessed over a five-month period in eleven preschool classrooms, including Head Start and STEP classrooms, in three different states: Pennsylvania, Florida, and Virginia.  The goal of the assessments was to determine whether the LabLearner Preschool Program fostered the development of critical thinking, problem solving, numeracy, fine motor control, oral language, and literacy skills in preschool children. Two PreK-Assessmentareas of impact were evaluated: receptive and expressive science tool vocabulary, and science concept comprehension. A total of 100 matched cases of preschool children aged 3.5 and 5.5 years were included in the study.

Results showed that children’s receptive and expressive vocabulary for the science tool names increased over time and these gains were statistically significant (p< 0.05). With regards to science concept comprehension, the study centered on the question of whether children would be able to grasp (and be able to articulate or otherwise demonstrate) an understanding of the most important concepts contained within selected LabLearner activities. Focus Questions were developed for selected activities, and example linguistic and behavioral responses were listed for teachers to compare children’s responses to as they reported them. Teachers reported whether or not the child could successfully answer the question.  The table here provides an example of the types of focus questions used in the hands-on science investigations.  The complete descriptive data for the study showed that for the majority of the science activities, children were able to demonstrate an understanding of the central scientific concept, either using language, or appropriate behaviors.

Visit a LabLearner Preschool Classroom


Pre-Post Assessments

The preK-8 LabLearner Program consists of some 60-plus individual science units called Core Experience Learning Labs (CELLs). Each CELL takes approximately four or five weeks for a class to complete, working in teams of four to six students. In grades one through eight, students complete a pretest, taken before the CELL begins and a posttest, taken at the end of the CELL.

Pre/post assessments are fixed response evaluations where identical questions presented in an adjusted sequence are asked in both the pre-test and post-test documents. Questions on the pre/post-tests reflect the cal_iconscience concepts taught within the investigations of a CELL. Questions require students to apply their knowledge of a CELL concept to a new situation, interpret experimental results from a table, graph, or mathematical formula, and correlate experimental observations with scientific concepts.

In a 2001 study conducted with 1268 public school students from grades one through six, student performance from pre-test to post-test in grades one through six increased by an average of 40%. Comparison of pre- and post-test scores using the Student’s Paired t-test indicated that the improvement in students’ comprehension of science and math concepts taught in the CELLs was statistically significant (p<0.0001) for CELLs in all grade levels.  In addition, analysis of the scores (p<0.05) suggested that there was no statistically significant difference on pre/post test performance based on student gender or the teacher facilitating the CELLs.

The graphic below, presenting data obtained in 2009, is indicative of similar studies. This study, undertaken by 285 students over the course of a year, illustrates the increase in student post-test performance in grades one through eight.  In general, differences in student comprehension between pre and post-test tend to increase as students move from primary to intermediate grades. The increase in performance is maintained through the middle school grades, a time in which U.S. schools tend to see a decrease in student science comprehension. These results are typical for LabLearner schools and illustrate the significant impact of the LabLearner program on student comprehension of scientific concepts.


Blue Ribbon Awards

The U.S. Department of Education, National Blue Ribbon Award is one way to assess the academic impact of a curriculum or program. The Blue Ribbon is widely considered the “highest honor a school can achieve”. This is because the Blue Ribbon is not just a measure of minimal compliance or simple standardized test scores. It is a much more comprehensive assessment, involving a multiplicity of objective criteria that allows teachers, students, parents, and community representatives to assess the school’s strengths and weaknesses and develop strategic plans for the future. Based on such criteria, few schools even receive the mandatory site visit for Blue Ribbon consideration.


LabLearner science has been a spectacular addition to our curriculum. Our students have developed a more sophisticated desire to learn at higher levels. Their intrinsic motivation has been enhanced by the infusion of our LabLearner program. We have students who demand that their parents allow them to come to school on lab day even when they are not feeling well. That’s a fact! The LabLearner teacher support is fabulous and we would highly recommend this meaningful learning system to any and all school districts. The LabLearner system was a major ingredient in our being named a National Blue Ribbon School“. – Doug Jacobson – Superintendent, Barnes County North School District, ND.

We used science and specifically LabLearner as the main curriculum area in our Blue Ribbon application. It provided a unique way of making science academically advanced while making learning fun for the students. I do think it was an important piece of why we received the National Blue Ribbon Award.  We also highlighted the program in applying for a number of different science and math grants. Parents and other visitors are amazed by what is available through LabLearner. The materials are excellent. The children are extremely well prepared for high school.” – Sister LaVerne King – Principal, Christ the Teacher School, DE.

Standardized Assessments

Within the U.S. there is a large diversity of standardized tests that public and private schools choose to assess their students in science. LabLearner schools across the U.S. utilize a variety of standardized tests such as the Virginia Standards of Learning (SOLs), the Michigan Educational Assessment Program (MEAPs), the Standard Achievement Test 10th edition (SAT 10), and Iowa Test of Basic Skills (ITBS), and many others.

pencil-icon-WebLoudoun County Public Schools in Northern Virginia began incorporating LabLearner experiments into their curriculum beginning in 2004. Prior to implementation of the LabLearner Program, Loudoun students scored in the mid- to high-ninetieth percentile on the Virginia’s science SOLs. In the nine years following the addition of the LabLearner Program, students in Loudoun County Public schools continue to achieve consistently high SOL science scores (mid-high ninetieth percentile). Science supervisors within the district have indicated that not only is the LabLearner Program an important component of the StdTest-Medcontinued excellence of science scores on the SOLs, but it is also responsible for the increase in inquiry-based learning and experimental design seen in their students.

The Handley School in Saginaw, Michigan uses the MEAPs, the science assessments taken by all public schools in the state of Michigan, to evaluate the science proficiency of their students. In 2001, the Handley school adopted the LabLearner Program. Scores from the 2004 MEAPs indicated that 67% of students at the Handley School scored as advanced in science where as by 2008, 86% of students scored as advanced. In 2009 and 2010, the Handley School reported that 100% of its students met or exceeded levels of proficiency in science as compared the state level of 78%.

The Wood Acres School in Marietta, Georgia began implementing the LabLearner Program in 2006. Wood Acres students score in the 92% of the science Stanford Achievement Test (SAT 10), a test whose normed average is 50%. The Wood Acres School emphasizes that the LabLearner Program provides a “science- math connection that is strong throughout the program and provides students with the tools to become independent researchers with problem-based projects.”

BSSfront2-WEBBlessed Sacrament School in Burlington, North Carolina began implementing the LabLearner program in 2006 as well. Blessed Sacrament assesses student-learning outcomes in all subject domains using the Iowa Tests of Basic Skills (ITBS). Blessed Sacrament consistently scores in the upper 20% nationally in Science. In 2012, for example, students achieved a National Percentile Rank of 83.

Student Interviews

Perhaps the best evaluation of all is to simply talk to students to find out what they really know about science. Teachers try to do this as much as possible because rather than a static, one way “report” of information from student to teacher, a back and forth exchange of thoughts occurs.

Students can ask for clarification of a question or offer information and insight peripheral to the original topic. Through such exchanges, we can see if students can embellish core concepts with relevant facts and meaningful deeper questions. One advantage of home schooling or maintaining reasonable class microphone-iconsizes is to permit more of this type of student/teacher discussions.

This method of evaluation and instruction is not used at all on standardized written assessments. This is unfortunate, because it is a very common, historically significant, and reliable assessment format even at the highest levels of science education. Typically, most Ph.D., M.D/Ph.D., and other advanced-degree science program assessments are in the form of “oral” exams. And that is a good thing, as most of a scientist’s career is ultimately spent talking to others about science.

Finally, it is very clear that this form of assessment may be very useful and powerful in dealing with students with varying degrees of reading or writing difficulties or who simply “freak out” with a sharp pencil in hand and a hundred bubbles on an answer sheet.

The following two videos give a pretty good indication of the value of student interviews. In the top video, the portion containing various student interviews begins at about 3:15.

Discover the All-In-One, Hands-On Science Solution
LabLearner Video Overview