Testing as a form of monitoring students' knowledge

An important part of the learning process is monitoring the knowledge and skills of students. The gradual transition from traditional forms of control and assessment of knowledge to computer testing meets the spirit of the times and the general concept of modernization and computerization of the Russian education system.

Test - test, check, sample, measure, criterion, experience) - a short standardized test, as a result of which an attempt is made to evaluate a particular process.

Test functions

Testing in pedagogy performs three main interrelated functions: diagnostic, teaching and educational:

· The diagnostic function is to identify the level of knowledge, skills and abilities of the student. This is the main and most obvious testing function. In terms of objectivity, breadth and speed of diagnosis, testing surpasses all other forms of pedagogical control.

· The educational function of testing is to motivate the student to intensify the work on mastering the educational material. To enhance the educational function of testing, additional incentive measures can be used, such as the teacher distributing an approximate list of questions for self-preparation, the presence of leading questions and tips in the test itself, and joint analysis of the test results.

· The educational function is manifested in the frequency and inevitability of test control. This disciplines, organizes and directs the activities of students, helps to identify and eliminate gaps in knowledge, and creates a desire to develop their abilities.

Computer testing has a number of advantages over traditional forms and methods of control. It allows you to use lesson time more efficiently, cover a larger volume of content, quickly provide feedback to students and determine the results of mastering the material, focus on gaps in knowledge and skills and make adjustments to them.

The main advantages of this form of knowledge control are:

Possibility of detailed check of students’ mastery of each course topic;

Carrying out operational diagnostics of the level of mastery of educational material by each student;

Provides simultaneous testing of students' knowledge of the entire class and forms their motivation to prepare for each lesson;

A properly designed test increases interest in the subject;

Allows you to individualize work with students;

Saving training time when monitoring knowledge and assessing learning outcomes;

The use of tests allows you to solve the problem of self-development.

But, along with the positive, there are also negative aspects to the use of tests:

Test control does not contribute to the development of students' oral and written speech;

The choice of answer can occur at random; it is impossible for the teacher to trace the logic of the students’ reasoning.

Basic forms of test tasks

1. Tasks with the choice of one or more correct answers.

Among these tasks there are such varieties as:

1.1. Selecting one correct answer according to the principle: one is correct, all others (one, two, three, etc.) are incorrect.

For example: What vitamin deficiency causes disruption of bone growth and development:

A) vitamin A

B) vitamin B

B) vitamin C

D) vitamin D

1.2. Selecting multiple correct answers.

1.3. Select one, most correct answer.

For example:

Organic substances include:

A) proteins

B) proteins and carbohydrates

B) proteins, carbohydrates and fats

D) proteins, carbohydrates, fats and mineral salts

Each of the answers is generally plausible, but the 1st and 2nd answers are incomplete. The 4th answer is also not correct, since mineral salts are not classified as organic substances.

2. Open form tasks.

The tasks are formulated in such a way that there is no ready answer; You need to formulate and enter the answer yourself, in the space provided.

3. Matching tasks, where elements of one set need to be matched with elements of another set.

For example:

Match:

Habitat Organisms

1) Organic a) crucian carp

2) Aquatic b) jellyfish

3) Soil c) mole

4) Ground-aerial d) earthworm

5) Terrestrial-aquatic e) sparrow

E) tiger

G) roundworm

H) frog

I) dysenteric amoeba

4. Tasks to establish the correct sequence (calculations, actions, steps, operations, terms in definitions).

The listed forms of computer representation of test tasks do not exhaust their diversity. Much depends on the skill and ingenuity of the teacher. When creating tests, it is important to take into account many circumstances, for example, the personality of the test taker, the type of control, the methodology for using tests in the educational process, etc.

The choice of form depends on:

· testing goals,

· content of the test,

· technical capabilities,

· level of teacher preparedness in the field of theory and methods of testing knowledge control.

Test developers should adhere to the following principles:

The test must be consistent with the testing objectives;

It is necessary to determine the significance of the knowledge being tested in the general system of knowledge being tested;

The relationship between the content and form of the test must be ensured;

Test tasks must be correct in terms of content (correct);

The test must correspond to the level of the current state of science;

The content of the test must be comprehensive and balanced;

The content of the test should be systematic, but at the same time variable.

At the beginning of any test, brief instructions are given for completing the task, for example: “Choose the correct answer...”, “Choose the most correct answer...”, “Type the answer in the free field...”, etc. If tasks are presented in one form, the instructions are written once for the entire test. If the test includes various tasks, then before each new task a new instruction is written. The text of the task is usually written in capital letters or in bold font in order to visually immediately separate the task itself from the answer options.

Main requirements for the system computer control are that:

* test questions and answer options must be clear and understandable in content;

The text of tasks (and answers!) of computer tests must be kept short and concise. Brevity is ensured by a careful selection of words, symbols, and graphics, allowing the minimum of means to achieve maximum clarity of the meaning of the task. Repetitions of words, obscure, rarely used words, as well as symbols unknown to students, foreign words that make it difficult to perceive the meaning should be completely excluded.

* the computer test should be easy to use;

It is desirable to have a minimum of control buttons on the screen; instructions and tips for the student’s actions should appear only at the right time in the right place, and not be constantly present on the screen, cluttering it.

* the test system must include an assessment of the degree of correctness of the answer to each question asked of the student;

The presence of pre-developed scoring rules is one of the important requirements for testing. In the general case of using tests, one point is given for a correct answer in each task, and zero for an incorrect answer. The sum of all points received by the student gives the number of correct answers. This number is associated with the level of his knowledge and with the concept of “test score of the subject.”

* there should be so many test questions that the totality of these questions covers all the material that the student must learn;

* questions should be presented to the subject in random order to exclude the possibility of mechanical memorization of their sequence;

* questions should not begin with a number or any symbolic designation in order to prevent memorization of the question in the order it follows or the symbol denoting it;

* Possible answer options should also follow in random order;

* It is necessary to keep track of the time spent on answers and limit this time.

Technology compilation of the test

The test is a list of tasks formed in a certain sequence, the number and composition of which depends on the purposes of testing. The didactic content of the test is determined by the purpose of testing and the subject area.

The technology for creating a test on a subject involves several successive stages:

1. Setting goals and control objectives.

2. Defining testing goals:

- training (independent training of students);

- current monitoring of knowledge (diagnosis of mastery of individual topics and sections);

- midterm knowledge control;

- final knowledge control (across the entire program).

It clearly states why the test is designed and what functions it performs.

Purpose input control is to assess the student’s initial preparedness in the subject, that is, the degree of his knowledge required for successful mastery of the course. Intermediate control is a test consisting of 5-10 compact tasks, implemented immediately after the material being studied and intended for rapid assessment of its assimilation.Rubezhny- carried out based on the results of studying a topic or section of the course.Final control is provided at the end of the course and covers its content as a whole. Its results serve as the basis for the certification of the student.

3. Analysis and systematization of material.

4. Development of a table of the difficulty level of test tasks, a table of concepts tested in the test in accordance with the task.

5. Development of test tasks.

6. Examination of the content and form of assignments (review) and correction.

7. Determining the volume (number of test tasks) in the test and its execution time.

8. Development of testing methodology, determination and calculation of evaluation indicators. An elementary scale is compiled: the number of test tasks presented for completion is correlated with the number of correct answers. The range of positive assessment is determined. The number of correct answers to receive a credit, good, excellent, etc.

9. Approbation of the test.

10. Adjustment and addition of new tasks to improve the system-forming parameters of the test based on the testing results.

11. Formation of the final version of the test.

12. Development of instructions for students.

Sources:

1. http://testbuilder.narod.ru/theory.html

2. http://shcola6amursk.ucoz.ru/TEST.doc

It is easy to organize computerized collection and analysis of test results if the test consists only of multiple-choice tasks. The results of completing tasks with constructed answers require manual processing and the involvement of experts, and therefore additional material costs and time for verification. The abundance of forms in the test complicates the student’s work and significantly complicates the statistical processing of empirical test results.

Unfortunately, the monoform requirement is not always feasible, since not all of a student’s knowledge and skills can be tested using a monoform test. In this regard, it is often necessary to combine forms, which, other things being equal, always negatively affects the measurement accuracy provided by the test. The choice of the optimal form of pretest tasks is usually associated with the specific content of the test. In this case, it is necessary to take into account the advantages and disadvantages of each of the forms (Table 1) and make a certain compromise decision in the process of such a choice.

4. COMPUTER TESTING IN EDUCATION.

4.1. Specifics of computer testing and its forms

General ideas about computer testing. Since the beginning of the 21st century, computers have become widely used in education for testing. A separate direction has appeared in pedagogical innovations - computer testing, in which the presentation of tests, assessment of student results and delivery of results to them is carried out using a PC.

The test generation stage can proceed technologically in different ways, including by entering blank tests into the computer. Today, there are numerous publications on computer testing; software and tools have been developed for generating and presenting tests.

When is it necessary to use computer testing? Although computer testing greatly facilitates the teacher’s work when presenting and assessing test results, its spread is in many ways nothing more than a tribute to fashion, all the negative consequences of which have not yet been fully identified. The choice of a computer-based exam format should be based on more important and valid reasons than just a passion for innovation, since it creates many problems and puts students at an unequal disadvantage. Computer testing should be used in cases where there is an urgent need to abandon traditional blank tests.

For example, computer testing is necessary when conducting the Unified State Exam in hard-to-reach areas of Russia. Gathering school graduates from remote areas at the designated time of the Unified State Exam becomes such a complex and expensive undertaking that it is simply impossible to do without computer testing and modern means of communication. Computer testing is also advisable to use when conducting exams for children with disabilities who have serious visual or hearing impairments. Using a PC, you can use larger fonts, audio recordings, additional devices for entering test data, and other devices that compensate for the potential lag of children with disabilities on exams.

Forms of computer testing. Computer testing can be carried out in various forms, differing in the technology of combining tasks into a test. Some of them have not yet received a special name in the literature on testing issues.

The first form is the simplest. The finished test, standardized or intended for routine monitoring, is entered into a special shell, the functions of which may vary in degree of completeness. Typically, during final testing, the shell allows you to present tasks on the screen, evaluate the results of their implementation, generate a matrix of test results, process it and scale the primary scores of the test takers by converting them into one of the standard scales for issuing a test score to each test taker and a protocol of his scores on the test tasks.

The second form of computer testing involves the automated generation of test options, carried out using tools. Options are created before the exam or directly during it from a bank of calibrated test items with stable statistical characteristics. Calibration is achieved through lengthy preliminary work on the formation of a bank, the task parameters of which are obtained on a representative sample of students, usually over 3-4 years using blank tests. The content validity and parallelism of the options are ensured through a strictly regulated selection of tasks for each option in accordance with the test specification.

The third form - computer adaptive testing - is based on special adaptive tests. The ideas of adaptability are based on the idea that it is useless for a student to be given test tasks that he will probably complete correctly without the slightest difficulty or that he is guaranteed to fail due to high difficulty. Therefore, it is proposed to optimize the difficulty of tasks, adapting it to the level of preparedness of each test taker, and to reduce the length of the test by eliminating some tasks.

Advantages and disadvantages of computer testing. Computer testing has certain advantages over traditional form testing, which are especially noticeable during mass examinations, for example, during national exams such as the Unified State Exam. Presenting test options on a computer allows you to save money usually spent on printing and transporting form tests.

Thanks to computer testing, it is possible to increase information security and prevent declassification of the test due to the high speed of information transfer and special protection of electronic files. The procedure for calculating the resulting scores is also simplified in cases where the test contains only multiple-choice tasks.

Other advantages of computer testing are manifested in ongoing monitoring, self-control and self-preparation of students; Thanks to the computer, you can immediately issue a test score and take immediate measures to correct the assimilation of new material based on the analysis of protocols based on the results of corrective and diagnostic tests. The possibilities of pedagogical control during computer testing are significantly increased by expanding the range of measured skills and abilities in innovative types of test tasks that use the diverse capabilities of the computer when including audio and video files, interactivity, dynamic formulation of problems using multimedia tools, etc.

Thanks to computer testing, the information capabilities of the control process are increased, it becomes possible to collect additional data on the dynamics of test passing by individual students and to differentiate between missed and unachieved test tasks.

The need to assess and verify the level and quality of knowledge arises in any human activity. The problem of the adequacy and validity of test results becomes even more acute with the remote and widespread use of information technologies to test and verify the knowledge of students, schoolchildren, teachers and other categories of people for whom the test results have important personal significance.

Monitoring the level of knowledge is an important part of the learning process. It provides feedback in the “student-teacher” system. Knowledge control performs controlling, teaching, diagnostic, educational, motivating and other functions in the educational process. To manage the learning process at various stages, the supervisor must constantly have information about how students perceive and assimilate educational material.

Control from the teacher’s point of view is a long and labor-intensive part of the work. It can be simplified and systematized by using so-called software tools. The problem of implementing control-related functions falls into three areas: the functions of preparing for control, the functions of conducting control, and the functions of providing feedback during the learning process. A set of tools associated with logic and idea may constitute a tool system. The use of a computer instrumental control system acts as a means of implementing a computer control system.

You can monitor the activities of students with special monitoring tests. Tests are a special type of task that allows a group to quickly monitor the degree of knowledge acquisition and acquisition of skills by students in theoretical and industrial training classes, to establish internal and external feedback, on the basis of which students and the teacher carry out the functions of managing the learning process. Testing has long appeared in pedagogy as a method of monitoring knowledge.

Currently, there are many computer programs available for testing. There are many products (including multimedia) with ready-made test tasks, as well as shell programs for creating tests yourself. There are a number of instrumental programs created by domestic and foreign specialists. Computer tests developed on their basis have the properties inherent in such systems: adaptability, openness, standardization, the possibility of its expansion and expansion, the ability to carry out individual and group control of students’ knowledge, etc. The test system, due to its versatility, provides automated support for students’ independent work, allowing for monitoring and self-monitoring of the level of mastery of the material, acting as a simulator in preparation for exams.

Chapter 1. Computer testing

1.1 The essence of the concept "Test"

To understand the essence of tests, it is important to understand the system of concepts. Concepts generally form the basis of any science, and in this sense, the activity of developing and effectively using tests is no exception. Since the 1930s, the science of testing has been called bourgeois, all of whose goals were considered “reactionary.” And although such judgments are now considered inadequate to the spirit of our time, publications still appear where they still try to deny the tests their scientific validity.

The first scientific works on test theory appeared at the beginning of the twentieth century, at the intersection of psychology, sociology, pedagogy and other so-called behavioral sciences. Foreign psychologists call this science psychometrics, and teachers call it pedagogical measurement. Since there is no general name in Russian yet, the author called this science testology, which can be pedagogical, psychological or sociological, depending on where it is applied and developed. Unclouded by ideology and politics, the interpretation of the name “testology” is simple and transparent: the science of tests. In the 21st century, Avanesov brought the name of this science in line with its name in the West - Pedagogical Dimensions.

Let us dwell on the definition of the concept “test”, since it is currently used in a wide range.

Test (English test - sample, testing, research) is an experimental method in psychology and pedagogy, standardized tasks that allow you to measure psychophysiological and personal characteristics, as well as the knowledge, skills and abilities of the test subject.

Tests began to be used in 1864 by J. Fisher in Great Britain to test students' knowledge. The theoretical foundations of testing were developed by the English psychologist F. Galton in 1883: the application of a series of identical tests to a large number of individuals, statistical processing of results, and the identification of evaluation standards.

The term “test” was first introduced by the American psychologist J. Cattell in 1890. The series of 50 tests he proposed actually represented a program for determining primitive psychophysiological characteristics based on the most developed psychological experiments at that time (for example, measuring the strength of the right and left hands using a dynamometer, speed of reaction to sound, etc.).

The word "test" evokes a variety of ideas. Some believe that these are questions or tasks with one ready-made answer that must be guessed. Others consider the test a form of game or fun. Still others try to interpret this as a translation from the English word “test” (sample, test, check). In general, there is no consensus on this issue. Moreover, pedagogy textbooks do not write about this. And if they do write somewhere, it is often difficult to understand what is written. It is no coincidence that the range of opinions about tests turns out to be too wide: from judgments of ordinary consciousness to attempts to scientifically interpret the essence of tests.

In science, there are significant differences between the simple translation of a word and the meaning of the concept. Most often we encounter a simplified perception of the concept of “test” as a simple choice of one answer from several proposed for a task. Numerous examples of such seemingly “tests” can easily be found in newspaper and magazine periodicals, in various competitions and in numerous book publications called “Tests”. But these often turn out to be not tests, but something outwardly similar to them. Usually these are collections of questions and tasks designed to select one correct answer from among those proposed. They are only superficially similar to the real test. Differences in understanding the essence of tests give rise to differences in attitudes towards tests.

What is the wording of the concept "TEST" in dictionaries?

Big Encyclopedic Dictionary. Test (English test - sample, test, study):

1) in psychology and pedagogy - standardized tasks, the results of which are used to judge the psychophysiological and personal characteristics, as well as the knowledge, skills and abilities of the subject;

2) in physiology and medicine - test effects on the body in order to study various physiological processes in it, as well as to determine the functional state of individual organs, tissues and the body as a whole;

3) in computer technology - a control task to check the correct operation of the computer;

4) in pattern recognition there are many functionally interdependent features that characterize an image (class).

Modern explanatory dictionary of the Russian language T.F. Efremova. Test:

1) a task, a test of a standard form, based on the results of which one can judge one’s ability, predisposition, etc. someone to something, as well as about the knowledge and skills of the subject;

2) a method of research and diagnosis, consisting of a test effect on the body (in physiology, medicine);

3) a questionnaire used in sociological research.

4) a problem with a known solution, intended to verify the correct operation of a computer (in computer technology).

Explanatory dictionary of the Russian language D.N. Ushakova. Test (English test) (psych.):

A psychotechnical test consisting in the fact that the subject is asked to solve one or several problems to determine certain of his abilities (memory, attention, reaction speed, etc.).

Nowadays, there are many types of tests, so it is hardly possible to give a universal definition for all of these types.

An analysis of the literature showed that there are different formulations of the concept of “test”. But regardless of the type or purpose, I would give the following definition of the test: computer testing requirement methodology

A test is one of the methods of knowledge control that allows the teacher to establish the factual and theoretical knowledge of students and evaluate them in a fairly short time. It should be noted that the test does not take into account the individual characteristics of a person.

1.2 Specifics of computer testing and its forms

Since the beginning of the 21st century, computer testing (CT) has become widely used in education, in which tests are presented, students' results are assessed, and results are given to them using a PC. However, computer testing should be used in cases where there is an urgent need to abandon traditional blank tests: when conducting exams for children with disabilities, with serious visual or hearing impairments, etc. The test generation stage can proceed technologically in different ways, in including by entering blank tests into the computer. Today, there are numerous publications on computer testing; software and tools have been developed for generating and presenting tests.

Computer testing can be carried out in various forms, differing in the technology of combining tasks into a test. Some of them have not yet received a special name in the literature on testing issues.

The first form is the simplest. The finished test, standardized or intended for routine monitoring, is entered into a special shell, the functions of which may vary in degree of completeness. Typically, during final testing, the shell allows you to present tasks on the screen, evaluate the results of their completion, generate a matrix of test results, process it and scale the primary scores of the test takers by converting them into one of the standard scales, so that each test taker receives his own score and a protocol of assessments for the test tasks.

The second form of computer testing involves the automated generation of test options, carried out using tools. Options are created before the exam or directly during it from a bank of calibrated test items with stable statistical characteristics. Calibration is achieved through lengthy preliminary work on the formation of a bank, the task parameters of which are obtained on a representative sample of students, usually over a period of 3-4 years using blank tests. The content validity and parallelism of the options are ensured through a strictly regulated selection of tasks for each option in accordance with the test specification.

The third form – computer adaptive testing – is based on special adaptive tests. The ideas of adaptability are based on the idea that it is useless for a student to be given test tasks that he will probably complete correctly without the slightest difficulty or that he is guaranteed to fail due to high difficulty. Therefore, it is proposed to optimize the difficulty of tasks, adapting it to the level of preparedness of each test taker, and to reduce the length of the test by eliminating some tasks.

When conducting computer testing, it is necessary to take into account the psychological and emotional reactions of students. Negative reactions usually cause various restrictions, which are sometimes imposed when issuing tasks in computer testing. For example, either the order in which tasks are presented is recorded, or the maximum possible time for completing each task, after which, regardless of the subject’s wishes, the next test task appears. During adaptive testing, students are dissatisfied with the fact that they do not have the opportunity to skip the next task, review the entire test before starting work on it, and change answers to previous tasks. Sometimes students object to computer-based testing because of the difficulties that arise in performing and recording mathematical calculations, etc.

To reduce the impact of students' computer experience on test scores, it is recommended that computer testing shells include special instructions and training exercises for each innovative form of tasks. It is also necessary to first familiarize students with the program interface, conduct rehearsal testing, and allocate students who do not have sufficient experience with a PC into independent groups in order to further train them or give them a blank test.

Thus, computer testing acts as a tool for managing the educational process, as an element of feedback that makes it possible to analyze the educational process and make adjustments to it, i.e. carry out full management of the learning process. The constant use of computer tests as an intermediate monitoring of progress defines the educational process as a system of continuous monitoring and self-control of students, which allows the teacher to receive “feedback” and students the opportunity to monitor the level of their preparedness throughout the entire training.

1.3 Advantages and disadvantages of computer testing

The advantages of computer testing are:

Objectivity. The factor of subjective approach on the part of the examiner is excluded. The test results are processed via a computer;

Validity. The “lottery” factor of a regular exam, in which you may get an “unlucky ticket” or task, is eliminated - a large number of test items cover the entire volume of material in a particular subject, which allows the test taker to express his or her horizons more broadly and not “fail” due to an accidental gap in knowledge;

Simplicity. Test questions are more specific and concise than ordinary exam papers and tasks and do not require a detailed answer or justification - it is enough to select the correct answer and establish correspondence;

Democratic. All test takers are in equal conditions, the test results are transparent;

Massiveness and short duration. The ability to cover a large number of test takers with final control over a certain set period of time. At the same time, use the remaining time to study new material or consolidate old ones;

Manufacturability. Conducting an exam in the form of testing is very technologically advanced, as it allows the use of automatic processing;

Reliability of information about the volume of material learned and the level of its assimilation;

Reliability. The test assessment is unambiguous and reproducible;

Differentiating ability. Due to the presence of tasks of varying levels of difficulty;

Implementation of an individual approach to training. Individual testing and self-testing of students' knowledge is possible.

Along with the advantages, computer methods also have their disadvantages:

Communication between a person and a computer has its own specifics, and not everyone is equally calm about computer testing. For example, if the testing procedure is delayed or the content of the test is not of interest to the person, the positive attitude may be replaced by the opposite: the monotony and monotony of the work, the “stupidity” of questions and tasks will tire and irritate. Sometimes a negative attitude towards computer testing is caused by a lack of feedback. And when the person being tested does not receive feedback, the likelihood of erroneous answers increases (you can misunderstand the instructions, confuse the answer keys, etc.).

Special studies have been conducted to determine how people feel about computer testing. It turned out that some people experience the so-called psychological barrier effect, and some people experience the overconfidence effect. It happens that a person is not able to cope with a task at all because he is “afraid” of the computer. It is also possible to include psychological defense mechanisms associated with the reluctance of the test taker to open up, the desire to avoid excessive frankness, or the deliberate distortion of results;

During computer testing, specialists deal only with the results obtained. They do not see the test taker, do not communicate with him, which means they do not have additional information about him, cannot find out his actual amount of knowledge;

Test control does not contribute to the development of students' oral and written speech;

The breadth of coverage of topics in testing also has a downside. When testing, a student, unlike an oral or written exam, does not have enough time for any in-depth analysis of the topic;

There is an element of randomness in testing. For example, a student who does not answer a simple question may give the correct answer to a more difficult one. The reason for this could be either a random error in the first question or guessing the answer in the second. This distorts the test results and leads to the need to take into account the probabilistic component when analyzing them.

Chapter 2. Computer control of knowledge

2.1 Classification of types of computer tests

Obviously, the primary task when testing the acquired knowledge should be to determine the control objectives. Thus, in universities, it is increasingly necessary to test students’ depth of knowledge of academic disciplines, the ability of future specialists to think logically, compare various objects and phenomena, draw the right conclusions and make optimal decisions. This means that the set (database) of test tasks should cover the academic discipline as completely as possible, and their thematic division should allow for stage-by-stage control in the process of studying the subject, identifying individual knowledge gaps of students, adjusting curricula, etc.

An important place in the formation of the task base is occupied by their formulation. Like any sentence, tasks are divided into explicit and implicit, interrogative and affirmative, judgments, opinions and other questions. The variety of their forms, which carries a richness of language and an abundance of special terms, depends on the skill of the teacher. The use of such tasks helps to improve students' ability to think logically, as well as the level of their general culture.

Another important point is to determine the correctness of the student’s answer to the proposed questions. There are various answer options included in the program. It is preferable for the student to “respond” to the computer as if orally, as if it were a teacher (open form of response). Perhaps such expert systems based on specially developed knowledge bases will appear in the near future with the introduction of fifth-generation computers. In the meantime, some experts have begun to create knowledge bases. This rather interesting and complex problem has a main drawback that was initially present in it - the subjectivity of the system, based on assessments of events and phenomena by individual, although sometimes very authoritative, specialists. Perhaps it would be most correct at present to master computer control systems based on databases. In this case, they usually resort to various ready-made forms of answers - templates.

This form of answers has become widespread when the respondent is offered a pre-generated set of answers to select one or more that, in his opinion, are correct (closed answer form). The program automatically evaluates the correctness of the choice made. In another case, the person under control enters from the keyboard some formulations or individual words that are the answer to the question posed (a form of semi-open answers). These answer options are not available on the computer screen, but the program contains the maximum possible, in the opinion of its authors, set of answers. It is believed that in most cases the program has the necessary modifications and, after making a comparison, it will be able to give its conclusion about the correctness of the answer. There are other options. Each method of forming a response to subjects has its own advantages and disadvantages. Here you should adhere to your goal and choose the most suitable one for its implementation.

In this regard, it can be proposed to use a single set of answer options for all test tasks on the topic. The formulations should be general in nature and help identify the ability to think logically, which is more important and valuable than memorizing individual factual data. With sufficient skill of the teacher, with the help of answer options formulated in this way, it is possible to determine knowledge and individual facts and events.

In schools in developed countries, the introduction and improvement of tests has proceeded at a rapid pace. Diagnostic tests of school performance have become widespread, using the form of alternatively selecting the correct answer from several plausible ones, writing a very short answer (filling in the blanks), adding letters, numbers, words, parts of formulas, etc. With the help of these simple tasks, it is possible to accumulate significant statistical material, subject it to mathematical processing, and obtain objective conclusions within the limits of those tasks that are presented for testing. Tests are printed in the form of collections, attached to textbooks, and distributed on computer floppy disks.

Training tests are used at all stages of the didactic process. With their help, preliminary, current, thematic and final control of knowledge, skills, and recording of progress and academic achievements are effectively ensured.

Learning tests are increasingly penetrating into mass practice. Currently, almost all teachers use short-term surveys of all students in each lesson using tests. The advantage of such a check is that the whole class is busy and productive at the same time, and in a few minutes you can get a snapshot of the learning of all students. This forces them to prepare for each lesson, to work systematically, which solves the problem of efficiency and the necessary strength of knowledge. When checking, first of all, gaps in knowledge are identified, which is very important for productive self-learning. Individual and differentiated work with students to prevent academic failure is also based on current testing.

Naturally, not all the necessary characteristics of assimilation can be obtained by testing. For example, indicators such as the ability to specify one’s answer with examples, knowledge of facts, the ability to coherently, logically and demonstrably express one’s thoughts, and some other characteristics of knowledge, skills and abilities cannot be diagnosed by testing. This means that testing must necessarily be combined with other (traditional) forms and methods of verification. Those teachers who, using written tests, give students the opportunity to verbally justify their answers act correctly. Within the framework of classical test theory, the level of knowledge of test takers is assessed using their individual scores, converted into certain derived indicators. This allows us to determine the relative position of each subject in the normative sample.

Another approach to creating tests and interpreting the results of their execution is presented in the so-called modern theories of pedagogical measurements– Item Response Theory (IRT), which was widely developed in the 60s – 80s in a number of Western countries. Recent research in this direction includes the works of B.C. Avanesova, V.P. Bespalko, L.V. Makarova, V.I. Mikheeva, B.U. Rodionova, A.O. Tatura, V.S. Cherepanova, D.V. Lyusina, M.B. Chelyshkova, T.N. Rodygina. E.N. Lebedeva and others.

The most significant advantages of IRT include measuring the values of parameters of subjects and test items on the same scale, which makes it possible to correlate the level of knowledge of any subject with the degree of difficulty of each test item. Critics of the tests intuitively realized the impossibility of accurately measuring the knowledge of subjects of different levels of training using the same test. This is one of the reasons that in practice they usually strive to create tests designed to measure the knowledge of subjects of the most numerous, average level of preparedness. Naturally, with this orientation of the test, the knowledge of strong and weak subjects was measured with less accuracy.

In foreign countries, control practices often use so-called success tests, which include several dozen tasks. Naturally, this allows you to more fully cover all the main sections of the course. Two types of tasks are used:

a) requiring students to independently compose an answer (tasks with a constructive type of answer);

b) tasks with a selective response type. In the latter case, the student chooses from among those presented the answer that he considers correct.

It is important to note that these types of assignments are subject to significant criticism. It is noted that tasks with a constructive type of answer lead to biased assessments. Thus, different examiners and often even the same examiner give different marks for the same answer. In addition, the more freedom students have in answering, the more options they have for evaluating teachers.

When creating knowledge control tests, you can be guided by other classifications of test types. Usually they are divided into:

"achievement" tests;

standardized "achievement" tests;

mental ability tests;

aptitude tests;

prognostic tests;

criterion-referenced tests;

tests of developed abilities.

Existing other classifications are practically reduced to the types mentioned above. Further, when making a choice, it is necessary to focus on pedagogical provisions, according to which the control system must be acceptable for testing professional knowledge. The filling of its content should serve to determine both the level of intelligence and abilities controlled in a specific area of knowledge. The form of the verification procedure must include individual and/or group monitoring.

As studies have shown, for testing students it is most appropriate to use criterion-orientedtypetests.

Criteria-Based Test allows for more complete individual and collective program control of the volume of acquired knowledge; obtain scores that allow you to compare the level of knowledge of students both within a separate group and between them; identify the results achieved by each individual student during the test in a wide range of values (scores).

The appeal to this type of tests is also due to the fact that with its help it is possible to identify the level of knowledge of students based on a predetermined volume and content of educational material that is common to all. At the same time, two components of this type of tests are clearly visible. On the one hand, the possibility of obtaining data on the individual knowledge of each student, on the other hand, the possibility of comparing the obtained data in a wide range of study groups, provided that an adequate testing environment is created.

Ultimately, it is important to determine what each individual student knows and can do, not how he compares to other students. Well-formed content (filling) of the test ensures that each student receives a rating (individual integral indicator), and at the same time provides the teacher with data characterizing the ability of any student to study in comparison with his fellow students. Thus, it is possible to successfully implement these two problems simultaneously. The student’s performance of the test is not assessed in accordance with some standard, but is determined by the degree of his mastery of the discipline specified in the test, and the achievement of a certain level of completion of the proposed tasks. Thus, with the help of these tests it is possible to identify the degree of knowledge of each individual student of both individual tasks and sections of the curriculum; the point (height) of their mastery of a particular discipline.

2.2 Requirements for computer testing systems

Recently, teachers have been offered a fairly large number of different software tools for developing tests and testing. However, many of them cannot implement modern requirements for the quality of pedagogical control materials (PCM), because themselves do not meet the requirements for computer testing systems:

the ability to use four forms of tasks of the classical pedagogical test;

obtaining and accumulating a matrix of profiles of test subjects’ responses for dichotomous and polytomous assessment of the results of completing tasks;

adjustment and rearrangement of test tasks depending on the results of statistical processing of test results;

protection of resulting matrices from unauthorized access.

In addition, from the point of view of a subject teacher, computer testing systems would like to have the following capabilities:

use of multimedia technologies in testing. In most test shells, tasks are presented as text (sometimes using graphics). Multimedia testing systems combine text, graphics, animation and video materials in the most effective combinations and simultaneously use all communication channels to transmit information: text, image and sound. Voicing questions and answer options allows you to eliminate the test subject’s mistakes when reading the task incorrectly; and in disciplines related to the study of foreign languages, the presentation of material in audio form is mandatory. Graphics (drawing, diagram, photograph) can be included in the formulation of both questions and answer options. In this case, the graphical answer can be represented by selecting a certain area on the screen (for example, an area on the graph of a function, a point or a function). The truth or falsity of the answer chosen by the subject can also be represented graphically. The use of animated graphics and video clips makes it possible to make tasks for determining the sequence of actions more clear, to demonstrate the development of the situation depending on the answer chosen by the subject, etc.;

the use of pseudo-test tasks, for example, chain, text, situational and even non-test tasks, for example, crosswords, puzzles, etc.;

the use of a prepared test not only for control, but also for self-control of knowledge. In this case, after completing such a test, the student receives information about the success of his actions, and after completing self-control, he can return to the tasks to which he gave incorrect answers and try to answer again. In this way, a training element will be implemented;

the use of adaptive testing algorithms that determine the choice of the next task depending on the test taker’s answers to previous questions;

use of hypertext links in self-control and training modes;

conducting testing online.

The additional capabilities listed above would expand the scope of application of computer testing systems.

One of the determining factors for the success of creating tests is the correct choice of hardware and software.

The term “technical teaching aids” appeared in the second half of the 60s. and it was understood as systems, complexes, devices and equipment used for presenting and processing information in the learning process in order to increase its effectiveness. According to their functional purpose, they are usually divided into three main classes: information, control, and training. Controlling technical teaching aids are designed to determine the degree and quality of mastery of educational material. The concept of informatization of education in our country defines the computer as the main material basis of modern education and its main technical means.

The parameters of the computers used largely determine the possibilities for effectively monitoring students' knowledge. The best characteristics when used in the country's universities were shown by universal PCs or computers that are capable of combining the capabilities of almost all types of technical teaching aids. Important advantages of PCs include their ability to create conditions for students to make independent decisions, i.e. individualize the learning process by creating adaptive computer programs. They allow you to successfully automate the educational process, including the knowledge control procedure. According to statistics, from 80 to 90% of computers operating in different countries of the world are IBM-compatible.

An IBM-compatible computer is the most suitable technical means of improving the quality of learning and monitoring students' knowledge in modern conditions. It works using system, instrumental and application computer programs. The second type of program is of greatest interest in the context of this work. Instrumental programs written in high-level languages allow programmers to create special-purpose programs - user, application programs. Application programs also include programs for monitoring student knowledge. The basic principles of creating such programs are that they focus on a specific course of study and allow qualified users (programmers and teachers) to create original programs for teaching and monitoring students' knowledge.

The limited scope of this work does not allow a more detailed and in-depth consideration of many important problems associated with testing. However, it is simply necessary to give a brief description of the testing properties.

Adaptability – the ability of the system to adapt to changing conditions (hardware and software).

Openness is determined by the ability of the system, under the influence of a qualified user, to adapt to the control of specific academic disciplines.

Standardity of the system expressed by the use of functions, design, etc., used in public programs. A trained user feels more comfortable, and an untrained user can use the experience gained when working with other programs.

Uniformity is to create a system on the basis of which similar ones can be created. A big mistake of developers of computer knowledge testing systems is the development of highly specific programs for a specific academic subject. Obviously, such activities are completely ineffective and lead to unnecessary labor costs for both the programmer and the expert.

The need to unify control programs logically stems from the formalization of the subject area. Since it is advisable to use computer testing in the form of programmed control only in easily formalized subject areas, it therefore makes sense to develop universal ways of presenting test questions, a unified system for their assessment, and create the actual information content in the form of separate, plug-in databases.

Possibility of system expansion and expansion is also an important characteristic. Its provision gives the user confidence in the further, long-term use of the system, in its modification, as well as in the application of various solutions to improve it.

An equally important property is the system’s ability to carry out individual and group control of students’ knowledge. In addition to the obvious advantages, it makes it possible to use the system in various conditions, which are determined by the teacher, the author of the test, based on educational tasks.

If we take into account all of the listed properties, then the result will be a system, working with which students will have the opportunity to self-check their knowledge on each topic of the academic discipline at a convenient individual pace, identify gaps and then eliminate them. At the same time, students will become more motivated to learn and will be relieved of stressful situations to a large extent, will be provided with in-depth study of the educational material, will have confidence in their existing knowledge and the adequacy of the assessment they receive based on the control results.

In addition to this requirement, the knowledge control system must also satisfy the following criteria:

work on a computer network (local and global), the ability to conduct testing simultaneously with a group of respondents;

the system must provide the ability to create new tests and analyze test results;

the system must contain algorithms for analyzing test results (validity of tests, assessment of the degree of their complexity, comparison of test results of different groups, etc.);

the system must provide high flexibility in choosing the types of questions and tasks, but at the same time must have a high degree of security;

The instrumental system must ensure differentiation of access rights to all its elements.

An important role in testing knowledge is played by objectivity, accuracy of results and minimal probability of assessment error, exclusion of the influence of any subjective factors, as well as almost identical testing conditions for all students, which is achieved in our case with the help of computers and special programs. Providing depth and completeness of control is also achieved by asking the student to answer several hundred questions. This is at least an order of magnitude higher than similar values for traditional knowledge testing. At the same time, both differentiated and integrated assessment of the level of mastery of educational material in a specific discipline is achieved. Control is carried out immediately after completing the study of each section of the curriculum. The teacher receives prompt and objective information about the results of students mastering this section. Consequently, the data obtained can be used to make appropriate adjustments to the content and methodology of the educational process.

2.3 Formation of test tasks for computer control of knowledge

Computer testing for the humanitarian disciplines of the university is almost completely implemented during tests, control over students’ independent work (entrance, current, thematic), partially - colloquiums, tests and exams (midterm, final, final control).

Pedagogical test is a system of facet tasks of a certain content, increasing difficulty, specific form, which allows you to qualitatively assess the structure and effectively measure the level of knowledge, skills, abilities and ideas.

The property of facets in humanities is very difficult to realize due to weak formalization and non-articulation.

On the one hand, test tasks (TZ) make up a very high percentage, perhaps 80-90% of computer testing programs in any humanities discipline. On the other hand, not all content can be transformed into the forms of a test task. Many proofs, verbose descriptions are difficult to express, or even not expressed at all in test form.

The issue of filling databases seems obvious and therefore, as a rule, does not cause difficulties either in theory or in practice. At first glance, the development of testing questions and the determination of standard answers is accessible to any teacher. However, in fact, the situation in this area is exactly the opposite of what it seems at first glance.

It’s really not difficult to formulate the question. However, most developers do not ask themselves the main question: what goals does this question serve? What section of the topic does this question cover? Is the question formulated correctly, does it cause different interpretations, does it allow for ambiguous answers, how is it perceived by students not from the point of view of the teacher (who has a large amount of knowledge compared to the students), but from the point of view of the theoretical course completed by the students?

An attempt to answer these questions shows that, first of all, the database should be developed not by an enthusiastic teacher, but by a high-level specialist in a given subject area. In addition, no matter how high the level of the developer, any person is capable of making mistakes or incorrectly formulating certain provisions. Therefore, the test base, before it is put into operation, must undergo at least an assessment by the methodological council for this specialty.

However, no panel can determine students' perceptions of test questions. Only real testing can show this. Moreover, such an assessment is very simple technically - only cumulative statistical analysis is required for the answers to each specific question. To do this, the issue must be uniquely identifiable. Analysis of such statistics, especially when conducting control testing in different educational groups, gives a dual result: a question to which no one can give the correct answer, or incorrectly formulated, or this topic is extremely poorly covered in the learning process. The question that everyone answers correctly is either poorly formulated (has hints in the text of the question), or this topic is very well covered in the learning process and correctly mastered by the whole group.

Such ambiguity in the analysis of statistics leads to the question of the time and form of statistical analysis and testing in general.

Mainforms of test tasks are: tasks of open form, closed form, for compliance, for establishing the correct sequence.

1. Tasks with the choice of one or more correct answers . Among these tasks there are such varieties as:

1.1. Selecting one correct answer according to the principle: one is correct, all the others (one, two, three, etc.) are incorrect.

For example, a deficiency of which vitamin causes disruption of bone growth and development:

a) vitamin A;

b) vitamin B;

c) vitamin C;

d) vitamin D.

1.2. Selecting multiple correct answers.

1.3. Select one, most correct answer.

For example, organic substances include:

a) proteins;

b) proteins and carbohydrates;

c) proteins, carbohydrates and fats;

d) proteins, carbohydrates, fats and mineral salts.

Each of the answers is generally plausible, but the 1st and 2nd answers are incomplete. The 4th answer is also not correct, since mineral salts are not classified as organic substances.

2. Open form tasks . The tasks are formulated in such a way that there is no ready answer; You need to formulate and enter the answer yourself, in the space provided.

3. Compliance tasks , where elements of one set need to be matched with elements of another set.

For example, match:

Habitat

Organisms

1) Organismal

a) crucian carp

2) Water

b) jellyfish

3) Soil

c) mole

4) Ground-air

d) earthworm

5) Land-aquatic

d) sparrow

e) tiger

g) roundworm

h) frog

i) dysenteric amoeba

4. Tasks to establish the correct sequence (calculations, actions, steps, operations, terms in definitions).

The choice of form depends on:

testing goals;

test content;

technical capabilities;

level of teacher preparedness in the field of theory and methods of testing knowledge control.

The best test is one that has broad content and covers deeper levels of knowledge.

The test task includes:

A)the stating part, describing a situation (may be absent), which does not require any active actions from the test taker;

b)procedural part, containing proposals for the student to perform any specific actions - select the correct element from the proposed set, establish a correspondence or correct sequence, name the date, write down the name, etc. The procedural part is this type of information, after receiving which the student is required to take active actions related not only to studying and analyzing the material contained in the task, but also to composing and entering an answer;

c) uhelements of the choice itself .

General rules for all forms of test tasks. It is necessary to ensure that the task is formulated correctly. The test task must be formulated clearly, clearly, specifically, without allowing ambiguity in the answer. The optimal number of response elements is 5-8, but there are exceptions.

The procedural part of the test task should be as short as possible – do not exceed 5-10 words. The test task must be formulated in an affirmative form. It is not allowed to define a concept by listing elements that are not included in it.

For all forms of the test task there must be standard instructions. All elements in tasks must be selected according to some specific principle chosen by the author. Preference for a large number of test tasks that are simple in structure rather than a small number of complex ones.

In complex separation test tasks, it is necessary to list all possible alternatives, because otherwise, the student’s idea of the classification or structure of the basis object is distorted.

Open form tests must meet the following requirements:

the complementary word or phrase is placed at the end and must be unique;

Only important things need to be added;

It is desirable that when formulating a task, the addition should be in the nominative case;

all dashes for additions must be the same length;

It is advisable to give the student a sample answer.

Closed form of test tasks must meet the following requirements:

equal plausibility of elements;

it is desirable that all selection elements be equal in length;

in selection elements it is desirable to use one object or an equal number of objects;

it is necessary to exclude repeated words in the answers;

all items must be true statements, but only one of them is a correct answer to a given item, and the rest may be true to other items in this test or in other tests.

Compliance test contain two sets, the right column is for choice, the left column is for answer. In the right one, for example, 1-3 more elements are formed, so that at the last substitution the student has a choice, and not an automatically substituted remainder. All elements are true statements.

In test tasks to establish the correct sequence the principle of forming elements alphabetically can be chosen. If an alphabetical list is the correct answer, then the elements are arranged randomly.

In order to level out the borrowing of an answer from a neighbor in tasks of all this form, it is necessary to formulate the test task in 2-3 variants that are synonymous in meaning, which are selected at random. In closed-form and matching tasks, elements are supplied using a random arrangement sensor. Elements of the task in these forms are formed according to the principle of “main” and “replacement” players. For example, with 5 elements given to a student, the author forms a set not “1 correct + 4 incorrect”, but “1 correct + 4 main incorrect + 5 spare incorrect”, where 9 incorrect ones are randomly selected.

Methods for assessing test quality criteria. Classical test theory is based on correlation theory, the main parameters of which are reliability and validity. Reliability– stability of the test results obtained when using it. Validity– suitability of the test, i.e. the ability to qualitatively measure what it was created for according to the authors’ intentions.

There is a strict scientific theory of tests that makes it possible to methodologically and methodically justify their use and processing of test results. A scientifically valid test is a method that meets established standards of reliability and validity (score from 0 to 1; the closer to 1, the better the test).

According to the classification, there are tests aimed atnorm (rating for strong – weak students) and tests focused oncriterion (sorting by difficult - easy tasks).

Based on the nature of the actions, tests are divided intoverbal (expressed in words) andnon-verbal (represented by images).

According to the degree of homogeneity of tasks, tests are divided intohomogeneous (for one discipline) andheterogeneous (in several disciplines).

By purposeusage: beginning of training, progress and difficulties in the learning process, achievements at the end of training. The practice of higher education shows that the most applicable are criterion-oriented, overwhelmingly verbal, homogeneous tests, usually aimed at achievement at the end of training.

Assessment of test items can bepolygamous (if out of 10 elements of the task you did one incorrectly, then the sum of points is 9);dichotomous (did all the elements – 1 point, did not do it – 0 points).

According to the degree of difficulty, tasks can besingle-level , i.e. with a weighting coefficient equal to one andmulti-level with a weighting coefficient from 0 to N.

The length of the test refers to the number of tasks included in the test. Classic test theory states that the longer the test, the more reliable it is. But practice shows that if the test is very long, then motivation and attention deteriorate. In practice, the length of the test should be determined empirically, taking into account validity, testing time, etc. The optimal length of the test, as theory and practice have shown, is 30-60 tasks. The ratio of the test length to the number of test items in the bank should tend to 1:10.

Each test hasoptimal testing time – time from the start of the testing procedure until the onset of fatigue. The spread in the characteristics of the threshold for the onset of fatigue is quite large - from 20 to 100 minutes in one age group. The main causes of fatigue: age, motivation, monotony of the work performed, individual characteristics of the subjects. Therefore, it is necessary to maintain motivation at the required level, diversify the work as much as possible by introducing all forms of tasks and non-verbal support, and also adapt the software product according to the individual characteristics of the subjects. The average estimated time until fatigue for students is 50-80 minutes (maximum duration). And the minimum depends on the forms, number and difficulty of tasks, elements in the task. For example, for an easy closed-form test task with the choice of one element from those proposed, 10-15 seconds are enough. During the testing process, the actual deadlines should be clarified.

Correlation of task forms in the test . Selecting the form of the test taskdepends on the content of the course, the purpose of creating the test, and the skill of the developer. The average layout could be as follows. In a test with a length of, for example, 60 tasks, it is recommended to have no more than 10 test tasks of an open form, approximately 10 each for ratio and sequence, the remaining 30 tasks should be given in a closed form.

2.4 Types of computer control questions

Probably the biggest misconception of the developers of most testing programs is the use of the so-called single sample: the student is asked a question, and given several ready-made answer options (usually five - it is more convenient to display a grade), one of which is correct. Despite the fact that there really is a class of control questions that can be implemented in a similar way, despite the fact that the probability of guessing (20%) is quite low, focusing exclusively on a single sample excludes the richest opportunities for using pedagogical technologies when conducting control.

In addition, it is no secret to anyone how students bypass this type of control - sooner or later, a printout with the correct answers falls into the hands of the students, and the order of answers is simply memorized or entered on a cheat sheet. Not all (in fact, only a few) knowledge control systems implement the function of changing the location of the correct answer during each testing.

What types of questions does the computer version of programmed control allow you to use?

Custom type, or, keyboard input. The most powerful tool for checking various kinds of terms, constants, dates. However, its implementation is usually very mathematically complex and therefore ignored by most developers. The problem lies, first of all, in the fact that the entered phrase must be subjected to syntactic, and ideally, semantic analysis, modeling the possible thinking options of the respondent. In addition, a student can make a typo, and in most areas of knowledge such typos cannot be considered an error - and this requires a very flexible implementation of computer logic, which not every programmer can do. A lot can also be said about the possibility of a student using various synonyms when entering an arbitrary answer, which may not have been provided for by the database developer and at the same time may be absolutely or partially correct. In addition, a free question type may have several possible answers.

There are also a number of variations of the free question type:

Entering multiple answers in a specific sequence can be used in questions about the strict sequence of any operations, relative positions, etc. The type of question is just as difficult to program as an arbitrary one, it is very difficult to construct and causes certain difficulties for students, since it requires not only error-free input of answers, but also their error-free relative position. However, despite its rather rare use, this type is indispensable and is a powerful means of determining the student’s level of knowledge in matters of, for example, the relative position of organs in topographic anatomy, the sequence of transformation of a substance in chemistry, the sequence of actions in various types of repair work, etc. .;

Entering missing parts of lines or letters, Despite its apparent simplicity, it is an indispensable tool for checking the understanding of various language constructs (in Russian and foreign languages, in programming, etc.). Unlike the standard “Free” type of question, as a rule, it assumes unambiguous answer options and is therefore easier to program;

Selective question type. The classic option, which the vast majority of developers consider necessary and sufficient for computer testing. This type of question may require one or more correct answers from those given. Some theorists divide these two varieties into different types of questions, but from the point of view of formal logic, these varieties are absolutely equivalent. The only question is the methodology for deriving results for these varieties.

The computer implementation of this type is extremely simple. Perhaps this is precisely the reason for its widespread use in various types of testing programs. To implement this type, even basic knowledge in any programming language or in programmable office systems such as Excel or Quattro is sufficient.

The selective question type also has variations:

Alternative type is the most simplified form and assumes a ready-made answer already in the text of the question. The subject can only indicate whether the answer is correct or not (i.e. answer “Yes” or “No”). Despite its apparent simplicity, this type can be successfully used in some areas of knowledge.

A variation of the sample type is a question type called " Selection". However, the difference between it and the standard selective type is only in the result output system.

Sequential question type. The most difficult type for students, although quite simple to implement, it gives the teacher a powerful tool for assessing not only specific knowledge, but also logic.

A simplified version of the sequential type - "Rearrangement" assumes that the student is asked a question and given a set of ready-made correct answers. His task is to arrange these answers in the required sequence.

Like the “Sequence” type, this variety can be used in those subject areas where a clear knowledge of the sequence of operations, actions or the correct relative position of objects is required. However, unlike the “Sequence” type, this variety can be used much more widely, since it does not contain the “pitfalls” of students incorrectly formulating any term - all the answers are already on the screen.

A more complicated version of the sequential type - "Arrangement" is the most complex of all types, both in terms of the complexity of programming and the complexity of its perception by students. However, it is this type that provides the most extensive opportunities for checking logic. The construction of a question of this type consists formally in students constructing a graph of a logical structure. The text of the question lists certain numbered provisions (points), and the text of the answers contains conclusions or facts corresponding to these points. The student is required to match the points listed in the question with the ready-made answers.

Chapter 3. Methodology for conducting a computer survey of students

3.1 Methodology for conducting a programmed survey

The problem of organizing collective forms of educational activity is particularly relevant to the specifics of conducting classes in classrooms equipped with a local computer network. The use of the network provides the teacher with new opportunities to manage the educational process, on the one hand, and, on the other hand, provides opportunities for effective independent educational work of students to complete practical tasks.

A local computer network makes it possible to present any action in a detailed sequence of operations, show its result, and the conditions for execution; record intermediate operational results, allows you to interpret and evaluate each step of students when performing tasks, etc.

For a teacher, a computer network makes it possible to carry out both final and operational control, to accumulate final information related to both an individual student and the entire group as a whole. A computer network makes it possible to qualitatively change the system for checking the activities of students, while providing flexibility in managing the educational process. Working on one common database allows you to check the correctness of all tasks and not only record the error, but also determine its nature, which helps to eliminate the cause that caused its occurrence in a timely manner.

The selection of topics and possible options for test tasks are prepared in advance. The content of test tasks is formulated in such a way as to show the applicability in practice of the knowledge and skills necessary for mastering the material.

Individualization of training can be realized by differentiating the content of the presented educational material, as well as selecting test tasks according to the level of complexity.

Selecting the degree of difficulty of tasks plays an important role. Excessively simple tasks do not require mental effort from the student, and therefore inhibit the formation of the necessary skills. Correct completion of relatively easy tasks is not experienced by the learner as success. At the same time, many of the mistakes activate the creative potential of students and have a positive effect on the activation of cognitive needs and on the motivational sphere.

The means of creating educational and cognitive motivation can be both the content of the test task and the form of organizing the activity (educational-game, group, individual).

At the discretion of the teacher, students may be offered a plan for completing the test task, and they are allowed to work with workbooks and literature. The teacher can maintain the interest of students by engaging in the process of discussing individual nuances when performing a test task.

The lesson can be structured in such a way as to direct it to the maximum development of students. To do this, at the moment when they have a feeling of completion of the proposed test tasks, the teacher, in order to activate further cognitive activity of students, can pose problematic questions to them on the topic being studied, arousing cognitive interest. As a result of resolving this difficulty, students gain new knowledge and skills. Thus, the work of a group of students on completing test tasks can be carried out in the mode of sequential problem solving.

After students complete their work on test tasks, the teacher can organize a group discussion and collective discussion of the tasks that caused the greatest difficulties. It is advisable to include issues that remain unexamined in the discussion and find out possible ways to solve them. Thus, the teacher can not only exercise control, but also becomes the organizer of the process of independent active acquisition of new knowledge by students.

3.2 Processing test results

The issue of deriving an assessment is probably one of the most difficult and controversial in pedagogy. In fact, it is easy to ask a question, but determining whether the student answered correctly, how correctly he answered, whether he thought correctly despite the wrong answer is a task that is far from completely solved. Accordingly, the computer analogue of deriving an estimate suffers from the same shortcomings, if not more.

In most programmed control systems, the principle of outputting the result is simple. Since in such systems, as a rule, only a single sample is used, the score is calculated simply: answered - plus, did not answer - minus. Then the number of pros and cons is reduced to a five-point scale and a score is displayed.

This principle of deriving an assessment, although it is primitive, however, in the case when all the questions in the database are equivalent and of the same type, it also has a right to exist. However, the direct reduction of the number of positive and negative answers to a five-point system deserves serious criticism. It is generally accepted that the acceptance rate threshold is 70%. In the case under consideration, to receive a passing grade (i.e. “satisfactory”), it is enough to answer 51% of the questions correctly, to receive a “good” grade – 71%, and to receive an “excellent” grade – 91%.

However, the practice described above, as a rule, does not occur, since the inequalities of questions in the database are obvious to all developers of testing systems. There is another technique when developers give the teacher the opportunity to determine the “weight”, i.e. the relative importance of each question in the database.

This technique, despite its apparent effectiveness, also has its drawbacks. The fact is that from the point of view of pedagogical theory there are no simple and complex questions (if we talk specifically about questions, and not about mathematical and logical problems that require a multi-component solution). A simple question will always be for those who know the answer. And difficult - for those who don’t know the answer. Thus, by arranging the “weights” of questions, the teacher actually arranges them in accordance with his own ideas about their complexity, in accordance with his level of competence or incompetence.

However, there are questions that require more or less time to answer. It would be logical to assume that for each question, from the point of view of psychophysiology, a greater or lesser number of mental (so-called essential) operations can be spent. Determining this number, as a rule, is not difficult; in simple types of questions it is equal to the number of proposed answer options, and is completely amenable to automation.

Thus, at the moment there are two ways to determine the result of an answer - by correct or incorrect answers to the question as a whole and by significant operations. When choosing an assessment principle, it should be assumed that the assessment for significant operations is more flexible and objective, since it allows you to identify incomplete, not entirely correct, partially erroneous and other similar answers and calculate them in specific figures for the absorption coefficient.

The flexibility of using the evaluation method for significant operations lies in the possibility of entering a so-called “soft evaluation”. The answer-based grading system generally always uses a “hard grade” – i.e. If a student makes a mistake, the entire question will not count. However, this assessment method is not justified for all issues. For example, in most questions that have several options for correct answers (selective type of question is implied), it is not necessary to mark all correct answers in full. In such questions, it is entirely acceptable to have either a partially correct answer or, conversely, no wrong answer. Using the principle of evaluation based on significant operations allows in such questions to determine the coefficient of correctness of the answer and count partially correct answers.

Conclusion

One of the significant trends in the development of education is the search for innovative methods of knowledge control that meet the requirements of objectivity, reliability, and manufacturability. At the present stage, among effective methods for assessing the abilities and achievements of students, an important role is given to computer control of knowledge, which today is successfully used in educational institutions at various levels - from schools to universities.

Compared to traditional forms of control, computer testing has a number of advantages: quick receipt of test results, freeing the teacher from the labor-intensive work of processing test results, unambiguous recording of answers, confidentiality during anonymous testing.

Having analyzed modern literature on this issue, the following requirements for a unified automated testing system were identified:

protection against unauthorized access to test questions. This problem can be solved by means of data encryption;

unlimited test base, which is designed both for test variety and for less repetition of questions;

simplicity of the program interface. Many specialists, especially whose specialization is not related to information technology, are quite poor at handling computers and computer programs, so the understandability and accessibility of the interface is an important requirement for the testing system;

ease of test administration. This requirement is also important. The easier the environment for developing themes and tests is, the fewer questions will arise regarding working on a computer. Ease of administration is achieved by using a separate program for creating or entering topics and tests into the database and setting parameters;

full automation of the testing process. Testing must be carried out without the supervision of teaching staff over the course of testing. Therefore, the entire process - from asking test questions by the teacher, identifying a specialist, conducting testing, to evaluating the result obtained and entering this result into a data file, must take place in a completely autonomous mode;

loading speed. This criterion is important for computers with low speed. A person should not wait for a long time for the question to load. Every picture and graph must be optimized or compressed. They should not contain redundant information, but include only the necessary part;

portability to different platforms with support for Microsoft Windows GUI;

accounting of requests. Each test should be recorded to ensure control. This is necessary to account for failed testing attempts in case the test was interrupted for any reason. This will provide control over user actions;

targeting non-programming users. Using a testing program should not require experience with other applications;

The testing system must support multimedia files (graphics, video, sound, animation). This is necessary for asking complex questions, such as displaying graphs, drawings, videos, etc.

Analysis of the literature made it possible to identify the following types of computer knowledge control questions: free type, or keyboard input; entering several answers in a certain sequence (ranking); entering missing parts of lines or letters; selective question type; alternative question type; sequential question type. To effectively control knowledge, it is necessary to competently use all types of questions.

At the moment, there are two ways to determine the result of an answer - by correct or incorrect answers to the question as a whole and by significant operations. When choosing an assessment principle, it should be assumed that the assessment for significant operations is more flexible and objective, since it allows you to identify incomplete, not entirely correct, partially erroneous and other similar answers and calculate them in specific figures for the absorption coefficient.

The test system has the following important characteristics:

adaptability, i.e. the ability of the system to adapt to changing conditions (hardware and software);

openness is determined by the ability of the system to adapt to the control of specific academic disciplines;

the standardization of the system is expressed by the use of functions and design used in general use programs;

the unification lies in the fact that on the basis of this system it is possible to create similar ones.

The knowledge control system implemented in the course of this study is an automated support for students’ independent work, allowing for monitoring and self-monitoring of the level of material mastery, and acting as a simulator in preparation for exams.

The developed knowledge control system will solve the problem of automating the creation of tests and testing procedures, and can be used to control the process of students mastering the material of various academic disciplines.

Lecture 11. Computer testing in education.

1.Specifics of computer testing and its form.

2. Innovative forms of test tasks for computer testing.

3. Computer adaptive testing.

4.Online testing, its application in distance learning.

1. Specifics of computer testing and its forms

General ideas about computer testing. Since the beginning of the 21st century. Computers have become widely used in testing education. A separate direction has appeared in pedagogical innovations - computer testing, in which the presentation of tests, assessment of student results and delivery of results to them is carried out using a PC.

When is it necessary to use computer testing? Although computer testing greatly facilitates the teacher’s work when presenting and assessing test results, its spread is in many ways nothing more than a tribute to fashion, all the negative consequences of which have not yet been fully identified. The choice of a computer-based exam format should be based on more important and valid reasons than just a passion for innovation, since it creates many problems and puts students at an unequal disadvantage. Computer testing should be used in cases where there is an urgent need to abandon traditional blank tests.

For example, computer testing is necessary when conducting the Unified State Exam in hard-to-reach areas of Russia. Gathering graduates of individual districts at the designated time of the Unified State Exam becomes such a complex and expensive undertaking that it is simply impossible to do without computer testing and modern means of communication. It is also advisable to use computer testing when conducting exams for children with disabilities who have serious visual or hearing impairments. Using a PC, you can use larger fonts, audio recordings, additional devices for entering test data, and other devices that compensate for the potential lag of children with disabilities on exams.

Forms of computer testing. Computer testing can be carried out in various forms, differing in the technology of combining tasks into a test (Fig. 17). Some of them have not yet received a special name in the literature on testing issues.

Fig. 17. Computer testing forms

The second form of computer testing involves the automated generation of test options, carried out using tools. Options are created before the exam or directly during it from a bank of calibrated test items with stable statistical characteristics. Calibration is achieved through lengthy preliminary work on forming the form, the task parameters of which are obtained on a representative sample of students, usually over 3-4 years using form tests. The content validity and parallelism of the options are ensured through a strictly regulated selection of tasks for each option in accordance with the test specification.

The third form – computer adaptive testing – is based on special adaptive tests. The ideas of adaptability are based on the idea that it is useless for a student to be given test tasks that he will certainly complete correctly without the slightest difficulty, or that he is guaranteed to fail due to high difficulty. Therefore, it is proposed to optimize the difficulty of tasks, adapting it to the level of preparedness of each test taker, and to reduce the length of the test by eliminating some tasks.

Advantages and disadvantages of computer testing. Computer testing has certain advantages over traditional form testing, which are especially noticeable during mass examinations, for example, during national exams such as the Unified State Exam. Presenting test options on a computer allows you to save money that is usually recommended for printing and transporting blank tests.

Other advantages of computer testing are manifested in ongoing monitoring, self-control and self-preparation of students; Thanks to the computer, you can immediately issue a test score and take immediate measures to correct the assimilation of new material based on the analysis of protocols based on the results of corrective and diagnostic tests. The possibilities of pedagogical control during computer testing are significantly increased by expanding the range of measured skills in innovative types of test tasks that use the diverse capabilities of the computer when including audio and video files, interactivity, dynamic formulation of problems using multimedia tools, etc.

In addition to its undeniable advantages, computer testing has a number of disadvantages, which are presented in Fig. 18.

Fig. 18 Problems arising during computer testing

Typical psychological and emotional reactions of students during computer testing. Typically, students' psychological and emotional reactions to computer testing are positive. Students like the immediate issuance of test scores, a testing protocol with results for each task, as well as the very innovative nature of control when modern hypermedia technologies are used to issue the test. Dynamic multimedia support of tasks on a computer, combined with software for presentation in an interactive mode, according to students, provides a more accurate assessment of knowledge and skills, and is more motivating to complete tasks compared to blank tests. It is also convenient that instead of filling out special forms for answers, you can simply select an answer with the mouse. If testing is carried out in adaptive mode, the exam time and test length are reduced.

Negative reactions usually cause various restrictions, which are sometimes imposed when issuing tasks in computer testing. For example, either the order in which tasks are presented is recorded, or the maximum possible time for completing each task, after which, regardless of the subject’s wishes, the next test task appears. In adaptive testing, students are unhappy that they do not have the opportunity to skip the next task, review the entire test before starting work on it, and change answers to previous tasks. Sometimes students object to computer-based testing because of the difficulties that arise in performing and recording mathematical calculations, etc.

The impact of prior level of computer experience on test performance. The results of foreign studies have shown that the experience of working with computers that schoolchildren have, in many cases, significantly affects the validity of the test results. If the test includes non-innovative multiple-choice items, then the impact of computer experience on test scores is negligible, since such items do not require students to perform any complex actions during the test. When innovative types of tasks are presented on the screen, making extensive use of computer graphics and other innovations, the influence of previous computer experience on the test score becomes very significant. Thus, in computer-based testing, it is necessary to take into account the computer level of the students for whom the test is intended.

The influence of the user interface on the results of computer testing. The user interface includes the functions available to the student and the ability to move through test tasks, elements for placing information on the screen, as well as the general visual style of presenting information. A good user interface should have a clear and logical flow of interaction with the examinee, reflecting the general principles of graphical information design. The more elaborate the interface, the less attention the student pays to it, concentrating all his efforts on completing the test tasks.

2. Innovative forms of test tasks using computer testing.

Goals of developing innovative tasks in computer testing . Innovative tasks that use the capabilities of computer testing are today the most promising direction in the development of automation of pedagogical measurements. The main reason for this is the great potential of innovative tasks to increase the information content of pedagogical measurements and increase the content validity of tests.

The main goal of developing innovative tasks for computer testing is to assess those cognitive skills, functional literacy and communication skills that remain undetected by traditional control or the use of blank tests.

The subject of assessment for innovations can be the level of analytical and synthetic activity of the student, the speed of generalization of new information, the flexibility of the thought process and many other indicators of mental activity that were formed during the learning process and cannot be assessed using conventional tests.

Possibilities of innovative tasks in computer testing. In the use of innovative tasks, two aspects can be distinguished: didactic and psychological-pedagogical. The first involves a detailed, meaningful interpretation of test results in the context of the cognitive, academic and general educational skills mastered at the time the test was presented, and the second allows you to assess the level of development of the student’s thought processes and identify the characteristics of his assimilation of new knowledge. Most of the innovative tasks developed to date provide measurement improvement in both directions. Thus, innovative tasks make it possible to expand the capabilities of the pedagogical measurement itself by obtaining results in new, previously inaccessible areas of assessing the quality of students’ preparedness. For example, to assess the level of development of functional literacy to examinees, you can offer a passage of text that contains errors, and then ask them to identify them and correct them by retyping sections of the text.

Innovative tasks help reduce the influence of random guessing. by increasing the number of possible answers without increasing the cumbersomeness of test items. For example, when assessing reading comprehension, you might ask the student to select a key sentence in the text and point to it with a mouse click. Thus, each sentence in a text passage becomes an option to choose from instead of 4-5 answers in traditional tasks with ready-made answers. To improve the form of tasks, complex drawings and dynamic elements, including images, animation or video, are used; thereby reducing the time required to read the condition. The expansion of testing capabilities occurs when the sound is turned on, which allows you to conduct a dialogue with the student, evaluate the phonetic features of his pronunciation when testing in a foreign language, and check the correct interpretation of various sounds.

The main directions of innovation in the development of tasks. Innovations in the development of computer-based testing tasks cover five interrelated areas. These include: the form of the task, the subject’s actions when answering, the level of use of multimedia technologies, the level of interactivity and the scoring method.

Innovations in the form of a task include visual and audio information series or a combination of them. Visual information can be realistic (photo, film) or synthesized (drawing, animation) in nature. The type of information in combination with the test form determines the format of the answer selected or created by the test taker. When using photographs or drawings, the information contained in test tasks is static. Film, reflecting the real world, and animation bring dynamics to the execution of the test.

The student’s actions when answering tasks depend on those innovative tools that are included in the test. When audio information is included in tasks that require a student's vocal response, a keyboard, mouse, or microphone is used to respond. A significant place in the answers is given to interactive processes. The interactive mode of work of students during computer testing means the alternate delivery of audiovisual information, in which each new statement on the part of the student or the computer is constructed taking into account previous information from both sides. When organizing an interactive mode in computer testing, an on-screen menu is mainly used, in which the student selects, creates or moves objects - components of the answer - to answer test tasks. Less often in interactive mode, voice input of a response is used.

In general, the level of interactivity provided in computer-based testing characterizes the degree to which a particular form of the task reacts or responds to input from the examinee. This level varies from the simplest case, when one step is taken, to complex, multi-step tasks with branching after each successive student answer.

Comparative characteristics of innovative forms of tasks during computer testing for various purposes of improving pedagogical measurement are given in Table. 5.

Problems that arise when using tasks of increased difficulty in computer testing. Tasks of increased difficulty always require more time to answer, regardless of whether they are presented using computer simulation of virtual reality, whether they take the form of laboratory work, essays, or use multimedia technologies. Due to time costs, the number of complex tasks should be small - no more than 10-15%, in some cases - 20-25%. The variety of sound and visual images in computer testing leads to fatigue among schoolchildren, therefore, when including even a small number of difficult innovative tasks in the test, it is necessary to significantly reduce the length of the test, which negatively affects the content validity, reliability and information security of the pedagogical measurement.

Despite the advantages of innovative forms of tasks presented using a computer, they need to be treated with caution and their adequacy to measurement purposes and appropriateness in the test should be carefully analyzed. Typically, innovative tasks of high difficulty are isolated in a separate block and placed at the end of the test* Their completion should not take up time from the weakest students, who most likely will not reach the end of the test.

Table 5

Comparative characteristics of innovative forms of tasks in computer testing

Target improving the pedagogical dimension	Characteristics of the response form	Basic directions of innovation	Characteristics of difficulty Tasks
Reduce guessing effect	The answer is numeric (or text), constructed by the student, using keyboard or voice input via microphone	Using a task form with a constructed answer	Usually high
Increase content validity	The answer is selected mouse on a graphic image, using a regular menu or hypertext	Usage audiovisual series. Enabling media without interactivity	Low or average
Provide increasing construct and content validity	The answer is selected mouse on a graphic image, additional information is requested, hypertext is used	Using multimedia to simulate the natural environment and user actions within it. Representing objects using animation outside of interactivity	Medium or high
Expand the ability to measure intelligence skills, cognitive skills	The answer is carried out by moving objects on the screen and is constructed by the student, using the keyboard, left and right mouse buttons. Interactive possible	Using a task form with a constructed answer and the simplest level of interactivity	Medium or high
Provide opportunity assessments creative and practical skills	When constructing an answer, students must use a two-stage or multi-stage branching interactive transition to various stages of completing the task.	Using a task form with a constructed answer and complex level interactivity	Average or high
Ensure increased construct and content validity; expand content coverage; realize the ability to measure communication and intellectual skills, cognitive skills	The answer is modeled by the student step by step using a multi-stage branching interactive transition to the various stages of completing the task and virtual reality	Actions of the subject when answering	High

Calculating student scores. If computer testing does not use multimedia and interactive technologies, then the calculation of students' primary scores is carried out traditionally by summing the scores on individual tasks. The involvement of multimedia technologies leads to multidimensionality of test results, since the assessment of a whole range of creative, communicative, general subject and other skills using innovative forms of tasks is always associated with several measurement variables. The emergence of interactivity further complicates the procedure for calculating student scores; it becomes dependent on the examinee’s answer at each step of completing test tasks and requires polytomous assessments.

Checking the results of completing tasks with a constructed, regulated answer is carried out by comparing the examinee’s answer with a standard stored in the computer’s memory, and includes various synonyms for the correct answer and an answer with acceptable spelling errors.

Much more difficult is automated scoring in tasks with freely constructed answers (such as essays) in the humanities. Today, foreign testolosWe have developed special programs for automated essay checking. The assessment criteria in these programs are quite varied: from consideration of superficial characteristics of an essay, such as length and degree of completeness of the answer, to complex cases of analysis using advances in computational linguistics. Typically, all of these various automated scoring programs require expert input only at the point of entry, when qualified educators need to “train” the computer program to score any long-form answers.

Fixed length tests, computer generation of parallel test versions

The main components of the automated test layout process for computer presentation. The process of automated test composition in the case where it. occurs in advance and not in an adaptive mode, includes the assembly (generation) of parallel options, the choice of a rule for calculating the scores of tested students and the correction of options to fulfill the requirements of the theory of pedagogical measurements.

Inevitable differences in the difficulty of options that arise due to the existence of measurement errors are eliminated after testing by leveling the scales obtained by calculating test scores for individual test options. Related issues, the solution of which is also necessary for automated test layout, include work on filling the bank of test tasks and assessing the information security of testing.

Computer generation of parallel test versions of fixed length. Automated assembly of a test with a fixed number of tasks assumes the presence of a set test length, its specification and bankcalibrated tasks. An efficient bank that supports the generation of a multivariate test should include task frames of varying difficulty for each content element with stable parameter estimates. With the help of special software and tools, an analogue of the traditional form test is obtained, ready for presentation a few minutes after the start of generation and providing high quality pedagogical measurements.

Automated test layout methodfor computer presentation in mode offline (without using localcomputer networks or the Internet) or in online (using local computer networks or the Internet) is called automated test design. The purpose of the design is to generate test options that satisfy a number of conditions, which include: the number of tasks, content structure, frequency of selection of tasks into options, as well as a number of requirements that ensure the generation of parallel test options.

Option layout technology must support systematic control over. frequency of inclusion of each task from the bank in the test. The number of identical tasks in parallel options used to level the scales across options should not exceed 15-20%. To control the frequency of inclusion of a task in the options, the maximumpossible percentage of selection of each task from the bank. When it is reached, the task ceases to be used in further test generation procedures.

Typically multiple parallel or quasi-paralleltest options are created in the mode offline for subsequent presentation in the mode online , including interactiveinteraction with students. Day of expanding the communication capabilities of computer control in real time It is recommended to use adaptive testing: which provides step-by-stepoptimization of the selection and difficulty of tasks when generating an adaptive test (see section 8.4).

3. Computer adaptive testing

Adaptive testing and its capabilities. The emergence of adaptive testing was caused by the desire to increase the efficiency of pedagogical measurements, which, as a rule, was associated with a decrease in the number of tasks, time, and cost of testing, as well as with an increase in the accuracy of student assessments. The adaptive approach is based on the individualization of the procedure for selecting test items, which, by optimizing the difficulty of the items in relation to the level of preparedness of students, ensures the generation of effective tests.

Optimizing the difficulty of tasks is usually carried out step by step. If the student completes the task correctly, then he is given a more difficult task. If the task is completed incorrectly, a retreat is made to easier bank tasks. If three tasks in a row are not completed, the process is stopped using special methods (most often using theories IRT ) the student’s score is determined for completed tasks on an adaptive test created specifically for him. Thus, in a computer adaptive presentation, the number of test tasks and their difficulty are individually selected for each examinee based on his answers, and the individual set of tasks forms an adaptive test. Adaptive tests in a group of subjects consist mainly of different tasks and differ in the number and difficulty of tasks the more, the greater the spread among the subjects of the test group in terms of preparedness.

It is impossible to obtain a simultaneous increase in measurement efficiency for all criteria, so usually when organizing adaptive testing, one comes to the fore; at best, two criteria. For example, in some cases, during express diagnostics in adaptive mode, the greatest attention is paid to minimizing test time and the number of tasks presented, and issues of assessment accuracy fade into the background. In other cases, measurement accuracy may be a priority and testing of each subject continues until the planned minimum measurement error is achieved.

The length of the adaptive test is significantly affected by the quality of the structure of students' knowledge. Typically, subjects with a clear structure of knowledge perform tasks of increasing difficulty, clarifying their assessment of preparedness with each successive correctly completed task. They complete a small number of adaptive test items and quickly reach the threshold of their competence. Students with an unclear structure of knowledge, who alternate between correct and incorrect answers, receive assignments that fluctuate in difficulty. The testing process is delayed because when the difficulty of tasks changes abruptly, there is no step-by-step increase in measurement accuracy and the number of tasks adapted for difficulty is often even greater than in a regular, traditional test.

Benefits of adaptive testing. Some of the important advantages of computerized adaptive testing include:

High efficiency;

High level of secrecy;

Individualization of the pace of test execution;

A high level of motivation for testing among the weakest students due to the exclusion of unnecessarily difficult tasks from the process of presenting them;

Communication of the result in an interval scale of test scores to each subject immediately, immediately after completion of his work on an individually selected set of tasks in the adaptive test.

Adaptive testing strategies. Strategies for presenting test items in adaptive testing can be divided into two-step and multi-step, according to which different technologies for generating adaptive tests are used. A two-step strategy involves two stages. At the first stage, all subjects are given the same input test, the purpose of which is to carry out preliminary differentiation of students along the axis of the measurement variable. Based on the results of differentiation, at the second stage an adaptive mode is organized and adaptive tests are built. :

As a result of the development of the theory IRT , which provides a single interval scale for assessing test subjects’ parameters and the difficulty of test items, it became possible to optimize the procedure for selecting items to model effective adaptive tests in a new way: Multi-step adaptive testing strategies began to develop, within the framework of which, in the process of completing sets of tasks, each subject moves in his own way individual trajectory.

Multi-step adaptive testing strategies are divided into fixed-branching And variable-branching depending on how multi-step adaptive tests are constructed. If the same set of tasks with their fixed location on the difficulty axis is used for all subjects, but each student moves through the set of tasks individually depending on the results of completing the next task, then the adaptive testing strategy is fixed-branching.

Difficulty tasks in a set of tasks are usually placed at an equal distance from each other or a decreasing step is selected in accordance with the increase in difficulty, which allows you to adjust the pace of testing to the test subject, since as the tasks are completed, fatigue increases and motivation to complete test tasks decreases.

The variable-branching adaptive testing strategy involves selecting tasks directly from the bank using certain algorithms that predict the optimal difficulty of the subsequent task based on the results of the test subject’s performance of the previous adaptive test task. Thus, step by step, an adaptive test is obtained from individual tasks. It varies not only the difficulty, but also the step, determined by the difference in the difficulties of two adjacent adaptive test tasks. A distinctive feature of the varying branching strategy of adaptive testing is a step-by-step reassessment of the test subject’s level of preparedness, undertaken after each completing the next test task.

Rice. 19. Variable multi-step testing algorithm

The algorithm that implements the varying adaptive testing strategy is cyclic in nature and has the form shown in Fig. 19.

Entering and exiting adaptive testing . The choice of initial assessments for entering adaptive testing is carried out differently, depending on the type of strategy and the available technological capabilities when generating adaptive tests. One of the methods for determining initial assessments is based on issuing an entrance protest to subjects before the start of adaptive testing. The protest usually includes 5-10 tasks from spread out sections of content, covering in difficulty the entire range of the expected location of the tested sample of students on the axis of the measurement variable. Sometimes entrance testing is replaced by a self-adaptation process, in which the test taker is offered a set of tasks of increasing difficulty. He performs a task that reflects the level of his knowledge and skills.

To exit the testing mode, either time limits or restrictions on the number of tasks are introduced, or the planned measurement accuracy is specified. The focus on accuracy when organizing adaptive cycles gives rise to a variety of individual trajectories of subjects, which can be visualized in the form of broken lines. The vertices of the broken line correspond to individual tasks of the adaptive test, the length of the link is determined by a varying step, the size of which is equal to the difference in the estimates of the difficulty parameter of two adjacent tasks of the adaptive test. Obviously, the shorter the length of the broken line, the better the structure of the student’s knowledge and the more effectively the adaptive test tasks are selected according to difficulty (Fig. 20).

Rice. 20. Visualization of individual trajectories of subjects: task numbers in circles

In Fig. Figure 20 shows the adaptive testing trajectories of three students who began their entry into the adaptive mode based on the results of the protest. The higher the top of the broken line is located, the more difficult the first task of the adaptive test was. At the time of entering the protest, the first student showed the highest result, so he begins adaptive testing with a more difficult task. For the convenience of discussing the visualization results, the figure shows non-intersecting trajectories. A “plus” is placed over the broken lines in cases where the subject completed the task correctly, or a “minus” if the subject completed the task incorrectly. A simple rule was chosen as the criterion for ending testing: testing stops if students complete three tasks of the adaptive test in a row, correctly or incorrectly.

Despite the high initial score, the first student appears to have poorly structured knowledge, as evidenced by the alternation of correct and incorrect answers. Testing of the first student stops if he manages to cope with three consecutive tasks of the adaptive test. The second student's answer trajectory is much shorter due to well-structured knowledge. After failing the first task, he gets everything right and therefore quickly completes the adaptive test. The third student is the weakest. He starts testing with the easiest task, which he cannot cope with. He also performs the second, easier task incorrectly. Finally, after three consecutive incorrect answers, he quits the adaptive test.

The presented figure is an idealization illustrating real situations of varying multi-step strategies for generating adaptive tests, in which, after completing each task, the current assessment of the level of preparedness is recalculated to select the next adaptive test task.

Reliability, validity and test length in adaptive testing. As with traditional testing, the selection of tasks for adaptive tests is carried out in accordance with the test specification. Optimizing difficulty; you can only reduce the number of tasks presented for each section and at the same time maintain a meaningful test plan for each subject. Thus, adaptive testing, regardless of the strategy for presenting tasks and their number, should ensure high content validity of each generated adaptive test.

Reliability in adaptive testing depends on a combination of factors. These include: the number of tasks, the presence of systematic control over the frequency of selection of bank tasks when generating an adaptive test. Reliability is also influenced by the characteristics of the bank of test items related to the quality of measurements (stability and range of variation in difficulty estimates) and the quality of the input (starting) control.

The adaptive algorithm is organized in such a way that after each next presentation of the task, the difference between the obtained and planned measurement accuracy is checked. Once the planned accuracy is achieved, the task selection algorithm is suspended, and the expected reliability of the adaptive test is achieved.

5.Online -testing, its use in remote

training

Levels of interactivity . In the simplest understanding of the interactive learning mode, the student has the opportunity to receive (read, watch, listen) only the information that he chooses to learn using a computer. The increasing complexity of the capabilities and technology for implementing the interactive mode leads to modeling the surrounding world and the behavior of objects in it, making it possible to simulate reality.

Of course, today, for many reasons, not all the possibilities of the interactive mode are used in teaching. In particular, according to A. G. Shmelev, who is the largest specialist in Russia in the use of interactive technologies in educational and psychological testing (Teletesting system), non-interactive forms of presenting educational information predominate on the modern Internet.

The simplest interactive mode on the local network and on the Internet. In accordance with the classification of computer networks into local and global, the simplest interactive mode is organized within one room, or educational institution, or using the Internet. As a rule, interactivity is based on asynchronous communication, when the teacher’s reaction to test results is delayed due to the time required to check the test in an automated mode and calculate students’ scores based on the results of its completion.

In the first case, when several tens or hundreds of computers are connected to a local network, a special implementing program - an instrumental shell - ensures the issuance of tasks online -test for the entire group of test takers, usually in an individual time mode. A task for one of the parallel options, a test, appears on the screen of each computer on the local network. When ensuring information security for the entire group of students, only one version of the test can be used.

Execution online -test using the Internet has no fundamental differences from the case of using a local network with the simplest level of interactivity without an adaptive mode, when all students perform the same versions of the test. The tasks overwhelmingly require students to select one or more correct answers using familiar dialog objects such as selector buttons ( radio-buttons ). Test scores are calculated by comparing students' answers with the key and, most often, comes down to simple summation. The final test score can be sent via email.

The time spent on presenting the test result is determined by the duration of the transfer (usually from several seconds to several hours) and the time interval that will pass until the student reads the mail that came to him. In some cases, when a student requires documentary evidence of scores, test results can be delivered offline by recording onto a storage medium. Thus, a low level of interactivity is quite suitable for final testing outside the adaptive mode, when the student must work without the help of a teacher, and obtaining results may be delayed in time.

Average level of interactivity in online testing.In ongoing monitoring during distance learning, an average level of interactivity is usually implemented. In accordance with the possibilities of synchronous exchange of information in real time using Internet pagers, the student is provided with assistance and advice from the teacher when completing tasks of corrective and diagnostic tests.

With a medium level of interactivity, a lot of variety take the form of test tasks. In particular, the student has the opportunity to edit the text presented in the assignment by introducing new sentences or replacing one part of the text with another. In tasks to establish the correct sequence, immediately after the subject has selected a certain order of elements, the computer displays a new sequence on the screen, etc. If time zones do not interfere with the establishment of synchronous communication, the interactive immediately provides the “teacher nearby” effect, thanks to which the student receives help, assessment or hint from the teacher when performing current control tasks.

High level of interactivity in online testing.A high level of interactivity is ensured in cases where sound and video are used when interacting with a teacher, which requires significant financial costs, but easily allows you to identify the identity of a student taking a test in remote control.

From a pedagogical point of view, adaptive testing meets a high level of interactivity, including extensive technologies for optimizing the difficulty of tasks depending on the student’s answers to each previous task of the adaptive test.