Specific Open-Ended Assessment: Assessing Students' Critical Thinking Skill on Kinetic Theory of Gases

ABSTRACT


INTRODUCTION
Critical thinking skill is owned by active students.This skill is related to others, such as scientific communication and self-confidence (Wismath, Orr, & Zhong, 2014) and students' motivation (Hu, Jia, Plucker, & Shan, 2016).However, Hashim & Samsudin (2019) found that some aspects of students' critical thinking skills were still at the middle level.
Purwati, Hobri, & Fatahillah (2016) also found a similar result, as many as 32.2% of students studied still had low critical thinking skills and 42.8% of the moderate category.Matsun, Sunarno, & Masykuri (2017) found the average value of students in critical thinking skills at a low level with a mean of 65.70.It shows that the level of students' critical thinking skills in Indonesia is very low.This ability has become the main key in policymaking (Szenes, Tilakaratna, & Maton, 2015).Therefore, research on this ability still needs to be done, especially in a specific topic in the learning process.
Measuring critical thinking skills in Physics is found to lack scholars' agreements.In the beginning, scholars suggested that measuring critical thinking skill evaluate the general skill of thinking.However, further studies found that thinking skill is related to the critical point of view on certain issues.Hence, in this current era, the development of critical thinking is considered an important educational goal, with always increasing in now (Kettler, 2014).Nowadays, it takes anyone with many variation skills such as critical thinking, problem-solving, and the application of some way in thinking process (Ghazivakili et al., 2014).In this era of learning environment, the student should at the advance level about critical thinking skills for their success in life (Kong, 2014).
As an important aspect of the learning process, the teacher must have the critical thinking skill to teach the students anything about critical thinking (Fuad, Zubaidah, Mahanal, & Suarsini, 2017).Because, critical thinking skills help the teacher in the teaching process, especially at discussion and debate with the students in difficult objects (Nasution, Harahap, & Manurung, 2017), such as physics and math.
A lot of effort is being made to improve intelligence and general trends towards critical thinking (Huber & Kuncel, 2015).There are many studies about measuring critical thinking test.Liu, Mao, Frankel, & Xu, (2016) have designed an assessment test to measure the students' critical thinking skills in the dimension of analytical and synthetic.Pascarella et al (2014)  and is a fundamental basic study for future research that focuses on the integration of CT skills in the certain subject matter.This study focuses on CT skills in the kinetic theory of gases (CTKTG).
Open-ended is different from an interview or questionnaire tests because structured questionnaires limit the explanations of the experiences of participants (Tran, Porcher, Falissard, & Ravaud, 2016) The importance of open-ended test first and foremost, it can break the opinion with the right solution (Klavir & Hershkovitz, 2014).They allow respondents to write their answers in their own words (Lee & Lutz, 2016;Popping, 2015) and do not limit their answers (Schonlau & Couper, 2016).They can provide new and valuable answers that may not have been thought of by previous researchers (Gurel, Eryilmaz, & McDermott, 2015).In other words, open questions provide a wealth of information to researchers we decided to measure the aspect CT using essay (open-ended).For that reason, this study shows the results of the reliability, validity, and other aspects of developing a test designed to measure CT skills, specifically on the kinetic theory of gases.This study aimed to develop a CT test and assess the level of students' critical thinking skills.

METHODS
This research was the development of research using the 4D model.The 4D model consists of four stages, including define, design, develop, and analyze.The summary of this model as shown in Figure 1.

Define
The first stage in developing the CTKTG test was defining critical thinking (CT) and selecting the CT skills that should be targeted in the test.Table 1 includes the test from any researchers collected by (Tiruneh et al., 2017),

Design
The second stage was to design the format of the items used and the topic in physics.In this study we used open-ended format.We designed the CTKTG test based on the aspect of CT, indicator, and sub-topic.We also designed the criteria of students' CT level on the Kinetic theory of gases.

Develop
The third stage was to develop items with the CT component that is matched with the topic with the kinetic theory of gases and then tested on a small number of students.The CTKTG test was initially tested in four sample groups: interviews with the expert review (N = 3), professional physics teachers (N = 2), and graduate school students (N = 2), students from secondary schools (N = 29).
All items were reviewed by experts with following the criteria by Dawit Tibebu Tiruneh et al., (2017): (a) Are the items suitable for measuring CT skills in the desired domain?(b) Is the item statement clear, complete, and suitable for the participant?
After reviewing the component, the reviewer asked to do the content validation.Content validation is one of the psychometric methods that aimed to assess the intended to be measured precisely or not (Cheng et al., 2016).This involved subjective opinions of "experts" about items that are judged by three categories: "important," "useful, but not important," or "unnecessary." In assessing items that were "important", we can calculate it using the following formula (1) using the content validity ratio (CVR).Items that are considered "important" were then inserted into the final instrument, while items that "fail" reach the critical level removed (Ayre & Scally, 2014).
ne is the number of panelists indicating "essential" and N is the total number of expert reviews.The minimum value of CVR, as shown in table 2, Two physics professors, one doctor, two magister students in the Graduate School Program at Yogyakarta State University, and two professional physics teachers were asked to review the 10 items.The review process of each item based on the accuracy of information and clarity of diagrams, phrases or words.

Small-scale paper-pencil administration
After the review process has been finished based on expert advice, the CTKTG items were administrated to a small group of students (N=29).The main purpose of this test is to determine whether the response can be assessed based on the assessment guide developed, and obtain an estimate of the time needed to complete the test.

Item Administration
The last step was to conduct a largescale trial after going through the developing stage.The test was modified based on the revised results in the initial test.After that, the CTKTG test was given to a group of students in class XI, science students (N = 55).
The administration of the test lasted in 90 minutes.After incorporating all the revisions, the test was administered to physics students (N= 55) in the science class of Senior High School in Yogyakarta.Item administration was following a step by Tiruneh, De Cock, Weldeslassie, Elen, & Janssen, (2017), before began the test the researcher conveyed to the students the purpose of the test, general direction on how to answer the item, and instructions for taking the test seriously and being told about the time took about one hour to complete.

RESULTS AND DISCUSSION Define
The result of this stage is the design of critical thinking components.Component of critical thinking skills for the CTKTG test is compiled based on the Ennis-weir CT essay test after reviewing all the tests mentioned above about the criteria by the author.The test focused on the following elements of CT skills: reasoning, argument analysis, hypothesis testing, likelihood and uncertainty analysis, and decision-making.

Design
The result of this stage was to design the format of the items used and the topic in physics.The CTKTG test based on the aspect of CT, indicator, and subtopic as shown in Table 3, Students were asked to complete 10 questions according to aspects of CT skills.All of the items were also validated by experts.Assessment of student skills based on the rubric using levels 0 -4.The table below shows the skill level of students based on their test results,

Develop
The results of this stage were content validation by an expert review and the review on small paper administration, CTKTG item with reliability and validity scale, the level of difficulty and discrimination.
The reviewers argued that the CTKTG items were suitable to assess the targeted CT skills on The Kinetic Theory of Gases.Any feedback from them about the items and some revise all of the items.
Analysis of students' responses showed that there were no significant revisions to CTKTG items.Besides, several relevant answers were found, so that revisions to the assessment guidelines were made.Determine which statement is right! A. If the temperature of the gas in a closed container is bigger than before, so the average velocity of the gas is also bigger than before.B. If the average velocity of the gas is before than before, the pressure of the gas will be smaller than before.

Indicator of CT: Identifying assumptions
Bloom Taxonomy: C5 Predicting Question 8: Rico experimented to determine the relative velocity of the gas.If there are two types of gas assuming the two gases have the same density and pressure.If the volume of container B is twice container A, then determine the relative speed of gas B! The aspect of CT: Implement strategies and tactics Indicator of CT: Bloom Taxonomy: C4 Analysis Question 9: Joni wants to join in the hot air balloon race.He plans to buy several supporting devices such as heating machines.However, Joni was confused about how to determine a good heater, whether it produces the most heat or not.He then concluded that there was no need for the most heat-producing machines.This is because it will cause around the balloon to become hot and wasteful of energy.Also, there is the help of wind encouragement so the hot air balloon can float upward.Determine the problem contained in the statement!Is Joni doing the right thing?Explain Indicator of CT: Choose criteria for considering possible solutions Bloom Taxonomy: C4 Analysis Question 10: A scientist wants to use the ideal gas concept to produce large kinetic energy.Then He calculated to find great energy.If the initial condition of pressure is 100 Pa, the temperature is 300 K and the volume is 1 m 3 , determine the appropriate solution chosen by the scientist.Solution 1: change the pressure to 50 Pa, replace the volume become 0.5 m 3 , make the temperature constant Solution 2: make the pressure constant, replace the volume be 0.5 m3 and reduce the temperature to be 200 K Solution 3: make the pressure be constant and volume, and raise the temperature to 400 K.In your opinion, which solution should be chosen by these scientists to produce large kinetic energy?Analyze the case and give your reason.

Internal Consistency/Reliability
Internal consistency is the most basic part of the measurement which refers to the homogeneity of the items on the test (Hajcak, Meyer, & Kotov, 2017).In other words, homogeneity or internal consistency is a level that shows the extent to which an item can measure the same thing (Davenport, Davison, Liou, & Love, 2015).We measured the internal consistency by Cronbach alpha formula: Where n = number of items, Vt = variance of the total scores and Vi = variance of the item's score.In this test, we found the α =.89 (good) based on Table 6,

Item Difficulty
The difficulty of items is an important parameter for each new item added to the test (Loukina, Yoon, Sakano, Wei, & Sheehan, 2016).It is very important in education for teachers and item makers (El Masri, Ferrara, Foltz, & Baird, 2017).The difficulty of the question is the measure of the percentage of students who answer the question correctly and the value for the index of difficulty range 0% (very difficult) to 100% (very easy) (Tomak, Bek, & Cengiz, 2016).In other words, the difficulty of the item is the comparison of the number of students who answer right from wrong (X.Bai & Ola, 2017)

Item Discriminant
The difficulty of the item is important in maintaining or rejecting the test items given.However, information about item difficulties is not enough, we must also consider discriminatory items (Perkins & Frank, 2018).Item discrimination is very important in determining the quality of the item.This value provides information about the differences in abilities measured by each individual based on the tests made (Khairani & Shamsuddin, 2016).
It is an index that shows how well items can distinguish people with certain levels of ability, especially students in high and low level (Tasca et al., 2016).Ten is used to measure the extent to which an item can predict the overall performance of a test (Xue Bai & Ola, 2017).The following rules of a discriminant level similar to that used by (Quaigrain & Arhin, 2017) as shown in table 11: The discrimination index (ID) is calculated using the following formula(Xue Bai & Ola, 2017), Where Xc is the mean total score for students who have responded correctly to the item; Xw is the mean total score for students who have responded incorrectly to the item; p is the item difficulty for the item and Std is the standard deviation of the total exam scores.The discrimination index is shown in Table 12,

Disseminate
We measure the difficulty level by the test was given to the participant (N = 55).The difficulty indices for the CTKTG items from 0.58 to 0.82.Most items are at a moderate level and the discriminant level is very good.We know that all goods are good items.The value of the validity of the instrument can be obtained from the relationship or correlation between the instrument that was developed with the instrument that already exists and has previously been considered valid.In this study, we use SPSS to determine r-value to show convergent validity (Pearson correlation) and a Kolmogorov-Smirnov.
Test show that test distribution is normal.The summary of r value from SPSS for all items is shown in table 1.Based on r table, we know that with N = 55 and ∝ = .05,r table is = .2241,so all items are valid.
Based on the test, students were given ten questions according to the aspect of critical thinking skills.The result is revealed in Table 12.The table shows the level of their answers in the test, Among the ten questions that administrated on the students, answers of the students in basic classification show the highest mean of 2.84 (average).Moreover, the answers to implement strategies and tactics present the lowest mean of 1.37 (very low).It can be gleaned from the table that the students have a very low level of critical thinking skill (mean = 1.84,SD = 0.32).These findings are similar to (Azis, Muhammad, & Yusuf, 2016) which found that the highest aspect possessed by students was basic classification (3,375) and the lowest advance clarification (1,875).
Based on the reliability scale (α = 0.89), the open-ended form was more effective than others, such as multiple choice only 0.78 (Hwang & Chen, 2017).Similar results found by Harjo, Kartowagiran, & Mahmudi (2019), the internal reliability with the open-ended format of their study shows α = 0.94.Besides that, through open-ended tests, we can explore, explain or confirm students' knowledge more deeply than any other test.We also registered all the items to intellectual property rights (IPR).

CONCLUSION AND SUGGESTION
All items are valid and the test distribution is normal.Item difficulty on level moderate and item discrimination on a level very good.So the CTKTG test is a good instrument for measuring CT skill in the kinetic theory of gases.But, to obtain more valid results, it requires a larger number of respondents and varies from several levels of student education.Based on the results and discussion, the level of students in CT skills is very low.It shows that aspects implement strategies and tactics are the most difficult aspect of students' critical thinking skills and basic classification is the easy aspect.

Indicator of CT
Taxonomy bloom Question Key Guide Score Implement strategies and tactics

C4 Analysis
A scientist wants to use the ideal gas concept to produce large kinetic energy.
Then He calculated to find great energy.
If the initial condition of pressure is 100 Pa, the temperature is 300 K and the volume is 1 m 3 , determine the appropriate solution chosen by the scientist.Solution 1: change the pressure to 50 Pa, replace the volume become 0.5 m 3 , make the temperature constant Solution 2: make the pressure constant, replace the volume be 0.5 m3 and reduce the temperature to be 200 K Solution 3: make the pressure be constant and volume, and raise the temperature to 400 K.
In your opinion, which solution should be chosen by these scientists to produce large kinetic energy.Analyze the case and give your reason.

Solution 3
Based on the concept of average kinetic energy, the greater the temperature has the greater the energy.

Score 1:
If the answer and the reason are wrong.Score 2: • If the answer is correct, but the reason is wrong or not following the key or the answer key.
• If the answer is wrong, but the reason is correct or following the key referred to like the answer key.Score 3: • If the answer is correct, the reason is not in accordance with the key or the answer key.Score 4: • If the correct answer is accompanied by the right reason according to the key or the answer key.
• If the answers and reasons can be categorized correctly but not listed in the answer key.

Figure 1 .
Figure 1.Summary of Research Methodology

Table 2 .
Minimum Value of CVR

Table 3
Component of CT

Table 4
Level of Critical Thinking Skill

Table 5 .
Item of CTKTG Every year, the hot air balloon festival is always held in Europe.All hot air balloons are required to meet good flight requirements.One requirement is to use a quality heater.Participants are prohibited from using a bad heater because it can be fatal during flight.Analyze the focus of the problem in the case above!Give reasons for the problem.

Table 7 .
Items Validity

Table 9 .
.To compute item difficulty of the test using a program existing now (QUEST).The index range difficulty level and the result of the test, as shown in table 9 and table 10, Index Range of Difficulty Level

Table 10 .
Difficulty of Items

Table 11
Index Range of Discriminant Level

Table 12 .
Items Discrimination

Table 13 .
The Level of Critical Thinking Skill of the Students Component