Instrument Analysis of Biology Teachers' Needs to Assess Students' Creative Thinking Skills Using RASCH Model

: This study aimed to investigate the validity, reliability, scale understanding, item difficulties, and bias items of the instrument for analyzing the needs of biology teachers regarding the assessment of creative thinking skills using the RASCH model. The instrument was modified based on the indicators of divergent and convergent thinking processes in the scientific learning stages integrated with the dimensions of creative thinking skills by using ADDIE research model. The research participants comprised of 104 participants from the Solo Raya area, those of whom were 64 Senior High School Biology teachers and 40 Junior High School Science teachers. The research instrument used inventory with a Likert scale of 1 to 4. The instrument validity, reliability, scale understanding, item difficulties, and bias items were analyzed by applying the RASCH model using Winstep 3.73. The research results showed that the overall validity was acceptable, and the item validity did not need improvement. Overall reliability was very good, and the item reliability was excellent. Rating scale analysis showed respondents had understood the Linkert scale of 1 to 4 well. Based on the item difficulties results of the teachers' responses, it was found that there had not been any teachers who evaluated the indicators of creative thinking skills by using writings or pictures in problem-solving assessment. The bias test results on the instrument items indicated that five items could potentially be biased due to age difference, and the two others were due to gender type. Therefore, the development of assessment instruments for creative thinking skills with scientific and social problem-solving based assignments, as well as writing and visual expressions, is required.


INTRODUCTION
Creative thinking skills are the ability in creating, implementing, communicating, and working creatively with others (Tran et al., 2017;Trilling & Fadel, 2009). The developing of people's creative thinking skills has actually become one of the educational goals in the 21st Century and, therefore, it is also essential to evaluate creative thinking skills (Lucas, 2016;Sugiharto et al., 2019;Tran et al., 2017). One of Several international institutions that much concern to evaluate creative thinking skills is the Global Innovation Index (GII) as a global-level measuring organization for innovation (Dutta et al., 2020). Based on the results of the GII evaluation in 2022, Indonesia places the second quarter, ranked 87th out of 132 countries (Dutta et al., 2020). Such result indicates that the scores of the Indonesian institution, human resources and research, business sophistication and innovative products are below the average (Dutta et al., 2020). These results show that the human resources' creative thinking skills to innovate in Indonesia are still low (Dutta et al., 2020).
Due to the importance of measuring creative thinking skills, the need analysis of Biology teachers to assess creative thinking skills is essential. The indicators of creative thinking skill assessment can be used as references to recognize someone's creative thinking skills (Sarriot et al., 2014;Yustina et al., 2020). The indicators of fluency, flexibility, originality, and elaboration ( FFOE ) have long been used to measure an individual's creative thinking skills. The FFOE indicator focuses on divergent thinking processes, which is a thinking process involving imagination in creating innovation (Guilford, 1975;Jia et al., 2017;Oppezzo & Schwartz, 2014;Runco et al., 2017;Runco & Acar, 2012). Fluency is the ability to mention as many relevant ideas as possible (Guilford, 1975;Hass, 2015;Runco & Albert, 1985;Zhou et al., 2020). Flexibility is distinguishing and classifying ideas from different points of view (Jasim Mohammed & Ati Daham, 2021;Lia D Rubenstein et al., 2019). Originality is the novelty of the idea or product that has been created ( Bart et al., 2017;Guilford, 1975;Hass, 2015). Elaboration is the ability to detail the construct of ideas until a solution is found (Bart et al., 2017;Oppezzo & Schwartz, 2014;Lia D Rubenstein et al., 2019;Runco et al., 2017).
Creative thinking skills require the ability to evaluate ideas from divergent thinking processes (Barbot & Lubart, 2012;Catarino et al., 2019;Hass, 2015;Vally et al., 2019) and convergent thinking processes which is the ability to evaluate (Charyton et al., 2011;Oppezzo & Schwartz, 2014). The indicators of convergent thinking processes found in creative thinking skills involve the usefulness, evaluation, and improvement (Benedek et al., 2006;Shu-Chen et al., 2020;Vally et al., 2019). The usefulness is the ability to explain the utility of new ideas or products (Benedek et al., 2006;Charyton et al., 2011;Lia D Rubenstein et al., 2019). The evaluation is the ability to evaluate the advantages, disadvantages, and possibilities of implementing new ideas and products (OECD, 2021;Shu-Chen et al., 2020). The improvement is the ability to fix problems and improve new ideas or products (Nuswowati et al., 2017;OECD, 2021;Vally et al., 2019). The integration between divergent and convergent thinking indicators used to identify individual creative thinking skills has to be adjusted to the learning stages.
Evaluating creative thinking skills in biology covers several stages and dimensions (Runco et al., 2017;Zubaidah et al., 2017). These stages include formulating problems (Runco & Acar, 2012), formulating a hypothesis (Phungsuk et al., 2017), conducting experiments (Nickerson, 2014), and solving problems (Romero et al., 2017;Simper, 2018). The dimension of creative thinking skills can be found in problemsolving by using scientific and social approaches (OECD, 2021;Plucker et al., 2014;Runco et al., 2017), and it can also be found in the ways individuals express themselves by using pictures and written forms (He et al., 2017;Listiana et al., 2016;OECD, 2021;Runco et al., 2017;Watson, 2018). However, the instrument for assessing creative thinking skills which can accommodate divergent and convergent thinking processes in natural science, especially biology, has not been completely formulated. Therefore, this instrument development is necessary to do. Instrument development activities that begin with making a need analysis instrument on how to measure creative thinking skills are currently being carried out.
The assessment of creative thinking skills in science education using the FFOE indicator in divergent thinking processes is still used today. The measurement of creative thinking skills carried out by Jumadi et al. (2021) used indicators of divergent thinking with a test for high school science students. The instrument was validated using the content validity ratio method. Madyani et al. (2020) measured four indicators of divergent thinking with a test for junior high school science students. The test results used descriptive analysis, showing that the originality is very low. Rudyanto et al. (2019) analyzed the validity and reliability of creative thinking skills using the FFOE indicator through a descriptive analysis of mathematics subjects. So far, the measurement of creative thinking skills only uses divergent thinking and focuses only on students. Meanwhile, the teacher's approval as a facilitator and a evaluator in doing creative assessments is unknown and lack of attention.
The method often used to find out how teachers measure creative thinking skills is through interviews. Matraeva et al. (2020) and As'ari et al. (2019) interviewed teachers about students' creative thinking skills. Interviews can be used to determine the extent to which creative thinking skills have been achieved. However, it is only conducted in a small number of samples. The analysis of the instrument measurement in teacher need analysis activities turns out only to use descriptive analysis, so it does not use valid and reliable testing.
Based on the need for an initial analysis of the development of creative thinking skill instruments, the researcher makes an instrument for teacher needs that integrates divergent and convergent thinking indicators in the steps and dimensions of creative thinking skills. The feasibility of the teacher's tool requirements requires a validation. The instrument by which the teacher needs to evaluate creative thinking skills requires a series of tests to ensure its reliability of the instrument (Baer et al., 2014;Chevalier et al., 2020;Runco & Acar, 2012;Simper, 2018). Thus, the novelty of this study is to analyze the need analysis instrument by making use of the RASCH model.
Accordingly, in this study, the instrument's reliability is analyzed using RASCH (Nielsen, 2018;Sumintono, 2018). The RASCH model is a statistical approach used to measure performance, perception, and attitude (Bonsaksen et al., 2013;Nielsen, 2018). The evaluation of creative thinking skills using the RASCH model has more advantages than classic test theory because it could increase the evaluation quality in quantitative and qualitative studies (Chan et al., 2014). Several advantages of using the RASCH model are: (1) it generates linear and onedimensional scale; (2) it needs suitability between data and measurement models; (3) it can count error standards; (4) it can estimate the person size as well as the difficulty level of the statement item through the linear scale which is similar to the standard units (logs); and (5) it can check the evaluation system logically and consistently (Planinic et al., 2019).
The RASCH model can analyze the evaluation instrument based on several parameters. For the advantages of the RASCH model, an instrument has to be tested for its reliability, validity, discrimination power, appropriateness, and difficulty level using the RASCH model (Nielsen, 2018;Sumintono & Widhiarso, 2015). The stages are crucial to obtain a reliable evaluation instrument. Therefore, it is necessary to conduct an instrument analysis on the teacher's needs to evaluate creative thinking skills in which the instrument's reliability is analyzed using the RASCH model. Thus, this study aims to analyze the validity, reliability, scale understanding, item difficulties, and bias items of the instrument for analyzing the needs of biology teachers related to the assessment of creative thinking skills by using the RASCH model.

METHOD
This research was conducted as one of the stages of ADDIE research and development. The participants of this research were the 64 Senior High School Biology teachers and the 40 Junior high school Science teachers. The characteristics of the respondents based on age and gender are presented in Table  1. The research was carried out in the Surakarta Residency area, Central Java Province, from July to December 2021. The teacher needs an analysis instrument that validates the teacher's response. Data collecting technique uses the teacher's needs for an instrument to evaluate the divergent and convergent thinking processes on creative thinking skills in the stages of Biology learning and science learning in schools. The inventory instrument contains statements with a Likert scale of 1 to 4. Data are collected by utilizing the google form application.
Divergent thinking indicators include fluency, flexibility, originality, and elaboration. Meanwhile convergent thinking indicators comprise of fullness, evaluation, and improvement that can be found at the stages of formulating problems, formulating hypotheses, conducting experiments, solving problems, and how students express their learning outcomes as well. The instrument then is analyzed using the RASCH model with the Winstep 3.73.
The analysis stage begins with testing the validity. Validity testing includes overall validity using summary statistics, item validity using item: fit order, and construct validity. The analysis of the instrument's reliability is reviewed using Cronbach's alpha value and the reliability of the items in the statistical summary test. Respondents understand the scale using a partial credit rating scale and a probability curve. Items are analyzed by making a logit ruler which is used to classify item difficulty based on the logit ruler and wright map. The bias items are analyzed using DIF tables and plots.

RESULT AND DISCUSSION Instrument Validity Analysis
The results of the validity test are divided into two, namely the validity of instrument overall and item (Planinic et al., 2019). The results of the analysis are presented in Table 2. The results in Table 3 of the instrument validity analysis from the summary statistics show whether the instrument is valid for use or not (Runco & Acar, 2012). Based on the value of the outfit MNSQ item (statement items), the instrument is appropriate to be used for evaluation because the result shows that 1.03 is close to the ideal value of 1.00. Based on the value of the outfit ZSTD item and person, the instrument shows that the data have a logical estimate because the results show that 0.1 is close to the ideal value of 0.00 (Sumintono & Widhiarso, 2015).
Based on AERA & APA, strong validity has the evidence and response validity is the instrument's reliability when the respondents respond. The instrument's validity has been used expert judgment, then directly used to test. Validity testing in the RASCH model informs the quality of the instrument so that validity testing is now more reliable (AERA & APA, 2014). The results of the item dimensionality test can be seen in Table 3 which shows that the construct validity of the instrument has good criteria. The results in the unexpected variance 1 st contrast of PCA residuals point out good criteria, indicating that all statement items show appropriateness. The result is unidimensionality. It means that the instrument can measure the range of variables or the teachers' responses towards the teacher needs to measure creative thinking skills.
The construct validity of the content variable of the instrument has been able to measure what you want to know. Using the RASCH application model can determine the instrument's construct validity. Based on the research conducted by Madyani et al. (2020), the construct validity has not been analyzed, so this test has a novelty.
The reliability test results on the instrument can be seen from the appropriateness analysis of the statement items used to know which statement item is a misfit. The appropriateness analysis uses the item: fit the order in Table 4. The results show that all statement items can be used to measure the responses (Dahlgren et al., 2017).  Table 4, all items do not require revision to fulfill these criteria. The RASCH analysis model can direct instrument makers to revise items or statements that are not appropriate so that the items have reliability in measurement. Table 4 shows the results of the reliability test. The instrument as a whole with alpha Cronbach value 0,91 has a very good category. The reliability of the statement items is 0,99. It has an excellent category (Sumintono & Widhiarso, 2015). The instrument has consistent results if tested in a population (Plucker et al., 2014;Runco & Albert, 1985). The grouping of the statement items has an excellent category because there are 14 categories of the difficulty level of statement items in the instrument. The grouping of the respondents has a good category because there are five levels of respondents' abilities. Thus, the instrument can be used to know the grouping of the statement items and respondents in evaluating creative thinking skills during the learning activities (Göçmen & Coşkun, 2019; Sumintono & Widhiarso, 2015).

The Rating Scale Understanding Analysis
The evaluation of the rating scale (1, 2, 3, 4) can be seen from the peaks of each scale on the probability curve Figure  2. The image shows separate peaks. Table   6 shows that the value of the logit rating scale has increased from the rating scale (1, 2, 3, 4) with the appropriate difference. The respondents understand the scale well (Sumintono & Widhiarso, 2015).  Respondents' understanding of the scale can be seen through statistics on the RASCH model. The scale is only analyzed through a descriptive analysis when the respondent fills in all the questions. The researcher concludes that the respondent can understand the rating scale well. So far, the understanding of the scale has not been tested. A poor probability curve can be used to analyse the shortcomings of the scale, for example, by reducing the scale range or eliminating a meaningfully neutral rating.

The Difficulty Level of the Statement Items Analysis
The evaluation of the difficulty level of the statement items was carried out using item measures. The statement items' separation or difficulty level was determined by adding the average value with the standard deviation (0.00+1.90= 1.90) used for making the log bar. The log bar can be used as the criteria for determining items that are difficult to approve, approve, and easy to approve in Figure 4. The values of the measure Items and persons need to be considered to know the response given. The results of the person's measures show an average value of (M) -0.2 logit under measure items, which is 0.0 logit. Thus, the teacher's ability to respond is below the average level of difficulty of the standard statement items (Sumintono & Widhiarso, 2015). The logit bar in Figure 3 is then integrated into the measure items in Table  7 and the wright map in Figure 4 to know the classifications of the statement items.
The results show that all teachers do not approve of the evaluation to the dimension of expression method of using pictures and writings. Therefore, the development should accommodate the evaluation of creative thinking skills on ways of expression using pictures and writings (He et al., 2017;Listiana et al., 2016;OECD, 2021). It is stated that biology teachers did not evaluate the creative thinking skills on ways of expressions using pictures and writings because Biology is a natural science. Therefore, teachers more focus on the answers with scientific thinking processes (Rodríguez et al., 2019;Sugiharto et al., 2019;Yustina et al., 2020).    The evaluation of creative thinking skills in problem-solving using social and scientific approaches becomes the dimension not approved by the teachers (Putranta & Supahar, 2019). It is indicated that natural science only focuses on a scientific approach (Chien, 2017; Putranta & Supahar, 2019). However, problemsolving may also implement a social approach, namely the human behavior analysis as well as the effort of the community, government, and social institutions. Therefore, the evaluation instrument development should be able to differentiate between problem-solving using scientific and social approaches (Afacan, 2018). Only a few teachers approve the evaluation of creative thinking skills in the experiment stage, so a special evaluation instrument is needed (Runco & Albert, 1985;Sarriot et al., 2014;Vergara et al., 2018).
The evaluation of creative thinking skills that was carried out was the stages of formulating the hypothesis and problem. This is because in general Biology teachers have already evaluated the hypothesis and problem formulation. Thus, the instrument development should be able to measure the ideas generated and the ability to formulate a hypothesis as well (Sternberg et al., 2020). The research results indicate that few teachers approve the convergent indicators which include usefulness, evaluation, and improvement. Therefore, some items which accommodate the evaluation of the indicators in the evaluation instrument are necessary (Oppezzo & Schwartz, 2014).

The Analysis of Age and Gender Bias towards the Statement Items
The bias test results using DIF in Figure 5 and Table 8 show probability with criteria for bias < 0.05. Five biased items were found, viewed from the age factor, namely 1G (evaluation to fluency) and Item 4G (evaluation to elaboration) on the way of expression using pictures. The difference in the teachers' perception occurs because evaluation using pictures in Biology learning process is considered ineffective. The older teachers with more than ten years of the working period have many experiences in using evaluation instruments. However, they have not evaluated creative thinking skills on the aspects of fluency and elaboration with pictures and writings (Lia D Rubenstein et al., 2019). Teachers with little experience, whose ages are under 25 years old, tend to give their approval because they have higher motivation to develop learning activities and they are open to the ideas of innovation. In this case, the evaluation of fluency and elaboration is conducted on the way of expression using pictures (Zubaidah et al., 2017).
The evaluation of fluency and elaboration using pictures in Biology learning can be carried out in the class by implementing a learning model with time flexibility such as a blended learning (Sugiharto et al., 2019;Tan, 2009). In the blended learning, the evaluation can accommodate instruments in the form of pictures. Creative thinking evaluation in the aspect of fluency can be realized with an instrument expressed in the form of lines, symbols, or pictures related to the content of Biology (He et al., 2017;Sternberg et al., 2020;Watson, 2018;Zhou et al., 2020). The evaluation of elaboration can be seen from the details or linkages between the illustrated pictures created by the students.  Based on Figure 5 and Table 8, bias on item 2E (Evaluation of flexibility) on problem-solving using a social approach occurs because late-middle-aged and oldaged teachers disapproved. However, adult-aged teachers and middle-aged teachers approved. In this sense, the social environment factors are very influential towards the diversity of the teacher's point of view (Boelens et al., 2018;Lisa D Rubenstein et al., 2018). The teachers who interact actively with society have better understanding and opportunity to identify behaviors causing problems such as the environmental pollution, habitat destruction, and loss of biodiversity (Huang et al., 2019). The teachers should be able to make innovations towards social issues related to the evaluation of flexibility using various social approaches (Boelens et al., 2018). Different perceptions may occur due to the different social interactions among different age of teachers (Lisa D Rubenstein et al., 2018).
Bias on item 3F (Evaluation of originality) on the way of writing expression occurs because elderly teachers, early adult teachers, and late adult teachers do not approve. However, the late-teenager teachers and the earlyold teachers approved. These results may have occurred because of the diversity of the teachers' ways of evaluating students' writings. One of the factors influencing the evaluation of the students' writings is teachers' subjectivity. Some teachers evaluate the appropriateness of the report's title without considering the novelty or the originality of the student's writing. For example, sometimes teachers give maximum scores to intelligent and diligent students without checking the originality or novelty of the student's ideas (Gralewski & Karwowski, 2019).
Furthermore, teachers' experiences become the other factors influencing the evaluation of the student's writing because the teachers with better experience are more familiar with the student's writing so that they can check the students' writing better (Schoevers et al., 2019;Wang et al., 2018;Watson, 2018). The evaluation of originality in writing can be done by checking the originality of a student's writing and the novelty of the student's ideas (Kafipour et al., 2018). Some methods can be used to express ideas in writing, such as explanation texts. Explanation text writing tasks enable teachers to identify the originality of students' writing and ideas due to explanation text written based on the construction of students' knowledge (Göçmen & Coşkun, 2019). In this respect, Turnitin can be used as a means of checking the originality of the student's writing if the teachers' experience has not yet been sufficient (Matheson & Starr, 2013).
Bias on item 5B (evaluation of usefulness) on formulating hypothesis occurs because late-teenager teachers only approve, while the other age-range teachers are easy to approve (Sumintono & Widhiarso, 2015). These indicate that teachers of all ages have paid attention to acknowledge the usefulness of the hypothesis formulated by the students. Usefulness emphasizes on the ability of individuals to mention the purpose of hypothesis testing (Charyton et al., 2011). Biology learning is natural science knowledge which emphasizes on scientific steps, so that hypothesis formulation requires creative thinking skills. Contextual learning can be used to give opportunity for the students to formulate hypotheses based on the real problems.
Elaboration on problem-solving with a social approach can be seen from the order of the explanation of the problem solving analyzed from a social perspective, for instance, the correlation between the behavior of the causes of problems, social analysis towards a case, and the selection of problem-solving (Bart et al., 2017). Based on Figure 5 and Table  9, four items are potentially genderbiased. This is indicated by the probability value below the significance value< 0.05. Item 4D shows that women can easily approve of the elaboration evaluation on problem solving with a social approach. This difference of perception may occur because female teachers tend to have higher motivation in socialization than male teachers (Webb & Rule, 2014)  Percentage Based on Table 9 and Figure 6, bias in item 6A occurs because men are easier to approve than women. This difference of perception between male teachers and female teachers may occur due to several factors such as motivation for socialization, teaching motivation, and psychology which are different between male and female teachers. Thus, these differences cause a difference in male and female teachers' teaching and evaluation styles (Wu et al., 2019). Evaluation assessment has been done in problem formulation (Zubaidah et al., 2017). The purpose of evaluating the students in formulating new problems is to foster the students' comprehension (Shu-Chen et al., 2020). As such, the teachers should give the special assignments to measure how the students evaluate their ideas (Bedir, 2019).

CONCLUSION
Based on the results of this research, it can be concluded that the teacher response instrument has good criteria to be applied to know the teachers' approval to have conducted evaluations on students' creative thinking skills in scientific learning stages and creative thinking dimension. Based on the analysis, the overall validity is acceptable, and the item validity does not require improvement. Based on the analysis, the overall reliability is very good, and the item reliability is excellent. Rating scale analysis shows that respondents have understood the Linkert scale of 1 to 4 well. Then, based on the item difficulties results of the teachers' responses, it is found that there have not been any teachers in evaluating the indicators of creative thinking skills through students' expressions using writing or pictures in problem-solving assessment by using scientific and social approaches. The bias test results on the instrument items indicate that five items could be biased due to age differences and two could be biased due to gender types. Thus, the development of assessment instruments to measure creative thinking skills with scientific and social problem-solving based assignments as well as writing and visual expressions, is definitely needed.