Modeling of Rice Production in Indonesia Using Robust Regression with The Method of Moments (MM) Estimation

Indonesia is an agricultural country with the majority of the people making rice which is then processed product of rice as a staple food. However, in the last few years, rice production in Indonesia has decreased. Rice production data with influencing factors, namely, rice harvest area, land area affected by plant pests (OPT), rainfall, the population in Indonesia have outliers and have residuals that are not normally distributed so that regression analysis with the least-squares method cannot be used to estimate the amount of rice production. A robust regression model with Method of Moments (MM) estimation is used to solve outlier problems and violations of normality assumptions. This study aims to determine the robust MM estimation regression model to estimate rice production in Indonesia and determine the factors that significantly influence. The robust regression model of MM estimation on rice production in Indonesia shows that the increase in the amount of harvested area (𝑋 1 ) , the land area is exposed to plant pests (OPT) (𝑋 2 ) and the population (𝑋 4 ) will increase the amount of rice production, while the rainfall (𝑋 3 ) will reduce the amount of rice production with a high level of confidence. The variable harvested land area (𝑋 1 ) and the population (𝑋 4 ) has a significant effect on the amount of rice production. Based on the results obtained, it is hoped that there will be policies that consider factors that influence rice production to increase the amount of rice production in Indonesia.


Introduction
The staple food of Indonesian people is rice, the rice plant becomes the most promising agricultural sector. The average rate of rice consumption per capita in Indonesia is 111.58 kilograms each year. The average rate of rice consumption per capita in Indonesia is 111.58 kilograms each year (Timorria, 2019). However, the rice plant production in Indonesia during 2019 decrease by 2,63 million tons compared to the previous year (Badan Pusat Statistik,2020).
Many factors can affect the amount of rice plant production. The area of rice harvested can be a factor that is related to the amount of rice that can be produced. The rice plant production can decrease if the area's harvest is low that can be caused by an attack by plant-disturbing organisms (OPT). Rainfall becomes a climate factor that affects the plants the most (Triatmodjo,2010). It turns out that rainfall can affect rice plant production. Besides the factors that can affect the plants, outside factors may also affect the rice plant production. One of them is the total population. The higher the total population, the higher the demand for staple food which can trigger more rice production. along with the growth of the number population, the total land area will experience a decline (Syaifuddin et al., 2013). The conversion of agricultural land into residential land also possibly happen.
Regression analysis is a method that can be used to know the relation between dependent and independent variables. A regression model is an application of a linear model where dependent variable can be affected by one or more quantitative variables as the factors or independent variable (Freund et al., 2006) Independent variable which the writer wants to know the relationship with the dependent variable is unlimited because a dependent variable can be connected to many independent variables. The parameter estimation that is usually used in regression analysis is the ordinary least-square (OLS).
The factors that can affect the rice plant production and rice plant estimation can be determined by using regression analysis. Rice plant production data in Indonesia has outliers and violates the assumption of normality so that OLS is not possible to be used.
According to Freund et al. (2006) robust regression is one of many solutions to analyze data with affecting outliers, the model can be tough on the outliers. Robust regression can solve the deviations in the least-squares method. The data will not be normally distributed if it has outliers. Robust regression is a method that can be used when the data has outliers so that the data is not normally distributed (Olive, 2005). In this case, a robust regression model can be used. When a researcher applies a regression model in testing classic assumptions when the assumptions are violated while the transformation cannot eliminate or weaken the outliers which cause the bias, robust regression is the best method in this case (Susanti et al., 2014).
Robust regression MM estimation is a method that combines high breakdown value estimation with MM estimation. MM estimation method has a 50% high breakdown value 95% efficiency rate (Wilcox, 2005). Therefore, robust regression MM estimation can give a good prediction model in rice plant production in Indonesia. Candraningtyas, et al., (2013) have researched handling outliers in multiple linear regression using robust regression estimation-MM using generation data from Minitab software with the results that there are three outliers obtained and a model that can be robust to outliers. Harini et al. (2019) has used linear regression analysis to determine the effect of agricultural land area on rice production in North Kalimantan. The result of this research is that the area of agricultural land has a significant effect on rice production. Ishaq et al. (2017) has researched rice production in East Java and found that harvested area and rainfall have a significant effect. This article will discuss modeling using robust regression MM estimation for rice plant production in Indonesia during 2019 with harvest area, the area of land affected by OPT attack, rainfall, and total population as the factors. The model obtained can later be used to predict rice plant production to increase the rice plant production in Indonesia for the following years.

Material and Methods
This study uses secondary data obtained from BPS and the Ministry of Agriculture. The data used is in the form of data on the amount of rice production, the area of harvested land, the area of land affected by the pest attack, rainfall, and the number of population in Indonesia in 2019 in each province and using robust regression MM estimation for modeling rice plant production in Indonesia during 2019. The data used are 34 with five variables including independent and dependent variables. Data on the amount of rice production is used as the dependent variable while the area of harvested land, the area of land affected by the pest attack, rainfall, and the number of population is used as the independent variable. The average rice production in Indonesia in 2019 is 1606001 tons where the highest rice production is in Central Java Province with a total rice production of 9655654 tons and the lowest is in Riau Islands Province with rice production of 1151 tons.

The simple linear regression model
Simple linear regression describes the relation between a model with one dependent variable and one independent variable. A regression model is the application of a linear model in which the response or dependent variable is identified by the numeric value of one or more quantitative variables that are called factor or independent variable (Freund et al., 2006). A simple linear regression model can be written as follows: = + + Where: , : regression coefficient parameter Y: dependent variable X: independent variable : error

Multiple linear regression model
There will be some problems having multiple variables independent variables 1 , 2 , … , with one dependent variable . Multiple linear regression model can be used to explain linear functional relation between independent variables 1 , 2 , … , with dependent variable . Multiple linear regression models can be written as follows: The estimator model for the (2.1) equation is: ̂=̂0 +̂1 1 + ⋯ +̂ (2) Where: ̂= estimator of ̂= estimator of = independent variable The estimator equation of linear regression can be written in the form of a matrix as follows: OLS estimator can be determined by minimizing the sum of the squares of the residual for each linear regression model. The estimator which comes from this least square method is supposed to be the Best Linear Unbiased Estimator (BLUE).

F-test and t-test
F-test is a concurrent test to find whether the independent variables affect the dependent variable in general. The result of the F-test shows that the independent variables affect the dependent variable in general if the p-value is smaller than α or the F-value is greater than F-value in the table.
BASC 2021 82 F value can be found by using this formula: = Hypothesis testing: i.
The hypothesis used: H0: the independent variable does not have a significant relationship with the dependent variable H1: the independent variable has a significant relationship with the dependent variable ii. Critical area: H0 is rejected if the value > How to calculate F-table is by using the first degree of freedom ( 1 ) = − 1 and second degree of freedom ( 2 ) = − , with as the number of independent variables and n as the total data (Nugroho, 2005) The T-test is a partial test to find whether independent variables individually affect the dependent variable. The result of the t-test shows that independent variables individually affect the dependent variable if the p-value is smaller than α, or count is higher than table.
t value can be found by using this formula: = the number of independent variables = element (j+1)th diagonal ( ' ) −1 S = root mean square error Hypothesis testing: i.
The hypothesis used: H0: the dependent variable has no significant effect on the independent variable linearly H1: the dependent variable i has a significant effect on the independent variable linearly ii. Critical area: H0 is rejected if the value > How to calculate F table is by using the first degree of freedom , with as the number of independent variables and α=5%.

Classic assumption test
A model is considered good and usable after a classic assumption test for the data obtained. Classic assumption test is used to find the data deviation in research.

Normality test
Normality test aims to find if residual in a regression model is normally distributed. T-test and F test that is used to test the feasibility of the model is following a normal distribution. If the normality assumption is violated, the statistical test cannot be performed. Two ways can be used to find whether the residual is normally distributed or not by using graphic analysis and statistical tests. A graphic analysis is not so recommended because there may be differences between one observation and others. The statistical test that is often used is by using Kolmogorov-Smirnov test.
Hypothesis testing: i. The hypothesis used: H0: errors are normally distributed H1: errors are not normally distributed

Non-autocorrelation test
The non-autocorrelation test aims to find the correlation between residual and period with residual on − 1 period or previous periods in linear regression. Autocorrelation is an assumption violation on the linear regression method which states that there is no correlation among residuals in different observations (Tinungki, 2016).
Durbin-Watson test is used to test whether there are autocorrelation symptoms. The conclusion drawing in Durbin-Watson test can use the following table:

Homoscedasticity test
The Homoscedasticity test aims to find whether there are variance differences from residual of one research to another. Heteroscedasticity happens when ( 2 ) has positive value while homoscedasticity happens when ( 2 ) = 0 (Klein et al., 2016) When the data is homoscedasticity, every observation on the dependent variable contains important information so that the observation in OLS has a similar value. Heteroscedasticity will cause the observation to have more information compared to others so that the observation must have more value than the others (Rawling et al., 1998).
There are many ways to do a heteroscedasticity test. .

Non-multicollinearity test
The non-multicollinearity test aims to see whether a regression model correlates with its independent variables. Daoud (2017) said that when there is a correlation among independent variables, the standard residual from the independent variable coefficient will increase and cause the variance of the independent variable coefficient to increase.
VIF value can check the multicollinearity on the regression model. VIF is used to measure and find the increased variance. VIF is calculated by using the software as part of regression analysis and will appear on the VIF column as part of the output (Daoud, 2017). The following formula can also be used: = 1 1 − 2 where: j = 1,2,....k k = the number of dependent variables 2 = coefficients of determination If the VIF value is more than 10, the data indicates that there is multicollinearity

Outlier detection
Outlier is data that is different from others. It can be smaller or greater. An outlier may affect the regression model. Outlier is an observation data that does not follow the dominant pattern and is far from the center of data (Widodo & Dewayanti, 2016). Outlier affects model regression. It makes outlier cannot be eliminated. Therefore, the residual in the model is not normally distributed. By identifying outliers, the researcher can gain important information to help in a better conclusion drawing about the data (Wang et al., 2019).
There are many ways to detect outliers. One of them is by using the Difference fitted of FITS (DFFITS) value. DFFITS is an affecting data measurement which is introduced by Belseley, Kuh dan Welsch in 1980, where observation data elimination measurement affects ke-and also predicted values (Wang et al., 2019). The DFFITS value can be defined as follows: is the -th error, JKG is the sum of squares of the errors, is the number of independent variables while ℎ is the -th diagonal element of ′ ( ′ ) − .
An outlier can be found if | | > 2√ where is the number of parameters in the model and is the number of observations.

Robust regression
Robust regression is one solution to analyze data that contain affecting data like outlier so that it results in robust model or strong toward outlier. Robust regression can solve the deviations in the least square method. When a researcher sets a regression model and classic assumption test, the assumption is violated while the transformation cannot eliminate or weaken the outlier influence. This state results in bias prediction. In this case, robust regression becomes the best method (Susanti et al., 2014).

MM estimation
MM estimation is first introduced by Yohai in 1987 This method combines high breakdown value estimation with M estimation. Both methods have a high breakdown value and better statistic efficiency than S estimation. Estimator S with a 50% high breakdown point is used as an early MM regression predictor (Gschwandtner & Filzmoser, 2012). MM estimation method estimates high breakdown value and high efficiency in a regression model with normality faults (Yohai, 1987).
The steps to get the MM estimator are as follows: 1. Determine the estimated regression coefficient on the data using the least-squares method.
2. Perform a classic assumption test on the regression model that has been obtained. f. Estimating the parameter ̂ using the WLS method with a weighting of 0 . g. Repeating steps c-f until you get a convergent value of ̂. h. Conducting hypothesis testing to determine whether the independent variable has a significant effect on the dependent variable.

Results and Discussion
This study uses data from BPS and the Ministry of Agriculture. The data used is in the form of data on the amount of rice production, the area of harvested land, the area of land affected by the pest attack, rainfall, and the population in Indonesia in 2019 in each province. Therefore, the harvested area ( 1 ), the area affected by the pest attack( 2 ), rainfall ( 3 ) and population ( 4 ) were used as factors to determine the regression model for the amount of rice production ( ). The regression model estimation using the ordinary least-square (OLS) is as follows: ̂= −67759,682 + 4,976 1 + 3,473 2 − 74,159 3 + 28,746 4 With an R-sq value of 99.5% The assumption test is carried out to see whether the model can be used to estimate the amount of rice production ( ). From the results of the assumption test, the three assumptions of homoscedasticity (p-value = 0,6942), non-multicollinearity (D = 1,8612) and non-autocorrelation BASC 2021 86 (VIF = 7,626;5,419;1,090;7,922) were fulfilled, only the normal assumptions (p-value = 0,00) were not fulfilled. There are 3 outliersdata, namely data 8,13, and 20.
The p-value of the F test on the model is 0.00 which shows the p-value <0.05, which means that the linear regression model is good. Because the assumption of normalcy cannot be fulfilled and there are outliers, an estimate is made using robust regression with the MM-estimation method. The robust regression model with the MM-estimation method is as follows: ̂= −68803,846 + 4,997 1 + 3,465 2 − 73,344 3 + 28,685 4 With an R-sq value of 99,56% The model shows that an increase of 1 hectare of the harvested area will increase 5.064 tons of rice production, an increase of 1 hectare of land area affected by pest attacks will decrease 6,124 tons of rice production, an increase of 1 mm of rainfall will decrease 22,31 tons of rice production, and an increase in one thousand inhabitants will increase 75,44 tonnes of rice production. From table 2 it can be seen that the variable harvested land area ( 1 ), and the area of land affected by the population ( 4 ) has a significant effect on the amount of rice production ( ), while OPT attacks ( 2 ) and rainfall ( 3 ) has no significant effect on the amount of rice production ( ). The results of the partial test show that only the variables of harvested area and population have a significant effect on the MM-estimation robust regression model that has been obtained, the same as the results using OLS.
The model to predict using OLS cannot be used because there are outliers and the assumption of normality is not met so that the results of the significance test cannot be used. This study uses robust regression estimation MM to overcome it so that the model can be used to predict rice production.
Outliers in rice production are Central Java, Lampung, and West Kalimantan. This can be caused because the production in the area has a number that is quite different from other areas, for example, Central Java which has the highest rice production. This causes OLS cannot be used to predict rice production. with two significant variables also influence the prediction model so that it has an R-sq value of 99.56% which means the level of confidence is quite high.

Conclusion
The robust regression model with the MM-estimation method to predict the amount of rice production in Indonesia is: ̂= −68803,846 + 4,997 1 + 3,465 2 − 73,344 3 + 28,685 4 with an R-sq of 99.56%, which means that the independent variable can explain 99.56% of the dependent variable while the remaining 0.44% is influenced by other variables not included in the model. The robust regression model of MM estimation on rice production in Indonesia shows that the increase in the amount of harvested area and population will increase the amount of rice production, while the land area is affected by plant pests and rainfall will reduce the amount of rice production.