Survey of Livestock Counting and Tracking Methods

Conventional livestock counting methods are tedious, time-consuming, and labor-intensive for farmers, which makes counting an irregular task. This in-ability to constantly monitor stock numbers gives farm rustlers sufficient time to steal farm animals and hence leads to a significant financial loss an-nually. To overcome this issue, research using unmanned aerial vehicles (UAVs) in pastoral farming is growing progressively since the last decade, but their use is still limited to some extent. This research article gives a detailed analysis of the existing hardware-based and software-based methods. The impact of shifting from current methods to a UAV-based system is discussed using the findings of research articles, and algorithms for object detection in images and tracking in videos are also analyzed. The article concludes that there are still unexplored practical uses of UAV in pastoral farming for monitoring, counting, and tracking farm animals, especially in countries like New Zealand, where pastoral products cover a major portion of export revenue.


Introduction
Fast pasture growth makes dairy farming low cost, sustainable, and efficient for New Zealand to compete well as an exporter of food and fiber. According to the statistical survey of New Zealand for the year 2016, 55,473 farms have an average area of 252 hectares each and 44% are of sheep or beef, 21% are dairy, and 6% are mixed farms (Beef and Lamb New Zealand, 2019). Sheep revenue alone contributes 48% to the gross farm revenue and total pastoral products covered almost 44% of the actual export revenue of New Zealand in 2017. For the year ended 30 September 2017, approximately 19.5 million lambs, 3.7 million sheep, and 2.4 million cattle are processed to produce 362,000, 94,000, and 633,000 tons of meat, respectively. While deer and cattle farming has increased in the last years, sheep farming has declined by 0.8% (Beef and Lamb New Zealand, 2019). The reason behind this decline is the requirement of extra labor and effort in handling sheep farms.
With the advancement in pastoral and agricultural techniques, a variety of data is available to farmers, such as expected climatic changes, area temperatures, soil dampness values, pasture covers, and individual stock identification for growth rates monitoring. Yet most of the tasks done in livestock handling are labor-intensive, and counting paddock animals is one of them. Counting is usually done only fewer times a year and thus makes stock information infrequent. It also seems disruptive for the farm animals because they have to pass through narrow choke points for this purpose. There are many cases where sheep and especially pregnant ewes were stolen from farms, and farmers found out about their financial loss after two to three months in the next sheep count (Johnstone, 2012). Easy and robust remote livestock counting and monitoring is one such system for handling the herds efficiently and can also help to respond to any disturbance in livestock

Software-based methodologies
To process the videos recorded by UAV for animal monitoring, counting, and tracking, machine learning, or deep learning-based strategies are used. Machine learning is defined as a science of techniques and approaches for problem-solving of artificially intelligent systems, whereas, deep learning is an inspiration of the human brain, and use multi-layered neurons for data processing in an effective way. Though deep learning needs an enormous amount of data and very efficient high-end machines with discrete graphics cards for processing, once trained, it can provide results very fast. The main difference between these approaches is that the latter performs feature extraction and classification as an end-to-end process. Existing machine learning and deep learning algorithms for object detection and tracking in videos are discussed below very briefly. The researcher can select the algorithm depending on the approach used for animal counting in the paddock. Object detection can be used if only images are collected and object tracking can be used if videos are recorded.
The research took an important turn with the introduction of convolutional neural networks (CNNs) for object detection in images (Krizhevsky, Sutskever, & Hinton, 2012) accomplished the classification task on ImageNet using deep CNN (DCNN), AlexNet. After seeing promising results by CNNs, researchers starting exploring the impact of deeper networks and different ways of training them. According to Girshick et al. (2014) proposed a Region-based CNN (R-CNN) algorithm and Fast R-CNN (Girshick, 2015) for the same network but comparatively better results. Ren et al. (Ren, Girshick, & Sun, 2015) designed an even faster algorithm, Faster R-CNN, with a bit higher efficiency, and then new deeper networks and faster algorithms were proposed by many 1 st ICESET 2020 153 other researchers; you only look once (YOLO) (Redmon et al., 2016, Redmon andFarhadi, 2017), single-shot multibox detector (SSD) (Liu et al., 2016), Mask R-CNN , RetinaNet with focal loss (Lin et al., 2017) and U-Net with weighted Hausdorff distance (WHD) (Ribera et al., 2018). Multiple versions of YOLO are available and are the fastest available algorithms for video processing that can be used for real-time processing. Figure 2 shows a brief timeline of object detection methods. The prospective, that livestock monitoring can help farmers in identifying animal-related issues, has motivated various researchers during the last decade to analyze the videos recorded in different areas and provide required information to the concerned persons after processing them. The research to design wildlife monitoring methods (Jachmann, 1991), who put effort into estimating elephant density estimation through aerial images. Some of the previously used techniques in wildlife life monitoring are AdaBoost classifier (Burghardt and Calic, 2006), power spectralbased techniques (Parikh, Patel, & Bhatt, 2013), DPM, SVM (van Gemert et al., 2014), DCNN (Sarwar et al., 2020) and the template matching algorithm (Gonzalez et al., 2016).

Object tracking in videos
Object tracking in a video can be defined as the approximation of the path of an object throughout the video by keeping track of objects' spatial changes like variations in position, size, and shape. Like detection, object tracking algorithms are also classified into two domains, machine learning, and deep learning. Machine learning algorithms are further classified into three main categories: point tracking, kernel-based tracking, and silhouette based tracking.
Point-based tracking involves object detection per frame using different feature points. If the object is detected successfully in an image, then these algorithms have a high rate of accuracy, however, can lead to false object detections in case of occlusions. A few main algorithms that fall in this category are Kalman filter (Kalman, 1960) and particle filter (Kitagawa, 1987).
Kernel-based tracking tracks the motion of the object from one frame to the next frame using the primary object's region. The algorithms that come under this domain are categorized depending upon the method that is used to track the object. A few of the most commonly used algorithms are template matching, layering, mean shift (Exner et al., 2010), and SVM (Suykens & Vandewalle, 1999).
Silhouette-based tracking algorithms are useful for tracking objects with complex shapes like hands and fingers, that cannot be described completely using simple geometric features. These algorithms use rough shape models of objects from the previous frame to track them in the preceding frame. They use predefined information for each object and handle different complex shaped objects, occlusion, merging, and splitting problems. The main subcategories are contour and shape tracking (Parekh, Thakore, & Jaliya, 2014). The most important advantage of tracking silhouettes is its flexibility to handle a large variety of object shapes. However, they need prior information of all the objects under consideration.
Deep learning algorithms are used solely and also in conjunction with machine learning in this area. Lijun Wang et al. (Wang et al., 2015) used the features of a fully convolutional network to track the objects in a video. Although they tracked a single object per video, they were able to overcome many tracking challenges. According to Kang et al. (2016) combined tubelet, bounding box sequences, and CNNs (T-CNN) for the same task, where low-confidence detection classes were suppressed to reduce false positives, and the detection results were propagated between adjacent frames for reducing false negatives. According to Feichtenhofer, Pinz and Zisserman (2017) proposed a ConvNet architecture to detect and track the objects in a video jointly, hence, named it the Detect and Track (D&T) approach. They trained a fully end-to-end convolutional network using video frames and after computing cross-correlation between feature responses of adjacent frames, region of interest (ROI) pooling layer for classification and regression of proposal boxes is used. Table 1 shows a comparison of the discussed techniques for object tracking.

Point Tracking
Kalman Filter (Kalman, 1960) Can track objects in noisy images Assumes Gaussian distribution of all state variables Particle Filter (Kitagawa, 1987) Does not need Gaussian distribution of state variables Difficulty in tracking multiple objects and computationally expensive

Kernel Tracking
Template Matching (Parikh et al., 2013) Relatively simple method and can deal with partial occlusion High computational cost Mean shift (Exner et al., 2010) Does not need any prior shape information Cannot distinguish an object from the background in case of the same color To be continue 1 st ICESET 2020 155 SVM (Suykens and Vandewalle, 1999) Suitable for multiple object tracking

Needs training
Layering Based (Parekh et al., 2014) Suitable for multiple object tracking High computational cost

Silhouette Tracking
Contour Matching (Patel & Patel, 2012) Can handle multiple complex-shaped objects Computationally expensive Shape Matching (Parekh et al., 2014) Resistant to noise and can handle multiple complex-shaped objects Need prior information of all shapes and needs training

Deep CNN Based Tracking
T-CNN (Kang et al., 2016) Low computational cost and good for tracking a few objects Difficulty in tracking many objects in video Detect and Track (Feichtenhofer et al., 2017) Failed detections can be recovered using previous frames detections Computationally expensive algorithm FCN (Wang et al., 2015) Can handle occlusion and noise Track one object at a time

Discussion
Existing hardware-based methods ultimately lead to direct farmer-animal encounters and can be categorized as manual counting methods. Though machines are available that can be installed near holding pens for automatic tag reading, the gathering of full stock for daily information is not feasible. Manual counting of the livestock can be time-consuming and prone to various psychological phenomena that can lead to bias or optical illusion. Good use of technology is required for the betterment of farmers. Surveillance or motion-sensor cameras can be installed around the whole farm, however, having an average paddock of 252 hectares, many cameras will be needed for proper coverage. There will be the problems of continuous power supply, networking issues to link all these cameras together and data transmission to a central system. It may cost a lot more than the annual financial loss.
These factors support automatic detection, counting, and remote monitoring of livestock using a UAV. This would give much relief to farmers and extract relevant information very quickly. However, the design of this automatic system is not trivial, even if the data is collected under the most favorable conditions. Illumination, background, and shadow variations offer many challenges. Thus finding a general technique to handle these issues, while incorporating different farms, presents many technical issues. The main challenge, which UAV offers for this task, is the coverage of the whole paddock from a good allowable height and in minimum possible time. There is a great variation in paddocks as few paddocks have plain ground with no trees while others have uneven terrains and many bushes. Currently, no large data set is publically available, so, creating a good and large dataset will be one of the main contributions towards this research. Figure  3 shows few sheep farm images taken by a UAV, where detected sheep has a blue dot on it and looks like a whitish blob from 80 m altitude. Figure 3, a sub-image of full frame, gives a hint of another challenge, to detect and track hundreds of animals per each video frame, where each animal covers only 10s of pixels and looks like a colored blob from a good height.

Conclusion
The RFID tags and bolus for livestock detection, monitoring, and tracking are in use for decades, however, the use of a UAV is still in its infancy. They discussed literature shows the potential, benefits, and challenges of using a UAV, along with object detection and tracking algorithms, for these tasks. There is still a lot of room available for research in this area and researchers can explore the algorithms that can detect and track livestock from different altitudes, and the higher the better and more time-efficient the system will be. It is a new direction of research and with the improvement in the UAV technology and increased efficiency of algorithms, farmers will be able to reduce the burden of many manual tasks in pastoral farming.