Tuomas Valtanen & Mikko Pajula
The objective of the BerryMachine –project is to create a technological prototype, with which it is possible to recognize the condition of the current berry harvest based on the amount of berries within a photo. This challenge will be tackled by using the principles of machine and deep learning, which typically require a large image dataset to train the artificial intelligence to recognize different objects from a photo efficiently.
Our technological work got the real kickstart during the summer of 2021, when the original berry photo dataset was produced. The contents of the dataset consist mostly of bilberry photos, although a portion of lingonberry photos exist as well. At the time of writing, there are approximately 10000 photos in total, containing vast amounts of pictures of berry flowers as well as raw and ripe berries.
The lighting and zoom level of the photos differentiate throughout the dataset. The reason for this is machine learning itself, since it is optimal to have a dataset as diverse as possible. The changes in lighting comes naturally due to the weather of different days, but the zooming levels were distributed into three different categories. These categories are close-ups, mid-range photos and the photos containing the whole measurement square frame. To add more diversity into the dataset, multiple photographers were producing the material with multiple different devices (smartphones and SLR cameras).
The machine learning model trained to recognize berries within photos will be built by using neural networks. A neural network is a computational model of the process how a system mimicking a human brain functions. In other words, by using neural networks, we can loosely model the process on a computer, how human brain actually learns. This learning process can be applied in image recognition with a special neural network structure.
Typically, a neural network aimed for image recognition is trained with a dataset, where the recognized objects are predefined beforehand. In addition to taking the actual pictures, another part of preparing the dataset is to mark down recognized objects by human hand. In other words, the personnel responsible will process through the dataset with special software, and mark down all interesting recognizable objects, which in this case are berry flowers as well as raw and ripe berries. This process is commonly called as “annotation”, for which multiple different software solutions exist.
In the BerryMachine –project, Label Studio was selected as the annotation software. The reasons for selecting Label Studio were its clarity, the safe storage of the produced annotation data and its network capabilities. Since Label Studio functions via network, it allows multiple personnel to annotate the same material simultaneously, which prevents us from doing redundant work and moving around image files and annotation data needlessly.
Performing a single annotation in Label Studio is done by drawing a rectangle on top of a recognized feature. The annotation process itself is simple, but provides multiple challenges:
- How large should the annotated area be?
- Should we annotate unclear situations?
- How should we annotate small or distant targets?
- What should we do, if for example, we can’t differentiate different flowers from each other?
Lingonberry was especially difficult in the recognition phase, since it produces the berries in bunches. In the flower phase, if the photo is overexposed, it is almost impossible by a human eye to recognize closely bunched flowers. This “special case” of the lingonberry was solved by creating a separate classification for a lingonberry bunch. This was also created for other lingonberry growth phases.
Counting berries via bunches can be challenging, since the amount of flowers or berries within a bunch can be ambiguous. On the other hand, it might be possible for the machine learning model to recognize bunches better than single berries, which could add to the accuracy of the recognition because of the greater amount of recognized bunches. However, this can lead to a compromise, where we have to estimate the average amount of berries by the size of the bunch.
Within the BerryMachine –project, the personnel performing the annotation were given instructions to crop the target as accurately as possible, while performing within acceptable time limits and ensuring all needed information were included. Unclear targets are only annotated, if the person doing the annotation has exact information about the target’s content. Small and distant targets are only marked, if there are enough pixels within the source photo material.
Photos taken from the highest zoom level are especially interesting. In these photos the berries are small, but still easily recognizable by human eye. Thanks to modern smartphone cameras, the image quality and especially the pixel density allows us to distinquish between small details within the photo.
The frame shown in the photo is the berry measurement square frame, which is used to calculate the amount of berries within a square meter area.
While annotating bilberry photos, we also realized other berries (for example, crowberries) that are not recognizable targets in the project, can be present in the photos. Crowberries are fairly recognizable, but e.g. bog bilberries are very difficult recognize, unless the person performing the annotation is an actual agrologist. For the first versions of the machine learning model, these extra berries are not taken into account at all. However, if they prove to be problematic for the results of the recognition of other berries, there are methods that prevent the machine learning model to recognize them incorrectly.
When the annotation process has been finished, we can export the annotation data into a digital format, which can be used to train the selected neural network.
The output of the annotation software is a digital file, which can be applied to the learning process of the neural network. The data includes, e.g. the person responsible for the annotation, information about the photo as well as the coordinates the recognized bilberry within the photo.
In the first phase, the neural network will be trained to recognize the type of berries, after which, the possibility of counting the amount of the berries within the photo will be investigated.
As the autumn proceeds, the people at the FrostBit Software Lab have a lot of work to do in order to process all of the image dataset, especially when more photos might appear later on. But so what, it’s better to have too much than too little data, at least when it comes to machine learning!
We are eager to wait the results of our upcoming artifical intelligence with our current photo dataset!