ViT

ViT-Lite.zip

Click here to download the lite code zip.

VisionTransformer-master

Redirect to the GitHub full code repository.

1. Introduction

The project code is a ViT neural network architecture demo. It can train, test and predict image datasets. You can download the provided datasets including fruits, animals, bloodcells. And of course, you can also run the code on your own image datasets. However, it is important to note that the project is only a demo. So it is not recommended that you deploy the code to industrial production.

2. Data Preparation

There are three image datasets provided: fruits, animals, bloodcells. You can customize a new image dataset, but you need to meet certain rules. Otherwise the project code might not run. Please refer to the three provided image datasets folder or file placement structure. It is easy for you to discover the rules, right?

The three image datasets occupy approximately 1GB of disk space totally. So only the fruits dataset zip package is placed in the GitHub repository. The other two datasets are recommended to download via BaiduNetdisk. Remember to unzip these image datasets before you run code!

Fruits (download link BaiduNetdisk extract password "nl86"): apple, carambola, pear, plum, tomato.
Animals (download link BaiduNetdisk extract password "nl86"): cat, dog.
Bloodcells (download link 1 BaiduNetdisk extract password "nl86", download link 2 Amazon): neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes, erythroblasts, platelets.

As an aside, you can integrate the three provided image datasets into a larger dataset which has 15 (5+2+8=15) categories. Then try to train and test the ViT model. The project workspace structure file tree is as follows:

3. Dependency

The project supports both CPU and GPU. If your computer has a graphics processing unit, you can use the "nvidia-smi" command in the terminal to view the GPU type. My GPU memory is 4GB size. The version returned by the "nvidia-smi" command is CUDA 12.5.

>>> NVIDIA-SMI 556.12 Driver Version: 556.12 CUDA Version: 12.5

You can find the appropriate package version on the PyTorch website. And I install the torch-cu121 package on Windows 11.

It is recommended to download Python 3.10+ packages via Anaconda, Miniconda, or Python venv.

conda

conda create -n ViTenv python==3.10.0
conda activate ViTenv
pip install matplotlib
python Preprocessing.py

venv

python -m venv venv
./venv/Scripts/activate
./venv/Scripts/pip.exe install matplotlib
./venv/Scripts/python.exe Preprocessing.py

Windows 11

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Windows 10

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install pillow

macOS

pip install torch torchvision torchaudio
pip install numpy
pip install pillow

Linux

I don't know.

4. Preprocessing

Before training, the "Preprocessing.py" file needs to be executed to collect statistics and process images.

>>> python ./Preprocessing.py

5. Train

The "./cache/log" folder in project has "animals.pt", "fruits.pt" and "blood cells.pt" weight files. They are the excellent weight results of Illusionna training after 200 epochs. You can just apply these two weights to testing and predicting without training from scratch. After all, training is very time-consuming.

If you wanna train from scratch with fruits, animals, bloodcells or own image datasets, you can execute the "Train.py" file after "Preprocessing.py" is done.

>>> python ./Train.py

The above five in a column screenshots show the training process of 200 epochs for animals, fruits and bloodcells datasets respectively. The training time of animals is 03:31:17, and the training time of fruits is 01:54:29. The training time of bloodcells is moderate, 02:25:42.

The program code automatically saves all training results in the "./cache" directory, where the subfolder "./cache/log" is the storage of training weight files. You can find the animals and fruits weights in "./cache/log". They are the optimal weights that I trained for 200 epochs.

6. Test

The "Illustrate.py" file is applied for illustrating the training process. You can find a general range of intervals according to the illustration. And then you will seek an optimal training weight. Certainly, you can also test directly with the two optimal weights I trained in the "./cache/log" directory.

>>> python ./Illustrate.py

>>> python ./Test.py

The final test result is returned and saved with an accuracy. The fruits image test dataset achieves 98% accuracy. It's very very very good! The bloodcells image test dataset is 92% accuracy, which is also good : )

7. Predict

The accuracy of the fruits image test dataset is as high as 98%. It indicates that the weight of training is quite good. So we can apply this weight to predicting some unknown image dataset.

>>> python ./Predict.py

8. Conclusion

The ViT neural network architecture model performs very well on the three image datasets fruits, animals and bloodcells. The accuracy of fruits is 98.455%, 80.150% on animals and 92.678% on bloodcells.

That's all for the code description of the project. If you have any questions about this, plzzzzz contact Illusionna by email. Thanks for your reading.