Vision Transformer
Friday, November 24, 2023, at 03:54 a.m.
1. Introduction
The project code is a ViT neural network architecture demo. It can train, test and predict image datasets. You can download the provided datasets including fruits, animals, bloodcells. And of course, you can also run the code on your own image datasets. However, it is important to note that the project is only a demo. So it is not recommended that you deploy the code to industrial production.
2. Data Preparation
There are three image datasets provided: fruits, animals, bloodcells. You can customize a new image dataset, but you need to meet certain rules. Otherwise the project code might not run. Please refer to the three provided image datasets folder or file placement structure. It is easy for you to discover the rules, right?
The three image datasets occupy approximately 1GB of disk space totally. So only the fruits dataset zip package is placed in the GitHub repository. The other two datasets are recommended to download via BaiduNetdisk. Remember to unzip these image datasets before you run code!
- Fruits (download link BaiduNetdisk extract password "nl86"): apple, carambola, pear, plum, tomato.
- Animals (download link BaiduNetdisk extract password "nl86"): cat, dog.
- Bloodcells (download link 1 BaiduNetdisk extract password "nl86", download link 2 Amazon): neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes, erythroblasts, platelets.
As an aside, you can integrate the three provided image datasets into a larger dataset which has 15 (5+2+8=15) categories. Then try to train and test the ViT model. The project workspace structure file tree is as follows:
3. Dependency
The project supports both CPU and GPU. If your computer has a graphics processing unit, you can use the "nvidia-smi" command in the terminal to view the GPU type. My GPU memory is 4GB size. The version returned by the "nvidia-smi" command is CUDA 12.5.
>>> NVIDIA-SMI 556.12 Driver Version: 556.12 CUDA Version: 12.5
You can find the appropriate package version on the PyTorch website. And I install the torch-cu121 package on Windows 11.
It is recommended to download Python 3.10+ packages via Anaconda, Miniconda, or Python venv.
- conda
- conda create -n ViTenv python==3.10.0
- conda activate ViTenv
- pip install matplotlib
- python Preprocessing.py
- venv
- python -m venv venv
- ./venv/Scripts/activate
- ./venv/Scripts/pip.exe install matplotlib
- ./venv/Scripts/python.exe Preprocessing.py
- Windows 11
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Windows 10
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- pip install pillow
- macOS
- pip install torch torchvision torchaudio
- pip install numpy
- pip install pillow
- Linux
- I don't know.
4. Preprocessing
Before training, the "Preprocessing.py" file needs to be executed to collect statistics and process images.
>>> python ./Preprocessing.py
5. Train
The "./cache/log" folder in project has "animals.pt", "fruits.pt" and "blood cells.pt" weight files. They are the excellent weight results of Illusionna training after 200 epochs. You can just apply these two weights to testing and predicting without training from scratch. After all, training is very time-consuming.
If you wanna train from scratch with fruits, animals, bloodcells or own image datasets, you can execute the "Train.py" file after "Preprocessing.py" is done.
>>> python ./Train.py
The above five in a column screenshots show the training process of 200 epochs for animals, fruits and bloodcells datasets respectively. The training time of animals is 03:31:17, and the training time of fruits is 01:54:29. The training time of bloodcells is moderate, 02:25:42.
The program code automatically saves all training results in the "./cache" directory, where the subfolder "./cache/log" is the storage of training weight files. You can find the animals and fruits weights in "./cache/log". They are the optimal weights that I trained for 200 epochs.
6. Test
The "Illustrate.py" file is applied for illustrating the training process. You can find a general range of intervals according to the illustration. And then you will seek an optimal training weight. Certainly, you can also test directly with the two optimal weights I trained in the "./cache/log" directory.
>>> python ./Illustrate.py
>>> python ./Test.py
The final test result is returned and saved with an accuracy. The fruits image test dataset achieves 98% accuracy. It's very very very good! The bloodcells image test dataset is 92% accuracy, which is also good : )
7. Predict
The accuracy of the fruits image test dataset is as high as 98%. It indicates that the weight of training is quite good. So we can apply this weight to predicting some unknown image dataset.
>>> python ./Predict.py
8. Conclusion
The ViT neural network architecture model performs very well on the three image datasets fruits, animals and bloodcells. The accuracy of fruits is 98.455%, 80.150% on animals and 92.678% on bloodcells.
That's all for the code description of the project. If you have any questions about this, plzzzzz contact Illusionna by email. Thanks for your reading.