Vitis AI
Getting Started
System Requirements
https://docs.xilinx.com/r/en-US/ug1414-vitis-ai
https://xilinx.github.io/Vitis-AI/3.5/html/docs/reference/system_requirements.html (3.0)
- 正式にサポートしているFPGAボードは限られているため注意
- 開発側はUbuntu 20.04(Debianは不可)にCUDA 11.3 + Container ToolkitおよびDockerを用意する必要がある (2023年時点)
- 便宜上CUDA 11.4を使用するケースもあったが問題は無さそう
- 開発側スペックは少なくともメモリを8GB程度は用意しないと処理中にOoM Killerで停止する > GCPなら最低でもn1-standard-2相当が必要
IP Core / Bitstream
https://docs.xilinx.com/r/en-US/ug1414-vitis-ai/Deep-Learning-Processor-Unit
https://docs.xilinx.com/r/en-US/pg338-dpu (JP)
https://github.com/Xilinx/Vitis-AI/tree/master/dpu
https://xilinx.github.io/Vitis-AI/3.5/html/docs/workflow-system-integration (3.0)
- 深層学習用の処理機能が詰まったDPUという名前のFPGA回路情報(IPコア)が提供されている
- あらかじめFPGAにDPU回路を書き込んでおいてXRTという名前のドライバで制御する
- Vivado/Vitisでビットストリームを生成
- ビットストリームをターゲットボードにダウンロード
- 関連するドライバ(および依存関係のあるライブラリ)をインストール
- 例えばZynq UltraScale+ MPSoCの場合はDPUCZDX8GというIPコア(Bitstream)が使用可能
- アーキテクチャ(ISA含む)やVitis AIのバージョンも合わせる必要があるので注意: B512/B800/B1024/B1152/B1600/B2304/B3136/B4096
- [IMPORTANT] Vitis AIは回路合成までは自動で行わないため適切なDPUを予めFPGAにダウンロードしておく必要がある !!!
- Kira KV260でDPU調整済みのOSイメージ(Option C)を使用すれば適切なDPUが自動でダウンロードされる
- OSイメージにUbuntuなどを選んだ場合は自分で設定するかapt経由(Kira KV260参照)でも可能 > バージョンの整合性に注意
OS Image
https://docs.xilinx.com/r/en-US/ug1414-vitis-ai/Flashing-the-OS-Image-to-the-SD-Card
- DPUのために調整されたVitis AI用のPetaLinuxを使用すれば余計な設定は不要 (Kira KV260参照)
- ライセンスに記載のあるBenchmarkingとはビジネス用語でありFPGAの性能試験のことではない
- 公式HPのUbuntuなどを使用する場合はDPUの調整が別途必要 (Kira KV260参照)
Model Zoo
https://xilinx.github.io/Vitis-AI/3.5/html/docs/workflow-model-zoo.html (3.0)
https://xilinx.github.io/Vitis-AI/3.5/html/docs/reference/ModelZoo_Github_web.htm (3.0)
https://github.com/Xilinx/Vitis-AI/tree/master/model_zoo
- 用意されているモデルは限られており、かつ商用不可ライセンスの物も多いため要注意
- ダウンロード先のURLはVitis-AI/model_zooにあるYAMLファイルで参照可能
- 再トレーニング用のプログラムもダウンロードしたzipファイルに含まれている
- 対応するDPUがFPGAにダウンロードされていないとFingerprint Failureで実行不可 > 部分的な回避策は下部に記載
Evaluation Boards
https://www.xilinx.com/products/boards-and-kits/see-all-evaluation-boards.html
https://www.xilinx.com/products/som/kria/kv260-vision-starter-kit.html
- Kria KV260が最安値
AI Optimizer (Optional)
https://xilinx.github.io/Vitis-AI/3.5/html/docs/workflow-model-development.html#model-optimization (3.0)
- Vitis AI Optimizer is an optional tool that can significantly enhance performance in many applications
- Vitis AI Optimizer requires the developer to purchase a license
Host Setup on Ubuntu
https://xilinx.github.io/Vitis-AI/3.5/html/docs/install/install.html (3.0)
- The pre-built cpu container should only be used when a GPU is not available on the host machine
- The docker_build process may take several hours to complete
- Often simply re-running the build script will result in success
git clone https://github.com/Xilinx/Vitis-AI #git clone -b 3.0 https://github.com/Xilinx/Vitis-AI
PyTorch (CPU Only)
cd ~/Vitis-AI
docker pull xilinx/vitis-ai-pytorch-cpu:latest
./docker_run.sh xilinx/vitis-ai-pytorch-cpu:latest
latest: Pulling from xilinx/vitis-ai-pytorch-cpu Digest: sha256:f55bd069ffd56c6358cae29df19e6085f2bcf8ea5e045744aa412fd72db521ed Status: Image is up to date for xilinx/vitis-ai-pytorch-cpu:latest docker.io/xilinx/vitis-ai-pytorch-cpu:latest Setting up user 's environment in the Docker container... Running as vitis-ai-user with ID 0 and group 0 ========================================== __ ___ _ _ _____ \ \ / (_) | (_) /\ |_ _| \ \ / / _| |_ _ ___ ______ / \ | | \ \/ / | | __| / __|______/ /\ \ | | \ / | | |_| \__ \ / ____ \ _| |_ \/ |_|\__|_|___/ /_/ \_\_____| ========================================== Docker Image Version: ubuntu2004-3.0.0.106 (CPU) Vitis AI Git Hash: d4ec26f Build Date: 2023-01-08 WorkFlow: pytorch
PyTorch (GPU Support)
cd ~/Vitis-AI/docker
## License is necessary to build opt_pytorch ./docker_build.sh -t gpu -f pytorch #./docker_build.sh -t gpu -f opt_pytorch
Validating Arguments... Your inputs: Docker-Type:gpu, FrameWork:pytorch ... => => writing image sha256:f555d3cf57c1562de00d29987b768f08836018fba6052a189bd1365c292d54b9 => => naming to docker.io/xilinx/vitis-ai-pytorch-gpu:3.5.0.001-bbd45838d
The list of Nvidia CUDA and cuDNN images of Docker Hub is available below:
https://hub.docker.com/r/nvidia/cuda/tags
#docker run --gpus all nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04 nvidia-smi docker run --gpus all nvidia/cuda:11.4.3-cudnn8-runtime-ubuntu20.04 nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 37C P8 10W / 70W | 105MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
cd ~/Vitis-AI
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE xilinx/vitis-ai-pytorch-gpu 3.5.0.001-bbd45838d f555d3cf57c1 2 hours ago 31GB
## The latest tag is not working in vitis-ai version 3.5 (as of July 2023) #./docker_run.sh xilinx/vitis-ai-pytorch-gpu:latest ./docker_run.sh xilinx/vitis-ai-pytorch-gpu:3.5.0.001-bbd45838d
Setting up user 's environment in the Docker container... Running as vitis-ai-user with ID 0 and group 0 ========================================== __ ___ _ _ _____ \ \ / (_) | (_) /\ |_ _| \ \ / / _| |_ _ ___ ______ / \ | | \ \/ / | | __| / __|______/ /\ \ | | \ / | | |_| \__ \ / ____ \ _| |_ \/ |_|\__|_|___/ /_/ \_\_____| ========================================== Docker Image Version: 3.5.0.001-bbd45838d (GPU) Vitis AI Git Hash: bbd45838d Build Date: 2023-07-20 WorkFlow: pytorch
Troubleshooting - Exit Code 137
https://stackoverflow.com/questions/31297616/what-is-the-authoritative-list-of-docker-run-exit-codes
https://komodor.com/learn/exit-codes-in-containers-and-kubernetes-the-complete-guide/
## ERROR: failed to solve: process "..." did not complete successfully: exit code: 137
Exit code 137 indicates that container was immediately terminated by the operating system via SIGKILL signal
The host machine needs more system memory; see Docker and OoM Killer for more details
Another workaround is preparing additional swap memory: see Swap for instructions
Troubleshooting - Docker Image Not Found
https://github.com/Xilinx/Vitis-AI/pull/1296
The 'latest' tag is not working in vitis-ai version 3.5 (as of July 2023)
## Unable to find image 'xilinx/vitis-ai-pytorch-gpu:latest' locally ## docker: Error response from daemon: pull access denied for xilinx/vitis-ai-pytorch-gpu, ## repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE xilinx/vitis-ai-pytorch-gpu 3.5.0.001-bbd45838d f555d3cf57c1 2 hours ago 31GB
docker tag xilinx/vitis-ai-pytorch-gpu:3.5.0.001-bbd45838d xilinx/vitis-ai-pytorch-gpu:latest
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE xilinx/vitis-ai-pytorch-gpu 3.5.0.001-bbd45838d f555d3cf57c1 2 hours ago 31GB xilinx/vitis-ai-pytorch-gpu latest f555d3cf57c1 2 hours ago 31GB
Jupyter Notebook in Docker Image
See Jupyter for details
jupyter notebook --port=8888 --ip=0.0.0.0
Target Board Setup (Zynq UltraScale+ MPSoC - DPUCZDX8G)
Cross-Compiler on Host Machine
## Exit or detach from the vitis-ai docker container before installing the cross compiler
cd ~/Vitis-AI/board_setup/mpsoc chmod 755 ./host_cross_compiler_setup.sh ./host_cross_compiler_setup.sh
#unset LD_LIBRARY_PATH source ~/petalinux_sdk_2022.2/environment-setup-cortexa72-cortexa53-xilinx-linux
## Run the vitis-ai docker container after installing the cross compiler cd ~/Vitis-AI ./docker_run.sh xilinx/vitis-ai-pytorch-cpu:latest
## Activate the conda environment in the docker container conda activate vitis-ai-pytorch
## Cross compile an example #cd ~/Vitis-AI/examples/vai_runtime/resnet50 cd /workspace/examples/vai_runtime/resnet50 #bash –x build.sh bash build.sh
Troubleshooting - LSB Modules
https://postgresweb.com/ubuntu-no-lsb-modules-are-available
## No LSB modules are available.
sudo apt install lsb-core
lsb_release -a
OS Installation
See Kira KV260 for the installation
Checking Device
xbutil examine
System Configuration OS Name : Linux Release : 5.15.36-xilinx-v2022.2 Version : #1 SMP Mon Oct 3 07:50:07 UTC 2022 Machine : aarch64 CPU Cores : 4 Memory : 3929 MB Distribution : PetaLinux 2022.2_release_S10071807 (honister) GLIBC : 2.34 Model : ZynqMP SMK-K26 Rev1/B/A XRT Version : 2.14.0 Branch : 2022.2 Hash : 43926231f7183688add2dccfd391b36a1f000bea Hash Date : 2022-10-07 05:12:02 ZOCL : 2.14.0, 43926231f7183688add2dccfd391b36a1f000bea Devices present BDF : Shell Platform UUID Device ID Device Ready* ---------------------------------------------------------------------- [0000:00:00.0] : edge 0x0 user(inst=0) Yes * Devices that are not ready will have reduced functionality when using XRT tools
Checking DPU Availability
export DEBUG_DPU_CONTROLLER=1 show_dpu
WARNING: Logging before InitGoogleLogging() is written to STDERR I0606 08:24:27.245379 1928 dpu_controller_dnndk.cpp:279] cancel register the dnndk dpu controller, because /dev/dpu is not opened I0606 08:24:27.245836 1928 dpu_controller.cpp:42] add factory method 02_xrt I0606 08:24:27.245878 1928 dpu_control_xrt.cpp:113] register the xrt edge dpu controller I0606 08:24:27.258949 1928 dpu_control_xrt.cpp:53] xrt dpu cu is detected, kernel = DPUCZDX8G I0606 08:24:27.259016 1928 dpu_control_xrt.cpp:82] create DpuControllerXrtEdge for DPUCZDX8G I0606 08:24:27.259049 1928 dpu_control_xrt_edge.cpp:53] creating dpu controller: this=0xaaab013e8e10 I0606 08:24:27.259078 1928 dpu_controller.cpp:57] create dpu controller via 02_xrt ret= 0xaaab013e8e10 device_core_id=0 device= 0 core = 0 fingerprint = 0x101000056010407 batch = 1 full_cu_name=DPUCZDX8G:DPUCZDX8G_1 I0606 08:24:27.259140 1928 dpu_control_xrt_edge.cpp:60] destroying dpu controller: this=0xaaab013e8e10
xdputil query
WARNING: Logging before InitGoogleLogging() is written to STDERR I0606 08:26:45.284555 2048 dpu_controller_dnndk.cpp:279] cancel register the dnndk dpu controller, because /dev/dpu is not opened I0606 08:26:45.285107 2048 dpu_controller.cpp:42] add factory method 02_xrt I0606 08:26:45.285149 2048 dpu_control_xrt.cpp:113] register the xrt edge dpu controller { "DPU IP Spec":{ "DPU Core Count":1, "IP version":"v4.1.0", "generation timestamp":"2022-11-30 19-15-00", "git commit id":"ce8dd1", "git commit time":2022113019, "regmap":"1to1 version" }, "VAI Version":{ "libvaip-core.so":"Xilinx vaip Version: 1.0.0-a176db67b19f94b0a31f9d24ef80322efe4494ad 2022-12-27-01:24:22 ", "libvart-runner.so":"Xilinx vart-runner Version: 3.0.0-2efa5fe1e56c2b2c8a7e71e9fc1636242dd50a9f 2022-12-27-00:47:05 ", "libvitis_ai_library-dpu_task.so":"Xilinx vitis_ai_library dpu_task Version: 3.0.0-1cccff04dc341c4a6287226828f90aed56005f4f 2022-12-20 10:29:01 [UTC] ", "libxir.so":"Xilinx xir Version: xir-9204ac72103092a7b253a0c23ec7471481656940 2022-12-27-00:46:16", "target_factory":"target-factory.3.0.0 860ed0499ab009084e2df3004eeb9ae710c26351" }, "kernels":[ { "DPU Arch":"DPUCZDX8G_ISA1_B4096", "DPU Frequency (MHz)":300, "IP Type":"DPU", "Load Parallel":2, "Load augmentation":"enable", "Load minus mean":"disable", "Save Parallel":2, "XRT Frequency (MHz)":300, "cu_addr":"0xa0010000", "cu_handle":"0xaaaaf9957c70", "cu_idx":0, "cu_mask":1, "cu_name":"DPUCZDX8G:DPUCZDX8G_1", "device_id":0, "fingerprint":"0x101000056010407", "name":"DPU Core 0" } ] }
ResNet50 (Image Classification)
## Root Login (as needed) sudo su -l root
## Download Model Zoo (as needed) #wget https://www.xilinx.com/bin/public/openDownload?filename=resnet50-zcu102_zcu104_kv260-r3.0.0.tar.gz -O resnet50-zcu102_zcu104_kv260-r3.0.0.tar.gz #tar -xzvf resnet50-zcu102_zcu104_kv260-r3.0.0.tar.gz #cp resnet50 /usr/share/vitis_ai_library/models -r
## Download Sample Images from Xilinx > Extract cd tar -xzvf vitis_ai_runtime_r3.0.0_image_video.tar.gz -C Vitis-AI/examples/vai_runtime
## Build Application (as needed) #cd ~/Vitis-AI/examples/vai_runtime/resnet50 #chmod 755 ./build.sh #./build.sh
## Run Example cd ~/Vitis-AI/examples/vai_runtime/resnet50 ./resnet50 /usr/share/vitis_ai_library/models/resnet50/resnet50.xmodel
WARNING: Logging before InitGoogleLogging() is written to STDERR I0606 09:46:05.968466 6374 main.cc:292] create running for subgraph: subgraph_conv1 I0606 09:46:05.985018 6374 dpu_controller_dnndk.cpp:279] cancel register the dnndk dpu controller, because /dev/dpu is not opened I0606 09:46:05.985324 6374 dpu_controller.cpp:42] add factory method 02_xrt I0606 09:46:05.985356 6374 dpu_control_xrt.cpp:113] register the xrt edge dpu controller I0606 09:46:05.998131 6374 dpu_control_xrt.cpp:53] xrt dpu cu is detected, kernel = DPUCZDX8G I0606 09:46:05.998200 6374 dpu_control_xrt.cpp:82] create DpuControllerXrtEdge for DPUCZDX8G I0606 09:46:05.998237 6374 dpu_control_xrt_edge.cpp:53] creating dpu controller: this=0xaaaaf72b4960 I0606 09:46:05.998266 6374 dpu_controller.cpp:57] create dpu controller via 02_xrt ret= 0xaaaaf72b4960 I0606 09:46:06.282402 6374 dpu_control_xrt_edge.cpp:115] code 0x19000000 core_idx 0 gen_reg: 0x19100000 0x1aa00000 ... Image : vitis-ai_gorilla_market.jpg top[0] prob = 0.xxxxxx name = Gorilla I0606 09:46:26.501123 6374 dpu_control_xrt_edge.cpp:60] destroying dpu controller: this=0xaaaaf72b4960
Troubleshooting - File Descriptor
## F0606 09:12:00.986861 4629 file_lock_lnx.cpp:28] Check failed: fd >= 0 (-1 vs. 0) cannot open file: /tmp/DPU_0
## Reset file descriptor rm /tmp/DPU_0
Troubleshooting - OpenCV
https://github.com/opencv/opencv/issues/18461
## terminate called after throwing an instance of 'cv::Exception' ## what(): OpenCV(4.5.2) /usr/src/debug/opencv/4.5.2-r0/git/modules/highgui/src/window_gtk.cpp:624: error: (-2:Unspecified error) Can't initialize GTK backend in function 'cvInitSystem'
## Disable DISPLAY if running the application via SSH or non-GUI environment export DISPLAY=:0.0
Troubleshooting - Fingerprint Failure
https://support.xilinx.com/s/question/0D54U00006wDmkzSAC/info-post-about-dpu-fingerprint
- DPU fingerprint is a unique identifier used in Vitis AI to characterize different DPU targets
- DPU fingerprint has a feature code depending on:
- IP Core: DPUCZDX8G/DPUCVDX8G/…
- Unique Architecture: B512/B800/B1024/B1152/B1600/B2304/B3136/B4096
- Instruction Set Architecture (ISA): 0x01
- Vitis AI Version: 2.5/3.0/…
## W0608 07:27:35.154747 83456 dpu_runner_base_imp.cpp:676] CHECK fingerprint fail ! model_fingerprint 0x101000056010407 dpu_fingerprint 0x101000016010406 ## F0608 07:27:35.154840 83456 dpu_runner_base_imp.cpp:648] fingerprint check failure.
## Check if DPU architecture and fingerprint are same as that for the compiled model xdputil query cat resnet50/meta.json
## Workaround with disabling fingerprint check env XLNX_ENABLE_FINGERPRINT_CHECK=0 ./resnet50 resnet50.xmodel
YOLOvX (Object Detection)
## Root Login (as needed) sudo su -l root
cd ~/Vitis-AI/examples/vai_library/samples/yolovx ./test_jpeg_yolovx /usr/share/vitis_ai_library/models/yolox_nano_pt/yolox_nano_pt.xmodel vitis-ai_gorilla_market.jpg
WARNING: Logging before InitGoogleLogging() is written to STDERR I0606 10:08:46.059096 7557 demo.hpp:1193] batch: 0 image: vitis-ai_gorilla_market.jpg I0606 10:08:46.059296 7557 process_result.hpp:32] RESULT: 16 78.75 25.94 502.17 505.75 0.469689
Custom Model Development (PyTorch)
https://github.com/Xilinx/Vitis-AI-Tutorials/blob/1.4/Design_Tutorials/09-mnist_pyt/README.md
https://www.paltek.co.jp/techblog/techinfo/211115_01
- Basic training flow for Vitis AI:
- Training: floating-point_model.pth
- Quantization: floating-point_model.pth > int_model.xmodel
- Compile: int_model.xmodel > int_model_kv260.xmodel
Step 0 - Setting-up Workspace
cd ~/Vitis-AI ./docker_run.sh xilinx/vitis-ai-pytorch-gpu:latest
conda activate vitis-ai-pytorch
git clone -b 1.4 https://github.com/Xilinx/Vitis-AI-Tutorials.git cd Vitis-AI-Tutorials/Design_Tutorials/09-mnist_pyt/files/
Step 1 - Training
export BUILD=./build export LOG=${BUILD}/logs mkdir -p ${LOG}
vi train.py
## Remove following lines from train.py ================================ > torchvision.datasets.MNIST.resources = [ > ('https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz', 'f68b3c2dcbeaaa9fbdd348bbdeb94873'), > ('https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz', 'd53e105ee54ea40749a09fcbcd1e9432'), > ('https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz', '9fb629c4189551a2d022fa330f9573f3'), > ('https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz', 'ec29112dd5afa0611ce80d1b7f02629c') > ] ================================
## Training python -u train.py -d ${BUILD} 2>&1 | tee ${LOG}/train.log
PyTorch version : 1.12.1 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] ----------------------------------------- Command line options: --build_dir : ./build --batchsize : 100 --learnrate : 0.001 --epochs : 3 ----------------------------------------- You have 1 CUDA devices available Device 0 : Tesla T4 Selecting device 0.. ... Epoch 1 Test set: Accuracy: 9814/10000 (98.14%) Epoch 2 Test set: Accuracy: 9866/10000 (98.66%) Epoch 3 Test set: Accuracy: 9898/10000 (98.98%) Trained model written to ./build/float_model/f_model.pth
Step 2 - Quantization
## Quantize python -u quantize.py -d ${BUILD} --quant_mode calib 2>&1 | tee ${LOG}/quant_calib.log
[VAIQ_NOTE]: Loading NNDCT kernels... ----------------------------------------- PyTorch version : 1.12.1 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] ----------------------------------------- Command line options: --build_dir : ./build --quant_mode : calib --batchsize : 100 ----------------------------------------- You have 1 CUDA devices available Device 0 : Tesla T4 Selecting device 0.. [VAIQ_NOTE]: OS and CPU information: [VAIQ_NOTE]: Tools version information: [VAIQ_NOTE]: GPU information: [VAIQ_NOTE]: Quant config file is empty, use default quant configuration [VAIQ_NOTE]: Quantization calibration process start up... [VAIQ_NOTE]: =>Quant Module is in 'cuda'. [VAIQ_NOTE]: =>Parsing CNN... [VAIQ_NOTE]: Start to trace and freeze model... [VAIQ_NOTE]: The input model CNN is torch.nn.Module. [VAIQ_NOTE]: Finish tracing. [VAIQ_NOTE]: Processing ops... [VAIQ_NOTE]: =>Doing weights equalization... [VAIQ_NOTE]: =>Quantizable module is generated.(./build/quant_model/CNN.py) [VAIQ_NOTE]: =>Get module with quantization. Test set: Accuracy: 9904/10000 (99.04%) [VAIQ_NOTE]: =>Exporting quant config.(./build/quant_model/quant_info.json)
## Evaluate Quantized Model python -u quantize.py -d ${BUILD} --quant_mode test 2>&1 | tee ${LOG}/quant_test.log
[VAIQ_NOTE]: Loading NNDCT kernels... ----------------------------------------- PyTorch version : 1.12.1 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] ----------------------------------------- Command line options: --build_dir : ./build --quant_mode : test --batchsize : 100 ----------------------------------------- You have 1 CUDA devices available Device 0 : Tesla T4 Selecting device 0.. [VAIQ_NOTE]: OS and CPU information: [VAIQ_NOTE]: Tools version information: [VAIQ_NOTE]: GPU information: [VAIQ_NOTE]: Quant config file is empty, use default quant configuration [VAIQ_NOTE]: Quantization test process start up... [VAIQ_NOTE]: =>Quant Module is in 'cuda'. [VAIQ_NOTE]: =>Parsing CNN... [VAIQ_NOTE]: Start to trace and freeze model... [VAIQ_NOTE]: The input model CNN is torch.nn.Module. [VAIQ_NOTE]: Finish tracing. [VAIQ_NOTE]: Processing ops... [VAIQ_NOTE]: =>Doing weights equalization... [VAIQ_NOTE]: =>Quantizable module is generated.(./build/quant_model/CNN.py) [VAIQ_NOTE]: =>Get module with quantization. Test set: Accuracy: 9901/10000 (99.01%) [VAIQ_NOTE]: =>Converting to xmodel ... [VAIQ_NOTE]: =>Successfully convert 'CNN' to xmodel.(./build/quant_model/CNN_int.xmodel)
Step 3 - Compile
vi compile.sh
## Add following lines to compile.sh if using KV260 ================================ elif [ $1 = kv260 ]; then ARCH=/opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json TARGET=kv260 echo "-----------------------------------------" echo "COMPILING MODEL FOR KV260.." echo "-----------------------------------------" ================================
## Compile source compile.sh kv260 ${BUILD} ${LOG}
COMPILING MODEL FOR KV260.. ----------------------------------------- [UNILOG][INFO] Compile mode: dpu [UNILOG][INFO] Debug mode: null [UNILOG][INFO] Target architecture: DPUCZDX8G_ISA1_B4096 [UNILOG][INFO] Graph name: CNN, with op num: 33 [UNILOG][INFO] Begin to compile... [UNILOG][INFO] Total device subgraph number 3, DPU subgraph number 1 [UNILOG][INFO] Compile done. [UNILOG][INFO] The meta json is saved to "./build/compiled_model/meta.json" [UNILOG][INFO] The compiled xmodel is saved to "./build/compiled_model/CNN_kv260.xmodel" [UNILOG][INFO] The compiled xmodel's md5sum is ed77..., and has been saved to "./build/compiled_model/md5sum.txt" ************************************************** * VITIS_AI Compilation - Xilinx Inc. ************************************************** ----------------------------------------- MODEL COMPILED -----------------------------------------
Step 4 - Application
vi target.py
## Modify the following line of target.py if using KV260 ================================ ap.add_argument('-t', '--target', type=str, default='zcu102', choices=['zcu102','zcu104','u50','vck190','kv260'], help='Target board type') ================================
## Preparing files for the target board python -u target.py --target kv260 -d ${BUILD} 2>&1 | tee ${LOG}/target_kv260.log
3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] ------------------------------------ Command line options: --build_dir : ./build --target : kv260 --num_images : 10000 --app_dir : application ------------------------------------ Copying application code from application ... Copying compiled model from ./build/compiled_model/CNN_kv260.xmodel ...
Step 5 - Running on Target
## Copy resulting files to the target board cd build tar cvfz target_kv260.tar.gz target_kv260 scp target_kv260.tar.gz root@kv260:
## Extract the resulting tar file on the target board tar xvfz target_kv260.tar.gz cd target_kv260
## Installing OpenCV (as needed) sudo pip3 install opencv-python
## Loading DPU (as needed) sudo xmutil listapps sudo xmutil unloadapp sudo xmutil loadapp kv260-benchmark-b4096
## Disabling fingerprint check (as needed) export XLNX_ENABLE_FINGERPRINT_CHECK=0
## Running the application on the target board python3 app_mt.py -m CNN_kv260.xmodel
Command line options: --image_dir : images --threads : 1 --model : CNN_kv260.xmodel ------------------------------- Pre-processing 10000 images... ------------------------------- Starting 1 threads... ------------------------------- Throughput=4641.32 fps, total frames = 10000, time=2.1546 seconds Correct:9886, Wrong:114, Accuracy:0.9886 -------------------------------
Custom Model Development via Model Zoo (PyTorch)
- The host machine for training needs at least around 16GB memory, plus many and powerful GPUs
- Training with 300 epochs (default) of the whole MS-COCO 2017 dataset on an NVIDIA Tesla T4 x 1 GPU board would take of order around 100 days
Step 0 - Setting-up Workspace
cd ~/Vitis-AI ./docker_run.sh xilinx/vitis-ai-pytorch-gpu:latest
conda activate vitis-ai-pytorch
## Download Model Zoo wget https://www.xilinx.com/bin/public/openDownload?filename=pt_yolox-nano_coco_416_416_1G_3.0.zip -O pt_yolox-nano_coco_416_416_1G_3.0.zip unzip pt_yolox-nano_coco_416_416_1G_3.0.zip cd pt_yolox-nano_coco_416_416_1G_3.0
## Installing Python Modules pip install --user -r requirements.txt cd code pip install --user -v -e . cd ..
## Preparing Dataset of MS-COCO (as needed) cd data/COCO wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip wget http://images.cocodataset.org/zips/train2017.zip wget http://images.cocodataset.org/zips/val2017.zip wget http://images.cocodataset.org/zips/test2017.zip unzip annotations_trainval2017.zip unzip train2017.zip unzip val2017.zip unzip test2017.zip cd ../../
Step 1 - Evaluation
## Evaluation bash code/run_eval.sh
Conducting test... 2023-06-27 09:09:19 | INFO | __main__:139 - Args: Namespace(batch_size=32, ckpt='float/yolox_nano.pth', ...) [VAIQ_NOTE]: Loading NNDCT kernels... 2023-06-27 09:09:32 | INFO | __main__:149 - Model Summary: Params: 0.91M, Gflops: 1.00 2023-06-27 09:09:32 | INFO | __main__:150 - Model Structure: YOLOX( ... ) 2023-06-27 09:09:32 | INFO | yolox.data.datasets.coco:64 - loading annotations into memory... 2023-06-27 09:09:34 | INFO | yolox.data.datasets.coco:64 - Done (t=1.79s) 2023-06-27 09:09:34 | INFO | pycocotools.coco:86 - creating index... 2023-06-27 09:09:34 | INFO | pycocotools.coco:86 - index created! 2023-06-27 09:09:56 | INFO | __main__:165 - loading checkpoint from float/yolox_nano.pth 2023-06-27 09:09:57 | INFO | __main__:169 - loaded checkpoint done. 100%|##########| 157/157 [04:24<00:00, 1.68s/it] 2023-06-28 08:16:47 | INFO | yolox.evaluators.coco_evaluator:256 - Evaluate in main process... 2023-06-28 08:17:10 | INFO | yolox.evaluators.coco_evaluator:289 - Loading and preparing results... 2023-06-28 08:17:17 | INFO | yolox.evaluators.coco_evaluator:289 - DONE (t=6.73s) 2023-06-28 08:17:17 | INFO | pycocotools.coco:366 - creating index... 2023-06-28 08:17:18 | INFO | pycocotools.coco:366 - index created! Running per image evaluation... Evaluate annotation type *bbox* COCOeval_opt.evaluate() finished in 18.23 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 2.92 seconds. 2023-06-28 08:17:43 | INFO | __main__:196 - Average forward time: 5.14 ms, Average NMS time: 0.83 ms, Average inference time: 5.97 ms Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.220 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.365 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.226 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.062 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.225 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.357 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.218 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.351 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.384 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.130 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.428 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586 per class AP: | class | AP | class | AP | class | AP | |:--------------|:-------|:-------------|:-------|:---------------|:-------| | person | 35.267 | bicycle | 13.570 | car | 18.173 | | motorcycle | 25.904 | airplane | 43.217 | bus | 45.145 | | train | 49.695 | truck | 15.139 | boat | 10.732 | | traffic light | 11.231 | fire hydrant | 40.946 | stop sign | 48.689 | | parking meter | 24.575 | bench | 10.999 | bird | 13.384 | | cat | 37.786 | dog | 34.574 | horse | 32.929 | | sheep | 24.128 | cow | 27.846 | elephant | 44.797 | | bear | 45.493 | zebra | 48.477 | giraffe | 52.491 | | backpack | 3.131 | umbrella | 21.010 | handbag | 2.292 | | tie | 14.239 | suitcase | 11.637 | frisbee | 35.369 | | skis | 8.520 | snowboard | 7.550 | sports ball | 18.980 | | kite | 23.806 | baseball bat | 9.997 | baseball glove | 15.666 | | skateboard | 22.507 | surfboard | 14.978 | tennis racket | 21.614 | | bottle | 11.882 | wine glass | 10.445 | cup | 15.231 | | fork | 9.913 | knife | 3.421 | spoon | 1.997 | | bowl | 23.385 | banana | 12.635 | apple | 7.757 | | sandwich | 21.507 | orange | 18.874 | broccoli | 13.849 | | carrot | 9.995 | hot dog | 14.199 | pizza | 35.764 | | donut | 23.774 | cake | 16.139 | chair | 11.971 | | couch | 31.531 | potted plant | 10.725 | bed | 32.832 | | dining table | 23.782 | toilet | 47.803 | tv | 42.598 | | laptop | 36.937 | mouse | 31.544 | remote | 3.964 | | keyboard | 30.560 | cell phone | 15.958 | microwave | 34.743 | | oven | 21.415 | toaster | 0.446 | sink | 21.817 | | refrigerator | 35.881 | book | 5.182 | clock | 29.555 | | vase | 13.550 | scissors | 8.278 | teddy bear | 23.986 | | hair drier | 0.000 | toothbrush | 4.453 | | | per class AR: | class | AR | class | AR | class | AR | |:--------------|:-------|:-------------|:-------|:---------------|:-------| | person | 46.651 | bicycle | 27.834 | car | 32.623 | | motorcycle | 41.853 | airplane | 56.084 | bus | 53.145 | | train | 61.421 | truck | 41.232 | boat | 27.783 | | traffic light | 25.315 | fire hydrant | 50.891 | stop sign | 54.400 | | parking meter | 42.333 | bench | 28.273 | bird | 25.340 | | cat | 57.970 | dog | 53.853 | horse | 48.934 | | sheep | 42.994 | cow | 44.247 | elephant | 61.429 | | bear | 55.634 | zebra | 59.361 | giraffe | 62.888 | | backpack | 19.057 | umbrella | 38.894 | handbag | 18.593 | | tie | 27.540 | suitcase | 35.418 | frisbee | 47.130 | | skis | 27.593 | snowboard | 20.290 | sports ball | 27.308 | | kite | 38.012 | baseball bat | 24.690 | baseball glove | 31.892 | | skateboard | 41.061 | surfboard | 30.637 | tennis racket | 36.089 | | bottle | 29.891 | wine glass | 20.469 | cup | 33.017 | | fork | 24.558 | knife | 16.062 | spoon | 11.107 | | bowl | 45.120 | banana | 34.459 | apple | 32.076 | | sandwich | 47.797 | orange | 42.807 | broccoli | 42.276 | | carrot | 33.041 | hot dog | 28.080 | pizza | 51.514 | | donut | 40.427 | cake | 36.968 | chair | 35.178 | | couch | 57.203 | potted plant | 35.351 | bed | 54.356 | | dining table | 47.094 | toilet | 62.179 | tv | 57.951 | | laptop | 51.688 | mouse | 51.509 | remote | 22.544 | | keyboard | 49.477 | cell phone | 31.565 | microwave | 58.000 | | oven | 46.503 | toaster | 6.667 | sink | 42.756 | | refrigerator | 56.825 | book | 20.399 | clock | 43.258 | | vase | 31.861 | scissors | 21.667 | teddy bear | 41.263 | | hair drier | 0.000 | toothbrush | 13.860 | | |
Troubleshooting - Number of Workers
## UserWarning: This DataLoader will create 4 worker processes in total. ## Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. ## Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
## Reduce the number of workers vi ./code/yolox/exp/yolox_base.py
## Modify the following line of yolox_base.py depending on the system (as needed) ================================ self.data_num_workers = 1 ================================
Step 2 - Training
## Update the number of output classes vi ./code/yolox/exp/yolox_base.py
## Modify the following line of yolox_base.py depending on the dataset and desired model (as needed) ================================ self.num_classes = 80 ================================
## Training bash code/run_train.sh
Conducting training... 2023-06-28 08:38:14 | INFO | yolox.core.trainer:130 - args: Namespace(batch_size=4, ...) 2023-06-28 08:38:14 | INFO | yolox.core.trainer:131 - exp value: ╒═══════════════════╤════════════════════════════╕ │ keys │ values │ ╞═══════════════════╪════════════════════════════╡ │ seed │ None │ ├───────────────────┼────────────────────────────┤ │ output_dir │ './YOLOX_outputs' │ ├───────────────────┼────────────────────────────┤ │ print_interval │ 10 │ ├───────────────────┼────────────────────────────┤ │ eval_interval │ 10 │ ├───────────────────┼────────────────────────────┤ │ num_classes │ 80 │ ├───────────────────┼────────────────────────────┤ │ depth │ 0.33 │ ├───────────────────┼────────────────────────────┤ │ width │ 0.25 │ ├───────────────────┼────────────────────────────┤ │ act │ 'relu' │ ├───────────────────┼────────────────────────────┤ │ data_num_workers │ 2 │ ├───────────────────┼────────────────────────────┤ │ input_size │ (416, 416) │ ├───────────────────┼────────────────────────────┤ │ multiscale_range │ 5 │ ├───────────────────┼────────────────────────────┤ │ data_dir │ 'data/COCO' │ ├───────────────────┼────────────────────────────┤ │ train_ann │ 'instances_train2017.json' │ ├───────────────────┼────────────────────────────┤ │ val_ann │ 'instances_val2017.json' │ ├───────────────────┼────────────────────────────┤ │ test_ann │ 'instances_test2017.json' │ ├───────────────────┼────────────────────────────┤ │ mosaic_prob │ 0.5 │ ├───────────────────┼────────────────────────────┤ │ mixup_prob │ 1.0 │ ├───────────────────┼────────────────────────────┤ │ hsv_prob │ 1.0 │ ├───────────────────┼────────────────────────────┤ │ flip_prob │ 0.5 │ ├───────────────────┼────────────────────────────┤ │ degrees │ 10.0 │ ├───────────────────┼────────────────────────────┤ │ translate │ 0.1 │ ├───────────────────┼────────────────────────────┤ │ mosaic_scale │ (0.5, 1.5) │ ├───────────────────┼────────────────────────────┤ │ enable_mixup │ False │ ├───────────────────┼────────────────────────────┤ │ mixup_scale │ (0.5, 1.5) │ ├───────────────────┼────────────────────────────┤ │ shear │ 2.0 │ ├───────────────────┼────────────────────────────┤ │ warmup_epochs │ 5 │ ├───────────────────┼────────────────────────────┤ │ max_epoch │ 300 │ ├───────────────────┼────────────────────────────┤ │ warmup_lr │ 0 │ ├───────────────────┼────────────────────────────┤ │ min_lr_ratio │ 0.05 │ ├───────────────────┼────────────────────────────┤ │ basic_lr_per_img │ 0.00015625 │ ├───────────────────┼────────────────────────────┤ │ scheduler │ 'yoloxwarmcos' │ ├───────────────────┼────────────────────────────┤ │ no_aug_epochs │ 15 │ ├───────────────────┼────────────────────────────┤ │ ema │ True │ ├───────────────────┼────────────────────────────┤ │ weight_decay │ 0.0005 │ ├───────────────────┼────────────────────────────┤ │ momentum │ 0.9 │ ├───────────────────┼────────────────────────────┤ │ save_history_ckpt │ True │ ├───────────────────┼────────────────────────────┤ │ exp_name │ 'yolox_nano_deploy_relu' │ ├───────────────────┼────────────────────────────┤ │ test_size │ (416, 416) │ ├───────────────────┼────────────────────────────┤ │ test_conf │ 0.01 │ ├───────────────────┼────────────────────────────┤ │ nmsthre │ 0.65 │ ├───────────────────┼────────────────────────────┤ │ random_size │ (10, 20) │ ╘═══════════════════╧════════════════════════════╛ [VAIQ_NOTE]: Loading NNDCT kernels... 2023-06-28 08:38:17 | INFO | yolox.core.trainer:137 - Model Summary: Params: 0.91M, Gflops: 1.00 2023-06-28 08:38:21 | INFO | yolox.data.datasets.coco:64 - loading annotations into memory... 2023-06-28 08:38:44 | INFO | yolox.data.datasets.coco:64 - Done (t=23.58s) 2023-06-28 08:38:44 | INFO | pycocotools.coco:86 - creating index... 2023-06-28 08:38:45 | INFO | pycocotools.coco:86 - index created! 2023-06-28 08:39:22 | INFO | yolox.core.trainer:155 - init prefetcher, this might take one minute or less... 2023-06-28 08:40:16 | INFO | yolox.data.datasets.coco:64 - loading annotations into memory... 2023-06-28 08:40:17 | INFO | yolox.data.datasets.coco:64 - Done (t=1.01s) 2023-06-28 08:40:17 | INFO | pycocotools.coco:86 - creating index... 2023-06-28 08:40:17 | INFO | pycocotools.coco:86 - index created! 2023-06-28 08:40:19 | INFO | yolox.core.trainer:191 - Training start... 2023-06-28 08:40:19 | INFO | yolox.core.trainer:192 - YOLOX( ... ) 2023-06-28 08:40:19 | INFO | yolox.core.trainer:203 - ---> start train epoch1 2023-06-28 08:40:40 | INFO | yolox.core.trainer:261 - epoch: 1/300, iter: 10/29572, mem: 13006Mb, iter_time: 2.057s, data_time: 0.062s, total_loss: 19.7, ... 2023-06-28 08:40:43 | INFO | yolox.core.trainer:261 - epoch: 1/300, iter: 20/29572, mem: 13006Mb, iter_time: 0.346s, data_time: 0.200s, total_loss: 14.5, ... ... (Estimated time of arrival with NVIDIA Tesla T4 x 1 would be around 100 days...) ... 2023-06-28 21:43:02 | INFO | yolox.core.trainer:356 - Save weights to ./YOLOX_outputs/yolox_nano_deploy_relu 100%|##########| 1250/1250 [01:54<00:00, 10.92it/s] 2023-06-28 21:44:56 | INFO | yolox.evaluators.coco_evaluator:256 - Evaluate in main process... 2023-06-28 21:44:56 | INFO | yolox.evaluators.coco_evaluator:289 - Loading and preparing results... 2023-06-28 21:44:56 | INFO | yolox.evaluators.coco_evaluator:289 - DONE (t=0.02s) 2023-06-28 21:44:56 | INFO | pycocotools.coco:366 - creating index... 2023-06-28 21:44:57 | INFO | pycocotools.coco:366 - index created! Running per image evaluation... Evaluate annotation type *bbox* COCOeval_opt.evaluate() finished in 11.71 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 0.66 seconds. 2023-06-28 21:45:10 | INFO | yolox.core.trainer:346 - Average forward time: 4.55 ms, Average NMS time: 0.35 ms, Average inference time: 4.90 ms Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 2023-06-28 21:45:10 | INFO | yolox.core.trainer:356 - Save weights to ./YOLOX_outputs/yolox_nano_deploy_relu 2023-06-28 21:45:10 | INFO | yolox.core.trainer:356 - Save weights to ./YOLOX_outputs/yolox_nano_deploy_relu 2023-06-28 21:45:10 | INFO | yolox.core.trainer:196 - Training of experiment is done and the best AP is 0.00
Troubleshooting - Number of GPUs
## RuntimeError: NCCL error in: ProcessGroupNCCL.cpp:1191, invalid usage, NCCL version 2.10.3 ## ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
## Reduce the number of GPUs vi ./code/run_train.sh
## Modify the following lines of run_train.sh depending on the system (as needed) ================================ export CUDA_VISIBLE_DEVICES=0 GPU_NUM=1 ================================
Troubleshooting - GPU/CUDA Out of Memory
## RuntimeError: CUDA out of memory. ## Tried to allocate 170.00 MiB (GPU 0; 14.76 GiB total capacity; 13.63 GiB already allocated; 72.81 MiB free; 13.63 GiB reserved in total by PyTorch) ## If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. ## See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
## Reduce the batch size (or upgrading GPUs would be preferable if possible) vi ./code/run_train.sh
## Modify the following line of run_train.sh depending on the system (as needed) ================================ BATCH=4 ================================
Step 3 - Quantization
## Quantization and xmodel Dumping bash code/run_quant.sh
[VAIQ_NOTE]: Loading NNDCT kernels... 2023-06-28 19:56:48 | INFO | __main__:148 - Args: Namespace(batch_size=32, ckpt='float/yolox_nano.pth', ...) 2023-06-28 19:56:48 | INFO | __main__:163 - Model Summary: Params: 0.91M, Gflops: 1.00 2023-06-28 19:56:48 | INFO | __main__:164 - Model Structure: YOLOX( ... ) 2023-06-28 19:56:48 | INFO | yolox.data.datasets.coco:64 - loading annotations into memory... 2023-06-28 19:56:49 | INFO | yolox.data.datasets.coco:64 - Done (t=0.64s) 2023-06-28 19:56:49 | INFO | pycocotools.coco:86 - creating index... 2023-06-28 19:56:49 | INFO | pycocotools.coco:86 - index created! 2023-06-28 19:56:52 | INFO | __main__:181 - loading checkpoint from float/yolox_nano.pth 2023-06-28 19:56:52 | INFO | __main__:188 - loaded checkpoint done. [VAIQ_NOTE]: OS and CPU information: [VAIQ_NOTE]: Tools version information: [VAIQ_NOTE]: GPU information: [VAIQ_NOTE]: Quant config file is empty, use default quant configuration [VAIQ_NOTE]: Quantization calibration process start up... [VAIQ_NOTE]: =>Quant Module is in 'cuda'. [VAIQ_NOTE]: =>Parsing YOLOX... [VAIQ_NOTE]: Start to trace and freeze model... [VAIQ_NOTE]: The input model YOLOX is torch.nn.Module. [VAIQ_NOTE]: Finish tracing. [VAIQ_NOTE]: Processing ops... [VAIQ_NOTE]: =>Doing weights equalization... [VAIQ_NOTE]: =>Quantizable module is generated.(quantize_result/YOLOX.py) [VAIQ_NOTE]: =>Get module with quantization. 100%|##########| 157/157 [01:51<00:00, 1.41it/s] 2023-06-28 19:58:51 | INFO | yolox.evaluators.coco_evaluator_q:270 - Evaluate in main process... 2023-06-28 19:59:11 | INFO | yolox.evaluators.coco_evaluator_q:303 - Loading and preparing results... 2023-06-28 19:59:16 | INFO | yolox.evaluators.coco_evaluator_q:303 - DONE (t=5.47s) 2023-06-28 19:59:16 | INFO | pycocotools.coco:366 - creating index... 2023-06-28 19:59:17 | INFO | pycocotools.coco:366 - index created! Running per image evaluation... Evaluate annotation type *bbox* COCOeval_opt.evaluate() finished in 16.24 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 2.56 seconds. 2023-06-28 19:59:38 | INFO | __main__:236 - Average forward time: 17.37 ms, Average NMS time: 0.69 ms, Average inference time: 18.06 ms Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.137 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.262 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.132 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.043 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.153 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.228 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.158 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.267 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.300 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.091 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.476 per class AP: ... [VAIQ_NOTE]: =>Exporting quant config.(quantize_result/quant_info.json) [VAIQ_NOTE]: Loading NNDCT kernels... 2023-06-28 19:59:45 | INFO | __main__:148 - Args: Namespace(batch_size=32, ckpt='float/yolox_nano.pth', ...) 2023-06-28 19:59:46 | INFO | __main__:163 - Model Summary: Params: 0.91M, Gflops: 1.00 2023-06-28 19:59:46 | INFO | __main__:164 - Model Structure: YOLOX( ... ) 2023-06-28 19:59:46 | INFO | yolox.data.datasets.coco:64 - loading annotations into memory... 2023-06-28 19:59:46 | INFO | yolox.data.datasets.coco:64 - Done (t=0.65s) 2023-06-28 19:59:46 | INFO | pycocotools.coco:86 - creating index... 2023-06-28 19:59:46 | INFO | pycocotools.coco:86 - index created! 2023-06-28 19:59:50 | INFO | __main__:181 - loading checkpoint from float/yolox_nano.pth 2023-06-28 19:59:50 | INFO | __main__:188 - loaded checkpoint done. [VAIQ_NOTE]: OS and CPU information: [VAIQ_NOTE]: Tools version information: [VAIQ_NOTE]: GPU information: [VAIQ_NOTE]: Quant config file is empty, use default quant configuration [VAIQ_NOTE]: Quantization test process start up... [VAIQ_NOTE]: =>Quant Module is in 'cuda'. [VAIQ_NOTE]: =>Parsing YOLOX... [VAIQ_NOTE]: Start to trace and freeze model... [VAIQ_NOTE]: The input model YOLOX is torch.nn.Module. [VAIQ_NOTE]: Finish tracing. [VAIQ_NOTE]: Processing ops... [VAIQ_NOTE]: =>Doing weights equalization... [VAIQ_NOTE]: =>Quantizable module is generated.(quantize_result/YOLOX.py) [VAIQ_NOTE]: =>Get module with quantization. 100%|##########| 157/157 [00:54<00:00, 2.89it/s] 2023-06-28 20:00:51 | INFO | yolox.evaluators.coco_evaluator_q:270 - Evaluate in main process... 2023-06-28 20:01:12 | INFO | yolox.evaluators.coco_evaluator_q:303 - Loading and preparing results... 2023-06-28 20:01:17 | INFO | yolox.evaluators.coco_evaluator_q:303 - DONE (t=5.45s) 2023-06-28 20:01:17 | INFO | pycocotools.coco:366 - creating index... 2023-06-28 20:01:18 | INFO | pycocotools.coco:366 - index created! Running per image evaluation... Evaluate annotation type *bbox* COCOeval_opt.evaluate() finished in 15.82 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 2.59 seconds. 2023-06-28 20:01:38 | INFO | __main__:236 - Average forward time: 5.94 ms, Average NMS time: 0.78 ms, Average inference time: 6.71 ms Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.136 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.264 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.132 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.155 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.226 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.156 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.265 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.298 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.093 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.471 per class AP: ... [VAIQ_NOTE]: Loading NNDCT kernels... 2023-06-28 20:01:48 | INFO | __main__:148 - Args: Namespace(batch_size=32, ckpt='float/yolox_nano.pth', ...) 2023-06-28 20:01:48 | INFO | __main__:163 - Model Summary: Params: 0.91M, Gflops: 1.00 2023-06-28 20:01:48 | INFO | __main__:164 - Model Structure: YOLOX( ... ) 2023-06-28 20:01:48 | INFO | yolox.data.datasets.coco:64 - loading annotations into memory... 2023-06-28 20:01:50 | INFO | yolox.data.datasets.coco:64 - Done (t=1.81s) 2023-06-28 20:01:50 | INFO | pycocotools.coco:86 - creating index... 2023-06-28 20:01:50 | INFO | pycocotools.coco:86 - index created! 2023-06-28 20:01:52 | INFO | __main__:181 - loading checkpoint from float/yolox_nano.pth 2023-06-28 20:01:54 | INFO | __main__:188 - loaded checkpoint done. [VAIQ_NOTE]: OS and CPU information: [VAIQ_NOTE]: Tools version information: [VAIQ_NOTE]: Quant config file is empty, use default quant configuration [VAIQ_NOTE]: Quantization test process start up... [VAIQ_NOTE]: =>Quant Module is in 'cpu'. [VAIQ_NOTE]: =>Parsing YOLOX... [VAIQ_NOTE]: Start to trace and freeze model... [VAIQ_NOTE]: The input model YOLOX is torch.nn.Module. [VAIQ_NOTE]: Finish tracing. [VAIQ_NOTE]: Processing ops... [VAIQ_NOTE]: =>Doing weights equalization... [VAIQ_NOTE]: =>Quantizable module is generated.(quantize_result/YOLOX.py) [VAIQ_NOTE]: =>Get module with quantization. 0%| | 0/5000 [00:00<?, ?it/s] 2023-06-28 20:02:01 | INFO | __main__:236 - [VAIQ_NOTE]: =>Converting to xmodel ... [VAIQ_NOTE]: =>Dumping 'YOLOX_0'' checking data... [VAIQ_NOTE]: =>Finsh dumping data.(quantize_result/deploy_check_data_int/YOLOX_0) [VAIQ_NOTE]: =>Successfully convert 'YOLOX_0' to xmodel.(quantize_result/YOLOX_0_int.xmodel)
Step Ex. - Quantization-Aware Training (as needed)
## Quantization-Aware Training, Model Converting, and xmodel Dumping bash code/run_qat.sh
References
https://misoji-engineer.com/archives/vitis-ai-how-to.html
https://misoji-engineer.com/archives/vitis-ai-3-0.html
https://misoji-engineer.com/archives/build-vitis-ai-gpu.html
https://www.paltek.co.jp/techblog/tag/ai
https://www.pixela.co.jp/products/pickup/dev/ai/vitisai_ai_3_model_zoo.html
https://misoji-engineer.com/archives/vitis-ai-model-zoo.html
https://tomosoft.jp/design/?p=44403
https://www.pixela.co.jp/products/pickup/dev/
https://www.paltek.co.jp/techblog/techinfo/220121_01
Acknowledgments
Daiphys is a professional-service company for research and development of leading-edge technologies in science and engineering.
Get started accelerating your business through our deep expertise in R&D with AI, quantum computing, and space development; please get in touch with Daiphys today!
Daiphys Technologies LLC - https://www.daiphys.com/