Что такое mace cl compiled program bin
Перейти к содержимому

Что такое mace cl compiled program bin

  • автор:

MIUI 12 выведет безопасность пользовательских данных на новый уровень

Xiaomi разработала программную платформу под названием MACE (Mobile AI Compute Engine), которая выведет безопасность пользовательских данных на новый уровень. Дебютирует MACE в прошивке MIUI 12, которую должны представить уже очень скоро – 27 апреля.

MIUI 12 выведет безопасность пользовательских данных на новый уровень

Само главное достоинство MACE – автономность. Она выполняет все процессы прямо на мобильном устройстве, а не использует для обработки мощности серверов облачных сервисов. Ну а коль скоро данные никуда не отправляются, то и перехватить их невозможно. Но это еще не все – в MIUI 12 появится механизм защиты информации под названием «отличительная конфиденциальность». Его смысл состоит в добавлении к передаваемым посредством беспроводной связи пользовательским данным небольшой части фальшивого кода – безвредного, но в то же время мешающего декодированию информации в случае перехвата.

Алгоритм «отличительной конфиденциальности» в сочетании с MACE должны сделать смартфоны с MIUI 12 более защищенными. Для всех нынешних пользователей смартфонов Xiaomi прелесть состоит в том, что эти системы защиты реализованы программно, так что пользователи моделей, которым обещано обновление до MIUI 12, получат новые системы защиты автоматически с новой прошивкой.

Что такое mace cl compiled program bin

Jgb49 no está en línea

Jgb49

Miembro del foro
Fecha de registro: feb 2018
Mensajes: 326
Modelo de smartphone: Xiaomi redmi note 9 pro
Tu operador: Simyo
Que es mace_cl_compiled_program.bin

Pues eso, que no deja de aparecer este archivo en mi note 9 pro y no se que es. Aparece en almacenamiento interno y tambi�n dentro de la carpeta Dcim, lo elimino y al poco vuelve a aparecer.

22/12/23, 09:40:28

rpla no está en línea

rpla

Usuario poco activo
Fecha de registro: abr 2012
Mensajes: 2
Tu operador: Movistar

Mace (Mobile AI Compute Engine) es el proyecto de desarrollo de Inteligencia Artificial m�s grande creado por Xiaomi para sus smartphones. Gracias al uso de nuevas programaciones y algoritmos Xiaomi asegura que la privacidad del usuario en MIUI 12 queda apartada de los servidores, siendo relegada a quedarse en el dispositivo del usuario.

El c�digo est� en Gitub

How to build¶

and then run the container with the following command.

# Create container # Set 'host' network to use ADB docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb --net=host \ -v /local/path:/container/path xiaomimace/mace-dev /bin/bash 

Usage¶

1. Pull MACE source code¶

git clone https://github.com/XiaoMi/mace.git git fetch —all —tags —prune # Checkout the latest tag (i.e. release version) tag_name=`git describe —abbrev=0 —tags` git checkout tags/$

It’s highly recommanded to use a release version instead of master branch.

2. Model Preprocessing¶

  • TensorFlow

TensorFlow provides Graph Transform Tool to improve inference efficiency by making various optimizations like Ops folding, redundant node removal etc. It’s strongly recommended to make these optimizations before graph conversion step.

The following commands show the suggested graph transformations and optimizations for different runtimes,

# CPU/GPU: ./transform_graph \ --in_graph=tf_model.pb \ --out_graph=tf_model_opt.pb \ --inputs='input' \ --outputs='output' \ --transforms='strip_unused_nodes(type=float, shape="1,64,64,3") strip_unused_nodes(type=float, shape="1,64,64,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) flatten_atrous_conv fold_batch_norms fold_old_batch_norms strip_unused_nodes sort_by_execution_order' 
# DSP: ./transform_graph \ --in_graph=tf_model.pb \ --out_graph=tf_model_opt.pb \ --inputs='input' \ --outputs='output' \ --transforms='strip_unused_nodes(type=float, shape="1,64,64,3") strip_unused_nodes(type=float, shape="1,64,64,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms backport_concatv2 quantize_weights(minimum_size=2) quantize_nodes strip_unused_nodes sort_by_execution_order' 

MACE converter only supports Caffe 1.0+, you need to upgrade your models with Caffe built-in tool when necessary,

# Upgrade prototxt $CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt # Upgrade caffemodel $CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel

3. Build static/shared library¶

3.1 Overview¶

MACE can build either static or shared library (which is specified by linkshared in YAML model deployment file). The followings are two use cases.

    Build well tuned library for specific SoCs

When target_socs is specified in YAML model deployment file, the build tool will enable automatic tuning for GPU kernels. This usually takes some time to finish depending on the complexity of your model.

Note You should plug in device(s) with the correspoding SoC(s).
When target_socs is not specified, the generated library is compatible with general devices.

Note There will be around of 1 ~ 10% performance drop for GPU runtime compared to the well tuned library.

MACE provide command line tool ( tools/converter.py ) for model conversion, compiling, test run, benchmark and correctness validation.

  1. tools/converter.py should be run at the root directory of this project.
  2. When linkshared is set to 1 , build_type should be proto . And currently only android devices supported.
3.2 tools/converter.py usage¶

Commands

build library and test tools.

# Build library python tools/converter.py build --config=models/config.yaml 

run the model(s).

# Test model run time python tools/converter.py run --config=models/config.yaml --round=100 # Validate the correctness by comparing the results against the # original model and framework, measured with cosine distance for similarity. python tools/converter.py run --config=models/config.yaml --validate # Check the memory usage of the model(**Just keep only one model in configuration file**) python tools/converter.py run --config=models/config.yaml --round=10000 & sleep 5 adb shell dumpsys meminfo | grep mace_run kill %1 

run rely on build command, you should run after build .

benchmark and profiling model.

# Benchmark model, get detailed statistics of each Op. python tools/converter.py benchmark --config=models/config.yaml 

benchmark rely on build command, you should benchmark after build .

Common arguments

option type default commands explanation
—omp_num_threads int -1 run / benchmark number of threads
—cpu_affinity_policy int 1 run / benchmark 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY
—gpu_perf_hint int 3 run / benchmark 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
—gpu_perf_hint int 3 run / benchmark 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
—gpu_priority_hint int 3 run / benchmark 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH

Using -h to get detailed help.

python tools/converter.py -h python tools/converter.py build -h python tools/converter.py run -h python tools/converter.py benchmark -h 

4. Deployment¶

build command will generate the static/shared library, model files and header files and packaged as build/$/libmace_$.tar.gz .

  • The generated static libraries are organized as follows,
build/ └── mobilenet-v2-gpu ├── include │ └── mace │ └── public │ ├── mace.h │ └── mace_runtime.h ├── libmace_mobilenet-v2-gpu.tar.gz ├── lib │ ├── arm64-v8a │ │ └── libmace_mobilenet-v2-gpu.MI6.msm8998.a │ └── armeabi-v7a │ └── libmace_mobilenet-v2-gpu.MI6.msm8998.a ├── model │ ├── mobilenet_v2.data │ └── mobilenet_v2.pb └── opencl ├── arm64-v8a │ └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin └── armeabi-v7a └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
  • The generated shared libraries are organized as follows,
build └── mobilenet-v2-gpu ├── include │ └── mace │ └── public │ ├── mace.h │ └── mace_runtime.h ├── lib │ ├── arm64-v8a │ │ ├── libgnustl_shared.so │ │ └── libmace.so │ └── armeabi-v7a │ ├── libgnustl_shared.so │ └── libmace.so ├── model │ ├── mobilenet_v2.data │ └── mobilenet_v2.pb └── opencl ├── arm64-v8a │ └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin └── armeabi-v7a └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
  1. DSP runtime depends on libhexagon_controller.so .
  2. $.pb file will be generated only when build_type is proto .
  3. $_compiled_opencl_kernel.$.$.bin will be generated only when target_socs and gpu runtime are specified.
  4. Generated shared library depends on libgnustl_shared.so .

$_compiled_opencl_kernel.$.$.bin depends on the OpenCL version of the device, you should maintan the compatibility or configure compiling cache store with ConfigKVStorageFactory .

5. How to use the library in your project¶

Please refer to mace/examples/example.cc for full usage. The following list the key steps.

// Include the headers #include "mace/public/mace.h" #include "mace/public/mace_runtime.h" // If the build_type is code #include "mace/public/mace_engine_factory.h" // 0. Set pre-compiled OpenCL binary program file paths when available if (device_type == DeviceType::GPU) < mace::SetOpenCLBinaryPaths(opencl_binary_paths); >// 1. Set compiled OpenCL kernel cache, this is used to reduce the // initialization time since the compiling is too slow. It's suggested // to set this even when pre-compiled OpenCL program file is provided // because the OpenCL version upgrade may also leads to kernel // recompilations. const std::string file_path ="path/to/opencl_cache_file"; std::shared_ptr storage_factory( new FileStorageFactory(file_path)); ConfigKVStorageFactory(storage_factory); // 2. Declare the device type (must be same with ``runtime`` in configuration file) DeviceType device_type = DeviceType::GPU; // 3. Define the input and output tensor names. std::vector input_names = ; std::vector output_names = ; // 4. Create MaceEngine instance std::shared_ptr engine; MaceStatus create_engine_status; // Create Engine from compiled code create_engine_status = CreateMaceEngineFromCode(model_name.c_str(), nullptr, input_names, output_names, device_type, &engine); // Create Engine from model file create_engine_status = CreateMaceEngineFromProto(model_pb_data, model_data_file.c_str(), input_names, output_names, device_type, &engine); if (create_engine_status != MaceStatus::MACE_SUCCESS) < // Report error >// 5. Create Input and Output tensor buffers std::map inputs; std::map outputs; for (size_t i = 0; i < input_count; ++i) < // Allocate input and output int64_t input_size = std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1, std::multiplies()); auto buffer_in = std::shared_ptr(new float[input_size], std::default_delete()); // Load input here // . inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in); > for (size_t i = 0; i < output_count; ++i) < int64_t output_size = std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1, std::multiplies()); auto buffer_out = std::shared_ptr(new float[output_size], std::default_delete()); outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out); > // 6. Run the model MaceStatus status = engine.Run(inputs, &outputs);

© Copyright 2018, Mobile AI Compute Engine (MACE) Developers. Revision 7ac05858 .

Saved searches

Use saved searches to filter your results more quickly

Cancel Create saved search

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

XiaoMi / mace Public

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CL_INVALID_KERNEL_ARGS #648

gasgallo opened this issue May 22, 2020 · 15 comments

CL_INVALID_KERNEL_ARGS #648

gasgallo opened this issue May 22, 2020 · 15 comments

Comments

Contributor
gasgallo commented May 22, 2020 •

Before you open an issue, please make sure you have tried the following steps:

  1. Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
  2. Have you ever read the document for your usage?
  3. Check if your issue appears in HOW-TO-DEBUG or FAQ.
  4. The form below must be filled.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • NDK version(e.g., 15c): 18b
  • GCC version(if compiling for host, e.g., 5.4.0): 5.4.0
  • MACE version (Use the command: git describe —long —tags): 0.13.0
  • Python version(2.7): 3.6
  • Bazel version (e.g., 0.13.0): 0.16.0
  • CMake version: 3.16.0

Model deploy file (*.yml)

# The name of library library_name: test target_abis: [arm64-v8a] model_graph_format: code model_data_format: code models: FE: # model tag, which will be used in model loading and must be specific. platform: caffe # path to your tensorflow model's pb file. Support local path, http:// and https:// model_file_path: /models/FE.prototxt weight_file_path: /models/FE.caffemodel # sha256_checksum of your model's pb file. # use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file model_sha256_checksum: 98f9b69a085e7d8f40704ac6b2fedae0fda876fff4658509dde3d74d883a9684 weight_sha256_checksum: a9f5d4dfe944315511c6070e8556790409ae0f0bd9005c5db66b4fdd5c38b716 subgraphs: - input_tensors: - data input_shapes: - 1,3,112,112 input_data_formats: - NCHW output_tensors: - fc1 output_shapes: - 1,1,1,512 obfuscate: 0 limit_opencl_kernel_time: 1 runtime: cpu+gpu winograd: 4 FD: platform: caffe model_file_path: /models/FD.prototxt weight_file_path: /models/FD.caffemodel model_sha256_checksum: 213d764bd605d02b1630740969ab7110a2ee0111e3f8200ce02304cf72fbd42a weight_sha256_checksum: c83d575645daf8541867a63197de6bfd44a7fb3bf9bf4c876cde8165c23fac0c subgraphs: - input_tensors: - data input_shapes: - 1,3,160,160 input_data_formats: - NCHW output_tensors: - face_rpn_cls_prob_reshape_stride32 - face_rpn_bbox_pred_stride32 - face_rpn_landmark_pred_stride32 - face_rpn_cls_prob_reshape_stride16 - face_rpn_bbox_pred_stride16 - face_rpn_landmark_pred_stride16 - face_rpn_cls_prob_reshape_stride8 - face_rpn_bbox_pred_stride8 - face_rpn_landmark_pred_stride8 output_shapes: - 1,4,5,5 - 1,8,5,5 - 1,20,5,5 - 1,4,10,10 - 1,8,10,10 - 1,20,10,10 - 1,4,20,20 - 1,8,20,20 - 1,20,20,20 output_data_formats: - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW obfuscate: 0 runtime: cpu+gpu winograd: 0

Describe the problem

  • When integrating compiled mace library and model into and android app, I get CL_INVALID_KERNEL_ARGS error at runtime, followed by some Out of resources errors.

Any clue about what can cause this kind of error?

To Reproduce

Steps to reproduce the problem:

1. cd /path/to/mace 2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file 2. python tools/converter.py run --validate --disable_tuning --config_file=/path/to/your/model_deployment_file 3. run android app

Error information / logs

Please include the full log and/or traceback here.

E/MACE: helper.cc:201 error: CL_INVALID_KERNEL_ARGS E/MACE: helper.cc:246 error: CL_INVALID_KERNEL_ARGS I/MACE: activation.cc:113 TuningOrRun3DKernel(runtime, kernel_, tuning_key, gws, lws, context->future()) failed with error: Out of resources I/MACE: net.cc:152 op->Run(&context) failed with error: Out of resources I/MACE: mace.cc:890 net_->Run(run_metadata) failed with error: Out of resources 

Additional context

Model to reproduce the issue can be found here

The text was updated successfully, but these errors were encountered:

Contributor Author
gasgallo commented May 22, 2020 •

@lu229 No I’m using original MACE code at tag v0.13.0 .

Also the error doesn’t always happen. In my application I built multiple models into one library and:

  • when only the model from first post is used, then I get no error
  • when I use the model from first post in combination with other models, then I get the error

Is it possible to have memory collision when multiple models run on GPU?

Collaborator
lu229 commented May 22, 2020

@gasgallo This is no memory collision when multiple modes run on GPU, we’ve never had a problem like this before, it seems that the model use the mismatched cl cache. Could you upload the yml file include multi models?

Contributor Author
gasgallo commented May 22, 2020 •

@lu229 sure. I’ll also update first post:

# The name of library library_name: test target_abis: [arm64-v8a] model_graph_format: code model_data_format: code models: FE: # model tag, which will be used in model loading and must be specific. platform: caffe # path to your tensorflow model's pb file. Support local path, http:// and https:// model_file_path: /models/FE.prototxt weight_file_path: /models/FE.caffemodel # sha256_checksum of your model's pb file. # use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file model_sha256_checksum: 98f9b69a085e7d8f40704ac6b2fedae0fda876fff4658509dde3d74d883a9684 weight_sha256_checksum: a9f5d4dfe944315511c6070e8556790409ae0f0bd9005c5db66b4fdd5c38b716 subgraphs: - input_tensors: - data input_shapes: - 1,3,112,112 input_data_formats: - NCHW output_tensors: - fc1 output_shapes: - 1,1,1,512 obfuscate: 0 limit_opencl_kernel_time: 1 runtime: cpu+gpu winograd: 4 FD: platform: caffe model_file_path: /models/FD.prototxt weight_file_path: /models/FD.caffemodel model_sha256_checksum: 213d764bd605d02b1630740969ab7110a2ee0111e3f8200ce02304cf72fbd42a weight_sha256_checksum: c83d575645daf8541867a63197de6bfd44a7fb3bf9bf4c876cde8165c23fac0c subgraphs: - input_tensors: - data input_shapes: - 1,3,160,160 input_data_formats: - NCHW output_tensors: - face_rpn_cls_prob_reshape_stride32 - face_rpn_bbox_pred_stride32 - face_rpn_landmark_pred_stride32 - face_rpn_cls_prob_reshape_stride16 - face_rpn_bbox_pred_stride16 - face_rpn_landmark_pred_stride16 - face_rpn_cls_prob_reshape_stride8 - face_rpn_bbox_pred_stride8 - face_rpn_landmark_pred_stride8 output_shapes: - 1,4,5,5 - 1,8,5,5 - 1,20,5,5 - 1,4,10,10 - 1,8,10,10 - 1,20,10,10 - 1,4,20,20 - 1,8,20,20 - 1,20,20,20 output_data_formats: - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW - NCHW obfuscate: 0 runtime: cpu+gpu winograd: 0 

Collaborator
lu229 commented May 22, 2020

@gasgallo It seems that there is nothing error. Is there a same error in the mace run? If so I want to debug this problem. By the way, can you use the command git status to ensure that you haven’t modified the code?

Contributor Author
gasgallo commented May 22, 2020

@lu229 yes, I’m 100% sure the code is the original from this repo because I use docker and get source code like:

ARG MACE_VERSION="v0.13.0" WORKDIR /mace RUN git clone -b "$" https://github.com/XiaoMi/mace.git . 

I will try to reproduce it in mace run and let you know.

Collaborator
lu229 commented May 22, 2020

OK, Thanks! I will analyze the code and try to find the problem.

Contributor Author
gasgallo commented May 22, 2020 •

@lu229 mace run works fine.

I don’t think we can reproduce the issue in mace run because it creates and destroys MACE engine for every model sequentially (and in this scenario even my app works fine), while in reality many engines coexist at the same time and this seems to be the reason of the crash.

root@ds017:/mace# python tools/converter.py run --config /models/test.yml CMD> bazel version WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". Build label: 0.16.0 Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Tue Jul 31 17:01:24 2018 (1533056484) Build timestamp: 1533056484 Build timestamp as int: 1533056484 CMD> bazel build //mace/proto:mace_py WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". Loading: Loading: 0 packages loaded Analyzing: target //mace/proto:mace_py (5 packages loaded) INFO: Analysed target //mace/proto:mace_py (17 packages loaded). INFO: Found 1 target. Building: no action [2 / 8] [-----] BazelWorkspaceStatusAction stable-status.txt Target //mace/proto:mace_py up-to-date: bazel-genfiles/mace/proto/mace_pb2.py INFO: Elapsed time: 3.774s, Critical Path: 0.04s INFO: 0 processes. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action CMD> cp -f bazel-genfiles/mace/proto/mace_pb2.py tools/python/py_proto CMD> bazel build //mace/proto:micro_mem_py WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". Loading: Loading: 0 packages loaded Analyzing: target //mace/proto:micro_mem_py (5 packages loaded) INFO: Analysed target //mace/proto:micro_mem_py (17 packages loaded). INFO: Found 1 target. [2 / 8] [-----] BazelWorkspaceStatusAction stable-status.txt Target //mace/proto:micro_mem_py up-to-date: bazel-genfiles/mace/proto/micro_mem_pb2.py INFO: Elapsed time: 2.412s, Critical Path: 0.05s INFO: 0 processes. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action CMD> cp -f bazel-genfiles/mace/proto/micro_mem_pb2.py tools/python/py_proto CMD> bazel build //third_party/caffe:caffe_py WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". Loading: Loading: 0 packages loaded Analyzing: target //third_party/caffe:caffe_py (5 packages loaded) INFO: Analysed target //third_party/caffe:caffe_py (17 packages loaded). INFO: Found 1 target. [0 / 5] [-----] BazelWorkspaceStatusAction stable-status.txt Target //third_party/caffe:caffe_py up-to-date: bazel-genfiles/third_party/caffe/caffe_pb2.py INFO: Elapsed time: 2.268s, Critical Path: 0.04s INFO: 0 processes. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action CMD> cp -f bazel-genfiles/third_party/caffe/caffe_pb2.py tools/python/py_proto * Build //mace/tools:mace_run_static with ABI arm64-v8a ('build', '//mace/tools:mace_run_static', '--config', 'android', '--cpu=arm64-v8a', '--define', 'neon=true', '--define', 'openmp=false', '--define', 'opencl=true', '--define', 'quantize=false', '--define', 'rpcmem=true', '--define', 'hexagon=false', '--define', 'hta=false', '--define', 'apu=false', '--config', 'optimization', '--config', 'symbol_hidden', '--per_file_copt=mace/tools/mace_run.cc@-DMODEL_GRAPH_FORMAT_CODE') WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". WARNING: The major revision of the Android NDK referenced by android_ndk_repository rule 'androidndk' is 19. The major revisions supported by Bazel are [10, 11, 12, 13, 14, 15, 16]. Bazel will attempt to treat the NDK as if it was r16. This may cause compilation and linkage problems. Please download a supported NDK version. INFO: Analysed target //mace/tools:mace_run_static (33 packages loaded). INFO: Found 1 target. Target //mace/tools:mace_run_static up-to-date: bazel-bin/mace/tools/mace_run_static INFO: Elapsed time: 17.480s, Critical Path: 2.11s INFO: 1 process: 1 local. INFO: Build completed successfully, 2 total actions ('build', '//mace/tools:mace_run_static', '--config', 'android', '--cpu=arm64-v8a', '--define', 'neon=true', '--define', 'openmp=false', '--define', 'opencl=true', '--define', 'quantize=false', '--define', 'rpcmem=true', '--define', 'hexagon=false', '--define', 'hta=false', '--define', 'apu=false', '--config', 'optimization', '--config', 'symbol_hidden', '--per_file_copt=mace/tools/mace_run.cc@-DMODEL_GRAPH_FORMAT_CODE') Build done! *************************************************** Run model FD on POCOF1 *************************************************** Generate input file: build/test/_tmp/FD/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a/model_input_data Generate input file done. * Run 'FD' with round=1, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,) Push build/test/_tmp/FD/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a/model_input_data to /data/local/tmp/mace_run Push build/test/_tmp/arm64-v8a/mace_run_static to /data/local/tmp/mace_run Push /tmp/cmd_file-FD-1590134041.93 to /data/local/tmp/mace_run/cmd_file-FD-1590134041.93 I mace/tools/mace_run.cc:530] model name: FD I mace/tools/mace_run.cc:531] mace version: v0.13.0-0-g9a06864 I mace/tools/mace_run.cc:532] input node: data I mace/tools/mace_run.cc:533] input shape: 1,3,160,160 I mace/tools/mace_run.cc:534] output node: face_rpn_cls_prob_reshape_stride32,face_rpn_bbox_pred_stride32,face_rpn_landmark_pred_stride32,face_rpn_cls_prob_reshape_stride16,face_rpn_bbox_pred_stride16,face_rpn_landmark_pred_stride16,face_rpn_cls_prob_reshape_stride8,face_rpn_bbox_pred_stride8,face_rpn_landmark_pred_stride8 I mace/tools/mace_run.cc:535] output shape: 1,4,5,5:1,8,5,5:1,20,5,5:1,4,10,10:1,8,10,10:1,20,10,10:1,4,20,20:1,8,20,20:1,20,20,20 I mace/tools/mace_run.cc:536] input_file: /data/local/tmp/mace_run/model_input I mace/tools/mace_run.cc:537] output_file: /data/local/tmp/mace_run/model_out I mace/tools/mace_run.cc:538] input dir: I mace/tools/mace_run.cc:539] output dir: I mace/tools/mace_run.cc:540] model_data_file: I mace/tools/mace_run.cc:541] model_file: I mace/tools/mace_run.cc:542] device: GPU I mace/tools/mace_run.cc:543] round: 1 I mace/tools/mace_run.cc:544] restart_round: 1 I mace/tools/mace_run.cc:545] gpu_perf_hint: 3 I mace/tools/mace_run.cc:546] gpu_priority_hint: 3 I mace/tools/mace_run.cc:547] omp_num_threads: -1 I mace/tools/mace_run.cc:548] cpu_affinity_policy: 1 I mace/tools/mace_run.cc:551] limit_opencl_kernel_time: 0 I mace/tools/mace_run.cc:556] opencl_queue_window_size: 0 I mace/libmace/mace.cc:500] Creating MaceEngine, MACE version: v0.13.0-0-g9a06864 I mace/libmace/mace.cc:539] Initializing MaceEngine I mace/libmace/mace.cc:682] Destroying MaceEngine I mace/tools/mace_run.cc:599] restart round 0 W ./mace/utils/tuner.h:201] Failed to read tuned param file: /data/local/tmp/mace_run/test_tuned_opencl_parameter.POCOF1.sdm845.bin I mace/libmace/mace.cc:500] Creating MaceEngine, MACE version: v0.13.0-0-g9a06864 W mace/core/kv_storage.cc:109] Failed to read kv store file: /data/local/tmp/mace_run/interior//mace_cl_compiled_program.bin W mace/core/runtime/opencl/opencl_runtime.cc:442] Load OpenCL cached compiled kernel file failed. Please make sure the storage directory exist and you have Write&Read permission I mace/libmace/mace.cc:539] Initializing MaceEngine I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_score_reshape_stride16 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_prob_stride16 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_prob_reshape_stride16 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_score_reshape_stride32 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_prob_stride32 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_prob_reshape_stride32 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_score_reshape_stride8 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_prob_stride8 fall back to CPU I mace/core/net_def_adapter.cc:348] Op face_rpn_cls_prob_reshape_stride8 fall back to CPU I mace/tools/mace_run.cc:272] Create Mace Engine latency: 1252.51 ms I mace/tools/mace_run.cc:279] Total init latency: 1252.67 ms I mace/tools/mace_run.cc:373] Warm up run I mace/tools/mace_run.cc:409] 1st warm up run latency: 1648 ms I mace/tools/mace_run.cc:417] Run model I mace/tools/mace_run.cc:479] Average latency: 16.055 ms I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride32 with size 400 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride32 with size 800 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride32 with size 2000 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride16 with size 1600 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride16 with size 3200 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride16 with size 8000 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride8 with size 6400 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride8 with size 12800 done. I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride8 with size 32000 done. ======================================================== capability(CPU) init warmup run_avg ======================================================== time 22.686 1252.672 1647.996 16.055 I mace/libmace/mace.cc:682] Destroying MaceEngine Running finished! Dana service is not available. ************************************************ Run model FR on POCOF1 ************************************************ Generate input file: build/test/_tmp/FR/99db22d0b7ac3c58eb583284e384f174/POCOF1_sdm845/arm64-v8a/model_input_data Generate input file done. * Run 'FR' with round=1, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,) Push build/test/_tmp/FR/99db22d0b7ac3c58eb583284e384f174/POCOF1_sdm845/arm64-v8a/model_input_data to /data/local/tmp/mace_run Push build/test/_tmp/arm64-v8a/mace_run_static to /data/local/tmp/mace_run Push /tmp/cmd_file-FR-1590134062.05 to /data/local/tmp/mace_run/cmd_file-FR-1590134062.05 I mace/tools/mace_run.cc:530] model name: FR I mace/tools/mace_run.cc:531] mace version: v0.13.0-0-g9a06864 I mace/tools/mace_run.cc:532] input node: data I mace/tools/mace_run.cc:533] input shape: 1,3,112,112 I mace/tools/mace_run.cc:534] output node: fc1 I mace/tools/mace_run.cc:535] output shape: 1,1,1,512 I mace/tools/mace_run.cc:536] input_file: /data/local/tmp/mace_run/model_input I mace/tools/mace_run.cc:537] output_file: /data/local/tmp/mace_run/model_out I mace/tools/mace_run.cc:538] input dir: I mace/tools/mace_run.cc:539] output dir: I mace/tools/mace_run.cc:540] model_data_file: I mace/tools/mace_run.cc:541] model_file: I mace/tools/mace_run.cc:542] device: GPU I mace/tools/mace_run.cc:543] round: 1 I mace/tools/mace_run.cc:544] restart_round: 1 I mace/tools/mace_run.cc:545] gpu_perf_hint: 3 I mace/tools/mace_run.cc:546] gpu_priority_hint: 3 I mace/tools/mace_run.cc:547] omp_num_threads: -1 I mace/tools/mace_run.cc:548] cpu_affinity_policy: 1 I mace/tools/mace_run.cc:551] limit_opencl_kernel_time: 1 I mace/tools/mace_run.cc:556] opencl_queue_window_size: 0 I mace/libmace/mace.cc:500] Creating MaceEngine, MACE version: v0.13.0-0-g9a06864 I mace/libmace/mace.cc:539] Initializing MaceEngine I mace/libmace/mace.cc:682] Destroying MaceEngine I mace/tools/mace_run.cc:599] restart round 0 W ./mace/utils/tuner.h:201] Failed to read tuned param file: /data/local/tmp/mace_run/test_tuned_opencl_parameter.POCOF1.sdm845.bin I mace/libmace/mace.cc:500] Creating MaceEngine, MACE version: v0.13.0-0-g9a06864 W mace/core/kv_storage.cc:109] Failed to read kv store file: /data/local/tmp/mace_run/interior//mace_cl_compiled_program.bin W mace/core/runtime/opencl/opencl_runtime.cc:442] Load OpenCL cached compiled kernel file failed. Please make sure the storage directory exist and you have Write&Read permission I mace/libmace/mace.cc:539] Initializing MaceEngine I mace/tools/mace_run.cc:272] Create Mace Engine latency: 2086.91 ms I mace/tools/mace_run.cc:279] Total init latency: 2087.07 ms I mace/tools/mace_run.cc:373] Warm up run I mace/tools/mace_run.cc:409] 1st warm up run latency: 2475.05 ms I mace/tools/mace_run.cc:417] Run model I mace/tools/mace_run.cc:479] Average latency: 47.123 ms I mace/tools/mace_run.cc:494] Write output file /data/local/tmp/mace_run/model_out_fc1 with size 2048 done. ======================================================== capability(CPU) init warmup run_avg ======================================================== time 22.727 2087.065 2475.045 47.123 I mace/libmace/mace.cc:682] Destroying MaceEngine Running finished! Elapse time: 0.692448 minutes. * Package libs for test Start packaging 'test' libs into build/test/libmace_test.tar.gz build/test/model/ build/test/model/arm64-v8a/ build/test/model/arm64-v8a/test.a build/test/model/gpu/ build/test/include/ build/test/include/mace/ build/test/include/mace/public/ build/test/include/mace/public/mace_engine_factory.h build/test/include/mace/public/FD.h build/test/include/mace/public/FR.h Packaging Done! -------------------------------------------------------------------------- Library -------------------------------------------------------------------------- | key | value | ========================================================================== | MACE Model package Path| build/test/libmace_test.tar.gz| -------------------------------------------------------------------------- 

Collaborator
lu229 commented May 22, 2020

@gasgallo Thanks for your test.
You say there are more than one engine coexist in your app, I also think this is the cause of the problem. When you invoke SetStoragePath , have you set a different path for the different engines? If the engines have the same path, it may produce this type of error.

Contributor Author
gasgallo commented May 22, 2020

@lu229 yes, storage path is different for each model.

Collaborator
lu229 commented May 22, 2020

@gasgallo OK, I can not debug this problem, Is it convenient for you to upload your app’s code? In that case I can debug for this problem.

Contributor Author
gasgallo commented May 27, 2020

@lu229 sorry for delay, I was busy debugging and looking for a solution.

The app it’s kind of messy and it’s an overcomplicated example to reproduce this issue. Though thanks to your hint, I’ve further investigated about the openCL binaries that MACE creates during warmup run and realized that something could be wrong with those.

My clue is that binaries compilation fails or is interrupted because multiple models run concurrently (in different threads). I’m currently testing a fix for this and update this topic as soon as I’ll have news.

Collaborator
lu229 commented May 28, 2020

@gasgallo MACE does not support multi-threads invoke, If you need to use MACE in multi-threads, perhaps you need to create a mace engine every thread.

Contributor Author
gasgallo commented May 28, 2020

@lu229 That’s not my case. I’m using multiple threads and I have 1 thread per model and each model has a separate engine, so it should be fine.

What I’ve tried so far is to mutex lock the method that creates the GPU context and runs warmup, so that only one model can access it and other models from other threads have to wait for the mutex to be released before getting initialized. This fix seems to work, but I wanna test it more.

Contributor Author
gasgallo commented May 30, 2020

The issue seems to be solved using the above fix. Thanks @lu229

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *