数値エラーのデバッグ

onnx-mlirでコンパイルされた推論実行ファイルが、トレーニングフレームワークによって生成されたものと一致しない数値結果を生成する場合、utils/RunONNXModel.py Pythonスクリプトを使用して数値エラーをデバッグします。このPythonスクリプトは、onnx-mlirと参照バックエンドを介してモデルを実行し、これら2つのバックエンドによってレイヤーごとに生成された中間結果を比較します。

前提条件

ONNX_MLIR_HOME環境変数を、onnx-mlirのHOMEディレクトリへのパスに設定します。 onnx-mlirのHOMEディレクトリは、ONNX-MLIR実行可能ファイルとライブラリが見つかるbin、libなどのサブフォルダを含む親フォルダを指します。

参照バックエンド

onnx-mlirによる出力は、参照ONNXバックエンドまたはprotobufの参照入力と出力を使用して検証できます。

参照バックエンドを使用して検証するには、pip install onnxruntimeを実行してonnxruntimeをインストールします。別のテストバックエンドを使用するには、onnxruntimeをインポートするコードを他のONNX準拠のバックエンドに置き換えるだけです。
参照出力を使用して検証するには、--verify=ref --load-ref=data_folderを使用します。ここで、data_folderは、入力と出力のprotobufファイルを含むフォルダへのパスです。このガイドラインは、numpy配列からprotobufファイルを作成する方法です。

使用方法

utils/RunONNXModel.pyは、次のコマンドラインオプションをサポートしています。

$ python ../utils/RunONNXModel.py  --help
usage: RunONNXModel.py [-h] [--log-to-file [LOG_TO_FILE]] [--model MODEL] [--compile-args COMPILE_ARGS] [--compile-only] [--compile-using-input-shape] [--print-input]
                       [--print-output] [--save-onnx PATH] [--verify {onnxruntime,ref}] [--verify-all-ops] [--verify-with-softmax] [--verify-every-value] [--rtol RTOL]
                       [--atol ATOL] [--save-so PATH | --load-so PATH] [--save-ref PATH] [--load-ref PATH | --shape-info SHAPE_INFO] [--lower-bound LOWER_BOUND]
                       [--upper-bound UPPER_BOUND]

optional arguments:
  -h, --help                  show this help message and exit
  --log-to-file [LOG_TO_FILE] Output compilation messages to file, default compilation.log
  --model MODEL               Path to an ONNX model (.onnx or .mlir)
  --compile-args COMPILE_ARGS Arguments passed directly to onnx-mlir command. See bin/onnx-mlir --help
  --compile-only              Only compile the input model
  --compile-using-input-shape Compile the model by using the shape info getting from the inputs in the reference folder set by --load-ref
  --print-input               Print out inputs
  --print-output              Print out inference outputs produced by onnx-mlir
  --save-onnx PATH            File path to save the onnx model. Only effective if --verify=onnxruntime
  --verify {onnxruntime,ref}  Verify the output by using onnxruntime or reference inputs/outputs. By default, no verification. When being enabled, --verify-with-softmax or --verify-every-value must be used to specify verification mode.
  --verify-all-ops            Verify all operation outputs when using onnxruntime
  --verify-with-softmax       Verify the result obtained by applying softmax to the output
  --verify-every-value        Verify every value of the output using atol and rtol
  --rtol RTOL                 Relative tolerance for verification
  --atol ATOL                 Absolute tolerance for verification
  --save-so PATH              File path to save the generated shared library of the model
  --load-so PATH              File path to load a generated shared library for inference, and the ONNX model will not be re-compiled
  --save-ref PATH             Path to a folder to save the inputs and outputs in protobuf
  --load-ref PATH             Path to a folder containing reference inputs and outputs stored in protobuf. If --verify=ref, inputs and outputs are reference data for verification
  --shape-info SHAPE_INFO     Shape for each dynamic input of the model, e.g. 0:1x10x20,1:7x5x3. Used to generate random inputs for the model if --load-ref is not set
  --lower-bound LOWER_BOUND   Lower bound values for each data type. Used inputs. E.g. --lower-bound=int64:-10,float32:-0.2,uint8:1. Supported types are bool, uint8, int8, uint16, int16, uint32, int32, uint64, int64,float16, float32, float64
  --upper-bound UPPER_BOUND   Upper bound values for each data type. Used to generate random inputs. E.g. --upper-bound=int64:10,float32:0.2,uint8:9. Supported types are bool, uint8, int8, uint16, int16, uint32, int32, uint64, int64, float16, float32, float64

2つの異なるコンパイルオプションでモデルを比較するためのヘルパースクリプト。

上記のutils/runONNXModel.pyに基づいて、utils/checkONNXModel.pyを使用すると、ユーザーは2つの異なるコンパイルオプションで指定されたモデルを2回実行し、その結果を比較できます。これにより、ユーザーはコンパイラの安全なバージョン（例：-O0または-O3）をより高度なバージョン（例：-O3または-O3 -march=x86-64）と比較して、新しいオプションを簡単にテストできます。--ref-compile-argsおよび--test-compile-argsフラグを使用してコンパイルオプションを指定し、--modelフラグを使用してモデルを指定し、動的形状入力がある場合は--shape-infoを指定します。すべてのオプションは、--helpフラグの下にリストされています。

演算子用に生成されたコードのデバッグ

特定のONNX MLIR演算子が誤った結果を生成すると知っている、または疑っている場合、問題を絞り込むために、テンソルの値、またはプリミティブデータ型を持つ値を（実行時に）出力できる、いくつかの便利なKrnl演算子を提供します。

特定のプログラムポイントでテンソルの値を出力するには、次のコードを挿入します（Xは出力されるテンソルです）。

create.krnl.printTensor("Tensor X: ", X);

注：現在、テンソルの内容は、テンソルのランクが4未満の場合にのみ出力されます。

メッセージの後に1つの値を出力するには、次のコードを挿入します（valは出力される値、valTypeはその型です）。

create.krnl.printf("inputElem: ", val, valType);

メモリエラーの発見

onnx-mlirでコンパイルされた推論実行可能ファイルにメモリ割り当て関連の問題があると知っている、または疑っている場合、valgrind フレームワークまたはmtrace メモリツールを使用してデバッグを容易にすることができます。これらのツールは、メモリ割り当て/解放関連のAPIを追跡し、メモリリークなどのメモリの問題を検出できます。

ただし、メモリアクセス、特にバッファオーバーランの問題に関連する問題は、実行時エラーが問題を含むポイントの外側で発生するため、デバッグが非常に困難です。「Electric Fence ライブラリ」を使用して、これらの問題をデバッグできます。 malloc（）メモリ割り当ての境界を超えるソフトウェアと、free（）によって解放されたメモリ割り当てに触れるソフトウェアという2つの一般的なプログラミング問題を検出するのに役立ちます。他のメモリデバッガとは異なり、Electric Fenceは読み取りアクセスと書き込みを検出し、エラーの原因となった正確な命令を特定します。

Electric FenceライブラリはRedHatで公式にサポートされていないため、ソースコードを自分でダウンロード、ビルド、インストールする必要があります。インストール後、推論実行可能ファイルを生成するときに「-lefence」オプションを使用してこのライブラリをリンクします。次に、それを実行するだけで、ランタイムエラーが発生し、メモリアクセス問題の原因となっている場所で停止します。デバッガまたは前のセクションで説明したデバッグ出力関数を使用して、場所を特定できます。

onnx-mlir

使用方法

リファレンス

開発

ツール

ツール

数値エラーのデバッグ

前提条件

参照バックエンド

使用方法

2つの異なるコンパイルオプションでモデルを比較するためのヘルパースクリプト。

演算子用に生成されたコードのデバッグ

メモリエラーの発見

数値エラーのデバッグ

前提条件

参照バックエンド

使用方法

2つの異なるコンパイルオプションでモデルを比較するためのヘルパースクリプト。

演算子用に生成されたコードのデバッグ

メモリ エラーの発見

メモリエラーの発見