2008-01-04

行列N×Nの90度左回転における最適化





ん~、最適化の実験は、前回同様いつも思うような結果にはなってくれないな。

最近のCPUはほとんど中身がカオスで、実際に何がどう動作しているのか、
本職の人にもわからないらしいし、これはもう実験しまくるしかないないわww


しかし、初回の課題でドライバ作ってて良かった。。
ドライバ作成は一瞬で終了。頑張って作っててよかったわぁ。



最適化に関しては、とりあえずN×Nを分割して各々を90度回転させて、組み立てる関数は完成。

8×8~64×64あたりに分割すると、うまくどっかのキャッシュに乗るっぽい。

最適化のために他にやってみることは、

・変数をローカルフィールドに置く
・関数を解体して、直接書き込む
・gcc特有のオプションを使ってみる

とかかなぁ。とりあえず、このぐらいは最低限実験してみるか。

アセンブリコードは…まぁやってみたいけど、プログラミング言語論の課題が終わってからだな。


結局、冬休みはあってないようなものだったな。


以下メモ@Gcc optimization option (refer to "man gcc" in cygwin)

Options That Control Optimization

These options control various sorts of optimizations.


Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results.
*compilation=編集

Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable or change the program counter to any other statement in the function and get exactly the results you would expect from the source code.
*assign=割り当てる、指定する

Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.
*expense=コスト(高)

The compiler performs optimization based on the knowledge it has of the program.

Using the -funit-at-a-time flag will allow the compiler to consider information gained from later functions in the file when compiling a function.

Compiling multiple files at once to a single output file (and using -funit-at-a-time) will allow the compiler to use information gained from all of the files when compiling each of them.
*multiple=複数の

Not all optimizations are controlled directly by a flag. Only optimizations that have a flag are listed.

-O
-O1 Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function.

With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.

-O turns on the following optimization flags: -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability
-fcprop-registers

-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.


-O2 Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff.
*tradeoff=見返り、交換(条件)

The compiler does not perform loop unrolling or function inlining when you specify -O2.
*in-line=一列に並んだ、直列(形)の , specify=指定する、明確にする

As compared to -O, this option increases both compilation time and the performance of the generated code.

-O2 turns on all optimization flags specified by -O. It also turns on the following optimization flags: -fforce-mem -foptimize-sibling-calls -fstrength-reduce -fcse-follow-jumps -fcse-skip-blocks -frerun-cse-after-loop -frerun-loop-opt -fgcse -fgcse-lm -fgcse-sm -fgcse-las -fdelete-null-pointer-checks -fexpensive-optimizations -fregmove -fschedule-insns -fschedule-insns2
-fsched-interblock -fsched-spec -fcaller-saves -fpeephole2 -freorder-blocks -freorder-functions -fstrict-aliasing -funit-at-a-time -falign-functions -falign-jumps -falign-loops -fsched-interblock -fsched-spec -fcaller-saves -fpeephole2 -freorder-blocks -freorder-functions -fstrict-aliasing
-funit-at-a-time -falign-functions -falign-jumps -falign-loops -falign-labels -fcrossjumping

Please note the warning under -fgcse about invoking -O2 on programs that use computed gotos.

-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -fweb and -frename-registers options.
*specified=指定の、規定の、特定の 、仕様が~の

-O0 Do not optimize. This is the default.

-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.

-Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -fprefetch-loop-arrays

If you use multiple -O options, with or without level numbers, the
last such option is the one that is effective.

0 件のコメント: