Tensorflowのベンチマークを試してみる その2

新しくノートPCを購入したので、再度TensorFlowベンチマークを試してみようと思います。

環境

OS:Windows10

CPU: Core i7 9750H

GPU: Geforce GTX1660 Ti

こちらのページを参考にTensorflowをインストールしました。

CUDAのバージョンは10.1ではなく10.0を使用しています。

https://qiita.com/milkchocolate/items/cdedd61a64862a65b84a

 

Alexnet

python tf_cnn_benchmarks.py --batch_size=512 --model=alexnet --data_format=NHWC

2019-08-15 18:18:48.333229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-08-15 18:18:48.338750: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-15 18:18:48.344278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-15 18:18:48.349214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-15 18:18:48.353082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-08-15 18:18:48.357386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-08-15 18:18:48.360002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4637 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
I0815 18:18:48.579124 18148 session_manager.py:500] Running local_init_op.
I0815 18:18:48.595082 18148 session_manager.py:502] Done running local_init_op.
Running warm up
2019-08-15 18:18:54.839782: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.19GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-08-15 18:18:54.847488: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.19GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Done warm up
Step Img/sec total_loss
1 images/sec: 1475.1 +/- 0.0 (jitter = 0.0) 7.397
10 images/sec: 1471.6 +/- 0.7 (jitter = 0.2) 7.396
20 images/sec: 1470.3 +/- 0.6 (jitter = 0.2) 7.397
30 images/sec: 1469.7 +/- 0.5 (jitter = 3.0) 7.397
40 images/sec: 1469.0 +/- 0.5 (jitter = 4.0) 7.397
50 images/sec: 1468.9 +/- 0.4 (jitter = 3.3) 7.397
60 images/sec: 1468.8 +/- 0.4 (jitter = 4.2) 7.397
70 images/sec: 1468.6 +/- 0.3 (jitter = 2.7) 7.397
80 images/sec: 1468.3 +/- 0.3 (jitter = 2.7) 7.397
90 images/sec: 1468.0 +/- 0.3 (jitter = 2.7) 7.397
100 images/sec: 1467.8 +/- 0.3 (jitter = 1.7) 7.397
----------------------------------------------------------------
total images/sec: 1467.68
----------------------------------------------------------------

前回計測した時の30倍ほどの速度です

 

ResNet50

ResNet50で試します。

python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 --data_format=NHWC

 

BatchSize 32 : total images/sec: 116.25

BatchSize 64 : Error

 

VGG16

VGG16で試します。

python tf_cnn_benchmarks.py --batch_size=32 --model=vgg16 --data_format=NHWC

 

BatchSize 32 : total images/sec: 71.01

BatchSize 64 : total images/sec: 46.59

BatchSize 128 : Error

 

ResNet152

ResNet152で試します。

python tf_cnn_benchmarks.py --batch_size=16 --model=resnet152 --data_format=NHWC

 

BatchSize 16 : total images/sec: 43.57

BatchSize 24 : total images/sec: 44.48

BatchSize 32 : Error

 

 

Inception4

Inception4で試します。

 

python tf_cnn_benchmarks.py --batch_size=16 --model=inception4 --data_format=NHWC

 

BatchSize 16 : total images/sec: 34.55

BatchSize 24 : total images/sec: 36.14

BatchSize 32 : Error

 

以上です。