DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Tired of Messy Code? Master the Art of Writing Clean Codebases
  • Exploring Exciting New Features in Java 17 With Examples
  • Best Practices for Writing Clean and Maintainable Code
  • How We Solved an OOM Issue in TiDB with GOMEMLIMIT

Trending

  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • Segmentation Violation and How Rust Helps Overcome It
  • Zero Trust for AWS NLBs: Why It Matters and How to Do It
  • Why Documentation Matters More Than You Think
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Using GPU in TensorFlow Model

Using GPU in TensorFlow Model

This tutorial explains how to increase our computational workspace by making room for TensorFlow GPU.

By 
Rinu Gour user avatar
Rinu Gour
·
Mar. 11, 19 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
10.3K Views

Join the DZone community and get the full member experience.

Join For Free

In our last TensorFlow tutorial, we studied Embeddings in TensorFlow. Today, we will study how to increase our computational workspace by making room for Tensorflow GPU. Moreover, we will see device placement logging and manual device placement in TensorFlow GPU and will discuss optimizing GPU memory. We will also cover single GPU in multiple GPU systems and use multiple GPU in TensorFlow.

Let's begin!

Image title


GPU in TensorFlow

Your usual system may comprise of multiple devices for computation, and as you already know, TensorFlow supports both CPU and GPU, which we represent as strings. For example:

  • If you have a CPU, it might be addressed as “/cpu:0”.
  • TensorFlow GPU strings have an index starting from zero. Therefore, to specify the first GPU, you should write “/device:GPU:0”.
  • Similarly, the second GPU is “/device:GPU:1”.

By default, if your system has both a CPU and a GPU, you give priority to the GPU in TensorFlow.

Device Placement Logging

You can find out which devices handle particular operations by creating a session where the log_device_placementconfiguration option is preset.

# Graph creation.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(c))

The output of TensorFlow GPU device placement logging is shown below:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22. 28.]
[ 49. 64.]]

Manual Device Placement

At times, you may want to decide on which device your operation should be running, and you can do this by creating a context with tf.device, wherein you assign the specific device, i.e., CPU or a GPU that should do the computation, as shown below.

# Graph Creation.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(c))

The above code of TensorFlow GPU assigns the constants a and b to cpu:0. In the second part of the code, since there is no explicit declaration of which device is to perform the task, a GPU by default is chosen if available, and it copies the multi-dimensional arrays between devices if required.

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

Optimizing TensorFlow GPU Memory

Memory fragmentation is done to optimize memory resources by mapping almost all of the TensorFlow GPU's memory that is visible to the processor, thus saving a lot of potential resources.
TensorFlow GPU offers two configuration options to control the allocation of a subset of memory if and when required by the processor to save memory, and these TensorFlow GPU optimizations are described below:

allow_growth, which allocates a limited amount of GPU memory in TensorFlow according to the runtime, is dynamic in the sense that it initially allocates little memory and keeps widening it according to the running sessions, thus extending the GPU memory required by the process. The memory isn’t released, as it will lead to fragmentation, which is not desired. ConfigProto is used for this purpose:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

per_process_gpu_memory_fraction is the second choice, and it decides that the segment of the total memory should be allocated for each GPU in use. The example below will tell TensorFlow to allocate 40 percent of the memory:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

It will be used only in cases where you already know the specifics of the computation and are sure that they will not change during the course of processing.

Single GPU in Multi-GPU System

In multi-TensorFlow GPU systems, the device with the lowest identity is selected by default. It is, again, up to the user to decide the specific GPU if the default user does not need one:

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

The InvalidArgumentError is obtained when the TensorFlow GPU specified by the user does not exist, as shown below:

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/device:GPU:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/device:GPU:2"]()]]

If you want to specify the default device in such cases when there is no existing or supported device found by TensoFflow, you could use allow_soft_placement and set it in the configuration option when the session is created, as illustrated by the code below.

with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Running the operation.
print(sess.run(c))

Using Multiple GPU in TensorFlow

You are already aware of the towers in TensorFlow and each tower we can assign to a GPU, making a multi-tower structural model for working with TensorFlow multiple GPUs. Let’s see an example:

c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operations.
print(sess.run(sum))

The output of TensorFlow GPU is as follows:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98. 128.]]

You can test this multiple GPU model with a simple dataset such as CIFAR10 to experiment and understand working with GPUs.

Conclusion

In this tutorial, we saw TensorFlow GPUs for graphical computations and that define as an array of parallel processors working together to perform high-level computations that are in contrast to CPUs. This tutorial also briefed you on how to initialize GPUs and change the default configurations to suit your needs and optimize your computation. Moreover, we saw how to import GPU and TensorFlow GPU install. If you have any questions or thoughts, feel free to comment below.

TensorFlow Memory (storage engine) code style Making Room Session (web analytics) Strings Subset Testing

Published at DZone with permission of Rinu Gour. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Tired of Messy Code? Master the Art of Writing Clean Codebases
  • Exploring Exciting New Features in Java 17 With Examples
  • Best Practices for Writing Clean and Maintainable Code
  • How We Solved an OOM Issue in TiDB with GOMEMLIMIT

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!