Keras trains networks with multiple graphics cards simultaneously
References
Official documentation: multi_gpu_model (https://keras.io/utils/#multi_gpu_model) and Google.
inconsistency
Currently Keras is supporting multiple GPUs to train the network simultaneously, which is very easy, but it won't work with this code below.
os.environ["CUDA_VISIBLE_DEVICES"] = "1,2"
When you monitor the GPU usage (nvidia-smi -l 1) you will see that even though the GPU is not idle, essentially only one GPU is running and the others are idle occupied, meaning that if you have multiple graphics cards inside your computer, Keras will default to occupying all the GPUs it can detect, with or without the above code.
This line of code is used when you only need one GPU, which means that you can make Keras detect no other GPUs in your computer. Suppose you have a total of three graphics cards, each with its own label (0, 1, 2), and in order not to interfere with others, you only use one of them, for example with the one with gpu=1, then
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
And then monitor.GPU Use of the(nvidia-smi -l 1), It's true that only one is occupied, Everything else is idle。 So it's aKeras with multiple graphics cardsinconsistency, It does not utilize multipleGPU。
purpose
Why train with multiple GPUs at the same time?
Too little memory for a single graphics card -> batch size It can't be set bigger, sometimes evenbatch_size=1 Both memory overflows(OUT OF MEMORY)
From my experience running deep networks,batch_size It would be better to have it bigger., Equivalent to updating the weights with each backpropagation, Network are available to see more samples, thus not every timeiteration All over-fitted to different placesDon't Decay the Learning Rate, Increase the Batch Size。 definitely, I've also read papers that say you can't set it too big either, The reason is unknown.... I never got a chance to try it anyway。 What I suggest.batch_size That's about it.64~256 within the limits of, It's no big deal. issues。
But as the depth of the network is now getting deeper and deeper, the memory requirements for the GPU are also getting bigger and bigger, and the biggest problem for many newcomers who get started is often not the code, but the code copied down from Github inside their own GPU is too scummy to achieve, and they can only reduce the batch_size and end up training without that effect.
The solution is twofold: either buy a super-awesome GPU with immense memory, or buy multiple mediocre GPUs and use them together.
The first option doesn't work because even the best NVIDIA graphics cards currently have a dozen or so gigabytes of memory terrific, the network hangs as soon as it gets deep, and it's not cost-effective to buy a bullish graphics card. So, learning to use multiple GPUs under Keras is a more reliable option.
realize
Very simple.
from model import unet G = 3 # simultaneous use3 sizeGPUwith tf.device("/gpu:0"):
M = unet(input_rows, input_cols, 1) model = keras.utils.training_utils.multi_gpu_model(M, gpus=G) model.compile(optimizer=Adam(lr=1e-5), loss='binary_crossentropy', metrics = ['accuracy']) model.fit(X_train, y_train, batch_size=batch_size*G, epochs=nb_epoch, verbose=0, shuffle=True, validation_data=(X_valid, y_valid)) model.save_weights('/path/to/save/model.h5')
issues
3.1 Compile the model
In the case of a normal network structure, Then no. issues, Just compile the code like above(model.compile(optimizer=Adam(lr=1e-5), loss='binary_crossentropy', metrics = ['accuracy'])) 。 merely, If it isMulti-task network, for exampleFaster-RCNN, It consists of multiple output branches, That is, multipleloss, The naming is generally given during the network definition, Then compile to find the different brancheslayer The name of the company can be, It's like this.:
model.compile(optimizer=optimizer, loss={'main_output': jaccard_distance_loss, 'aux_output': 'binary_crossentropy'}, metrics={'main_output': jaccard_distance_loss, 'aux_output': 'acc'},
loss_weights={'main_output': 1., 'aux_output': 0.5})
where main_output and aux_output are the layer names that are considered defined, but if you use keras.utils.training_utils.multi_gpu_model(), the names are automatically changed to the default concatenate_1, concatenate_2, etc. So you need to first model.summary() a bit, print out the network structure, then figure out which output represents which branch, and then recompile the network as follows.
from keras.optimizers import Adam, RMSprop, SGD model.compile(optimizer=RMSprop(lr=0.045, rho=0.9, epsilon=1.0), loss={'concatenate_1': jaccard_distance_loss, 'concatenate_2': 'binary_crossentropy'}, metrics={'concatenate_1': jaccard_distance_loss, 'concatenate_2': 'acc'},
loss_weights={'concatenate_1': 1., 'concatenate_2': 0.5})
3.2 save the model
using multipleGPU The trained model has a issuesKeras unsolved, evenmodel.save() Error reported when saving
TypeError: can't pickle module objects
either one or the other
RuntimeError: Unable to create attribute (object header message is too large)
The reason is:
In https://keras.io/utils/#multi_gpu_model it clearly stated that the model can be used like the normal model, but it cannot be saved, very funny. I can't even perform reinforced training just because I cannot save the previous model trained with multiple GPUs. If trained with single GPU, the rest of my invested GPUs will become useless. Please urge the developer to look into this bug ASAP.
Normally Keras gives you the function to automatically save the best network (keras.callbacks.ModelCheckpoint()), which is internally saved with model.save(), so it doesn't work anymore, you need to design your own function CustomModelCheckpoint() to save the best model.
class CustomModelCheckpoint(keras.callbacks.Callback): def __init__(self, model, path):
self.model = model
self.path = path
self.best_loss = np.inf
def on_epoch_end(self, epoch, logs=None): val_loss = logs['val_loss']
if val_loss < self.best_loss: print(" Validation loss decreased from {} to {}, saving model".format(self.best_loss, val_loss))
self.model.save_weights(self.path, overwrite=True)
self.best_loss = val_loss model.fit(X_train, y_train, batch_size=batch_size*G, epochs=nb_epoch, verbose=0, shuffle=True, validation_data=(X_valid, y_valid), callbacks=[CustomModelCheckpoint(model, '/path/to/save/model.h5')])
Even so, if the model is still too large, you need the following method, save it in npy format instead of hdf5 format.
RuntimeError: Unable to create attribute (Object header message is too large)
# save model
weight = self.model.get_weights() np.save(self.path+'.npy', weight)
# load model
weight = np.load(load_path) model.set_weights(weight)
3.3 Load the model
By the same token, when reading in a network file.h that was trained together with multiple graphics cards, it also reports an error
ValueError: You are trying to load a weight file containing 3 layers into a model with 1 layers.
The reason for this is that the .h internals are not quite the same as the individual GPU training storage, so you also need to set the function keras.utils.training_utils.multi_gpu_model() when reading it.
from model import unetwith tf.device("/cpu:0"): M = unet(input_rows, input_cols, 1) model = keras.utils.training_utils.multi_gpu_model(M, gpus=G) model.load_weights(load_path)
And then it didn't. issues (onomat.)。