keras tutorial
model = Sequential() model.add(Dense(512, activation='elu', input_shape=(784,))) selu Applies Scaled exponential linear unit. from keras.models import Sequential from keras.layers import Activation data can be changed using below code: x_train_scaled = preprocessing.scale(x_train) scaler = preprocessing.StandardScaler().fit(x_train) x_test_scaled = scaler.transform(x_test) Here, we have normalized Step 6: Train the model Let us train the model using fit() method. history = model.fit(x_train_scaled, y_train, batch_size=128, epochs=500,0 码力 | 98 页 | 1.57 MB | 1 年前3《Efficient Deep Learning Book》[EDL] Chapter 1 - Introduction
the presence of sufficient labeled data. With deep learning models, the performance of the model scaled well with the number of labeled examples, since the network had a large number of parameters. Thus0 码力 | 21 页 | 3.17 MB | 1 年前3《Efficient Deep Learning Book》[EDL] Chapter 7 - Automation
to the child, we would need to update the controller as well. Second, the controller needs to be scaled with the child networks. For a large child network, a large controller is required which would invariably0 码力 | 33 页 | 2.48 MB | 1 年前3《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectures
compute the query, key and value matrices for input sequences. Then, a softmax is applied to the scaled dot product of query and key matrices to obtain a score matrix (figure 4-16). Finally, the values0 码力 | 53 页 | 3.92 MB | 1 年前3动手学深度学习 v2.0
的所有元素都是独立的随机变量,并且都满足零均值和单位方差,那么两个向量的点积的均值为0,方差为d。 为确保无论向量长度如何,点积的方差在不考虑向量长度的情况下仍然是1,我们再将点积除以 √ d,则缩放 点积注意力(scaled dot‐product attention)评分函数为: a(q, k) = q⊤k/ √ d. (10.3.4) 在实践中,我们通常从小批量的角度来考虑提高效率,例如基于n个查询和m个键-值对计算注意力,其中查0 码力 | 797 页 | 29.45 MB | 1 年前3
共 5 条
- 1