+ 5) mod 8 ^ b(i + 6) mod 8 ^ b(i + 7) mod 8 ^ ci for 0 <= i < 8, where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the value {63} or {01100011}. It is possible to traverse every possible value in a Galois field using what is referred to as a 'generator'. There are many generators (128 out of 256): 3,5,6,9,11,82 to name a few. To fully traverse GF we iterate 255 times, multiplying by our generator each time. On each iteration we can determine the multiplicative inverse for the current element. Suppose there is an element in GF 'e'. For a given generator 'g', e = g^x. The multiplicative inverse of e is g^(255 - x). It turns out that if use the inverse of a generator as another generator it will produce all of the corresponding multiplicative inverses at the same time. For this reason, we choose 5 as our inverse generator because it only requires 2 multiplies and 1 add and its inverse, 82, requires relatively few operations as well. In order to apply the affine transformation, the multiplicative inverse 'ei' of 'e' can be repeatedly XOR'd (4 times) with a bit-cycling of 'ei'. To do this 'ei' is first stored in 's' and 'x'. Then 's' is left shifted and the high bit of 's' is made the low bit. The resulting value is stored in 's'. Then 'x' is XOR'd with 's' and stored in 'x'. On each subsequent iteration the same operation is performed. When 4 iterations are complete, 'x' is XOR'd with 'c' (0x63) and the transformed value is stored in 'x'. For example: s = 01000001 x = 01000001 iteration 1: s = 10000010, x ^= s iteration 2: s = 00000101, x ^= s iteration 3: s = 00001010, x ^= s iteration 4: s = 00010100, x ^= s x ^= 0x63 This can be done with a loop where s = (s << 1) | (s >> 7). However, it can also be done by using a single 16-bit (in this case 32-bit) number 'sx'. Since XOR is an associative operation, we can set 'sx' to 'ei' and then XOR it with 'sx' left-shifted 1,2,3, and 4 times. The most significant bits will flow into the high 8 bit positions and be correctly XOR'd with one another. All that remains will be to cycle the high 8 bits by XOR'ing them all with the lower 8 bits afterwards. At the same time we're populating sbox and isbox we can precompute the multiplication we'll need to do to do MixColumns() later. */ // apply affine transformation sx = ei ^ (ei << 1) ^ (ei << 2) ^ (ei << 3) ^ (ei << 4); sx = (sx >> 8) ^ (sx & 255) ^ 0x63; // update tables sbox[e] = sx; isbox[sx] = e; /* Mixing columns is done using matrix multiplication. The columns that are to be mixed are each a single word in the current state. The state has Nb columns (4 columns). Therefore each column is a 4 byte word. So to mix the columns in a single column 'c' where its rows are r0, r1, r2, and r3, we use the following matrix multiplication: [2 3 1 1]*[r0,c]=[r'0,c] [1 2 3 1] [r1,c] [r'1,c] [1 1 2 3] [r2,c] [r'2,c] [3 1 1 2] [r3,c] [r'3,c] r0, r1, r2, and r3 are each 1 byte of one of the words in the state (a column). To do matrix multiplication for each mixed column c' we multiply the corresponding row from the left matrix with the corresponding column from the right matrix. In total, we get 4 equations: r0,c' = 2*r0,c + 3*r1,c + 1*r2,c + 1*r3,c r1,c' = 1*r0,c + 2*r1,c + 3*r2,c + 1*r3,c r2,c' = 1*r0,c + 1*r1,c + 2*r2,c + 3*r3,c r3,c' = 3*r0,c + 1*r1,c + 1*r2,c + 2*r3,c As usual, the multiplication is as previously defined and the addition is XOR. In order to optimize mixing columns we can store the multiplication results in tables. If you think of the whole column as a word (it might help to visualize by mentally rotating the equations above by counterclockwise 90 degrees) then you can see that it would be useful to map the multiplications performed on each byte (r0, r1, r2, r3) onto a word as well. For instance, we could map 2*r0,1*r0,1*r0,3*r0 onto a word by storing 2*r0 in the highest 8 bits and 3*r0 in the lowest 8 bits (with the other two respectively in the middle). This means that a table can be constructed that uses r0 as an index to the word. We can do the same with r1, r2, and r3, creating a total of 4 tables. To construct a full c', we can just look up each byte of c in their respective tables and XOR the results together. Also, to build each table we only have to calculate the word for 2,1,1,3 for every byte ... which we can do on each iteration of this loop since we will iterate over every byte. After we have calculated 2,1,1,3 we can get the results for the other tables by cycling the byte at the end to the beginning. For instance we can take the result of table 2,1,1,3 and produce table 3,2,1,1 by moving the right most byte to the left most position just like how you can imagine the 3 moved out of 2,1,1,3 and to the front to produce 3,2,1,1. There is another optimization in that the same multiples of the current element we need in order to advance our generator to the next iteration can be reused in performing the 2,1,1,3 calculation. We also calculate the inverse mix column tables, with e,9,d,b being the inverse of 2,1,1,3. When we're done, and we need to actually mix columns, the first byte of each state word should be put through mix[0] (2,1,1,3), the second through mix[1] (3,2,1,1) and so forth. Then they should be XOR'd together to produce the fully mixed column. */ // calculate mix and imix table values sx2 = xtime[sx]; e2 = xtime[e]; e4 = xtime[e2]; e8 = xtime[e4]; me = (sx2 << 24) ^ // 2 (sx << 16) ^ // 1 (sx << 8) ^ // 1 (sx ^ sx2); // 3 ime = (e2 ^ e4 ^ e8) << 24 ^ // E (14) (e ^ e8) << 16 ^ // 9 (e ^ e4 ^ e8) << 8 ^ // D (13) (e ^ e2 ^ e8); // B (11) // produce each of the mix tables by rotating the 2,1,1,3 value for(var n = 0; n < 4; ++n) { mix[n][e] = me; imix[n][sx] = ime; // cycle the right most byte to the left most position // ie: 2,1,1,3 becomes 3,2,1,1 me = me << 24 | me >>> 8; ime = ime << 24 | ime >>> 8; } // get next element and inverse if(e === 0) { // 1 is the inverse of 1 e = ei = 1; } else { // e = 2e + 2*2*2*(10e)) = multiply e by 82 (chosen generator) // ei = ei + 2*2*ei = multiply ei by 5 (inverse generator) e = e2 ^ xtime[xtime[xtime[e2 ^ e8]]]; ei ^= xtime[xtime[ei]]; } } } /** * Generates a key schedule using the AES key expansion algorithm. * * The AES algorithm takes the Cipher Key, K, and performs a Key Expansion * routine to generate a key schedule. The Key Expansion generates a total * of Nb*(Nr + 1) words: the algorithm requires an initial set of Nb words, * and each of the Nr rounds requires Nb words of key data. The resulting * key schedule consists of a linear array of 4-byte words, denoted [wi ], * with i in the range 0 <= i < Nb(Nr + 1). * * KeyExpansion(byte key[4*Nk], word w[Nb*(Nr+1)], Nk) * AES-128 (Nb=4, Nk=4, Nr=10) * AES-192 (Nb=4, Nk=6, Nr=12) * AES-256 (Nb=4, Nk=8, Nr=14) * Note: Nr=Nk+6. * * Nb is the number of columns (32-bit words) comprising the State (or * number of bytes in a block). For AES, Nb=4. * * @param key the key to schedule (as an array of 32-bit words). * @param decrypt true to modify the key schedule to decrypt, false not to. * * @return the generated key schedule. */ function _expandKey(key, decrypt) { // copy the key's words to initialize the key schedule var w = key.slice(0); /* RotWord() will rotate a word, moving the first byte to the last byte's position (shifting the other bytes left). We will be getting the value of Rcon at i / Nk. 'i' will iterate from Nk to (Nb * Nr+1). Nk = 4 (4 byte key), Nb = 4 (4 words in a block), Nr = Nk + 6 (10). Therefore 'i' will iterate from 4 to 44 (exclusive). Each time we iterate 4 times, i / Nk will increase by 1. We use a counter iNk to keep track of this. */ // go through the rounds expanding the key var temp, iNk = 1; var Nk = w.length; var Nr1 = Nk + 6 + 1; var end = Nb * Nr1; for(var i = Nk; i < end; ++i) { temp = w[i - 1]; if(i % Nk === 0) { // temp = SubWord(RotWord(temp)) ^ Rcon[i / Nk] temp = sbox[temp >>> 16 & 255] << 24 ^ sbox[temp >>> 8 & 255] << 16 ^ sbox[temp & 255] << 8 ^ sbox[temp >>> 24] ^ (rcon[iNk] << 24); iNk++; } else if(Nk > 6 && (i % Nk === 4)) { // temp = SubWord(temp) temp = sbox[temp >>> 24] << 24 ^ sbox[temp >>> 16 & 255] << 16 ^ sbox[temp >>> 8 & 255] << 8 ^ sbox[temp & 255]; } w[i] = w[i - Nk] ^ temp; } /* When we are updating a cipher block we always use the code path for encryption whether we are decrypting or not (to shorten code and simplify the generation of look up tables). However, because there are differences in the decryption algorithm, other than just swapping in different look up tables, we must transform our key schedule to account for these changes: 1. The decryption algorithm gets its key rounds in reverse order. 2. The decryption algorithm adds the round key before mixing columns instead of afterwards. We don't need to modify our key schedule to handle the first case, we can just traverse the key schedule in reverse order when decrypting. The second case requires a little work. The tables we built for performing rounds will take an input and then perform SubBytes() and MixColumns() or, for the decrypt version, InvSubBytes() and InvMixColumns(). But the decrypt algorithm requires us to AddRoundKey() before InvMixColumns(). This means we'll need to apply some transformations to the round key to inverse-mix its columns so they'll be correct for moving AddRoundKey() to after the state has had its columns inverse-mixed. To inverse-mix the columns of the state when we're decrypting we use a lookup table that will apply InvSubBytes() and InvMixColumns() at the same time. However, the round key's bytes are not inverse-substituted in the decryption algorithm. To get around this problem, we can first substitute the bytes in the round key so that when we apply the transformation via the InvSubBytes()+InvMixColumns() table, it will undo our substitution leaving us with the original value that we want -- and then inverse-mix that value. This change will correctly alter our key schedule so that we can XOR each round key with our already transformed decryption state. This allows us to use the same code path as the encryption algorithm. We make one more change to the decryption key. Since the decryption algorithm runs in reverse from the encryption algorithm, we reverse the order of the round keys to avoid having to iterate over the key schedule backwards when running the encryption algorithm later in decryption mode. In addition to reversing the order of the round keys, we also swap each round key's 2nd and 4th rows. See the comments section where rounds are performed for more details about why this is done. These changes are done inline with the other substitution described above. */ if(decrypt) { var tmp; var m0 = imix[0]; var m1 = imix[1]; var m2 = imix[2]; var m3 = imix[3]; var wnew = w.slice(0); end = w.length; for(var i = 0, wi = end - Nb; i < end; i += Nb, wi -= Nb) { // do not sub the first or last round key (round keys are Nb // words) as no column mixing is performed before they are added, // but do change the key order if(i === 0 || i === (end - Nb)) { wnew[i] = w[wi]; wnew[i + 1] = w[wi + 3]; wnew[i + 2] = w[wi + 2]; wnew[i + 3] = w[wi + 1]; } else { // substitute each round key byte because the inverse-mix // table will inverse-substitute it (effectively cancel the // substitution because round key bytes aren't sub'd in // decryption mode) and swap indexes 3 and 1 for(var n = 0; n < Nb; ++n) { tmp = w[wi + n]; wnew[i + (3&-n)] = m0[sbox[tmp >>> 24]] ^ m1[sbox[tmp >>> 16 & 255]] ^ m2[sbox[tmp >>> 8 & 255]] ^ m3[sbox[tmp & 255]]; } } } w = wnew; } return w; } /** * Updates a single block (16 bytes) using AES. The update will either * encrypt or decrypt the block. * * @param w the key schedule. * @param input the input block (an array of 32-bit words). * @param output the updated output block. * @param decrypt true to decrypt the block, false to encrypt it. */ function _updateBlock(w, input, output, decrypt) { /* Cipher(byte in[4*Nb], byte out[4*Nb], word w[Nb*(Nr+1)]) begin byte state[4,Nb] state = in AddRoundKey(state, w[0, Nb-1]) for round = 1 step 1 to Nr-1 SubBytes(state) ShiftRows(state) MixColumns(state) AddRoundKey(state, w[round*Nb, (round+1)*Nb-1]) end for SubBytes(state) ShiftRows(state) AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1]) out = state end InvCipher(byte in[4*Nb], byte out[4*Nb], word w[Nb*(Nr+1)]) begin byte state[4,Nb] state = in AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1]) for round = Nr-1 step -1 downto 1 InvShiftRows(state) InvSubBytes(state) AddRoundKey(state, w[round*Nb, (round+1)*Nb-1]) InvMixColumns(state) end for InvShiftRows(state) InvSubBytes(state) AddRoundKey(state, w[0, Nb-1]) out = state end */ // Encrypt: AddRoundKey(state, w[0, Nb-1]) // Decrypt: AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1]) var Nr = w.length / 4 - 1; var m0, m1, m2, m3, sub; if(decrypt) { m0 = imix[0]; m1 = imix[1]; m2 = imix[2]; m3 = imix[3]; sub = isbox; } else { m0 = mix[0]; m1 = mix[1]; m2 = mix[2]; m3 = mix[3]; sub = sbox; } var a, b, c, d, a2, b2, c2; a = input[0] ^ w[0]; b = input[decrypt ? 3 : 1] ^ w[1]; c = input[2] ^ w[2]; d = input[decrypt ? 1 : 3] ^ w[3]; var i = 3; /* In order to share code we follow the encryption algorithm when both encrypting and decrypting. To account for the changes required in the decryption algorithm, we use different lookup tables when decrypting and use a modified key schedule to account for the difference in the order of transformations applied when performing rounds. We also get key rounds in reverse order (relative to encryption). */ for(var round = 1; round < Nr; ++round) { /* As described above, we'll be using table lookups to perform the column mixing. Each column is stored as a word in the state (the array 'input' has one column as a word at each index). In order to mix a column, we perform these transformations on each row in c, which is 1 byte in each word. The new column for c0 is c'0: m0 m1 m2 m3 r0,c'0 = 2*r0,c0 + 3*r1,c0 + 1*r2,c0 + 1*r3,c0 r1,c'0 = 1*r0,c0 + 2*r1,c0 + 3*r2,c0 + 1*r3,c0 r2,c'0 = 1*r0,c0 + 1*r1,c0 + 2*r2,c0 + 3*r3,c0 r3,c'0 = 3*r0,c0 + 1*r1,c0 + 1*r2,c0 + 2*r3,c0 So using mix tables where c0 is a word with r0 being its upper 8 bits and r3 being its lower 8 bits: m0[c0 >> 24] will yield this word: [2*r0,1*r0,1*r0,3*r0] ... m3[c0 & 255] will yield this word: [1*r3,1*r3,3*r3,2*r3] Therefore to mix the columns in each word in the state we do the following (& 255 omitted for brevity): c'0,r0 = m0[c0 >> 24] ^ m1[c1 >> 16] ^ m2[c2 >> 8] ^ m3[c3] c'0,r1 = m0[c0 >> 24] ^ m1[c1 >> 16] ^ m2[c2 >> 8] ^ m3[c3] c'0,r2 = m0[c0 >> 24] ^ m1[c1 >> 16] ^ m2[c2 >> 8] ^ m3[c3] c'0,r3 = m0[c0 >> 24] ^ m1[c1 >> 16] ^ m2[c2 >> 8] ^ m3[c3] However, before mixing, the algorithm requires us to perform ShiftRows(). The ShiftRows() transformation cyclically shifts the last 3 rows of the state over different offsets. The first row (r = 0) is not shifted. s'_r,c = s_r,(c + shift(r, Nb) mod Nb for 0 < r < 4 and 0 <= c < Nb and shift(1, 4) = 1 shift(2, 4) = 2 shift(3, 4) = 3. This causes the first byte in r = 1 to be moved to the end of the row, the first 2 bytes in r = 2 to be moved to the end of the row, the first 3 bytes in r = 3 to be moved to the end of the row: r1: [c0 c1 c2 c3] => [c1 c2 c3 c0] r2: [c0 c1 c2 c3] [c2 c3 c0 c1] r3: [c0 c1 c2 c3] [c3 c0 c1 c2] We can make these substitutions inline with our column mixing to generate an updated set of equations to produce each word in the state (note the columns have changed positions): c0 c1 c2 c3 => c0 c1 c2 c3 c0 c1 c2 c3 c1 c2 c3 c0 (cycled 1 byte) c0 c1 c2 c3 c2 c3 c0 c1 (cycled 2 bytes) c0 c1 c2 c3 c3 c0 c1 c2 (cycled 3 bytes) Therefore: c'0 = 2*r0,c0 + 3*r1,c1 + 1*r2,c2 + 1*r3,c3 c'0 = 1*r0,c0 + 2*r1,c1 + 3*r2,c2 + 1*r3,c3 c'0 = 1*r0,c0 + 1*r1,c1 + 2*r2,c2 + 3*r3,c3 c'0 = 3*r0,c0 + 1*r1,c1 + 1*r2,c2 + 2*r3,c3 c'1 = 2*r0,c1 + 3*r1,c2 + 1*r2,c3 + 1*r3,c0 c'1 = 1*r0,c1 + 2*r1,c2 + 3*r2,c3 + 1*r3,c0 c'1 = 1*r0,c1 + 1*r1,c2 + 2*r2,c3 + 3*r3,c0 c'1 = 3*r0,c1 + 1*r1,c2 + 1*r2,c3 + 2*r3,c0 ... and so forth for c'2 and c'3. The important distinction is that the columns are cycling, with c0 being used with the m0 map when calculating c0, but c1 being used with the m0 map when calculating c1 ... and so forth. When performing the inverse we transform the mirror image and skip the bottom row, instead of the top one, and move upwards: c3 c2 c1 c0 => c0 c3 c2 c1 (cycled 3 bytes) *same as encryption c3 c2 c1 c0 c1 c0 c3 c2 (cycled 2 bytes) c3 c2 c1 c0 c2 c1 c0 c3 (cycled 1 byte) *same as encryption c3 c2 c1 c0 c3 c2 c1 c0 If you compare the resulting matrices for ShiftRows()+MixColumns() and for InvShiftRows()+InvMixColumns() the 2nd and 4th columns are different (in encrypt mode vs. decrypt mode). So in order to use the same code to handle both encryption and decryption, we will need to do some mapping. If in encryption mode we let a=c0, b=c1, c=c2, d=c3, and r be a row number in the state, then the resulting matrix in encryption mode for applying the above transformations would be: r1: a b c d r2: b c d a r3: c d a b r4: d a b c If we did the same in decryption mode we would get: r1: a d c b r2: b a d c r3: c b a d r4: d c b a If instead we swap d and b (set b=c3 and d=c1), then we get: r1: a b c d r2: d a b c r3: c d a b r4: b c d a Now the 1st and 3rd rows are the same as the encryption matrix. All we need to do then to make the mapping exactly the same is to swap the 2nd and 4th rows when in decryption mode. To do this without having to do it on each iteration, we swapped the 2nd and 4th rows in the decryption key schedule. We also have to do the swap above when we first pull in the input and when we set the final output. */ a2 = m0[a >>> 24] ^ m1[b >>> 16 & 255] ^ m2[c >>> 8 & 255] ^ m3[d & 255] ^ w[++i]; b2 = m0[b >>> 24] ^ m1[c >>> 16 & 255] ^ m2[d >>> 8 & 255] ^ m3[a & 255] ^ w[++i]; c2 = m0[c >>> 24] ^ m1[d >>> 16 & 255] ^ m2[a >>> 8 & 255] ^ m3[b & 255] ^ w[++i]; d = m0[d >>> 24] ^ m1[a >>> 16 & 255] ^ m2[b >>> 8 & 255] ^ m3[c & 255] ^ w[++i]; a = a2; b = b2; c = c2; } /* Encrypt: SubBytes(state) ShiftRows(state) AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1]) Decrypt: InvShiftRows(state) InvSubBytes(state) AddRoundKey(state, w[0, Nb-1]) */ // Note: rows are shifted inline output[0] = (sub[a >>> 24] << 24) ^ (sub[b >>> 16 & 255] << 16) ^ (sub[c >>> 8 & 255] << 8) ^ (sub[d & 255]) ^ w[++i]; output[decrypt ? 3 : 1] = (sub[b >>> 24] << 24) ^ (sub[c >>> 16 & 255] << 16) ^ (sub[d >>> 8 & 255] << 8) ^ (sub[a & 255]) ^ w[++i]; output[2] = (sub[c >>> 24] << 24) ^ (sub[d >>> 16 & 255] << 16) ^ (sub[a >>> 8 & 255] << 8) ^ (sub[b & 255]) ^ w[++i]; output[decrypt ? 1 : 3] = (sub[d >>> 24] << 24) ^ (sub[a >>> 16 & 255] << 16) ^ (sub[b >>> 8 & 255] << 8) ^ (sub[c & 255]) ^ w[++i]; } /** * Deprecated. Instead, use: * * forge.cipher.createCipher('AES-', key); * forge.cipher.createDecipher('AES-', key); * * Creates a deprecated AES cipher object. This object's mode will default to * CBC (cipher-block-chaining). * * The key and iv may be given as a string of bytes, an array of bytes, a * byte buffer, or an array of 32-bit words. * * @param options the options to use. * key the symmetric key to use. * output the buffer to write to. * decrypt true for decryption, false for encryption. * mode the cipher mode to use (default: 'CBC'). * * @return the cipher. */ function _createCipher(options) { options = options || {}; var mode = (options.mode || 'CBC').toUpperCase(); var algorithm = 'AES-' + mode; var cipher; if(options.decrypt) { cipher = forge.cipher.createDecipher(algorithm, options.key); } else { cipher = forge.cipher.createCipher(algorithm, options.key); } // backwards compatible start API var start = cipher.start; cipher.start = function(iv, options) { // backwards compatibility: support second arg as output buffer var output = null; if(options instanceof forge.util.ByteBuffer) { output = options; options = {}; } options = options || {}; options.output = output; options.iv = iv; start.call(cipher, options); }; return cipher; }