2018/05/08

[Cryptography] pycrypto筆記: SHA256計算之程式作業

在Coursera上Cryptography I之第三週作業中,要利用chaining的方式來計算檔案的SHA256。作業給了兩個檔案,分別是:
1. 驗證用:6.2.birthday.mp4_download, 16,927,313 bytes
2. 作業題目:6.1.intro.mp4_download, 12,494,779 bytes



其中,6.2.birthday.mp4_download之SHA256為:
03c08f4ee0b576fe319338139c045c89c3e8e9409633bea29442e21425006ea8


使用pycrypto來實現,要特別注意是讀檔案的部分,作業要求1024 bytes為一個block來做計算,所以在讀檔的時候最後會有一段無法整除的尾端檔案,也就是不滿足1024 bytes。

在Python讀檔案中,可以用底下的方式來寫
with open(filename, "rb") as in_file:
    while True:
        piece = in_file.read(block_size)
        if piece == "":
            break

要特別注意read()這個函數的回傳值,read(size)可以透過size來指定我們一次要讀取1024 bytes,但是讀到最後檔案時會回傳空字串,所以假設沒特別處理最後一個區塊的話,記得不要把最後read()讀取出來的空字串也代進SHA256中計算了;因為犯了這個錯誤,讓我一直無法計算出正確的答案...

舉例來說,我們有一個檔案tttt.txt (1,100 bytes),存了1,100個英文字,如底下
AAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZ

所以讀取這個檔案時候,底下兩個會有不同的結果
import os

filename = "tttt.txt"
block_size = 1024 #bytes
# Get file size in bytes
file_size = os.path.getsize(filename)
# The last block size 
last_block_size = file_size % block_size
total_blocks = file_size / block_size

pieces_of_files = []
iteration = 0
with open(filename, "rb") as in_file:
    while iteration < total_blocks:
        piece = in_file.read(block_size)
        pieces_of_files.append(piece)
        iteration = iteration + 1
    if iteration == total_blocks:
        #print("This is the final block!")
        piece = in_file.read(last_block_size)
        pieces_of_files.append(piece)
print(pieces_of_files)
num_blocks = len(pieces_of_files)
print("All blocks is %d" %len(pieces_of_files))

pieces_of_files2 = []
iteration = 0
with open(filename, "rb") as in_file:
    while True:
        piece = in_file.read(block_size)
        pieces_of_files2.append(piece) # This line is THE bug
    
        if piece == "":
            break
print(pieces_of_files2)

可以看到在第二個讀取最後多存了一個空字串,導致我一直無法算出正確的答案。

也就是說,正確的程式碼,如果不處理最後字串的話應該要寫成:
pieces_of_files2 = []
with open(filename, "rb") as in_file:
    while True:
        piece = in_file.read(block_size)
        if piece == "":
            break
        pieces_of_files2.append(piece)

之後就按照第一張圖的要求,依序計算每個block的SHA256,應該就可以得到答案了。簡單的SHA256計算程式碼如下:
from Crypto.Hash import SHA256

print("---This is an example for SHA256---")
h = SHA256.new()
h.update(b'Hello')
print h.hexdigest()


沒有留言:

張貼留言