1. 驗證用:6.2.birthday.mp4_download, 16,927,313 bytes
2. 作業題目:6.1.intro.mp4_download, 12,494,779 bytes
其中,6.2.birthday.mp4_download之SHA256為:
03c08f4ee0b576fe319338139c045c89c3e8e9409633bea29442e21425006ea8
使用pycrypto來實現,要特別注意是讀檔案的部分,作業要求1024 bytes為一個block來做計算,所以在讀檔的時候最後會有一段無法整除的尾端檔案,也就是不滿足1024 bytes。
在Python讀檔案中,可以用底下的方式來寫
with open(filename, "rb") as in_file:
while True:
piece = in_file.read(block_size)
if piece == "":
break
要特別注意read()這個函數的回傳值,read(size)可以透過size來指定我們一次要讀取1024 bytes,但是讀到最後檔案時會回傳空字串,所以假設沒特別處理最後一個區塊的話,記得不要把最後read()讀取出來的空字串也代進SHA256中計算了;因為犯了這個錯誤,讓我一直無法計算出正確的答案...
舉例來說,我們有一個檔案tttt.txt (1,100 bytes),存了1,100個英文字,如底下
AAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZAAAAAZZZZZ
所以讀取這個檔案時候,底下兩個會有不同的結果
import os
filename = "tttt.txt"
block_size = 1024 #bytes
# Get file size in bytes
file_size = os.path.getsize(filename)
# The last block size
last_block_size = file_size % block_size
total_blocks = file_size / block_size
pieces_of_files = []
iteration = 0
with open(filename, "rb") as in_file:
while iteration < total_blocks:
piece = in_file.read(block_size)
pieces_of_files.append(piece)
iteration = iteration + 1
if iteration == total_blocks:
#print("This is the final block!")
piece = in_file.read(last_block_size)
pieces_of_files.append(piece)
print(pieces_of_files)
num_blocks = len(pieces_of_files)
print("All blocks is %d" %len(pieces_of_files))
pieces_of_files2 = []
iteration = 0
with open(filename, "rb") as in_file:
while True:
piece = in_file.read(block_size)
pieces_of_files2.append(piece) # This line is THE bug
if piece == "":
break
print(pieces_of_files2)
可以看到在第二個讀取最後多存了一個空字串,導致我一直無法算出正確的答案。
也就是說,正確的程式碼,如果不處理最後字串的話應該要寫成:
pieces_of_files2 = []
with open(filename, "rb") as in_file:
while True:
piece = in_file.read(block_size)
if piece == "":
break
pieces_of_files2.append(piece)
之後就按照第一張圖的要求,依序計算每個block的SHA256,應該就可以得到答案了。簡單的SHA256計算程式碼如下:
from Crypto.Hash import SHA256
print("---This is an example for SHA256---")
h = SHA256.new()
h.update(b'Hello')
print h.hexdigest()
沒有留言:
張貼留言