I am downloading a gzip log from a URL and then saving it to a variable. I then want to later iterate over that string variable line by line. If I just save the file and open it in Notepad++, I can see that the saved log file is in UTF-8 encoding.
I wanted to skip saving the file and then reopening to parse it, so I have attempted to assign the file contents to a variable and then use io.StringIO to iterate over each line within the variable. This process works fine but occasionally I get the following error to blow up when the script reaches the line return str(file_content, 'utf-8').
Exception Raised in connect function: 'utf-8' codec can't decode byte 0xe0 in position 138037: invalid continuation byte
Here is the section of code that makes the request and then assigns to string variable.
# Making a get request with basic authentication
request = urllib.request.Request(url)
base64string = base64.b64encode(bytes('%s:%s' % ('xxxxx', 'xxxxx'),'ascii'))
request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
# open request and then use gzip to read the shoutcast log that is in gzip format, then save uncompressed version
with urllib.request.urlopen(request) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_content = uncompressed.read()
return str(file_content, 'utf-8')