Code

Coding, Programming & Algorithms, Tips, Tweaks & Hacks
Search

Python2's String = Python3's Text Vs. Data

A significant change from Python 2 to Python 3 is the way strings are dealt with.
Python 3 doesnt always return a string when expected.
For example, the return type of read() in version 2 has always been a string. But in version 3, its very often a "bytes" string.
When you print a "bytes" string, you'll see every character in its byte format, special characters as escape secquences (newline as \n) and other unicode characters as escape sequences.
This is because Python 3 differentiates between text (string) and data ("bytes" string) as oppossed to Unicode vs 8-bit string. (Text Vs. Data Instead Of Unicode Vs. 8-bit)

My localhost/index.html contains just this :
<html><body><h1>It works!. stärke gläser</h1></body></html>

Python 2.x
import urllib

url = "http://localhost"
fp = urllib.urlopen(url)
data = fp.read()
print "%s, %s" % (type(data), type(data).__name__)
print data
Python 2.6.4
Python 2.x
~$ python
Python 2.6.4 (r264:75706, Nov 2 2009, 14:44:17)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> url = "http://localhost"
>>> fp = urllib.urlopen(url)
>>> data = fp.read()
>>> print "%s, %s" % (type(data), type(data).__name__)
<type 'str'>, str
>>> print data
<html><body><h1>It works!. stärke gläser</h1></body></html>

>>>
Python 2.6.4

In Python 3, we need to need to explicitly convert it to string format via the str() function and specify the encoding-type.
If you are getting errors using Python 3.0, you may want to update to atleast Python 3.0.1 - many have reported possible Unciode encoding/decoding bugs in 3.0.

Python 3.x
import urllib.request

url = "http://localhost"
fp = urllib.request.urlopen(url)
data = fp.read()
print ("%s, %s" % (type(data), type(data).__name__))
print (data)
data = str(data,'utf-8') # convert a byte datatype to string datatype using utf-8 encoding. For ASCII, data = str(data,'ascii')
print (data)
Python 3.1.1
Python 3.x
~$ python3
Python 3.1.1+ (r311:74480, Oct 12 2009, 02:14:03)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> url = "http://localhost"
>>> fp = urllib.request.urlopen(url)
>>> data = fp.read()
>>> print ("%s, %s" % (type(data), type(data).__name__))
<class 'bytes'>, bytes
>>> print (data)
b'<html><body><h1>It works!. st\xc3\xa4rke gl\xc3\xa4ser</h1></body></html>\n'
>>> data = str(data,'utf-8') # convert a byte datatype to string datatype using utf-8 encoding. For ASCII, data = str(data,'ascii')
>>> print (data)
<html><body><h1>It works!. stärke gläser</h1></body></html>

>>>
Python 3.1.1
Vanakkam !

0 comments: