html special characters and web forms

Sun Oct 9 21:44:00 EDT 2005

On Sun, 2005-10-09 at 19:42 -0400, Donald Leslie {74279} wrote:
> I have a web application which allows text to be input into a form.
> 
> A user went to a web page where the text contained a quote , which was
> represented in the page source as &#8217 .
>>> uc = u'\u8127'
>>> len(uc)
1
>>> bytes = uc.encode('utf8')
>>> len(bytes)
3
>>> map(ord,bytes)
[232, 132, 167]		# those are base-10 numbers

Well the UTF encoding uses three bytes.  The encoding should be
specified at the top of the html file, though the web server could be
specifying a different encoding in its headers.  The browsers normally
believe the web server heading when they disagree.

Are you saying the database field is limited to ASCII (0 < ord < 128)?
Or is it grumbling about the encoding error?

>>> x = '\222'
>>> ux = x.decode('utf8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 0:
unexpected code byte

>
> 
> When the form data is read the quote becomes \222 . Where do I find
> how this is being encoded so I can put back the quote. or other special 
> characters, I found one  reference which said that this was due to UTF-8 
> encoding .
> 
> This is a problem for me since the text containing the quote is stored 
> in a data base. An xml database query breaks when it thinks the record 
> contains binary data.
> 
> Don Leslie
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
-- 
Lloyd Kvam
Venix Corp