Questions tagged [character-encoding]
Character encoding refers to the way characters are represented as a series of bytes. Character encoding for the Web is defined in the Encoding Standard.
character-encoding
15,285
questions
2367
votes
41
answers
1.2m
views
How do I get a consistent byte representation of strings in C# without manually specifying an encoding?
How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding?
I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know ...
1491
votes
5
answers
2.8m
views
Best way to convert string to bytes in Python 3? [closed]
TypeError: 'str' does not support the buffer interface suggests two possible methods to convert a string to bytes:
b = bytes(mystring, 'utf-8')
b = mystring.encode('utf-8')
Which method is ...
1133
votes
22
answers
925k
views
What's the difference between UTF-8 and UTF-8 with BOM?
What's different between UTF-8 and UTF-8 with BOM?
803
votes
17
answers
854k
views
MySQL: Get character-set of database or table or column?
What is the (default) charset for:
MySQL database
MySQL table
MySQL column
754
votes
20
answers
407k
views
What is the difference between UTF-8 and Unicode?
I have heard conflicting opinions from people - according to the Wikipedia UTF-8 page.
They are the same thing, aren't they? Can someone clarify?
542
votes
20
answers
646k
views
How to convert an entire MySQL database characterset and collation to UTF-8?
How can I convert entire MySQL database character-set to UTF-8 and collation to UTF-8?
523
votes
5
answers
343k
views
What is the difference between utf8mb4 and utf8 charsets in MySQL?
What is the difference between utf8mb4 and utf8 charsets in MySQL?
I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings;
but I'm curious to know whats the difference of utf8mb4 group of ...
512
votes
8
answers
608k
views
What is the difference between UTF-8 and ISO-8859-1? [closed]
What is the difference between UTF-8 and ISO-8859-1?
461
votes
2
answers
1.1m
views
Working with UTF-8 encoding in Python source [duplicate]
Consider:
$ cat bla.py
u = unicode('d…')
s = u.encode('utf-8')
print s
$ python bla.py
File "bla.py", line 1
SyntaxError: Non-ASCII character '\xe2' in file bla.py on line 1, but no encoding ...
450
votes
13
answers
164k
views
Why do we use Base64?
Wikipedia says
Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This ...
425
votes
8
answers
221k
views
No line-break after a hyphen
I'm looking to prevent a line break after a hyphen - on a case-by-case basis that is compatible with all browsers.
Example:
I have this text: 3-3/8" which in HTML is this: 3-3/8”
The ...
413
votes
18
answers
853k
views
Setting the default Java character encoding
How do I properly set the default character encoding used by the JVM (1.5.x) programmatically?
I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don't have that ...
409
votes
2
answers
357k
views
Unicode, UTF, ASCII, ANSI format differences
What is the difference between the Unicode, UTF8, UTF7, UTF16, UTF32, ASCII, and ANSI encodings?
In what way are these helpful for programmers?
387
votes
7
answers
810k
views
What does "Content-type: application/json; charset=utf-8" really mean?
When I make a POST request with a JSON body to my REST service I include Content-type: application/json; charset=utf-8 in the message header. Without this header, I get an error from the service. I ...
375
votes
21
answers
1.0m
views
"for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte
Here is my code,
for line in open('u.item'):
# Read each line
Whenever I run this code it gives the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: ...
367
votes
19
answers
622k
views
Change MySQL default character set to UTF-8 in my.cnf?
Currently we are using the following commands in PHP to set the character set to UTF-8 in our application.
Since this is a bit of overhead, we'd like to set this as the default setting in MySQL. Can ...
342
votes
18
answers
530k
views
Is there an upside down caret character?
I have to maintain a large number of classic ASP pages, many of which have tabular data with no sort capabilities at all. Whatever order the original developer used in the database query is what you'...
333
votes
26
answers
442k
views
Detect encoding and make everything UTF-8
I'm reading out lots of texts from various RSS feeds and inserting them into my database.
Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO 8859-1.
...
306
votes
10
answers
189k
views
What is a vertical tab?
What was the original historical use of the vertical tab character (\v in the C language, ASCII 11)?
Did it ever have a key on a keyboard? How did someone generate it?
Is there any language or ...
303
votes
7
answers
298k
views
What encoding/code page is cmd.exe using?
When I open cmd.exe on Windows, what encoding is it using?
How can I check which encoding it is currently using?
Does it depend on my regional setting or are there any environment
variables to check?
...
297
votes
13
answers
814k
views
How to convert Strings to and from UTF8 byte arrays in Java
In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java ...
284
votes
11
answers
464k
views
What is ANSI format?
What is ANSI encoding format? Is it a system default format?
In what way does it differ from ASCII?
278
votes
20
answers
318k
views
How do you echo a 4-digit Unicode character in Bash?
I'd like to add the Unicode skull and crossbones to my shell prompt (specifically the 'SKULL AND CROSSBONES' (U+2620)), but I can't figure out the magic incantation to make echo spit it, or any other, ...
266
votes
11
answers
151k
views
PHP DOMDocument loadHTML not encoding UTF-8 correctly
I'm trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me).
$profile = "<div><p>various japanese characters</p&...
254
votes
9
answers
488k
views
Write to UTF-8 file in Python
I'm really confused with the codecs.open function. When I do:
file = codecs.open("temp", "w", "utf-8")
file.write(codecs.BOM_UTF8)
file.close()
It gives me the error
UnicodeDecodeError: 'ascii' ...
253
votes
8
answers
399k
views
Writing Unicode text to a text file?
I'm pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a Wordpress page).
It has some non-ASCII symbols. How can I convert these safely to ...
240
votes
15
answers
425k
views
Do I really need to encode '&' as '&'?
I'm using an '&' symbol with HTML5 and UTF-8 in my site's <title>. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles.
http://validator.w3.org is giving me ...
217
votes
3
answers
416k
views
Change the encoding of a file in Visual Studio Code
Is there any way to change the encoding of a file?
For example UTF-8 to ISO 8859-1?
Setting Example Sublime Text:
"default_encoding": "UTF-8"
215
votes
6
answers
74k
views
Why charset names are not constants?
Charset issues are confusing and complicated by themselves, but on top of that you have to remember exact names of your charsets. Is it "utf8"? Or "utf-8"? Or maybe "UTF-8"? When searching internet ...
204
votes
12
answers
540k
views
Convert Unicode to ASCII without errors in Python
My code just scrapes a web page, then converts it to Unicode.
html = urllib.urlopen(link).read()
html.encode("utf8","ignore")
self.response.out.write(html)
But I get a UnicodeDecodeError:
Traceback ...
199
votes
7
answers
638k
views
How can I transform string to UTF-8 in C#?
I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.
Due to incorrect encoding, a piece of my string looks ...
195
votes
10
answers
59k
views
What's the difference between encoding and charset?
I am confused about the text encoding and charset. For many reasons, I have to
learn non-Unicode, non-UTF8 stuff in my upcoming work.
I find the word "charset" in email headers as in "ISO-2022-JP", ...
189
votes
3
answers
310k
views
Changing PowerShell's default output encoding to UTF-8
By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.
It can be ...
187
votes
7
answers
116k
views
What is the difference between encode/decode?
I've never been sure that I understand the difference between str/unicode decode and encode.
I know that str().decode() is for when you have a string of bytes that you know has a certain character ...
186
votes
4
answers
141k
views
Why specify @charset "UTF-8"; in your CSS file?
I've been seeing this instruction as the very first line of numerous CSS files that have been turned over to me:
@charset "UTF-8";
What does it do, and is this at-rule necessary?
Also, if I ...
178
votes
13
answers
373k
views
PHP: Convert any string to UTF-8 without knowing the original character set, or at least try
I have an application that deals with clients from all over the world, and, naturally, I want everything going into my databases to be UTF-8 encoded.
The main problem for me is that I don't know what ...
172
votes
10
answers
88k
views
Can I make git recognize a UTF-16 file as text?
I'm tracking a Virtual PC virtual machine file (*.vmc) in git, and after making a change git identified the file as binary and wouldn't diff it for me. I discovered that the file was encoded in UTF-...
170
votes
23
answers
240k
views
How do I remove  from the beginning of a file?
I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: 
PHP removes ...
168
votes
10
answers
149k
views
How many characters can UTF-8 encode?
If UTF-8 is 8 bits, does it not mean that there can be only maximum of 256 different characters?
The first 128 code points are the same as in ASCII. But it says UTF-8 can support up to million of ...
161
votes
16
answers
390k
views
Java : How to determine the correct charset encoding of a stream
With reference to the following thread:
Java App : Unable to read iso-8859-1 encoded file correctly
What is the best way to programatically determine the correct charset encoding of an inputstream/...
155
votes
13
answers
428k
views
How to change the default encoding to UTF-8 for Apache
I am using a hosting company and it will list the files in a directory if the file index.html is not there. It uses ISO 8859-1 as the default encoding.
If the server is Apache, is there a way to set ...
153
votes
5
answers
259k
views
JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10)
I'm trying to use org.apache.httpcomponents to consume a Rest API, which will post JSON format data to API.
I get this exception:
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal
...
152
votes
7
answers
171k
views
Is ASCII code in matter of fact 7 bit or 8 bit?
My teacher told me ASCII is an 8-bit character coding scheme. But it is defined only for 0-127 codes which means it can be fitted into 7 bits. So can't it be argued that ASCII is actually a 7-bit code?...
148
votes
10
answers
146k
views
How can I find non-ASCII characters in MySQL?
I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to ...
145
votes
14
answers
140k
views
How to check if a String contains only ASCII?
The call Character.isLetter(c) returns true if the character is a letter. But is there a way to quickly find if a String only contains the base characters of ASCII?
145
votes
12
answers
287k
views
How to support UTF-8 encoding in Eclipse
How can I add UTF-8 support in eclipse? I want to add for example Russian language but eclipse won't support it. What should I do? Please guide me.
145
votes
2
answers
359k
views
How many bits or bytes are there in a character? [closed]
How many bits or bytes are there per "character"?
144
votes
5
answers
191k
views
How to change the default charset of a MySQL table?
There is a MySQL table which has this definition taken from SQLYog Enterprise :
Table Create Table
----------------- ------------------------...
143
votes
3
answers
87k
views
.NET Core doesn't know about Windows 1252, how to fix?
This program works just fine when compiled for .NET 4 but does not when compiled for .NET Core. I understand the error about encoding not supported but not how to fix it.
Public Class Program
...
140
votes
16
answers
294k
views
Who sets response content-type in Spring MVC (@ResponseBody)
I'm having in my Annotation driven Spring MVC Java web application runned on jetty web server (currently in maven jetty plugin).
I'm trying to do some AJAX support with one controller method ...