Skip to main content

Questions tagged [character-encoding]

Character encoding refers to the way characters are represented as a series of bytes. Character encoding for the Web is defined in the Encoding Standard.

character-encoding
Filter by
Sorted by
Tagged with
2367 votes
41 answers
1.2m views

How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding? I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know ...
Agnel Kurian's user avatar
  • 58.9k
1491 votes
5 answers
2.8m views

Best way to convert string to bytes in Python 3? [closed]

TypeError: 'str' does not support the buffer interface suggests two possible methods to convert a string to bytes: b = bytes(mystring, 'utf-8') b = mystring.encode('utf-8') Which method is ...
Mark Ransom's user avatar
1133 votes
22 answers
925k views

What's the difference between UTF-8 and UTF-8 with BOM?

What's different between UTF-8 and UTF-8 with BOM?
simple's user avatar
  • 11.5k
803 votes
17 answers
854k views

MySQL: Get character-set of database or table or column?

What is the (default) charset for: MySQL database MySQL table MySQL column
Amandasaurus's user avatar
  • 59.8k
754 votes
20 answers
407k views

What is the difference between UTF-8 and Unicode?

I have heard conflicting opinions from people - according to the Wikipedia UTF-8 page. They are the same thing, aren't they? Can someone clarify?
sarsnake's user avatar
  • 27.3k
542 votes
20 answers
646k views

How to convert an entire MySQL database characterset and collation to UTF-8?

How can I convert entire MySQL database character-set to UTF-8 and collation to UTF-8?
Dean's user avatar
  • 7,985
523 votes
5 answers
343k views

What is the difference between utf8mb4 and utf8 charsets in MySQL?

What is the difference between utf8mb4 and utf8 charsets in MySQL? I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings; but I'm curious to know whats the difference of utf8mb4 group of ...
Mojtaba Rezaeian's user avatar
512 votes
8 answers
608k views

What is the difference between UTF-8 and ISO-8859-1? [closed]

What is the difference between UTF-8 and ISO-8859-1?
Jagadesh's user avatar
  • 6,659
461 votes
2 answers
1.1m views

Working with UTF-8 encoding in Python source [duplicate]

Consider: $ cat bla.py u = unicode('d…') s = u.encode('utf-8') print s $ python bla.py File "bla.py", line 1 SyntaxError: Non-ASCII character '\xe2' in file bla.py on line 1, but no encoding ...
Nullpoet's user avatar
  • 11.2k
450 votes
13 answers
164k views

Why do we use Base64?

Wikipedia says Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This ...
Lazer's user avatar
  • 93.2k
425 votes
8 answers
221k views

No line-break after a hyphen

I'm looking to prevent a line break after a hyphen - on a case-by-case basis that is compatible with all browsers. Example: I have this text: 3-3/8" which in HTML is this: 3-3/8” The ...
Sparky's user avatar
  • 98.5k
413 votes
18 answers
853k views

Setting the default Java character encoding

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically? I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don't have that ...
user avatar
409 votes
2 answers
357k views

Unicode, UTF, ASCII, ANSI format differences

What is the difference between the Unicode, UTF8, UTF7, UTF16, UTF32, ASCII, and ANSI encodings? In what way are these helpful for programmers?
web dunia's user avatar
  • 9,719
387 votes
7 answers
810k views

What does "Content-type: application/json; charset=utf-8" really mean?

When I make a POST request with a JSON body to my REST service I include Content-type: application/json; charset=utf-8 in the message header. Without this header, I get an error from the service. I ...
DenaliHardtail's user avatar
375 votes
21 answers
1.0m views

"for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

Here is my code, for line in open('u.item'): # Read each line Whenever I run this code it gives the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: ...
SujitS's user avatar
  • 11.3k
367 votes
19 answers
622k views

Change MySQL default character set to UTF-8 in my.cnf?

Currently we are using the following commands in PHP to set the character set to UTF-8 in our application. Since this is a bit of overhead, we'd like to set this as the default setting in MySQL. Can ...
Jorre's user avatar
  • 17.5k
342 votes
18 answers
530k views

Is there an upside down caret character?

I have to maintain a large number of classic ASP pages, many of which have tabular data with no sort capabilities at all. Whatever order the original developer used in the database query is what you'...
Joel Coehoorn's user avatar
333 votes
26 answers
442k views

Detect encoding and make everything UTF-8

I'm reading out lots of texts from various RSS feeds and inserting them into my database. Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO 8859-1. ...
caw's user avatar
  • 31.3k
306 votes
10 answers
189k views

What is a vertical tab?

What was the original historical use of the vertical tab character (\v in the C language, ASCII 11)? Did it ever have a key on a keyboard? How did someone generate it? Is there any language or ...
dmazzoni's user avatar
  • 13k
303 votes
7 answers
298k views

What encoding/code page is cmd.exe using?

When I open cmd.exe on Windows, what encoding is it using? How can I check which encoding it is currently using? Does it depend on my regional setting or are there any environment variables to check? ...
Dan Gøran Lunde's user avatar
297 votes
13 answers
814k views

How to convert Strings to and from UTF8 byte arrays in Java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java ...
284 votes
11 answers
464k views

What is ANSI format?

What is ANSI encoding format? Is it a system default format? In what way does it differ from ASCII?
web dunia's user avatar
  • 9,719
278 votes
20 answers
318k views

How do you echo a 4-digit Unicode character in Bash?

I'd like to add the Unicode skull and crossbones to my shell prompt (specifically the 'SKULL AND CROSSBONES' (U+2620)), but I can't figure out the magic incantation to make echo spit it, or any other, ...
266 votes
11 answers
151k views

PHP DOMDocument loadHTML not encoding UTF-8 correctly

I'm trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me). $profile = "<div><p>various japanese characters</p&...
Slightly A.'s user avatar
  • 2,855
254 votes
9 answers
488k views

Write to UTF-8 file in Python

I'm really confused with the codecs.open function. When I do: file = codecs.open("temp", "w", "utf-8") file.write(codecs.BOM_UTF8) file.close() It gives me the error UnicodeDecodeError: 'ascii' ...
John Jiang's user avatar
  • 11.3k
253 votes
8 answers
399k views

Writing Unicode text to a text file?

I'm pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a Wordpress page). It has some non-ASCII symbols. How can I convert these safely to ...
simon's user avatar
  • 6,047
240 votes
15 answers
425k views

Do I really need to encode '&' as '&amp;'?

I'm using an '&' symbol with HTML5 and UTF-8 in my site's <title>. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles. http://validator.w3.org is giving me ...
Haroldo's user avatar
  • 37.2k
217 votes
3 answers
416k views

Change the encoding of a file in Visual Studio Code

Is there any way to change the encoding of a file? For example UTF-8 to ISO 8859-1? Setting Example Sublime Text: "default_encoding": "UTF-8"
Fernando Tholl's user avatar
215 votes
6 answers
74k views

Why charset names are not constants?

Charset issues are confusing and complicated by themselves, but on top of that you have to remember exact names of your charsets. Is it "utf8"? Or "utf-8"? Or maybe "UTF-8"? When searching internet ...
serg's user avatar
  • 111k
204 votes
12 answers
540k views

Convert Unicode to ASCII without errors in Python

My code just scrapes a web page, then converts it to Unicode. html = urllib.urlopen(link).read() html.encode("utf8","ignore") self.response.out.write(html) But I get a UnicodeDecodeError: Traceback ...
themirror's user avatar
  • 10.2k
199 votes
7 answers
638k views

How can I transform string to UTF-8 in C#?

I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface. Due to incorrect encoding, a piece of my string looks ...
Gaara's user avatar
  • 2,187
195 votes
10 answers
59k views

What's the difference between encoding and charset?

I am confused about the text encoding and charset. For many reasons, I have to learn non-Unicode, non-UTF8 stuff in my upcoming work. I find the word "charset" in email headers as in "ISO-2022-JP", ...
TK.'s user avatar
  • 27.8k
189 votes
3 answers
310k views

Changing PowerShell's default output encoding to UTF-8

By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8. It can be ...
rwallace's user avatar
  • 32.7k
187 votes
7 answers
116k views

What is the difference between encode/decode?

I've never been sure that I understand the difference between str/unicode decode and encode. I know that str().decode() is for when you have a string of bytes that you know has a certain character ...
ʞɔıu's user avatar
  • 48.1k
186 votes
4 answers
141k views

Why specify @charset "UTF-8"; in your CSS file?

I've been seeing this instruction as the very first line of numerous CSS files that have been turned over to me: @charset "UTF-8"; What does it do, and is this at-rule necessary? Also, if I ...
rsturim's user avatar
  • 6,816
178 votes
13 answers
373k views

PHP: Convert any string to UTF-8 without knowing the original character set, or at least try

I have an application that deals with clients from all over the world, and, naturally, I want everything going into my databases to be UTF-8 encoded. The main problem for me is that I don't know what ...
Grim...'s user avatar
  • 16.8k
172 votes
10 answers
88k views

Can I make git recognize a UTF-16 file as text?

I'm tracking a Virtual PC virtual machine file (*.vmc) in git, and after making a change git identified the file as binary and wouldn't diff it for me. I discovered that the file was encoded in UTF-...
skiphoppy's user avatar
  • 101k
170 votes
23 answers
240k views

How do I remove  from the beginning of a file?

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it:  PHP removes ...
Matt's user avatar
  • 11.3k
168 votes
10 answers
149k views

How many characters can UTF-8 encode?

If UTF-8 is 8 bits, does it not mean that there can be only maximum of 256 different characters? The first 128 code points are the same as in ASCII. But it says UTF-8 can support up to million of ...
eMRe's user avatar
  • 3,217
161 votes
16 answers
390k views

Java : How to determine the correct charset encoding of a stream

With reference to the following thread: Java App : Unable to read iso-8859-1 encoded file correctly What is the best way to programatically determine the correct charset encoding of an inputstream/...
Joel's user avatar
  • 29.9k
155 votes
13 answers
428k views

How to change the default encoding to UTF-8 for Apache

I am using a hosting company and it will list the files in a directory if the file index.html is not there. It uses ISO 8859-1 as the default encoding. If the server is Apache, is there a way to set ...
nonopolarity's user avatar
153 votes
5 answers
259k views

JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10)

I'm trying to use org.apache.httpcomponents to consume a Rest API, which will post JSON format data to API. I get this exception: Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal ...
jian zhong's user avatar
  • 1,531
152 votes
7 answers
171k views

Is ASCII code in matter of fact 7 bit or 8 bit?

My teacher told me ASCII is an 8-bit character coding scheme. But it is defined only for 0-127 codes which means it can be fitted into 7 bits. So can't it be argued that ASCII is actually a 7-bit code?...
Anurag Kalia's user avatar
  • 4,768
148 votes
10 answers
146k views

How can I find non-ASCII characters in MySQL?

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to ...
Ed Mays's user avatar
  • 1,740
145 votes
14 answers
140k views

How to check if a String contains only ASCII?

The call Character.isLetter(c) returns true if the character is a letter. But is there a way to quickly find if a String only contains the base characters of ASCII?
TambourineMan's user avatar
145 votes
12 answers
287k views

How to support UTF-8 encoding in Eclipse

How can I add UTF-8 support in eclipse? I want to add for example Russian language but eclipse won't support it. What should I do? Please guide me.
Katty's user avatar
  • 1,727
145 votes
2 answers
359k views

How many bits or bytes are there in a character? [closed]

How many bits or bytes are there per "character"?
RedKing's user avatar
  • 1,613
144 votes
5 answers
191k views

How to change the default charset of a MySQL table?

There is a MySQL table which has this definition taken from SQLYog Enterprise : Table Create Table ----------------- ------------------------...
pheromix's user avatar
  • 18.9k
143 votes
3 answers
87k views

.NET Core doesn't know about Windows 1252, how to fix?

This program works just fine when compiled for .NET 4 but does not when compiled for .NET Core. I understand the error about encoding not supported but not how to fix it. Public Class Program ...
Joshua's user avatar
  • 42.2k
140 votes
16 answers
294k views

Who sets response content-type in Spring MVC (@ResponseBody)

I'm having in my Annotation driven Spring MVC Java web application runned on jetty web server (currently in maven jetty plugin). I'm trying to do some AJAX support with one controller method ...
Hurda's user avatar
  • 4,695

1
2 3 4 5
306