4.1.10

Character Sets

Test yourself

Character Sets

Text data is made up of characters. Character sets allow us to store characters digitally.

Illustrative background for Character setsIllustrative background for Character sets ?? "content

Character sets

  • Text data is made up of characters.
  • Each character is assigned its own character code.
  • A character set is a collection of all the characters that a computer recognises, along with their binary codes.
Illustrative background for What's in a character set?Illustrative background for What's in a character set? ?? "content

What's in a character set?

  • Character sets include:
    • Alphanumeric characters e.g. letters, numbers, and symbols.
    • Special characters e.g. new line.
Illustrative background for Examples of character setsIllustrative background for Examples of character sets ?? "content

Examples of character sets

  • There are two main character sets in use:
    • American Standard Code for Information Interchange.
    • Unicode.

American Standard Code for Information Interchange

The American Standard Code for Information Interchange (ASCII) character set is the most common character set.

Illustrative background for ASCIIIllustrative background for ASCII ?? "content

ASCII

  • Each character in ASCII is represented by a seven-bit binary code.
    • That means there is a maximum of 128 characters.
  • ASCII includes all commonly used letters and symbols in the English language.
Illustrative background for 7-bit letters?Illustrative background for 7-bit letters? ?? "content

7-bit letters?

  • Each letter is represented by seven bits.
  • This is useful because when used in an 8-bit system, the extra bit can be used as a check digit.
Illustrative background for Limitations of ASCIIIllustrative background for Limitations of ASCII ?? "content

Limitations of ASCII

  • 128 characters is perfectly fine for the English language. But it does not leave space for characters from other languages.
  • An extended ASCII set was released which used all eight bits, but it was still not enough.
  • This led to the release of Unicode.

Unicode

Unicode is a character set which was released because of the need to standardise character sets internationally.

Illustrative background for UnicodeIllustrative background for Unicode ?? "content

Unicode

  • Unicode aims to represent every possible character in the world.
  • The most common form of Unicode is UTF-8 and uses between eight and 32 bit binary codes to represent each character.
Illustrative background for Compatability with ASCIIIllustrative background for Compatability with ASCII ?? "content

Compatability with ASCII

  • The first 256 characters in Unicode are identical to extended ASCII, which makes it backwards compatible with documents encoded using older character sets.
Illustrative background for Types of charactersIllustrative background for Types of characters ?? "content

Types of characters

  • Unicode represents characters from all major alphabets of the world.
  • Unicode is also used to represent emojis!

Jump to other topics

1Components of a Computer

2Software & Software Development

3Exchanging Data

4Data Types, Data Structures & Algorithms

5Legal, Moral, Cultural & Ethical Issues

6Elements of Computational Thinking

6.1Thinking Abstractly

6.2Thinking Procedurally

6.3Thinking Logically

7Problem Solving & Programming

8Algorithms

Go student ad image

Unlock your full potential with GoStudent tutoring

  • Affordable 1:1 tutoring from the comfort of your home

  • Tutors are matched to your specific learning needs

  • 30+ school subjects covered

Book a free trial lesson