Friday, October 16, 2009
Encoding Rules
Encoding Rules
There are currently five encoding methods recognized for encoding ASN.1 objects into streams of bytes:
q BER encoding includes the Basic Encoding Rules.
q DER encoding includes the Distinguished Encoding Rules.
q CER encoding includes the Canonical Encoding Rules.
q PER encoding includes the Packed Encoding Rules.
q XER encoding includes the XML Encoding Rules.
I mention all of them for completeness, as you can see
the number of methods indicates that people have had a few goes at
producing encoding methods to date and there is probably still more to
be written. Fortunately, the only two methods of interest are BER
encoding and DER encoding.
BER Encoding
BER stands for Basic Encoding Rules. As you've
probably guessed from the example encodings you've seen so far, BER
encoding follows the tag-length-value (TLV) convention. A tag is used
to identify the type, a value defining the length of the content is
next, and then the actual value of the content follows.
BER encoding offers three methods for encoding an ASN.1 object:
q Primitive definite-length
q Constructed definite-length
q Constructed indefinite-length
Simple types employ the primitive definite-length, bit
and character string types will employ whatever method is most
expedient, and structured types employ one of the constructed methods.
If an object is tagged with the IMPLICIT style, the encoding used is the same as that used for the type of the object being tagged. If an object is tagged with the EXPLICIT style, one of the constructed methods will be used to encode the tagging.
How is it decided which method is most expedient?
Strictly speaking, the decision is made on the basis of whether you
know how long the encoding of the object will be when you start writing
it out. However, in some cases, standards do specify BER
indefinite-length, so in situations like that, you will end up with
objects that are indefinite-length encoded regardless of whether it
would have been possible to hold the object in memory. To fully
understand what this means, you need to take a look at the three
methods in more detail.
The Primitive Definite-Length Method
The definite-length methods all require that you
know the length of what you are trying to encode in advance. The
primitive definite-length method is appropriate for any nonstructured
type, or implicitly tagged versions of the same, and an encoding of
this type is created by first encoding the tag assigned to the object,
encoding the length, and then writing out the encoding of the body.
You'll look at how the bodies are encoded in more
detail later, but how the encoding of the length is done is worth
looking at here. If the length is less than or equal to 127, a single
octet is written out containing the actual length as a 7-bit number. If
the length is greater than 127, the first octet written out has bit 8
set and bits 7-1, represent the number of octets following that contain
the actual length. The length is then written out, one octet at a time,
high order octet first.
For example, a length of 127 will produce a length
encoding with 1 byte of the value 0�7f, a length of 128 will produce a
2-byte encoding with the values 0�81 and 0�80, and a length of 1,000
will produce a 3 byte encoding—0�82, 0�03, and 0�e8. This is the
simplest method of encoding and, as you will see, is required for DER
encodings.
The Constructed Definite-Length Method
Length octets in this case are generated the same
way as for the primitive definite-length method, but the initial byte
in the tag associated with any object encoded in this fashion will have
bit 6 set, indicating the encoding is of the constructed type.
As you would imagine, the regular structured types such as SEQUENCE and SET,
or implicitly tagged objects derived from them, are still encoded as
the concatenation of the BER encoding of the objects that make them up.
Likewise, explicitly tagged objects are encoded using the BER encoding
of the object that was tagged. Where this does become different is when
bit string and character string types, or implicit types derived from
them, are encoded using the constructed definite-length method.
When
this happens, the original bit string, or character string, is encoded
as a series of substrings that are of the same base type as the
constructed string. For example, if you are trying to encode a byte
array using the constructed definite-length method as an OCTET STRING, the encoding will start with an OCTET STRING
tag with bit 6 set indicating that it is constructed. Then, after the
length octets, the body of the encoding will be made up a series of
smaller OCTET STRING encodings using the primitive
definite-length method, the sum of which will be the byte array that
you were originally trying to encode.
You might use this method where you have several
values that make up a single ASN.1 type that exist separately prior to
creating an encoding of the ASN.1 type. For example, an e-mail address
may be defined as a single ASN.1 type but be assembled from parts
"name" + "@" + "domain" prior to encoding, which can be encoded as
substrings of a constructed string representing the full address.
The Constructed Indefinite-Length Method
Unlike the previous two methods, the constructed
indefinite-length method does not require you to know the length of the
encoding you are trying to construct in advance. With this method, the
encoding of the tag value follows the same procedure as for the
constructed definite-length method; however, the length is written out
as the single octet of the value 0�80, and instead of being able to use
the length of the encoding to determine when you reach the end of the
contents in the body of the encoding, there is an end-of-contents
marker—two octets of the value 0�00, which actually equate to tag 0,
length 0. Other than the requirement for and the presence of the
end-of-contents marker, encoding of objects is handled in much the same
way as for constructed definite-length.
This method of encoding is useful where the length
of the value is not known at the time the tag and length for the value
is encoded. This method is common when encodings are very large and
memory or efficiency constraints prevent the entire value being
buffered to determine its length before encoding it.
DER Encoding
The Distinguished Encoding Rules, or DER, are so
called because they make identical data within identical ASN.1
definitions reduce to identical binary encodings. This is particularly
important in security applications where the binary data will be
digitally signed. There is also an interesting covert channel made
possible with BER encodings where equivalent, but different BER
encodings can be used to transmit extra information. For example, an
octet string representing encrypted data could be represented using a
constructed method where the length of the substrings making up the
encrypted data could be used to leak information about either the data
itself or the key used to encrypt it. As DER always reduces a value to
the same encoding no matter what, such a covert channel is not possible.
DER adds the following restrictions to BER encoding to make this possible:
q Only definite-length is allowed.
q Only SEQUENCE, SET, implicitly tagged objects derived from SEQUENCE and SET, and all tags of EXPLICIT type use constructed definite-length.
q The length of the encoding must be
encoded in the minimum number of bytes possible. For example, no
leading zeros that add length but do not change the value of the item
being encoded are included.
q Fields that are set to their default value are not included in the encoding.
q The objects contained in a SET are sorted.
DER
encoding is the most common form of encoding you will encounter, and it
is also the simplest to perform. The only area of complication is the
sorting of the objects contained in SET objects. A DER-encoded SET
is sorted by ordering the objects inside it according to their encoded
value in ascending order. Encodings are compared by padding them with
trailing zeros so they are all the same length, with the result that a
DER-encoded SET will be ordered on the tag value of each object. Be careful about relying on this, though. A BER-encoded SET is not necessarily sorted, so if you are trying to write code to handle both BER-and DER-encoded SET objects, it is a mistake to rely on the ordering taking place.
No comments:
Post a Comment