What is a Letter in CSS's first-letter Pseudo-element?

By Xah Lee. Date: . Last updated: .

CSS has a :first-letter pseudo-element selector. [see CSS: ::first-letter]

p.xyz:first-letter {
font-size:2rem;
color:red;
}

However, if the paragraph starts with a punctuation or some math symbol such as (U+2211: N-ARY SUMMATION), it may or may not work.

The CSS spec is not precise about what characters are considered a “letter”, and it also specifies that if the paragraph starts with a quotation symbol, both that quotation symbol and the first letter will be styled.

“once upon a time …”

Different browsers behave differently.

Here is a screenshot of what Firefox 3.5 shows:

firefox3.5 CSS first letter

Try this test page to see what your browser show: CSS first-letter pseudo-element test page.

Is Firefox 3.5's Behavior Correct by W3C?

The question is, is Firefox 3.5's behavior correct by W3C? The spec is here: http://www.w3.org/TR/CSS2/selector.html#first-letter

However, the spec is not exactly clear on this.

I think it's better if it applies to all Unicode chars, and just the first char, no exceptions.

Emulating the traditional publication behavior is quite vague and problematic. For example, the spec says if the first letter is a double quote, then that double quote should also be cap'd alone with the letter after the double quote. Also, it says if the first char is digits such as 3, it should be cap'd too.

These are problematic and inconsistent. If emulating tradition is focused, then digit chars shouldn't apply. But, including the quotation char as part of the “first-letter” is also problematic, because it breaks simplicity and consistency.

It would be much simpler and consistent, and probably more applicable to today's digital world, if the pseudo-element “first-letter” is “first-char” instead. For those who needs to cap the first punctuation char alone with the letter that follows, they might as well do special CSS for those cases, or, W3C really should have tags like second-letter, last-letter (for last punctuation, usually closing quotation char.).

Jukka K Korpela gives a good analysis of the situation:

Newsgroups: comp.infosystems.www.authoring.stylesheets
Date: Tue, 28 Jul 2009 19:25:57 +0300
Subject: Re: firefox 3.5 broke css :first-letter

at: Source groups.google.com

Jukka K. Korpela 7/28/09

Swifty wrote:
> Xah Lee wrote:
>> Interesting. Seems Firefox 3.5 only apply the css to letters and not
>> other chars?
>>
>> All other browser of current version seems to do the summation sign.
>> Which behavior is correct?

That will come down to your interpretation of the word “letter”. In part, yes. The CSS “specifications” are vague in this matter. In a sense, the meaning of “letter” might be the easiest part. The “specs” (CSS 2.0 which is official but not recommended by anyone; CSS 2.1 which is often cited as de-facto standar but itself forbids that, nominally; and the excuse for a draft sketch CSS 3.0 Selectors, all saying essentially the same in this matter) refer to Unicode properties in the discussion of punctuation characters. Thus, it would be very natural to interpret the word “letter” as referring to the Unicode characters that have a General Category value beginning with “L”, for “Letter”, thus including ideographs, syllable characters, and far more characters than most of us ever heard of. The summation sign is surely not a letter in that sense; its General Category is “Symbol, Math”.

> Is “A” a letter?
> Is “8” a letter?
> if “+” a letter?

I would say “yes” to only the first. That is correct. And “8” is a digit. But what other letters and digits are there?

The additional difficulties include these:

(1) The CSS “specs” say that :first-letter also includes any leading punctuation character, which is somewhat odd - if a line begins with a quotation mark and a letter, then these two are included into the pseudo-element :first-letter.

(2) The CSS “specs” also say:

The ‘:first-letter’ also applies if the first letter is in fact a digit, e.g., the “6” in “67 million dollars is a lot of money.

That's weird, really, and it raises the question what constitutes a digit. (Anything with General Category value beginning with “N”, for “Number” - which is a lot more than most of us would think.)

(3) For further confusion, they say:

“Some languages may have specific rules about how to treat certain letter combinations. In Dutch, for example, if the letter combination “ij” appears at the beginning of a word, both letters should be considered within the :first-letter pseudo-element.”

That's really wild. It makes things language-dependent and even language version dependent; e.g. it's an open question whether “ch” in Spanish is one letter in some sense. (And what about English “th”?) And it leaves things wide open - are browsers really supposed to know the rules of all written languages in such matters? What _are_ the rules, really?

Let's end this with one more foolishness:

If the letters that would form the first-letter are not in the same element, such as “'T” in <p>'<em>T..., the UA may create a first-letter pseudo-element from one of the elements, both elements, or simply not create a pseudo-element.

Is that just idle babbling, or is it supposed to be part of a specification?

Oh, wait… they also “specify”:

“The :first-letter pseudo-element must select the first letter of the first line of a block, if it is not preceded by any other content (such as images or inline tables) on its line”

So if we have #%&*Foo at the start of a line, then :first-letter is the "F", right? Is this supposed to be useful?

--
Yucca, http://www.cs.tut.fi/~jkorpela/

BUY ΣJS JavaScript in Depth