| Title: | Revised proposal on the Hebrew
vowel HOLAM |
| Source: | Peter Kirk |
| Status: | Individual Contribution |
| Action: | For consideration by the UTC |
| Date: | Fourth draft of revised proposal 2004-07-08 |
[Note that the material up to the section "Design Goals", is
essentially unchanged from the original proposal to the June 2004 UTC
meeting, http://qaya.org/academic/hebrew/Holam.html,
except that the last paragraph of "Summary" and Figure 5 have been
added.]
The Hebrew point HOLAM combines in two different
ways with the Hebrew letter VAV. In
the first combination, known as Holam Male,
the VAV is not pronounced as a consonant, and HOLAM
and VAV together serve as the vowel associated with the
preceding consonant. In the second combination, known as Vav Haluma,
the HOLAM is the vowel of a
consonantal VAV. In
high quality typography Holam Male
is distinguished from Vav Haluma: Holam Male is written
with the HOLAM dot above the right side
or above the centre of VAV;
and Vav Haluma
is written with HOLAM above the top left
of VAV. The distinction is clear and significant in
some texts, dating from the 10th century CE to the present day. But in
less high quality typography Holam
Male and Vav Haluma
are not distinguished,
and usually both rendered with the HOLAM dot above the
centre of VAV. Holam
Male
is very common in pointed Hebrew texts; Vav Haluma is much less common but
not
extremely rare.
Note carefully that this is not
a proposal to encode a phonetic
distinction which is not made graphically. Rather, it is a proposal to
encode a graphical
distinction with a 1000 year history. This graphical distinction is
often, although not always, made in modern texts, and it must be made
when the phonetic distinction needs to be indicated unambiguously.
Unicode does not currently specify how to distinguish between Holam Male, Vav Haluma, and the
undifferentiated combination. Several different ways have been used in
existing texts,
or recommended for use with Unicode Hebrew fonts. To avoid
proliferation of ad hoc
solutions, it is proposed here that the UTC
specify encodings for the three cases.
[*** Following paragraph to be rewritten ***]
Several options are outlined below. The
preferred option is to encode Holam
Male, when
distinguished from Vav Haluma,
as
the sequence <VAV,
ZWJ, HOLAM>. This option is proposed
to the UTC.
The current proposal is a revised version of the proposal made to
the June 2004 UTC meeting as document L2/04-??? [*** number to be found
***] (also available as http://qaya.org/academic/hebrew/Holam.pdf),
with adjusted options and clarifications to meet the objections
expressed at that meeting by UTC members.
There are two ways of indicating vowels in Hebrew script, which may
be
used either separately or in combination. The ancient system, which
does not fully distinguish the vowel sounds, is to insert the Hebrew
letters ALEF, HE,
VAV and YOD, which can therefore
function as vowels as well as consonants. When "silent", i.e. used to
indicate vowels,
these letters are known mothers of
reading (imot qeri'a or
ehevi in Hebrew, matres lectionis in Latin). In the
early mediaeval period several different systems of pointing were
introduced to specify the vowel sounds more precisely. Only one of
these systems, the Tiberian system, is in current use, and this is the
only one currently encoded in Unicode. (Proposals for the other systems
are currently being prepared.) This system is normally used for
the biblical and other ancient texts (although not for synagogue
scrolls, which are unpointed) and for some modern Hebrew texts. Most
modern Hebrew is unpointed, but makes good use of mothers of reading.
One of the Tiberian vowel points, U+05B9 HEBREW POINT HOLAM,
consists of a dot
usually
written above the left side of a Hebrew base character. This
represents a long O sound pronounced after the base character. When
there is no associated mother of
reading, this way
of writing a long O sound is known as Holam
Haser, i.e. Defective Holam.
In old manuscripts, the dot is often positioned over the space
between the preceding and following base characters, and sometimes
above the right side of the following (to the left) base character. In
printed texts,
the regular position of the dot is above the left side of the preceding
base character.
In pointed Hebrew text the same vowel is often represented both by a
vowel point and by a mother of
reading. The latter has no vowel point of its own, because the
vowel is associated with the preceding consonant. The commonest mother of
reading for a long O sound is VAV. Therefore the
combination of HOLAM with a VAV mother of reading is common in
pointed texts. This
combination is known as Holam Male
(Male is pronounced as two
syllables, mah-leh), i.e. Full Holam.
The HOLAM dot is logically associated with the
preceding base character, the consonant for which it indicates the
vowel sound; the VAV
is redundant because the vowel is fully indicated by the HOLAM.
Thus the VAV may be considered silent, corresponding to
the general rule for pointed texts that a non-final base character with
no point is silent; an alternative analysis is that the VAV
and the HOLAM together indicate the vowel sound.
In the oldest manuscripts which use this pointing scheme, dating from
the 10th century CE, the dot
was positioned above the space between the preceding base character and
the VAV, but it has gradually shifted on to the
redundant VAV.
In modern high quality typography the dot is positioned above the VAV,
usually above its right edge or its centre. However, the HOLAM
dot is not shifted on to a following VAV when the VAV
is not silent but consonantal, except sometimes in rendering the divine
name.
The difficulty arises because VAV can also be a consonant, and as such can be followed, like every other consonant, by Holam Haser (or by Holam Male, but this causes no special difficulty). Therefore the HOLAM dot can combine in two logically different ways with VAV. The combination of VAV with Holam Haser is known as Vav Haluma, and is pronounced VO (or in some traditions WO). A combination of VAV with HOLAM could be a Holam Male, where the VAV is silent and the letter VAV and the point HOLAM together represent the vowel; or it could be the letter VAV with a Holam Haser, where the VAV is a consonant and the HOLAM point is a vowel. There is no difference in pronunciation between Holam Male and Holam Haser.
In high quality typography, especially of the Hebrew Bible and other religious texts, of educational materials, and of poetry, a careful distinction is made between Holam Male and Vav Haluma: in Holam Male, the HOLAM dot is positioned above the right side of the VAV, or sometimes centred above the VAV; but in Vav Haluma, Holam Haser is rendered in its normal position above the left side of VAV. This seems to have been the original practice, as witnessed in manuscripts and printed editions from the 10th to 19th centuries CE. But, because VAV is a rather narrow letter, and because Vav Haluma is rare in modern Hebrew (in which long O is usually written as Holam Male), many modern typographers of general texts make no distinction, rendering both Holam Male and Vav Haluma by VAV with a HOLAM dot usually centred above it.
The distinction between Holam Male
and Vav Haluma is an
important and
semantically significant one. This is especially true for religious
texts; the
distinction is made in most Hebrew Bible editions, and in texts quoting
from the Bible. It is also important in educational materials and in
poetry, wherever the exact pronunciation must be marked unambiguously.
See the
examples in Figures 1, 2 and 3 below, in which Holam Male and Vav Haluma are distinguished in six
Hebrew Bible editions and in two other works.
This distinction is not a rare one. Holam Male is very common in the
Hebrew Bible, occurring about 34,808 times or in about 13% of all
words. Vav Haluma is much
less
common, occurring about 421 times.
![]() |
![]() |
![]() |
| Codex Leningradensis (1006-7) | Lisbon Bible (1492) | Rabbinic Bible (1524-5) |
![]() |
![]() |
![]() |
| Ginsburg/BFBS edition (1908) | Biblia Hebraica Stuttgartensia (1976) | Stone edition of Tanach (1996) |
Figure 1: Holam Male (marked in red) and Vav
Haluma (marked in blue)
distinguished in ancient
and modern editions of the Hebrew Bible - these words are from Genesis
4:13.
(If the colours are not visible: In each image, the third base
character from the right, with the
dot above its right side or its centre, is Holam Male; the third base
character from the left, with the dot above
its left side, is Vav Haluma.)
![]() |
![]() |
Figure 2: Holam Male (left, twice, red, from
p.529) and Vav Haluma
(right, blue, from p.528) contrasted
in Keil & Delitzsch Commentary
on the Old Testament,
vol.1, reprint by Hendrickson, 1996 (Hebrew words quoted in English
text).

Figure 3: Holam Male (right Hebrew word, red) and Vav
Haluma (left word, blue)
contrasted
in Langenscheidt's Pocket Hebrew
Dictionary, p.243.
![]() |
![]() |
![]() |
Figure 4: Comparison of positions of HOLAM after HE and with VAV in Biblia Hebraica Stuttgartensia. Left: regular Holam Male, from Joshua 10:3. Centre: HOLAM dot not shifted on to consonantal VAV, as this is not Holam Male, from Ezekiel 7:26. Right: HOLAM dot shifted to Holam Male position on a consonantal VAV in the divine name, although this is not Holam Male, from Exodus 13:15.

Figure 5: Holam Male (red) written with a different glyph from a
regular VAV (blue),
from Siddur Tikkun Meir Hashalem,
R. Greenfield, 1982.
The Unicode Hebrew block is based on the Israeli national standard
SI 1311. This standard was originally designed for unpointed modern
Hebrew texts, although later extended to cover points (SI 1311.1) and
accents (SI 1311.2) (see http://qsm.co.il/Hebrew/stdisr.htm
for further details), but was not designed for full support of biblical
Hebrew. As a result there are some minor inadequacies in the Unicode
support for biblical Hebrew.
The most significant of these inadequacies, because it is the only
one which affects the vowel points rather than only the accents, is
that there is no support for the distinction between Holam Male and Vav Haluma. There is a single VAV
character and a single HOLAM character, and only one
way of combining these two, the sequence <VAV, HOLAM>,
which is apparently intended to be used for both Holam Male
and Vav Haluma. There is
thus no defined way of distinctively encoding either Holam Male or Vav Haluma.
The alphabetic presentation form U+FB4B HEBREW LETTER VAV WITH HOLAM cannot be used for Holam Male distinct from Vav Haluma, because it is canonically equivalent to the sequence <VAV, HOLAM>, i.e. it has a canonical decomposition (which cannot be changed) to 05D5 05B9. It is included in Unicode for compatibility purposes.
Because there is a real need to
distinguish between Holam Male
and Vav Haluma, but there is
no standard way of doing so, various ad hoc solutions have been used by
text providers and by font developers. The Hebrew Bible text from
Mechon Mamre (at Genesis 4:13, http://www.mechon-mamre.org/c/ct/c0104.htm#13)
uses <VAV, HOLAM> for Holam Male and <VAV,
ZWJ, HOLAM> for Vav Haluma. The "alpha release"
text at http://whi.wts.edu/WHI/Members/klowery/eL/leningradCodex-alpha.zip
and the text at http://users.ntplx.net/~kimball/Tanach/Genesis.xml
use (again at Genesis 4:13) <HOLAM, VAV>
(actually <HOLAM, accent, VAV>
according to canonical ordering) for Holam
Male and <VAV, HOLAM> for Vav Haluma, and this is also the
encoding recommended in the documentation for the fonts SBL Hebrew and
Ezra SIL. There is however a larger body of existing data, including
pointed modern Hebrew and some biblical texts (e.g. the one at http://www.anastesontai.com/b-cantilee/en-cant.asp?A=1&listeB=4),
in which Holam Male and Vav Haluma are not distinguished
but are both encoded as <VAV, HOLAM>.
To avoid this inconsistency and potential confusion, it is proposed
here that
the UTC should specify distinctive character sequences for
representation of Holam Male
and Vav Haluma, for use when
these two
need
to be distinguished. Various options for these distinctive sequences
are
discussed below and in the Appendix. It is noted that
although Option B1 in the Appendix can technically be chosen without
UTC
involvement, because it involves only a spelling rule, the other
options do require UTC approval as they involve sequences with ZWJ
or ZWNJ, or variation sequences, or new characters.
The options for distinctive sequences have been chosen in an attempt
to meet the following design goals and preferences which have been
expressed:
In this regard there is a specific issue concerning use of ZWJ and ZWNJ. According to TUS version 4.0.0, these characters are not permitted within combining character sequences, but according to the (preliminary) minutes of the February 2004 UTC meeting (http://www.unicode.org/consortium/utc-minutes/UTC-098-200402.html) this restriction is being lifted:
The original version of this proposal,
presented to the June 2004 UTC meeting, relied on this consensus, and
in several of its options (equivalent to A1a/b/c and B2a in the
Appendix), including the
preferred options, ZWJ and ZWNJ were
used in combining character sequences. However, at the June meeting UTC
members seemed reluctant to accept such use of ZWJ and ZWNJ.
Therefore in this version of the proposal new options (A3a/b/c, A4a/b/c
and B3a in the Appendix) have been added in which ZWJ
and ZWNJ
are used strictly as defined in TUS
version 4.0.0, as well as an option (A2a in the Appendix) in which a
variation selector
is used instead of ZWJ and according to the TUS rules for use of variation
selectors.
The sequence <VAV, HOLAM> should continue to be a valid representation of both Holam Male and Vav Haluma when there is no need to distinguish them, as commonly in modern Hebrew text.
Specifically, the new sequences for Holam Male and, if applicable, Vav Haluma should be displayed
legibly and as far as possible correctly, although necessarily without
every fine typographical distinction, by existing rendering systems and
fonts which currently display Hebrew without distinguishing Holam Male from Vav Haluma.
This design goal is feasible if careful use is made of default ignorable characters such as ZWJ, ZWNJ and variation selectors, which according to existing Unicode principles should be ignored by rendering systems and fonts which do not recognise specific sequences using these characters.
This unusual design goal requires some
justification. An important motivation for bringing this issue to the
UTC as a matter of some urgency is the proliferation of ad hoc
solutions described above. These have been developed to meet a
perceived need to make available to the general public, on the Internet
and by other means, standard electronic texts of the Hebrew Bible and
other ancient Hebrew texts. The proposers consider it important that
this proliferation is stopped as quickly as possible. The first
requirement for stopping such proliferation is that a standard
representation of distinctive Holam
Male is agreed, and that is the main purpose of this proposal.
However, proliferation will be halted only when the new standard
representation becomes widely supported, at least to a sufficiently
close approximation to satisfy most users. There is also a strong
resistance to solutions which formalise
distinctions between ancient and modern Hebrew; modern Hebrew readers
are likely to reject on these grounds any solution which makes the
Hebrew Bible text unreadable on their existing systems. Therefore
priority is given
in this proposal to options which can already be rendered at least
approximately by existing rendering systems and fonts, without the need
to wait for a number of years for updated fonts to be installed
worldwide.
Note that this distinction differs from
some others made in Unicode, for example between HYPHEN
and MINUS, in that it must be made not only for special
typographical purposes. For example, whereas HYPHEN and
MINUS do not need to be distinguished in general purpose
electronic texts, Holam Male
does need to be distinguished from Vav
Haluma in some such texts because the distinction affects the
interpretation and pronunciation of the text; therefore with certain
texts it is not an optional matter. There are also special issues of
integrity and authority with the biblical text which makes it
undesirable that different versions of the text should be distributed
for different groups of end users. Modern Hebrew readers require the
ability to view the full Hebrew text with all the points, accents and
fine distinctions, even though they are not able to understand all of
these distinctions.
Other processes should fall back to treating Holam Male and Vav Haluma as identical when no deliberate distinction is being made. Thus, for example, the new sequences for Holam Male and, if applicable, Vav Haluma should by default collate together with <VAV, HOLAM> except at the binary level.
The choice of sequence should meet the objections of UTC members to the options in the original June 2004 proposal.
There is no single solution which ideally meets all of these design
goals. For this and other reasons many options have been considered for
representation of Holam Male
and Vav Haluma, offering
various trade-offs between the design goals. All the options considered
worthy of serious consideration are listed in the Appendix below, with
a comparative table of their advantages and disadvantages. Because
there is no clear consensus among the user community on any one option,
a small number of these options have been chosen as preferred options
and are described in the following section.
The following preferred options are presented to the UTC for
consideration as the preferences of Hebrew users among the Unicode
community, and as meeting the objections of UTC members to the options
in the original June 2004 proposal.
In these preferred options ZWJ and variation
selectors are used only as defined in The
Unicode
Standard version 4.0.1, e.g. ZWJ is not used
within a combining character sequence. However, in Preferred Options 1
and 2 ZWJ or the variation selector does have semantic
significance, in that Holam Male
is semantically as well as graphically distinct from Vav Haluma. In Preferred Option 3
ZWJ does not have clear semantic significance because the distinction
between Holam Male and Holam Haser followed by consonantal
VAV can be understood as only graphical, and so this
option may theoretically preferable.
Also in all of these preferred options the
recommended encoding for Vav Haluma
is simply <VAV,
HOLAM>. Thus Vav
Haluma is identified with undifferentiated VAV
with HOLAM.
Although Vav Haluma is less
common than Holam Male, this
corresponds to the regular use of HOLAM with other
Hebrew consonants; this is the reason for proposing the specially
marked, and in most cases longer,
encoding for more common case. For each of these preferred options
there are two alternatives in the Appendix, one (options ending in b
rather than a) in which Holam Male
is identified with undifferentiated VAV
with HOLAM and a distinctive sequence is used for Vav Haluma, and one (options ending
in c) in which a full three-way distinction is made.
It is the position of the proposers that it is undesirable to define a new character for representation of Holam Male, because this is in conflict with the design goal of legibility with existing fonts and rendering systems. Less Preferred Options 4 and 5, which are the least objectionable of the new character options, are listed here with the preferred options to allow the UTC to consider new character options alongside the other preferred options.
This is Option A4a in the Appendix.
This option effectively takes Holam
Male as a variant form of a grapheme cluster which is more
closely connected to the preceding grapheme cluster than the default
case, Vav Haluma or
undifferentiated VAV with HOLAM. This
more connected rendering is indicated by inserting ZWJ
between the two combining character sequences. Graphically, the more
closely connected rendering is indicated by the shift of the HOLAM
dot towards the preceding grapheme cluster. Logically, Holam Male is more closely
connected to what comes before because it is the vowel associated with
the previous consonant, whereas Vav
Haluma is a separate syllable.
This is Option A2a in the Appendix.
This option effectively takes the VAV in Holam Male as a graphical variant
of a regular VAV. This difference is not generally
visible in the actual glyph; see however the slight glyph difference in
Figure 5 above. The graphical difference is rather that the variant VAV
is positioned differently relative to the HOLAM dot.
This is not simply a way of getting round the prohibition against using
variation selectors with combining characters; it is genuinely true,
from a theoretical viewpoint, that Holam
Male and Vav Haluma
are made up of the same HOLAM character combined with
different varieties of VAV, one a vowel and the other a
consonant. See also the discussion of Less Preferred Option 5, in which
a new character is used instead of a variation sequence.
This is Option B3a in the Appendix.
In this option the dot in Holam
Male
is represented according to its logical association as HOLAM
combined with the preceding base character. ZWJ is
inserted between this combining character sequence and the following VAV
to indicate a more closely connected rendering or a kind of ligature
between the grapheme clusters, in which the HOLAM dot
which is logically part of the first grapheme cluster is graphically
positioned above the second one.
One problem with this option is that canonical reordering and
normalisation may cause HOLAM to be separated from <ZWJ,
VAV>, in practical cases by as many as three other
combining characters (DAGESH, SHIN DOT
or SIN DOT, and an accent). This is not a difficulty in
theory, but it may considerably complicate practical implementation.
This is Option C2a in the Appendix.
In this option a new combining character is defined, HEBREW
POINT RIGHT HOLAM, which is to be used only in combination with
VAV
to form Holam Male.
The existing HOLAM character is to be used only for Holam Haser, when combined with any
Hebrew consonant, and for undifferentiated VAV with HOLAM.
A significant problem with this option is that makes an arbitrary
distinction between two types of HOLAM, when in fact
from a logical viewpoint there is only one HOLAM which
may be combined with two varieties of VAV. This option
also has bad fallback behaviour with existing fonts and rendering
engines, in that the very common Holam
Male is completely illegible.
<>This option is based on the observation that Holam Male differs from Vav Haluma not in the HOLAM
but in the VAV. Therefore it is theoretically
preferable to define a new VAV character rather than a
new HOLAM
character. Unicode does not encode distinctions between consonants and
vowels when there is no graphical distinction; thus there is only one LATIN
SMALL LETTER Y. However, there is a graphical distinction
between the VAVs in Holam
Male and Vav Haluma,
in that they are positioned differently relative to HOLAM;
also a distinctive VAV glyph is occasionally used in Holam Male, as shown in Figure 5.
There is thus justification for encoding a separate character HEBREW
LETTER VAV VOWEL, for use primarily as the base character in Holam Male, and possibly also as
the base character in Vav Shruqa
(<VAV, DAGESH> functioning as a
vowel).
>
<>This option shares with Less Preferred Option 4 the disadvantage of bad fallback behaviour.>
<>>
This table summarises the advantages and disadvantages of each of
the preferred and less preferred options. Further details are given in
the fuller descriptions of the options in the Appendix.
| Preferred Option | Summary |
Fallback Behaviour |
Advantages | Disadvantages |
| 1 (A4a) |
Holam Male = <ZWJ, VAV, HOLAM> | Excellent |
Use of ZWJ corresponds to logical structure of script; best fallback behaviour | ZWJ used with semantic significance; long sequence for a common character |
| 2 (A2a) |
Holam Male = <VAV,
variation selector, HOLAM> |
Excellent |
Doesn't use ZWJ or ZWNJ; best fallback behaviour | Variation selector used with semantic significance; long sequence for a common character |
| 3 (B3a) |
Holam Male = <HOLAM, ZWJ, VAV> | Legible |
Best fit to the logical
structure of Hebrew script; ZWJ used as defined in TUS 4.0.1 |
Long sequence for a common character; difficulties with canonical reordering |
| 4 (C2a) |
New character RIGHT HOLAM | Holam Male illegible |
Doesn't use ZWJ, ZWNJ or variation sequence | Bad fallback behaviour; unity
of HOLAM lost |
| 5 (C4) |
New character VAV VOWEL | Holam Male illegible |
Doesn't use ZWJ, ZWNJ or variation sequence | Bad fallback behaviour |
[*** This section to be rewritten completely ***]
Assuming that the objections to sequences with ZWJ
or ZWNJ between base characters and combining
characters no longer apply, all of the options above can be considered.
Theoretically the neatest of these is Option A1a. It is also easy to
implement in practice. The only significant objection to it is the
2% increase in the length of the text. But that objection should not be
given too much weight, given that storage is cheap and compression can
be used for transmission.
Options A1b, A1c and B2a are considered to be acceptable
alternatives
to Option A1a. Option B1 is rejected because its apparent simplicity
masks serious complications. And all of the new character solutions are
rejected because of their incompatibility with existing fonts and
implementations.
In the options with a final "a", and in Options B1 and C1, the recommended encoding for Vav Haluma is simply <VAV, HOLAM>. Thus Vav Haluma is identified with undifferentiated VAV with HOLAM. Although Vav Haluma is less common than Holam Male, this corresponds to the regular use of HOLAM with other Hebrew consonants; this is the reason for proposing the specially marked, and in most cases longer, encoding for more common case.
In the options with a final "b" Holam Male is identified with undifferentiated VAV with HOLAM. These have the advantage that the longer and more complex sequence or the new character is used for the less common combination, Vav Haluma (or with the logical structure options Holam Haser followed by consonantal VAV), but the disadvantage that Holam Haser is treated differently when adjacent to consonantal VAV from when adjacent to all other Hebrew consonants.
The options with a final "c" allow typesetters to make a three-way distinction, distinguishing undifferentiated VAV with HOLAM both from Holam Male and from Vav Haluma (or both from Holam Male and from Holam Haser followed by consonantal VAV). It is uncertain whether this is ever necessary, and so whether the extra complexity of these options can be justified.
Following this list of options and a summary of their advantages and
disadvantages, the preferred encoding
and proposal to the UTC is given.
These options are called "graphical structure solutions" because
they represent the dot in Holam Male
according to its graphical association with the VAV.
[Was Option A1 in the original proposal.]
This option effectively takes Holam Male as a variant of <VAV, HOLAM> with "a more connected rendering" (to quote from The Unicode Standard, version 4.0, section 15.3, p.390). This more connected rendering is indicated by inserting U+200D ZERO WIDTH JOINER (ZWJ) between VAV and HOLAM. This option was earlier rejected because ZWJ and ZWNJ were not permitted between a base character and a combining character. But this restriction was partially relaxed at the February 2004 UTC meeting. This option depends on a small further relaxation of this restriction.
This encoding has the advantage that the fallback behaviour should be automatically as required. One disadvantage is that as a layout control character ZWJ is intended for making rendering distinctions which have no other semantic significance. However, there are already several defined uses of ZWJ and ZWNJ with Arabic and Indic scripts which do have other semantic significance. There are similar objections to any possible variant of this option using Variation Selectors.
There are no known existing implementations of this option. However,
it would be simple to support in fonts.
This option, as well as Options B1 and B2, implies that undifferentiated VAV with HOLAM will be rendered like Vav Haluma, not like Holam Male. In fact it seems that many typesetters who do not generally distinguish Vav Haluma from Holam Male render the HOLAM dot above VAV further to the right than the HOLAM dot indicating Holam Haser when used with other letters, for example with YOD whose upper part is usually the same as that of VAV. This suggests that if in a particular text these typesetters did need to distinguish Vav Haluma from Holam Male, the glyph they would use for Vav Haluma would not be the one which they used for undifferentiated VAV with HOLAM.
Another disadvantage of this option is that each Holam Male consists of three
Unicode characters, including ZWJ which takes three
bytes in UTF-8. This increases the size of the encoded Hebrew Bible,
relative to Options A1b and B1 (in which Holam Male consists of two
characters), by 34,000 characters and more than 100,000
UTF-8 bytes, i.e. around 2% of its total length.
[Was Option A2 in the original proposal.]
This option differs from
Option A1a in that the
simple
sequence <VAV, HOLAM> is used for Holam Male, rather than for Vav Haluma. The proposed sequence
for Vav Haluma uses U+200C
ZERO WIDTH
NON-JOINER (ZWNJ), because Vav
Haluma is a less
connected rendering than Holam Male.
This option has the advantage
that the longer and more complex sequence is used for the less common Vav Haluma, but the disadvantage
that consonantal VAV is treated differently from all
other Hebrew consonants in how it combines with Holam Haser. The fallback behaviour
of this option should be as required.
This sequence was rejected earlier for the same theoretical
reasons as Option A1a, but for the same reasons it can now be
considered acceptable.
This option implies that undifferentiated VAV with HOLAM will be rendered like Holam Male, not like Vav Haluma. It may therefore represent more closely than Options A1a, B1 or B2 the practice of typesetters who do not normally distinguish Vav Haluma from Holam Male but may have to for certain special texts.
The encoding already used by Mechon Mamre is similar to this option except that ZWNJ is replaced by ZWJ. This encoding is apparently supported by existing some fonts and rendering engines, but this support may be largely accidental, because the ZWJ unintentionally breaks a rule to position HOLAM centrally over VAV. The long term encoding of text should not be determined in this way by unintended features of current implementations.[Was Option A3 in the original proposal.]
This option differs from Options A1a and A1b in that explicit
sequences with ZWJ or ZWNJ are used to distinguish both Holam Male and Vav Haluma from the
undifferentiated VAV with HOLAM. Again,
the fallback behaviour of this option
should be as required. Otherwise, this option seems to have the
disadvantages of both Options A1a and A1b.
This is Preferred Option 2. [As proposed by Ted Hopp 2004-06-18.]
This option differs from Option A1a in that a variation selector
(the specific proposal is for U+FE00 VARIATION SELECTOR-1,
or VS1, but an alternative variation selector would be
acceptable) is used in place of ZWNJ. The point has
been made that in Options A1a-A1c ZWJ and ZWNJ
are used where a variation selector is more appropriate. This option is
intended to respond to that point. On the one hand, it can be argued
that the variation sequence <VAV, VS1>
should indicate a variant form of VAV rather than a
variant positioning of the HOLAM dot. On the other
hand, the logical difference between Holam
Male and Vav Haluma is
not so much in the HOLAM as in the VAV.
At the glyph level, the VAV in Holam Male differs from a regular
consonantal VAV in having a different attachment point
for the HOLAM dot. There is also occasional use of a
slightly different VAV glyph in Holam Male, as in Figure 5 above.
There are possible corresponding Options A2b and A2c in which the VAV
in Vav Haluma is represented
by a variation sequence, either instead of or as well as the VAV
in Holam Male. These options
do not seem to have any real advantages, and so are not described
further here.
Arguably it would make more sense to use a variation selector with HOLAM
rather than with VAV, but the definition of variation
selectors does not allow them to be used with combining characters.
[A3 set as proposed by Mark Shoulson 2004-06-18.]
The A3 and A4 options differ from the A1 options in that ZWJ
and ZWNJ are used only between combining character
sequences, and not within them. This corresponds to the usage of these
characters defined in The Unicode
Standard version 4.0.1, and avoids use of combining character
sequences which are technically defective and the extended mechanisms
tentatively accepted at the February 2004 UTC meeting. In these options
ZWJ and ZWNJ are used, according to the definitions in TUS 4.0.1 section 15.2, to indicate
renderings in which whole combining character sequences are
respectively more or less closely connected in rendering.
Option A3a is based on an understanding of Holam Male as a rendering of VAV
with HOLAM which is less connected with the following
base character than Vav Haluma.
It is therefore distinguished from Vav
Haluma by insertion of ZWNJ before the following
base character.
This option differs from Option A3a in that Holam Male is taken as the default
case, and Vav Haluma as a
special case in which the VAV with HOLAM
is taken as more closely connected with the following base character.
One advantage of this is that Vav
Haluma is not normally used word finally, at least in the Hebrew
language, whereas Holam Male
is commonly word final; and so a theoretically problematic common use
of word final ZWNJ is avoided. This option also has the
same advantages and disadvantages relative to Option A3a as Option A1b
does relative to Option A1a.
This option relates to Options A3a and A3b in the same way as Option
A1c relates to Options A1a and A1b.
This is Preferred Option 1. [A4 set as proposed by Peter Kirk
2004-06-19.]
The A3 options are based on VAV with HOLAM
being either more or less connected with the following base character and its
combining character sequence, but this connection difference is not a
real one. But there is a real difference in how Holam Male and Vav Haluma are connected with the preceding base character and its
combining character sequence. Within the logical structure of the
Hebrew abjad, Holam Male acts as the vowel for
the preceding base character and as part of the same syllable; indeed,
if it were a separate character (as in Option C1) a good case could be
made for defining it as a spacing combining mark, comparable to such
marks in Indic scripts. It thus has a closer logical connection with
the preceding base character than does Vav Haluma, which represents a
separate syllable. Graphically, the closer connection is commonly
indicated by the positioning of the HOLAM dot over the
space between the base characters.
Option A4a is based on this understanding of Holam Male as a rendering of VAV
with HOLAM which is more connected with the preceding base character
and its combining character sequence than Vav Haluma. It is therefore
distinguished from Vav Haluma
by insertion of ZWJ between this and the preceding combining character
sequence.
This option differs from Option A4a in that Holam Male is taken as the default case, and Vav Haluma as a special case in which the VAV with HOLAM is taken as less closely connected with the preceding combining character sequence. This seems to accord less well with the logical structure of the script. This option also has the same advantages and disadvantages relative to Option A4a as Option A1b does relative to Option A1a.
This option relates to Options A4a and A4b in the same way as Option A1c relates to Options A1a and A1b.
These options are called "logical structure solutions" because they
represent the dot in Holam Male
according to its logical association with the preceding base character.
In all of these solutions Vav Haluma
and undifferentiated VAV with HOLAM are
represented as <VAV, HOLAM>.
[Was Option B1 in the original proposal.]
In this option Holam Male
is distinguished from Vav Haluma
in that HOLAM is encoded before VAV.
This appears to be a breach of the Unicode rule that combining
characters must follow their associated base characters. But it is not
really a breach of the rule, because the HOLAM
in Holam Male can be
understood as
logically associated with the preceding base character, for which it is
the associated vowel, and the VAV is a separate silent
letter. On this analysis Holam Male
is
analogous to Hiriq Male, i.e.
HIRIQ
followed by silent YOD, in which the HIRIQ
is written below the preceding base character; also to the sequence of HOLAM
with silent ALEF, which is encoded unambiguously in
this order although the HOLAM is often rendered above
the top right side of the ALEF.
With this encoding, the HOLAM is for Unicode purposes linked with the preceding base character in a combining character sequence. The HOLAM will often become separated from the VAV by DAGESH and/or an accent character, because within a combining character sequence DAGESH and accents are sorted after vowel points in canonical ordering and also in the specific orderings recommended for certain fonts.
The fallback behaviour of this encoding, with a font which has not
been set up to work with it, is not ideal but still legible: the Holam Male will be broken up, with
the HOLAM being rendered above the left side of the
preceding base character.
Some existing texts use this encoding, and it is supported in
OpenType fonts
such as SBL Hebrew and Ezra SIL, with Microsoft Windows only. However,
this implementation proved to be very complex, and may be beyond the
capabilities of other rendering systems.
The complicating factor is the rule that Holam Male is not formed, and so HOLAM
is not shifted on to a following VAV, if the VAV
is consonantal and followed by a vowel, except in the divine
name. This rule, which is illustrated in Figure 4 above, is complex and
not entirely conditioned by the immediate glyph or character
environment. In most cases it is possible in principle, although rather
complex, to determine within the font which VAVs are
silent and so may form Holam Male;
the rule is that if VAV is followed by any Hebrew point
or accent it is not silent. But there are two cases where this is not
possible. Firstly, a VAV followed by Holam Male or by Vav Shruqa (i.e. VAV
with DAGESH acting as a vowel; but this combination may
also be consonantal) is consonantal and so cannot form Holam Male, but any attempt to
distinguish these cases within a font is potentially recursive and well
beyond the capabilities of existing rendering systems. (This situation
does not occur in the Hebrew Bible, but it can do in modern Hebrew.)
Secondly, in at least one major edition of the Hebrew Bible, when the
divine name is written with HOLAM (which is in a small
minority of cases) the HOLAM dot is positioned over the
VAV as in Holam Male
although the VAV is consonantal and carries another
vowel point and usually an accent; this case can be distinguished from
a similar word in which the HOLAM is not positioned as
in Holam Male only from the
remote context, in a way which is clearly outside the scope of any
rendering system - see the centre and right hand images in Figure 4.
Since it is beyond the reasonable scope of a rendering system to determine in every case whether Holam Male should be formed or not, there is a need to define more specific encodings at least for certain marginal cases. Thus, for example, formation of Holam Male could be inhibited by the sequence <ZWJ, HOLAM, VAV> or <HOLAM, ZWNJ, VAV>, which would indicate Holam Haser followed by consonantal VAV; but this formation could be promoted by the sequence <ZWNJ, HOLAM, VAV> or <HOLAM, ZWJ, VAV>, which would indicate the rendering of the divine name as in the right hand image in Figure 4. The implication of this is that Option B1 does not in fact have the simplicity which it appears to have at first sight.
[Was Option B2 in the original proposal.]
This option differs from Option B1 in that HOLAM is
preceded by ZWNJ to separate it from the preceding
combining character sequence. Again, this is a sequence which was
rejected earlier for the same theoretical
reasons as Option A1a, but for the same reasons it can now be
considered acceptable. The HOLAM is technically and
logically combined with the preceding base character as in Option B1,
but the intervening ZWNJ can be understood as
indicating that it should not be combined graphically.
With this proposal, any accents and other combining characters which
are graphically as well as logically associated with the preceding base
character should be encoded before the ZWNJ. The ZWNJ,
which is in combining class 0, inhibits canonical reordering, and so
these other combining characters will never be moved to between HOLAM
and VAV. The ZWNJ also explicitly
signals that the HOLAM is to be shifted to form Holam Male or as in the divine
name, and so distinguishes
this from the cases in which the HOLAM dot remains on
the
preceding base character before consonantal VAV. This
implies that it is significantly simpler to
implement Option B2 than Option B1.
This option has the same disadvantage as Options A1a and A1c that
the
length of a text is significantly increased. Its fallback behaviour
should be the same as that of Option B1.
There are possible corresponding Options B2b and B2c in which ZWJ is inserted before HOLAM and VAV when these do not combine to form Holam Male, either instead of or as well as using this sequence for Holam Male. These options do not seem to have any real advantages, and so are not described further here.
This is Preferred Option 3. [As proposed by Mark Shoulson 2004-06-17]
This option differs from Option B1 in that HOLAM is
followed by ZWJ. This sequence has the advantage over
the one in Option B2 that ZWJ is used between combining
character sequences, according to the definitions in TUS version 4.0.1. ZWJ
is properly used to indicate a more closely connected rendering of the
two combining character sequences, in that the HOLAM
dot which logically belongs to the former is graphically shifted on to
the latter. ZWJ can be omitted where the HOLAM
dot is not to be shifted, but included in the anomalous cases of the
divine name. Therefore, again, this option is significantly simpler to
implement than Option B1. But it does not have the advantage of Option
B2 of inhibiting canonical reordering, and so the implementation
advantage is less.
This option has the same disadvantage as Options A1a and A1c that
the
length of a text is significantly increased. Its fallback behaviour
should be the same as that of Option B1.
There are possible corresponding Options B3b and B3c in which ZWNJ is inserted before VAV following HOLAM when these do not combine to form Holam Male, either instead of or as well as using this sequence for Holam Male. These options do not seem to have any real advantages, and so are not described further here.
The common factor with these options is that one or more new Unicode
characters
is proposed, for use only when Holam
Male is to be distinguished from Vav Haluma. They have the common
disadvantage that they have very poor
fallback behaviour when used with fonts which do not support the new
character. Some experts have commented that any of these solutions have
the effect of making existing uses of HOLAM illegal. In
fact the definitions could be carefully written so that existing uses
are not made illegal but only deprecated. Nevertheless, this effect on
existing texts is a significant argument against any of these new
character solutions.
[Was Option C1 in the original proposal.]
In some ways the simplest option of all is to define a new Unicode
character HEBREW LETTER HOLAM MALE, which might have a
compatibility decomposition to <VAV, HOLAM>.
This would certainly be simple to implement, and would reduce the size
of the encoded text. But it would have no suitable fallback behaviour
with fonts which do not support this new character. This solution also
loses the essential identity of the HOLAM and the VAV
in Holam Male with HOLAM
and VAV in other contexts.
This is Less Preferred Option 4. [Was Option C2 in the original
proposal.]
This is the first of four options based on defining one or two new combining characters for variant of HOLAM. Thus one variant of HOLAM can be used for the dot in Holam Male, and another variant can be used in Vav Haluma. These options are reasonably simple to implement. They have the small advantage over Option C1 that the identity of VAV, though not of HOLAM, is preserved.
In this option, the new combining character is HEBREW
POINT RIGHT HOLAM, and is to be used only in combination with VAV
to form Holam Male.
The existing HOLAM character is to be used only for Holam Haser, when combined with any
Hebrew consonant, and for undifferentiated VAV with HOLAM.
The fallback behaviour is good for Holam
Haser but not for Holam Male.
[Was Option C4 in the original proposal.]
In this option, the new combining character is HEBREW
POINT LEFT HOLAM, and is to be used only in combination with VAV
to form Vav Haluma. The
existing HOLAM character is to be used in combination
with VAV
to form Holam Male, and for Holam Haser in combination with
consonants other than VAV, and for undifferentiated VAV
with HOLAM. The fallback behaviour is
good except for the relatively rare Vav
Haluma, i.e. Holam Haser
with VAV. But this option introduces an entirely
illogical
distinction between Holam Haser
with VAV and Holam
Haser with other
consonants, which is justified neither by character semantics nor by
typography.
In this option two new combining characters are defined: HEBREW
POINT RIGHT HOLAM to be used as in Option C2a and HEBREW
POINT LEFT HOLAM to be used as in Option C2b. The existing HOLAM
character is to be used with VAV only for
undifferentiated VAV with HOLAM. The
fallback behaviour is uniformly bad for all cases of VAV
with HOLAM, and it introduces the same illogical
distinctions as Option C2b. The only advantage of defining a second new
combining character is that it would make possible support for a
three-way distinction in HOLAM positioning for which no
requirement has been demonstrated.
[Was Option C3 in the original proposal.]
This option differs from the C2 options, and indeed from all the
other options in this proposal, in proposing a change in the
representation of HOLAM even when not associated with VAV.
In this option, the new combining character is HEBREW POINT
HOLAM HASER,
and is to be used for Holam Haser
when combined with any Hebrew consonant, not only with VAV.
The existing HOLAM
character is to be used only in combination with VAV to
form Holam Male, and for
every HOLAM if Holam
Male is not differentiated from Vav Haluma. The
fallback behaviour is good for Holam
Male but not for Holam Haser;
this may be preferable to the fallback behaviour of Option C2a because Holam
Male is commoner than Holam
Haser in modern Hebrew.
This is Less Preferred Option 5.
This option is based on the observation that Holam Male differs from Vav Haluma not in the HOLAM
but in the VAV. Therefore it is theoretically preferable
to define a new VAV character rather than a new HOLAM
character. Unicode does not encode distinctions between consonants and
vowels when there is no graphical distinction; thus there is only one LATIN
SMALL LETTER Y. However, there is a graphical distinction
between the VAVs in Holam
Male and Vav Haluma,
in that they are positioned differently relative to HOLAM;
also a distinctive VAV glyph is occasionally used in Holam Male, as shown in Figure 5.
There is thus justification for encoding a separate character HEBREW
LETTER VAV VOWEL, for use primarily as the base character in Holam Male, and possibly also as
the base character in Vav Shruqa.
However, this option shares with all the new character solutions the
disadvantage of bad fallback behaviour.
| Option | Summary |
Fallback Behaviour |
Advantages | Disadvantages |
| A1a |
Holam Male = <VAV, ZWJ, HOLAM> |
Excellent |
Best fit to the graphical
structure of Hebrew script; best
fallback behaviour |
ZWJ used
within combining character sequence and with
semantic significance; long sequence for a common character |
| A1b |
Vav Haluma = <VAV, ZWNJ, HOLAM> |
Excellent |
Best fit to the graphical
structure of Hebrew script; best fallback behaviour; long sequence only
for a rare
combination |
ZWNJ used within combining character sequence and with semantic significance; arbitrary use of different sequence for Holam Haser in the context of VAV |
| A1c |
Holam Male = <VAV, ZWJ, HOLAM> and Vav Haluma = <VAV, ZWNJ, HOLAM> | Excellent |
Best fit to the graphical
structure of Hebrew script; best
fallback behaviour; support for conjectured three-way HOLAM
positioning distinction |
ZWJ and ZWNJ used within combining character sequence and with semantic significance; long sequence for a common character; arbitrary use of different sequence for Holam Haser in the context of VAV |
| A2a |
Holam Male = <VAV,
VS1, HOLAM> |
Excellent |
Doesn't use ZWJ or ZWNJ; best fallback behaviour | Variation selector used with semantic significance; long sequence for a common character |
| A3a |
Holam Male = <VAV, HOLAM, ZWNJ> | Excellent |
Best fallback behaviour | ZWNJ used arbitrarily with semantic significance; long sequence for a common character |
| A3b |
Vav Haluma = <VAV, HOLAM, ZWJ> | Excellent |
Best fallback behaviour; long sequence only for a rare combination | ZWJ used arbitrarily with semantic significance; arbitrary use of different sequence for Holam Haser in the context of VAV |
| A3c |
Holam Male = <VAV,
HOLAM, ZWNJ> and Vav Haluma = <VAV,
HOLAM, ZWJ> |
Excellent |
Best fallback behaviour; support for conjectured three-way HOLAM positioning distinction | ZWJ and ZWNJ used arbitrarily with semantic significance; long sequence for a common character; arbitrary use of different sequence for Holam Haser in the context of VAV |
| A4a |
Holam Male = <ZWJ, VAV, HOLAM> | Excellent |
Use of ZWJ corresponds to logical structure of script; best fallback behaviour | ZWJ used with semantic significance; long sequence for a common character |
| A4b |
Vav Haluma = <ZWNJ, VAV, HOLAM> | Excellent |
Best fallback behaviour; long sequence only for a rare combination | ZWNJ used arbitrarily with semantic significance; arbitrary use of different sequence for Holam Haser in the context of VAV |
| A4c |
Holam Male = <ZWJ,
VAV, HOLAM> and Vav Haluma = <ZWNJ,
VAV, HOLAM> |
Excellent |
Best fallback behaviour; support for conjectured three-way HOLAM positioning distinction | ZWJ and ZWNJ used arbitrarily with semantic significance; long sequence for a common character; arbitrary use of different sequence for Holam Haser in the context of VAV |
| B1 |
Holam Male = <HOLAM, VAV> |
Legible |
Best fit to the logical
structure of Hebrew script; doesn't use ZWJ, ZWNJ
or variation sequence;
existing
implementations and texts |
Most complex implementation;
difficulties with unusual
combinations e.g. the divine name; difficulties with canonical
reordering |
| B2a |
Holam Male = <ZWNJ, HOLAM, VAV> |
Legible |
Best fit to the logical
structure of Hebrew script; implementation much easier than Option B1 |
ZWNJ used within combining character sequence, but with only graphical significance; long sequence for a common character |
| B3a |
Holam Male = <HOLAM, ZWJ, VAV> | Legible |
Best fit to the logical
structure of Hebrew script; ZWJ used as defined in TUS 4.0.1; implementation easier
than Option B1 |
Long sequence for a common character; difficulties with canonical reordering |
| C1 |
New
character HOLAM MALE |
Holam Male illegible |
Doesn't use ZWJ,
ZWNJ or variation sequence; simplest implementation |
Bad fallback behaviour; unity
of HOLAM
lost |
| C2a |
New character RIGHT HOLAM | Holam Male illegible |
Doesn't use ZWJ, ZWNJ or variation sequence | Bad fallback behaviour; unity
of HOLAM lost |
| C2b |
New character LEFT HOLAM | Vav Haluma illegible |
Doesn't use ZWJ,
ZWNJ or variation sequence; few characters
affected by
bad fallback behaviour |
Unity of HOLAM
and of Holam Haser lost;
arbitrary use of
different character for Holam Haser
in the context of VAV |
| C2c |
Two new
characters RIGHT HOLAM and LEFT HOLAM |
All VAV
with HOLAM
combinations illegible |
Doesn't use ZWJ, ZWNJ or variation sequence; support for conjectured three-way HOLAM positioning distinction | Worst fallback behaviour;
unity of HOLAM and of Holam
Haser lost; arbitrary use of
different character for Holam Haser
in the context of VAV; unnecessary new character defined |
| C3 |
New character HOLAM HASER | Holam Haser illegible |
Doesn't use ZWJ, ZWNJ or variation sequence | Bad fallback behaviour; unity
of HOLAM lost |
| C4 |
New character VAV VOWEL | Holam Male illegible |
Doesn't use ZWJ, ZWNJ or variation sequence | Bad fallback behaviour |