Discussion:
bug#21904: date->string duff ISO 8601 format for non-4-digit years
Zefram
2015-11-13 14:22:29 UTC
Permalink
The date->string function from (srfi srfi-19), used on ISO 8601 formats
"~1", "~4", and "~5", gets the formatting of year numbers wrong when the
year number doesn't have exactly four digits. There are multiple cases:

scheme@(guile-user)> (date->string (julian-day->date 1500000 0) "~1")
$1 = "-607-10-04"
scheme@(guile-user)> (date->string (julian-day->date 1700000 0) "~1")
$2 = "-59-05-05"
scheme@(guile-user)> (date->string (julian-day->date 1720000 0) "~1")
$3 = "-4-02-05"

For year numbers -999 to -1 inclusive, date->string is using the minimum
number of digits to express the number, but ISO 8601 requires the use
of at least four digits, with zero padding on the left. So one should
write "-0059" rather than "-59", for example. Note that this range is
also affected by the off-by-one error in the selection of the year number
that I described in bug #21903, but that's not the subject of the present
bug report. Here I'm concerned with how the number is represented in
characters, not with how the year is represented numerically.

scheme@(guile-user)> (date->string (julian-day->date 1722000 0) "~1")
$4 = "2-07-29"
scheme@(guile-user)> (date->string (julian-day->date 1730000 0) "~1")
$5 = "24-06-23"
scheme@(guile-user)> (date->string (julian-day->date 2000000 0) "~1")
$6 = "763-09-18"

For year numbers 1 to 999 inclusive, again date->string is using the
minimum number of digits to express the number, but ISO 8601 requires the
use of at least four digits. If no leading "+" sign is used then the
number must be exactly four digits, and that is the appropriate format
to use in this situation. So one should write "0024" rather than "24",
for example.

The year number 0, representing the year 1 BC, logically also falls into
this group, and should be represented textually as "0000". Currently this
case doesn't arise in the function's output, because the off-by-one bug
has it erroneously emit "-1" for that year.

scheme@(guile-user)> (date->string (julian-day->date 10000000 0) "~1")
$7 = "22666-12-20"
scheme@(guile-user)> (date->string (julian-day->date 100000000 0) "~1")
$8 = "269078-08-07"

For year numbers 10000 and above, it is necessary to use more than four
digits for the year, and that's permitted, but ISO 8601 requires that
more than four digits are preceded by a sign. For positive year numbers
the sign must be "+". So one should write "+22666" rather than "22666",
for example.

The formatting of year numbers for ISO 8601 purposes is currently only
correct for numbers -1000 and lower (though the choice of number is off
by one) and for year numbers 1000 to 9999 inclusive.

-zefram
Zefram
2017-04-20 00:04:37 UTC
Permalink
A patch to fix this is attached. The ISO 8601 date formats were
implemented by using the ~Y formatter for the year portion, but SRFI-19
doesn't require ~Y to follow ISO 8601, so this raises the question of
whether ~Y should. It could be fixed by changing ~Y to conform to
ISO 8601, retaining the existing factoring of the formatters. Or a
separate internal formatting function could be instituted to do ISO
8601 year formatting, with ~1 et al using that and ~Y left unchanged.
I chose the former strategy, partly because the funny non-linear year
number doesn't seem a useful thing to support in date->string at all,
but more strongly because it's useful to have access to ISO 8601 year
formatting on its own. There isn't any other format specifier for that
job; it looks like SRFI-19 imagines that ~Y will fill that need.

-zefram
Zefram
2017-04-20 00:07:02 UTC
Permalink
Post by Zefram
I chose the former strategy, partly because the funny non-linear year
number doesn't seem a useful thing to support in date->string at all,
Sorry, this comment is misplaced. It relates to bug#21903; the choice
about ~Y applies to both of these bugs.

-zefram
Mark H Weaver
2018-10-20 22:41:05 UTC
Permalink
Post by Zefram
$4 = "2-07-29"
$5 = "24-06-23"
$6 = "763-09-18"
This particular subset of bugs, for years 0-9999, was fixed in the
upstream SRFI-19 reference implementation, and so I included the same
fix in commit 5106377a3460e1e35daf14ea6edbe80426347155. That fix pads
the year to have at least 4 characters with the requested padding
character (0 by default). However, it does not handle adding the sign
where mandated by ISO 8601.

As with your related bug <https://bugs.gnu.org/21903>, I think this bug
should be reported to upstream SRFI-19, and hopefully they will take it
seriously. I'm reluctant to have Guile deviate from most (all?) other
SRFI-19 implementations in this respect.

There's also the issue that 'string->date' would need to be fixed to
successfully parse the years as printed by 'date->string'.

Would you like to report these issues to upstream SRFI-19?

Regards,
Mark
Mark H Weaver
2018-10-21 00:34:10 UTC
Permalink
Post by Zefram
For year numbers 10000 and above, it is necessary to use more than four
digits for the year, and that's permitted, but ISO 8601 requires that
more than four digits are preceded by a sign. For positive year numbers
the sign must be "+". So one should write "+22666" rather than "22666",
for example.
I skimmed a draft of ISO 8601 that I was able to find gratis online:

https://web.archive.org/web/20171019211402/https://www.loc.gov/standards/datetime/ISO_DIS%208601-1.pdf
https://web.archive.org/web/20171020000043/https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf

and also the ISO 8601 Wikipedia page:

https://en.wikipedia.org/wiki/ISO_8601#Years

and I'm left with a different interpretation about what the standard
permits. As the Wikipedia page says:

To represent years before 0000 or after 9999, the standard also
permits the expansion of the year representation but only by prior
agreement between the sender and the receiver.[19] An expanded year
representation [±YYYYY] must have an agreed-upon number of extra year
digits beyond the four-digit minimum, and it must be prefixed with a +
or − sign[20] [...]

Note the words "but only by prior agreement between the sender and the
receiver", and "must have an agreed-upon number of extra year digits".

You seem to have reached the conclusion that the sender can choose the
number of digits dynamically, leaving the receiver to auto-detect the
number of digits, but that seems to contradict to requirements given
above.

My interpretation is that although ISO 8601 permits the use of expanded
year formats, it seems to require that in a given format, the year must
have a fixed number of digits, and it must _always_ include a sign. In
other words, the receiver should know ahead of time, by prior agreement,
how many digits to expect, and there should _always_ be a sign, even if
the year happens to be in the range 0-9999.

In order to support years outside the range 0-9999 and in accordance
with ISO 8601, I think that 'date->string' and 'string->date' would need
to be extended to allow the caller to specify how many digits to use in
the expanded 'year' format, presumably by adding a new format escape.
If the specified number of digits is greater than 4, then a sign would
*always* be printed. 'string->date' would know how many digits to
expect, and whether to expect a sign.

Ideally, such an extension of 'date->string' and 'string->date' would be
adopted by upstream SRFI-19. However, if that's unsuccessful, I'd be
open to unilaterally adding such an extension. There's precedent for
this in Guile, e.g. see our (srfi srfi-9 gnu) extensions to SRFI-9.

Another question is whether or not we should raise an exception when
attempting to print a year that cannot be represented in the requested
year format.

What do you think?

Mark
Mark H Weaver
2018-10-21 03:53:24 UTC
Permalink
Post by Mark H Weaver
Another question is whether or not we should raise an exception when
attempting to print a year that cannot be represented in the requested
year format.
I thought about it some more, and I'm now inclined to think that the
approach in your patches is reasonable, or at least it's the least bad
thing we can do when asked to print a year that doesn't fit within the
standard format, given the existing SRFI-19 API.

I also just noticed that the SRFI-19's reference implementation's
formatting of negative years is very badly broken (e.g. it prints "00-2"
when the year field is -2) and Guile had the same behavior after I
applied the fix from upstream to pad the year to 4 digits.

So, for now, I went ahead and implemented the behavior that you
recommended, with one difference: where you hardcode the padding
character to #\0 when formatting years, I use the padding character
specified by the user, following the SRFI-19 reference implementation.

What do you think?

Mark

Loading...