Format strings module

Format string description.

The format string syntax is heavily influenced by {fmt} and std::format, and is largely compatible with it. Scanning functions, such as scn::scan and scn::input, use the format string syntax described in this section.

Format strings consist of:

Replacement fields, which are surrounded by curly braces {}.
Non-whitespace characters (except {}; for literal braces, use {{ and }}), which consume exactly one identical character from the input
Whitespace characters, which consume any and all available consecutive whitespace from the input.

Literal characters are matched by code point one-to-one, with no normalization being done. Ä (U+00C4, UTF-8 0xc3 0x84) only matches another U+00C4, and not, for example, U+00A8 (DIAERESIS) and U+0041 (LATIN CAPITAL LETTER A).

Characters (code points) are considered to be whitespace characters by the Unicode Pattern_White_Space property, as defined by UAX31-R3a. These code points are:

ASCII whitespace characters ("\t\n\v\f\r ")
U+0085 (next line)
U+200E and U+200F (LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)
U+2028 and U+2029 (LINE SEPARATOR and PARAGRAPH SEPARATOR)

The grammar for a replacement field is as follows:

replacement-field   ::= '{' [arg-id] [':' format-spec] '}'
arg-id              ::= positive-integer

format-spec         ::= [fill-and-align]
                        [width] [precision]
                        ['L'] [type]
fill-and-align      ::= [fill] align
fill                ::= any character other than
                        '{' or '}'
align               ::= one of '<' '>' '^'
width               ::= positive-integer
precision           ::= '.' nonnegative-integer
type                ::= 'a' | 'A' | 'b' | 'B' | 'c' | 'd' |
                        'e' | 'E' | 'f' | 'F' | 'g' | 'G' |
                        'o' | 'p' | 's' | 'x' | 'X' | 'i' | 'u'

Argument IDs

The arg-id specifier can be used to index arguments manually. If manual indexing is used, all of the indices in a format string must be stated explicitly. The same arg-id can appear in the format string only once, and must refer to a valid argument.

// Format string equivalent to "{0} to {1}"
auto a = scn::scan<int, int>("2 to 300", "{} to {}");
// a->values() == (2, 300)

// Manual indexing
auto b = scn::scan<int, int>("2 to 300", "{1} to {0}");
// b->values() == (300, 2)

// INVALID:
// Automatic and manual indexing is mixed
auto c = scn::scan<int, int>("2 to 300", "{} to {0}");

// INVALID:
// Same argument is referred to multiple times
auto d = scn::scan<int, int>("2 to 300", "{0} to {0}");

// INVALID:
// {2} does not refer to an argument
auto e = scn::scan<int, int>("2 to 300", "{0} to {2}");

Fill and align

Alignment allows for skipping character before and/or after a value. There are three possible values for alignment:

Alignment options
Option	Meaning
`<`	Align the value to the left (skips fill characters after the value)
`>`	Align the value to the right (skips fill characters before the value)
`^`	Align the value to the center (skips fill characters both before and after the value)

The fill character can be any Unicode code point, except for { and }. The default fill is the space character ‘ ’ '`.

For format type specifiers other than c (default for char and wchar_t, available for string and string_view), [...], and the regex /.../, the default alignment is >. Otherwise, the default alignment is <.

In addition to the skipping of fill characters, for format type specifiers with the > default alignment, preceding whitespace is automatically skipped. This preceding whitespace isn't counted as part of the field width, as described below.

The number of fill characters consumed can be controlled with the width and precision specifiers.

Width

Width specifies the minimum number of characters that will be read from the source range. It can be any unsigned integer. Any fill characters skipped are included in the width

For the purposes of width calculation, the same algorithm is used that in {fmt}. Every code point has a width of one, except the following ones have a width of 2:

any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX#44
U+4DC0 – U+4DFF (Yijing Hexagram Symbols)
U+1F300 – U+1F5FF (Miscellaneous Symbols and Pictographs)
U+1F900 – U+1F9FF (Supplemental Symbols and Pictographs)

Precision

Precision specifies the maximum number of characters that will be read from the source range. The method for counting characters is the same as above, with the width field.

Localized

The L flag enables localized scanning. Its effects are different for each type it is used with:

For integers, it enables locale-specific thousands separators
For floating-point numbers, it enables locale-specific thousands and radix (decimal) separators
For booleans, it enables locale-specific textual representations (for true and false)
For other types, it has no effect

Type specifier

The type specifier determines how the data is to be scanned. The type of the argument to be scanned determines what flags are valid.

Type specifier: strings

String types (`std::basic_string` and `std::basic_string_view`)
Type	Meaning
none, `s`	Copies from the input until a whitespace character is encountered, or, if using the `<` (left) or `^` (center) alignment, a fill character is encountered.
`c`	Copies from the input until the field width is exhausted. Doesn't skip preceding whitespace. Errors if no field precision is provided.
`[...]`	Character set matching: copies from the input until a character not specified in the set is encountered. Character ranges can be specified with `-`, and the entire selection can be inverted with a prefix `^`. Matches and supports arbitrary Unicode code points. Doesn't skip preceding whitespace.
`/<regex>/<flags>`	Regular expression matching: copies from the input until the input does not match the regex. Doesn't skip preceding whitespace. See also Regular expressions

Type specifier: integers

Integer values are scanned as if by using std::from_chars, except a positive + sign and a base prefix (like 0x) are always allowed to be present.

Integer types (`signed` and `unsigned` variants of `char`, `short`, `int`, `long`, and `long long`)
Type	Meaning
`b`, `B`	`std::from_chars` with base `2`. The base prefix is `0b` or `0B`.
`o`, `O`	`std::from_chars` with base `8`. The base prefix is `0o` or `0O`, or just `0`.
`x`, `X`	`std::from_chars` with base `16`. The base prefix is `0x` or `0X`.
`d`	`std::from_chars` with base `10`. No base prefix allowed.
`u`	`std::from_chars` with base `10`. No base prefix or `-` sign allowed.
`i`	Detect the base from a possible prefix, defaulting to decimal (base-10).
`rXX` (where XX = [2, 36])	Custom base, without a base prefix (r stands for radix).
`c`	Copies a character (code unit) from the input.
none	Same as `d`.

Type specifier: characters

Character types (`char` and `wchar_t`), and code points (`char32_t`)
Type	Meaning
none, `c`	Copies a character (code point for `char32_t`, code unit otherwise) from the input.
`b`, `B`, `d`, `i`, `o`, `O`, `u`, `x`, `X`	Same as for integers, see above Type specifier: integers. Not allowed for `char32_t`.

Type specifier: floating-point values

Floating-point values are scanned as if by using std::from_chars, except a positive + sign and a base prefix (like 0x) are always allowed to be present.

Floating-point types (`float`, `double`, and `long double`)
Type	Meaning
`a`, `A`	`std::from_chars` with `std::chars_format::hex`. Prefix `0x`/`0X` is allowed.
`e`, `E`	`std::from_chars` with `std::chars_format::scientific`.
`f`, `F`	`std::from_chars` with `std::chars_format::fixed`.
`g`, `G`	`std::from_chars` with `std::chars_format::general`.
none	`std::from_chars` with `std::chars_format::general \| std::chars_format::hex`. Prefix `0x`/`0X` is allowed.

Type specifier: booleans

`bool`
Type	Meaning
`s`	Allows for the textual representation (`true` or `false`).
`b`, `B`, `d`, `i`, `o`, `O`, `u`, `x`, `X`	Allows for the integral/numeric representation (`0` or `1`).
none	Allows for both the textual and the integral/numeric representation.

Classes

template <typename CharT, typename Source, typename... Args> class scn::basic_scan_format_string
template <typename CharT> struct scn::detail::basic_runtime_format_string
template <typename T> struct scn::discard

Functions

auto runtime_format(std::string_view s) → detail::basic_runtime_format_string<char>

Function documentation

detail::basic_runtime_format_string<char> runtime_format(std::string_view s)

Create a runtime format string

Can be used to avoid compile-time format string checking