Format strings module

Format string description.

The format string syntax is heavily influenced by {fmt} and std::format, and is largely compatible with it. Scanning functions, such as scn::scan and scn::input, use the format string syntax described in this section.

Format strings consist of:

  • Replacement fields, which are surrounded by curly braces {}.
  • Non-whitespace characters (except {}; for literal braces, use {{ and }}), which consume exactly one identical character from the input
  • Whitespace characters, which consume any and all available consecutive whitespace from the input.

Literal characters are matched by code point one-to-one, with no normalization being done. Ä (U+00C4, UTF-8 0xc3 0x84) only matches another U+00C4, and not, for example, U+00A8 (DIAERESIS) and U+0041 (LATIN CAPITAL LETTER A).

Characters (code points) are considered to be whitespace characters by the Unicode Pattern_White_Space property, as defined by UAX31-R3a. These code points are:

  • ASCII whitespace characters ("\t\n\v\f\r ")
  • U+0085 (next line)
  • U+200E and U+200F (LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)
  • U+2028 and U+2029 (LINE SEPARATOR and PARAGRAPH SEPARATOR)

The grammar for a replacement field is as follows:

replacement-field   ::= '{' [arg-id] [':' format-spec] '}'
arg-id              ::= positive-integer

format-spec         ::= [fill-and-align]
                        [width] [precision]
                        ['L'] [type]
fill-and-align      ::= [fill] align
fill                ::= any character other than
                        '{' or '}'
align               ::= one of '<' '>' '^'
width               ::= positive-integer
precision           ::= '.' nonnegative-integer
type                ::= 'a' | 'A' | 'b' | 'B' | 'c' | 'd' |
                        'e' | 'E' | 'f' | 'F' | 'g' | 'G' |
                        'o' | 'p' | 's' | 'x' | 'X' | 'i' | 'u'

Argument IDs

The arg-id specifier can be used to index arguments manually. If manual indexing is used, all of the indices in a format string must be stated explicitly. The same arg-id can appear in the format string only once, and must refer to a valid argument.

// Format string equivalent to "{0} to {1}"
auto a = scn::scan<int, int>("2 to 300", "{} to {}");
// a->values() == (2, 300)

// Manual indexing
auto b = scn::scan<int, int>("2 to 300", "{1} to {0}");
// b->values() == (3, 200)

// INVALID:
// Automatic and manual indexing is mixed
auto c = scn::scan<int, int>("2 to 300", "{} to {0}");

// INVALID:
// Same argument is referred to multiple times
auto d = scn::scan<int, int>("2 to 300", "{0} to {0}");

// INVALID:
// {2} does not refer to an argument
auto e = scn::scan<int, int>("2 to 300", "{0} to {2}");

Fill and align

Alignment allows for skipping character before and/or after a value. There are three possible values for alignment:

Alignment options
Option

Meaning

<

Align the value to the left (skips fill characters after the value)

>

Align the value to the right (skips fill characters before the value)

^Align the value to the center (skips fill characters both before and after the value)

The fill character can be any Unicode code point, except for { and }. The default fill is any whitespace character, as specified above.

For format type specifiers other than c (default for char and wchar_t, available for string and string_view), [...], and the regex /.../, the default alignment is >. In practice, this means that leading whitespace is skipped by default. For the c format type specifier, there's no default alignment, and no fill characters are skipped, including whitespace.

The number of fill characters consumed can be controlled with the width and precision specifiers.

Width

Width specifies the minimum number of characters that will be read from the source range. It can be any unsigned integer. Any fill characters skipped are included in the width.

For the purposes of width calculation, the same algorithm is used that in {fmt}. Every code point has a width of one, except the following ones have a width of 2:

  • any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX#44
  • U+4DC0 – U+4DFF (Yijing Hexagram Symbols)
  • U+1F300 – U+1F5FF (Miscellaneous Symbols and Pictographs)
  • U+1F900 – U+1F9FF (Supplemental Symbols and Pictographs)

Precision

Precision specifies the maximum number of characters that will be read from the source range. The method for counting characters is the same as above, with the width field.

Localized

The L flag enables localized scanning. Its effects are different for each type it is used with:

  • For integers, it enables locale-specific thousands separators
  • For floating-point numbers, it enables locale-specific thousands and radix (decimal) separators
  • For booleans, it enables locale-specific textual representations (for true and false)
  • For other types, it has no effect

Type specifier

The type specifier determines how the data is to be scanned. The type of the argument to be scanned determines what flags are valid.

Type specifier: strings

String types (std::basic_string and std::basic_string_view)
TypeMeaning
none, sCopies from the input until a whitespace character is encountered, or, if using the < (left) or ^ (center) alignment, a fill character is encountered.
cCopies from the input until the field width is exhausted. Has no default alignment (doesn't skip preceding whitespace, if no alignment is specified). Errors if no field precision is provided.
[...]Character set matching: copies from the input until a character not specified in the set is encountered. Character ranges can be specified with -, and the entire selection can be inverted with a prefix ^. Matches and supports arbitrary Unicode code points. Has no default alignment (doesn't skip preceding whitespace, if no alignment is specified).
/<regex>/<flags>

Regular expression matching: copies from the input until the input does not match the regex. Has no default alignment (doesn't skip preceding whitespace, if no alignment is specified).

Type specifier: integers

Integer values are scanned as if by using std::from_chars, except a positive + sign and a base prefix (like 0x) are always allowed to be present.

Integer types (signed and unsigned variants of char, short, int, long, and long long)
TypeMeaning
b, Bstd::from_chars with base 2. The base prefix is 0b or 0B.
o, Ostd::from_chars with base 8. The base prefix is 0o or 0O, or just 0.
x, Xstd::from_chars with base 16. The base prefix is 0x or 0X.
dstd::from_chars with base 10. No base prefix allowed.
ustd::from_chars with base 10. No base prefix or - sign allowed.
iDetect the base from a possible prefix, defaulting to decimal (base-10).
rXX (where XX = [2, 36])Custom base, without a base prefix (r stands for radix).
cCopies a character (code unit) from the input.
noneSame as d.

Type specifier: characters

Character types (char and wchar_t), and code points (char32_t)
TypeMeaning
none, cCopies a character (code point for char32_t, code unit otherwise) from the input.
b, B, d, i, o, O, u, x, XSame as for integers, see above Type specifier: integers. Not allowed for char32_t.

Type specifier: floating-point values

Floating-point values are scanned as if by using std::from_chars, except a positive + sign and a base prefix (like 0x) are always allowed to be present.

Floating-point types (float, double, and long double)
TypeMeaning
a, Astd::from_chars with std::chars_format::hex. Prefix 0x/0X is allowed.
e, Estd::from_chars with std::chars_format::scientific.
f, Fstd::from_chars with std::chars_format::fixed.
g, Gstd::from_chars with std::chars_format::general.
nonestd::from_chars with std::chars_format::general | std::chars_format::hex. Prefix 0x/0X is allowed.

Type specifier: booleans

bool
TypeMeaning
sAllows for the textual representation (true or false).
b, B, d, i, o, O, u, x, XAllows for the integral/numeric representation (0 or 1).
noneAllows for both the textual and the integral/numeric representation.

Classes

template <typename CharT, typename Source, typename... Args>
class scn::basic_scan_format_string
template <typename CharT>
struct scn::detail::basic_runtime_format_string
template <typename T>
struct scn::discard

Functions

auto runtime_format(std::string_view s) →  detail::basic_runtime_format_string<char>

Function documentation

detail::basic_runtime_format_string<char> runtime_format(std::string_view s)

Create a runtime format string

Can be used to avoid compile-time format string checking