Format strings module
Format string description.
The format string syntax is heavily influenced by {fmt} and std::
, and is largely compatible with it. Scanning functions, such as scn::
and scn::
, use the format string syntax described in this section.
Format strings consist of:
- Replacement fields, which are surrounded by curly braces
{}
. - Non-whitespace characters (except
{}
; for literal braces, use{{
and}}
), which consume exactly one identical character from the input - Whitespace characters, which consume any and all available consecutive whitespace from the input.
Literal characters are matched by code point one-to-one, with no normalization being done. Ä
(U+00C4, UTF-8 0xc3 0x84) only matches another U+00C4, and not, for example, U+00A8 (DIAERESIS) and U+0041 (LATIN CAPITAL LETTER A).
Characters (code points) are considered to be whitespace characters by the Unicode Pattern_White_Space property, as defined by UAX31-R3a. These code points are:
- ASCII whitespace characters ("\t\n\v\f\r ")
- U+0085 (next line)
- U+200E and U+200F (LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)
- U+2028 and U+2029 (LINE SEPARATOR and PARAGRAPH SEPARATOR)
The grammar for a replacement field is as follows:
replacement-field ::= '{' [arg-id] [':' format-spec] '}' arg-id ::= positive-integer format-spec ::= [fill-and-align] [width] [precision] ['L'] [type] fill-and-align ::= [fill] align fill ::= any character other than '{' or '}' align ::= one of '<' '>' '^' width ::= positive-integer precision ::= '.' nonnegative-integer type ::= 'a' | 'A' | 'b' | 'B' | 'c' | 'd' | 'e' | 'E' | 'f' | 'F' | 'g' | 'G' | 'o' | 'p' | 's' | 'x' | 'X' | 'i' | 'u'
Argument IDs
The arg-id
specifier can be used to index arguments manually. If manual indexing is used, all of the indices in a format string must be stated explicitly. The same arg-id
can appear in the format string only once, and must refer to a valid argument.
// Format string equivalent to "{0} to {1}" auto a = scn::scan<int, int>("2 to 300", "{} to {}"); // a->values() == (2, 300) // Manual indexing auto b = scn::scan<int, int>("2 to 300", "{1} to {0}"); // b->values() == (300, 2) // INVALID: // Automatic and manual indexing is mixed auto c = scn::scan<int, int>("2 to 300", "{} to {0}"); // INVALID: // Same argument is referred to multiple times auto d = scn::scan<int, int>("2 to 300", "{0} to {0}"); // INVALID: // {2} does not refer to an argument auto e = scn::scan<int, int>("2 to 300", "{0} to {2}");
Fill and align
Alignment allows for skipping character before and/or after a value. There are three possible values for alignment:
Option | Meaning |
---|---|
< | Align the value to the left (skips fill characters after the value) |
> | Align the value to the right (skips fill characters before the value) |
^ | Align the value to the center (skips fill characters both before and after the value) |
The fill character can be any Unicode code point, except for {
and }
. The default fill is the space character ‘
’ '`.
For format type specifiers other than c
(default for char
and wchar_t
, available for string
and string_view
), [...]
, and the regex /.../
, the default alignment is >
. Otherwise, the default alignment is <
.
In addition to the skipping of fill characters, for format type specifiers with the >
default alignment, preceding whitespace is automatically skipped. This preceding whitespace isn't counted as part of the field width, as described below.
The number of fill characters consumed can be controlled with the width and precision specifiers.
Width
Width specifies the minimum number of characters that will be read from the source range. It can be any unsigned integer. Any fill characters skipped are included in the width
For the purposes of width calculation, the same algorithm is used that in {fmt}. Every code point has a width of one, except the following ones have a width of 2:
- any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX#44
- U+4DC0 – U+4DFF (Yijing Hexagram Symbols)
- U+1F300 – U+1F5FF (Miscellaneous Symbols and Pictographs)
- U+1F900 – U+1F9FF (Supplemental Symbols and Pictographs)
Precision
Precision specifies the maximum number of characters that will be read from the source range. The method for counting characters is the same as above, with the width field.
Localized
The L
flag enables localized scanning. Its effects are different for each type it is used with:
- For integers, it enables locale-specific thousands separators
- For floating-point numbers, it enables locale-specific thousands and radix (decimal) separators
- For booleans, it enables locale-specific textual representations (for
true
andfalse
) - For other types, it has no effect
Type specifier
The type specifier determines how the data is to be scanned. The type of the argument to be scanned determines what flags are valid.
Type specifier: strings
Type | Meaning |
---|---|
none, s | Copies from the input until a whitespace character is encountered, or, if using the < (left) or ^ (center) alignment, a fill character is encountered. |
c | Copies from the input until the field width is exhausted. Doesn't skip preceding whitespace. Errors if no field precision is provided. |
[...] | Character set matching: copies from the input until a character not specified in the set is encountered. Character ranges can be specified with - , and the entire selection can be inverted with a prefix ^ . Matches and supports arbitrary Unicode code points. Doesn't skip preceding whitespace. |
/<regex>/<flags> | Regular expression matching: copies from the input until the input does not match the regex. Doesn't skip preceding whitespace. |
Type specifier: integers
Integer values are scanned as if by using std::
, except a positive +
sign and a base prefix (like 0x
) are always allowed to be present.
Type | Meaning |
---|---|
b , B | std:: with base 2 . The base prefix is 0b or 0B . |
o , O | std:: with base 8 . The base prefix is 0o or 0O , or just 0 . |
x , X | std:: with base 16 . The base prefix is 0x or 0X . |
d | std:: with base 10 . No base prefix allowed. |
u | std:: with base 10 . No base prefix or - sign allowed. |
i | Detect the base from a possible prefix, defaulting to decimal (base-10). |
rXX (where XX = [2, 36]) | Custom base, without a base prefix (r stands for radix). |
c | Copies a character (code unit) from the input. |
none | Same as d . |
Type specifier: characters
Type | Meaning |
---|---|
none, c | Copies a character (code point for char32_t , code unit otherwise) from the input. |
b , B , d , i , o , O , u , x , X | Same as for integers, see above Type specifier: integers. Not allowed for char32_t . |
Type specifier: floating-point values
Floating-point values are scanned as if by using std::
, except a positive +
sign and a base prefix (like 0x
) are always allowed to be present.
Type | Meaning |
---|---|
a , A | std:: with std::chars_format::hex . Prefix 0x /0X is allowed. |
e , E | std:: with std::chars_format::scientific . |
f , F | std:: with std::chars_format::fixed . |
g , G | std:: with std::chars_format::general . |
none | std:: with std::chars_format::general | std::chars_format::hex . Prefix 0x /0X is allowed. |
Type specifier: booleans
Type | Meaning |
---|---|
s | Allows for the textual representation (true or false ). |
b , B , d , i , o , O , u , x , X | Allows for the integral/numeric representation (0 or 1 ). |
none | Allows for both the textual and the integral/numeric representation. |
Classes
-
template <typename CharT, typename Source, typename... Args>class scn::basic_scan_format_string
-
template <typename CharT>struct scn::detail::basic_runtime_format_string
-
template <typename T>struct scn::discard
Functions
-
auto runtime_format(std::
string_view s) → detail::basic_runtime_format_string<char>
Function documentation
detail::basic_runtime_format_string<char> runtime_format(std:: string_view s)
Create a runtime format string
Can be used to avoid compile-time format string checking