Format strings module
Format string description.
The format string syntax is heavily influenced by {fmt} and std::
, and is largely compatible with it. Scanning functions, such as scn::
and scn::
, use the format string syntax described in this section.
Format strings consist of:
- Replacement fields, which are surrounded by curly braces
{}
. - Non-whitespace characters (except
{}
; for literal braces, use{{
and}}
), which consume exactly one identical character from the input - Whitespace characters, which consume any and all available consecutive whitespace from the input.
Literal characters are matched by code point one-to-one, with no normalization being done. Ä
(U+00C4, UTF-8 0xc3 0x84) only matches another U+00C4, and not, for example, U+00A8 (DIAERESIS) and U+0041 (LATIN CAPITAL LETTER A).
Characters (code points) are considered to be whitespace characters by the Unicode Pattern_White_Space property, as defined by UAX31-R3a. These code points are:
- ASCII whitespace characters ("\t\n\v\f\r ")
- U+0085 (next line)
- U+200E and U+200F (LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)
- U+2028 and U+2029 (LINE SEPARATOR and PARAGRAPH SEPARATOR)
The grammar for a replacement field is as follows:
replacement-field ::= '{' [arg-id] [':' format-spec] '}' arg-id ::= positive-integer format-spec ::= [width] ['L'] [type] width ::= positive-integer type ::= 'a' | 'A' | 'b' | 'B' | 'c' | 'd' | 'e' | 'E' | 'f' | 'F' | 'g' | 'G' | 'o' | 'p' | 's' | 'x' | 'X' | 'i' | 'u'
Argument IDs
The arg-id
specifier can be used to index arguments manually. If manual indexing is used, all of the indices in a format string must be stated explicitly. The same arg-id
can appear in the format string only once, and must refer to a valid argument.
// Format string equivalent to "{0} to {1}" auto a = scn::scan<int, int>("2 to 300", "{} to {}"); // a->values() == (2, 300) // Manual indexing auto b = scn::scan<int, int>("2 to 300", "{1} to {0}"); // b->values() == (3, 200) // INVALID: // Automatic and manual indexing is mixed auto c = scn::scan<int, int>("2 to 300", "{} to {0}"); // INVALID: // Same argument is referred to multiple times auto d = scn::scan<int, int>("2 to 300", "{0} to {0}"); // INVALID: // {2} does not refer to an argument auto e = scn::scan<int, int>("2 to 300", "{0} to {2}");
Width
Width specifies the maximum number of characters that will be read from the source range. It can be any unsigned integer. When using the ‘'c’` type specifier for strings, specifying the width is required.
auto r = scn::scan<std::string>("abcde", "{:3}"); // r->value() == "abc"
For the purposes of width calculation, the same algorithm is used that in {fmt}. Every code point has a width of one, except the following ones have a width of 2:
- any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX#44
- U+4DC0 – U+4DFF (Yijing Hexagram Symbols)
- U+1F300 – U+1F5FF (Miscellaneous Symbols and Pictographs)
- U+1F900 – U+1F9FF (Supplemental Symbols and Pictographs)
Localized
The L
flag enables localized scanning. Its effects are different for each type it is used with:
- For integers, it enables locale-specific thousands separators
- For floating-point numbers, it enables locale-specific thousands and radix (decimal) separators
- For booleans, it enables locale-specific textual representations (for
true
andfalse
) - For other types, it has no effect
Type specifier
The type specifier determines how the data is to be scanned. The type of the argument to be scanned determines what flags are valid.
Type specifier: strings
Type | Meaning |
---|---|
none, s | Copies from the input until a whitespace character is encountered. Preceding whitespace is skipped. |
c | Copies from the input until the field width is exhausted. Does not skip preceding whitespace. Errors if no field width is provided. |
[...] | Character set matching: copies from the input until a character not specified in the set is encountered. Character ranges can be specified with - , and the entire selection can be inverted with a prefix ^ . Matches and supports arbitrary Unicode code points. Does not skip preceding whitespace. |
/<regex>/<flags> | Regular expression matching: copies from the input until the input does not match the regex. Does not skip preceding whitespace. |
Type specifier: integers
Integer values are scanned as if by using std::
, except:
- A positive
+
sign and a base prefix (like0x
) are always allowed to be present - Preceding whitespace is skipped.
Type | Meaning |
---|---|
b , B | std:: with base 2 . The base prefix is 0b or 0B . |
o , O | std:: with base 8 . The base prefix is 0o or 0O , or just 0 . |
x , X | std:: with base 16 . The base prefix is 0x or 0X . |
d | std:: with base 10 . No base prefix allowed. |
u | std:: with base 10 . No base prefix or - sign allowed. |
i | Detect the base from a possible prefix, defaulting to decimal (base-10). |
rXX (where XX = [2, 36]) | Custom base, without a base prefix (r stands for radix). |
c | Copies a character (code unit) from the input. |
none | Same as i . |
Type specifier: characters
Type | Meaning |
---|---|
none, c | Copies a character (code point for char32_t , code unit otherwise) from the input. |
b , B , d , i , o , O , u , x , X | Same as for integers, see above Type specifier: integers. Not allowed for char32_t . |
Type specifier: floating-point values
Floating-point values are scanned as if by using std::
, except:
- A positive
+
sign and a base prefix (like0x
) are always allowed to be present - Preceding whitespace is skipped.
Type | Meaning |
---|---|
a , A | std:: with std::chars_format::hex . Prefix 0x /0X is allowed. |
e , E | std:: with std::chars_format::scientific . |
f , F | std:: with std::chars_format::fixed . |
g , G | std:: with std::chars_format::general . |
none | std:: with std::chars_format::general | std::chars_format::hex . Prefix 0x /0X is allowed. |
Type specifier: booleans
Type | Meaning |
---|---|
s | Allows for the textual representation (true or false ). |
b , B , d , i , o , O , u , x , X | Allows for the integral/numeric representation (0 or 1 ). |
none | Allows for both the textual and the integral/numeric representation. |
Classes
-
template <typename CharT, typename Source, typename... Args>class scn::basic_scan_format_string
-
template <typename CharT>struct scn::detail::basic_runtime_format_string
-
template <typename T>struct scn::discard
Functions
-
auto runtime_format(std::
string_view s) → detail::basic_runtime_format_string<char>
Function documentation
detail::basic_runtime_format_string<char> runtime_format(std:: string_view s)
Create a runtime format string
Can be used to avoid compile-time format string checking