Format strings module

Format string description.

The format string syntax is heavily influenced by {fmt} and std::format, and is largely compatible with it. Scanning functions, such as scn::scan and scn::input, use the format string syntax described in this section.

Format strings consist of:

  • Replacement fields, which are surrounded by curly braces {}.
  • Non-whitespace characters (except {}; for literal braces, use {{ and }}), which consume exactly one identical character from the input
  • Whitespace characters, which consume any and all available consecutive whitespace from the input.

Literal characters are matched by code point one-to-one, with no normalization being done. Ä (U+00C4, UTF-8 0xc3 0x84) only matches another U+00C4, and not, for example, U+00A8 (DIAERESIS) and U+0041 (LATIN CAPITAL LETTER A).

Characters (code points) are considered to be whitespace characters by the Unicode Pattern_White_Space property, as defined by UAX31-R3a. These code points are:

  • ASCII whitespace characters ("\t\n\v\f\r ")
  • U+0085 (next line)
  • U+200E and U+200F (LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)
  • U+2028 and U+2029 (LINE SEPARATOR and PARAGRAPH SEPARATOR)

The grammar for a replacement field is as follows:

replacement-field   ::= '{' [arg-id] [':' format-spec] '}'
arg-id              ::= positive-integer

format-spec         ::= [width] ['L'] [type]
width               ::= positive-integer
type                ::= 'a' | 'A' | 'b' | 'B' | 'c' | 'd' |
                        'e' | 'E' | 'f' | 'F' | 'g' | 'G' |
                        'o' | 'p' | 's' | 'x' | 'X' | 'i' | 'u'

Argument IDs

The arg-id specifier can be used to index arguments manually. If manual indexing is used, all of the indices in a format string must be stated explicitly. The same arg-id can appear in the format string only once, and must refer to a valid argument.

// Format string equivalent to "{0} to {1}"
auto a = scn::scan<int, int>("2 to 300", "{} to {}");
// a->values() == (2, 300)

// Manual indexing
auto b = scn::scan<int, int>("2 to 300", "{1} to {0}");
// b->values() == (3, 200)

// INVALID:
// Automatic and manual indexing is mixed
auto c = scn::scan<int, int>("2 to 300", "{} to {0}");

// INVALID:
// Same argument is referred to multiple times
auto d = scn::scan<int, int>("2 to 300", "{0} to {0}");

// INVALID:
// {2} does not refer to an argument
auto e = scn::scan<int, int>("2 to 300", "{0} to {2}");

Width

Width specifies the maximum number of characters that will be read from the source range. It can be any unsigned integer. When using the ‘'c’` type specifier for strings, specifying the width is required.

auto r = scn::scan<std::string>("abcde", "{:3}");
// r->value() == "abc"

For the purposes of width calculation, the same algorithm is used that in {fmt}. Every code point has a width of one, except the following ones have a width of 2:

  • any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX#44
  • U+4DC0 – U+4DFF (Yijing Hexagram Symbols)
  • U+1F300 – U+1F5FF (Miscellaneous Symbols and Pictographs)
  • U+1F900 – U+1F9FF (Supplemental Symbols and Pictographs)

Localized

The L flag enables localized scanning. Its effects are different for each type it is used with:

  • For integers, it enables locale-specific thousands separators
  • For floating-point numbers, it enables locale-specific thousands and radix (decimal) separators
  • For booleans, it enables locale-specific textual representations (for true and false)
  • For other types, it has no effect

Type specifier

The type specifier determines how the data is to be scanned. The type of the argument to be scanned determines what flags are valid.

Type specifier: strings

String types (std::basic_string and std::basic_string_view)
TypeMeaning
none, sCopies from the input until a whitespace character is encountered. Preceding whitespace is skipped.
cCopies from the input until the field width is exhausted. Does not skip preceding whitespace. Errors if no field width is provided.
[...]Character set matching: copies from the input until a character not specified in the set is encountered. Character ranges can be specified with -, and the entire selection can be inverted with a prefix ^. Matches and supports arbitrary Unicode code points. Does not skip preceding whitespace.
/<regex>/<flags>

Regular expression matching: copies from the input until the input does not match the regex. Does not skip preceding whitespace.

Type specifier: integers

Integer values are scanned as if by using std::from_chars, except:

  • A positive + sign and a base prefix (like 0x) are always allowed to be present
  • Preceding whitespace is skipped.
Integer types (signed and unsigned variants of char, short, int, long, and long long)
TypeMeaning
b, Bstd::from_chars with base 2. The base prefix is 0b or 0B.
o, Ostd::from_chars with base 8. The base prefix is 0o or 0O, or just 0.
x, Xstd::from_chars with base 16. The base prefix is 0x or 0X.
dstd::from_chars with base 10. No base prefix allowed.
ustd::from_chars with base 10. No base prefix or - sign allowed.
iDetect the base from a possible prefix, defaulting to decimal (base-10).
rXX (where XX = [2, 36])Custom base, without a base prefix (r stands for radix).
cCopies a character (code unit) from the input.
noneSame as i.

Type specifier: characters

Character types (char and wchar_t), and code points (char32_t)
TypeMeaning
none, cCopies a character (code point for char32_t, code unit otherwise) from the input.
b, B, d, i, o, O, u, x, XSame as for integers, see above Type specifier: integers. Not allowed for char32_t.

Type specifier: floating-point values

Floating-point values are scanned as if by using std::from_chars, except:

  • A positive + sign and a base prefix (like 0x) are always allowed to be present
  • Preceding whitespace is skipped.
Floating-point types (float, double, and long double)
TypeMeaning
a, Astd::from_chars with std::chars_format::hex. Prefix 0x/0X is allowed.
e, Estd::from_chars with std::chars_format::scientific.
f, Fstd::from_chars with std::chars_format::fixed.
g, Gstd::from_chars with std::chars_format::general.
nonestd::from_chars with std::chars_format::general | std::chars_format::hex. Prefix 0x/0X is allowed.

Type specifier: booleans

bool
TypeMeaning
sAllows for the textual representation (true or false).
b, B, d, i, o, O, u, x, XAllows for the integral/numeric representation (0 or 1).
noneAllows for both the textual and the integral/numeric representation.

Classes

template <typename CharT, typename Source, typename... Args>
class scn::basic_scan_format_string
template <typename CharT>
struct scn::detail::basic_runtime_format_string
template <typename T>
struct scn::discard

Functions

auto runtime_format(std::string_view s) →  detail::basic_runtime_format_string<char>

Function documentation

detail::basic_runtime_format_string<char> runtime_format(std::string_view s)

Create a runtime format string

Can be used to avoid compile-time format string checking