scnlib  0.2.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Tutorial

Basics

The most basic operation is reading from stdin, which can be achieved with scn::input. The function takes a format string as its first parameter, which gives the library instructions on how to read the values. Followed by that are references to the values to read.

int i;
// Reads an int from stdin
scn::input("{}", i);
// Equivalent to:
// std::cin >> i;
// scanf("%d", &i);

In this case the format string is "{}". The syntax is familiar from fmtlib, or from Python, where it originated. This format string tells scn::input to read a single value with default options.

Notice, how you don't have to pass any type information with the format string, like you have do with scanf. This information is preserved through the usage of variadic templates, and gives the library stronger type safety.

You can read multiple values with a single call to scn::input:

int i;
double d;
scn::input("{} {}", i, d);
// Equivalent to:
// std::cin >> i >> d;
// scanf("%d %lf", &i, &d);

The preceding snippet reads an integer, followed by whitespace (any combination of spaces, newlines, tabs, what have you) that gets discarded, and a floating-point value.

To make common usage easier, scnlib also provides scn::prompt. It is otherwise equivalent to scn::input, but it takes a string as its first argument, that it prints to stdout. This can be used to give the user instructions on what you're expecting of their input.

int i;
// Prints "Gimme integer pls " and reads an int
scn::prompt("Gimme integer pls ", "{}", i);
// Equivalent to:
// std::cout << "Gimme integer pls ";
// std::cin >> i;
// or
// fputs("Gimme integer pls ", stdout);
// scanf("%d", &i);

Ranges

We can, of course, read from other sources than stdin. In fact, with scnlib, we can read from any range, as long as it fulfills certain requirements. If you're not familiar with C++20 ranges, don't worry; conceptually, they're quite simple. A range is simply something that one can call begin() and end() on. For example, a std::string or a std::vector are ranges.

The library can't work with every range, though. Most importantly, it needs to be a view, meaning that it doesn't own its elements, and is fast to copy. Examples of views are std::string_view and std::span.

This range can then be passed as the first parameter to scn::scan:

int i;
// A string literal is something one _can_ pass to scn::scan
scn::scan("42", "{}", i);
// i == 42

scn::scan takes the input by forwarding reference. This means, that if it's given a modifiable lvalue (T&), the same variable can easily be used in multiple calls to scn::scan.

auto input = scn::string_view("123 foo");
int i;
scn::scan(input, "{}", i);
// i == 123
// input == " foo"
std::string str;
scn::scan(input, "{}", str);
// str == "foo"
// input is empty

A convenience function, scn::make_view, is provided, which makes converting a range to an appropriate view easier.

std::string str = ...;
auto view = scn::make_view(str);
scn::scan(view, ...);

Note, that const char* is not a range, but const char(&)[N] is. This has the unfortunate consequence that this works:

// "foo" is a const char(&)[4]
scn::scan("foo", ...);

But this doesn't:

auto str = "foo";
// str is a const char*
scn::scan(str, ...);
// Error will be along the lines of
// "Cannot call begin on a const char*"

This is caused by the way string literals and array decay work in the language.

This can be worked around with scn::make_view:

auto str = scn::make_view("foo");
// str is a scn::string_view
scn::scan(str, ...);

Reading from files is also supported, with a range wrapping a FILE*.

auto f = std::fopen(...);
// Non-owning wrapper around a FILE*
auto file = scn::file(f);
// scn::file does _not_ sync with the underlying FILE* by default
// call .sync() if you wish to use scnlib in conjunction with <cstdio>
file.sync();
// scn::file doesn't take ownership, and doesn't close
std::fclose(f);

scn::cstdin() returns a scn::file pointing to stdin.

Alternative tuple-based API

By including <scn/tuple_return.h> an alternative API becomes available, returning a std::tuple instead of taking references.

// Use structured bindings with C++17
auto [result, i] = scn::scan_tuple<int>(range, "{}");
// result is a `scan_result`, similar to the return value of `scn::scan`
// Error handling is further touched upon later
// i is an `int`, scanned from the range

Strings and getline

Reading a std::string with scnlib works the same way it does with operator>> and <iostream>: the input range is read until a whitespace character or EOF is found. This effectively means, that scanning a std::string reads a word at a time.

auto source = scn::make_view("Hello world!");
std::string word;
scn::scan(source, "{}", word);
// word == "Hello"
scn::scan(source, "{}", word);
// word == "world!"

If reading word-by-word isn't what you're looking for, you can use scn::getline. It works pretty much the same way as std::getline does for std::strings.

// Using the source range from the earlier example
std::string word;
// A third parameter could be given, denoting the delimeter
// Defaults to '\n'
scn::getline(source, word);
// word == "Hello world!"
// The delimeter is not included in the output

Error handling

scnlib does not use exceptions for error handling. Instead, scn::scan and others return a scn::scan_result, which is an object that contains:

  • an integer, telling the number of arguments successfully read
  • a range, denoting the unused part of the input range
  • an scn::error object
// successful read:
int i{}
auto ret = scn::scan("42 leftovers", "{}", i);
// ret == true
// ret.value() == 1
// ret.range() == " leftovers"
// ret.error() == true
// failing read:
int i{};
auto ret = scn::scan("foo", "{}", i);
// ret == false
// ret.value() == 0
// ret.range() == "foo"
// ret.error() == false

The scn::error object can be examined further. It contains an error code scn::error::code, accessible with member function code() and a message, that can be get with msg().

auto ret = scn::scan(range, "{}", value);
if (!ret) {
std::cout << "Read failed with message: '" << ret.error().msg() << "'\n";
}

Please note, that EOF is also an error, with error code scn::error::end_of_range.

If the error is of such quality that it cannot be recovered from, the range becomes bad, and the member function is_recoverable() of scn::error will return false. This means, that the range is unusable and in an indeterminate state.

See scn::error for more details about the error codes.

Error guarantees

Should the reading of any of the arguments fail, and the range is not bad, the state of the range will be reset to what it was before the reading of said argument. Also, the argument will not be written to.

int i{}, j{};
// "foo" cannot be read to an integer, so this will fail
auto ret = scn::scan("123 foo", "{} {}", i, j);
assert(!ret);
// First read succeeded
assert(ret.value() == 1);
assert(i == 123);
// Second read failed, value was not touched
assert(j == 0);
assert(ret.error().code() == scn::error::invalid_scanned_value);
// std::string so operator== works
assert(ret.range() == std::string{" foo"});
// The range now contains "foo",
// as it was reset to the state preceding the read of j
std::string s{};
ret = scn::scan(ret.range(), "{}", s);
// This succeeds
assert(ret);
assert(ret.value() == 1);
assert(s == "foo");
assert(ret.range().empty() == true);

Exceptions

No exceptions will ever be thrown by scnlib functions (save for a std::bad_alloc, but that's probably your fault). Should any user-defined operations, like operator* on an iterator, or operator>>, throw, the behavior is undefined.

The library can be compiled with -fno-exceptions and -fno-rtti.

scan_value

If you only wish to scan a single value with all default options, you can save some cycles and use scn::scan_value. Instead of taking its argument by reference, it returns the read value. It is functionally equivalent to scn::scan(range, scn::default_tag, value).

auto ret = scn::scan_value<int>("42 leftovers");
// ret == true
// ret.value() == 42
// ret.range() == " leftovers"

Wide ranges

Ranges can also be wide (terminology borrowed from iostreams), meaning that their character type is wchar_t instead of char. This has some usage implications.

The format string must be wide:

scn::scan(range, L"{}", value);

chars and std::strings cannot be read from a wide range, but wchar_ts and std::wstrings can.

std::wstring word;
scn::scan(range, L"{}", word);

Ranges with character types other that char and wchar_t are not supported, due to lacking support for them in the standard library. Converting between character types is out-of-score for this library.

Encoding and Unicode

Because of the rather lackluster Unicode support of the standard library, this library doesn't have any significant Unicode support either.

Narrow ranges are expected to be ASCII encoded, and using multibyte encodings (like UTF-8) with them is probably going to cause problems (blame std::locale). If you need some sort of Unicode support, your best bet is going to be wide ranges, encoded in the way your platform expects (UTF-32 in POSIX, the thing resembling UCS-2 in Windows)

Format string

Every value to be scanned from the input range is marked with a pair of curly braces "{}" in the format string. Inside these braces, additional options can be specified. The syntax is not dissimilar from the one found in fmtlib.

The information inside the braces consist of two parts: the index and the scanning options, separated by a colon ':'.

The index part can either be empty, or be an integer. If the index is specified for one of the arguments, it must be set for all of them. The index tells the library which argument the braces correspond to.

int i;
std::string str;
scn::scan(range, "{1} {0}", i, str);
// Reads from the range in the order of:
// string, whitespace, integer
// That's because the first format string braces have index '1', pointing to
// the second passed argument (indices start from 0), which is a string

After the index comes a colon and the scanning options. The colon only has to be there if any scanning options are specified.

For spans, there are no supported scanning options.

Integral types
There are localization specifiers:
  • n: Use thousands separator from the given locale
  • l: Accept characters specified as digits by the given locale. Implies n
  • (default): Use , as thousands separator and [0-9] as digits
And base specifiers:
  • d: Decimal (base-10)
  • x: Hexadecimal (base-16)
  • o: Octal (base-8)
  • b.. Custom base; b followed by one or two digits (e.g. b2 for binary). Base must be between 2 and 36, inclusive
  • (default): Detect base. 0x/0X prefix for hexadecimal, 0 prefix for octal, decimal by default
  • i: Detect base. Argument must be signed
  • u: Detect base. Argument must be unsigned
And other options:
  • ': Accept thousands separator characters, as specified by the given locale (only with custom-scanning method)
  • (default): Thousands separator characters aren't accepter
These specifiers can be given in any order, with up to one from each category.
Floating-point types
First, there's a localization specifier:
  • n: Use decimal and thousands separator from the given locale
  • (default): Use . as decimal point and , as thousands separator
After that, an optional a, A, e, E, f, F, g or G can be given, which has no effect.
bool
First, there are a number of specifiers that can be given, in any order:
  • a: Accept only true or false
  • n: Accept only 0 or 1
  • l: Implies a. Expect boolean text values as specified as such by the given locale
  • (default): Accept 0, 1, true, and false, equivalent to an
After that, an optional b can be given, which has no effect.
Strings
Only supported option is s, which has no effect
Characters
Only supported option is c, which has no effect
Whitespace
Any amount of whitespace in the format string tells the library to skip until the next non-whitespace character is found from the range. Not finding any whitespace from the range is not an error.
Literal characters
To scan literal characters and immediately discard them, just write the characters in the format string. scanf-like []-wildcard is not supported. To read literal { or }, write {{ or }}, respectively.
std::string bar;
scn::scan("foobar", "foo{}", bar);
// bar == "bar"
Default format string
If you wish to not pass any custom parsing options, you should probably pass a scn::default_tag instead. This will increase performance, as an useless format string doesn't need to be parsed.
scn::scan(range, scn::default_tag, value);
// Equivalent to:
// scn::scan(range, "{}", value);

Localization

To scan localized input, a std::locale can be passed as the first argument to scn::scan_localized.

auto loc = std::locale("fi_FI");
int a, b;
loc,
range, "{} {:n}", a, b);

Only reading of b will be localized, as it has {:n} as its format string.

Semantics of scanning a value

In the beginning, with every scn::scan (or similar) call, the library calls begin() on the range, getting an iterator. This iterator is advanced until a non-whitespace character is found.

After that, the format string is scanned character-by-character, until an unescaped '{' is found, after which the part after the '{' is parsed, until a ':' or '}' is found. If the parser finds an argument id, the argument with that id is fetched from the argument list, otherwise the next argument is used.

The parse() member function of the appropriate scn::scanner specialization is called, which parses the parsing options-part of the format string argument, setting the member variables of the scn::scanner specialization to their appropriate values.

After that, the scan() member function is called. It reads the range, starting from the aforementioned iterator, into a buffer until the next whitespace character is found (except for char/wchar_t: just a single character is read; and for span: span.size() characters are read). That buffer is then parsed with the appropriate algorithm (plain copy for strings, the method determined by the options object for ints and floats).

If some of the characters in the buffer were not used, these characters are put back to the range, meaning that operator-- is called on the iterator.

Because how the range is read until a whitespace character, and how the unused part of the buffer is simply put back to the range, some interesting situations may arise. Please note, that the following behavior is consistent with both scanf and <iostream>.

char c;
std::string str;
// No whitespace character after first {}, no range whitespace is skipped
scn::scan("abc", "{}{}", c, str);
// c == 'a'
// str == "bc"
// Not finding whitespace to skip from the range when whitespace is found in
// the format string isn't an error
scn::scan("abc", "{} {}", c, str);
// c == 'a'
// str == "bc"
// Because there are no non-whitespace characters between 'a' and the next
// whitespace character ' ', `str` is empty
scn::scan("a bc", "{}{}", c, str);
// c == 'a'
// str == ""
// Nothing surprising
scn::scan("a bc", "{} {}", c, str);
// c == 'a'
// str == "bc"

Using scn::default_tag is equivalent to using "{}" in the format string as many times as there are arguments, separated by whitespace.

scn::scan(range, scn::default_tag, a, b);
// Equivalent to:
// scn::scan(range, "{} {}", a, b);

ignore

scnlib has various functions for skipping characters from a range.

scn::ignore_until(range, ch) will skip until ch is read.

scn::ignore_n_until(range, n, ch) will skip until either n characters have been skipped or ch is read.

User types

To make your own types scannable with scnlib, you can specialize the struct template scn::scanner.

struct my_type {
int i{};
double d{};
};
template <typename Char>
struct scn::scanner<Char, my_type>
: public scn::empty_parser<Char> {
template <typename Context>
error scan(my_type& val, Context& c) {
return scn::scan(c.range(), "[{}, {}]", val.i, val.d);
}
};
// Input: "[123, 4.56]"
// ->
// my_type.i == 123
// my_type.d == 4.56

Inheriting from scn::empty_parser means only an empty format string "{}" is accepted. You can also implement a parse() method, or inherit from a scn::scanner for another type (like scn::scanner<Char, int>) to get access to additional options.

Scanning temporaries

scnlib provides a helper type for scanning into a temporary value: scn::temporary. which can be created with the helper function scn::temp. This is useful, for example, for scanning a scn::span.

// Doesn't work, because arguments must be lvalue references
scn::scan(range, "{}", scn::make_span(...));
// Workaround
auto span = scn::make_span(...);
scn::scan(range, "{}", span);
// Using scn::temporary
// Note the () at the end
scn::scan(range, "{}", scn::temp(scn::make_span(...))());

scanf-like format strings

With scn::scanf, a scanf-like format string syntax can be used, instead. scn::ranges::scanf is also available. The syntax is not 100% compatible with C scanf, as it uses the exact same options as the regular format string syntax. The following snippet demonstrates the syntax.

int i;
double d;
std::string s;
scn::scanf(range, "%i %f %s", i, d, s);
// How C scanf would do it:
// scanf(range, "%i %lf", &i, &d);
// reading a dynamic-length string is not possible with scanf
// How scn::scan would do it:
// scn::scan(range, "{} {} {}", i, d, s);
// or to be more explicit:
// scn::scan(range, "{:i} {:f} {:s}", i, d, s);

Notice, how the options map exactly to the ones used with scn::scan: d -> {:d}, f -> {:f} and s -> {:s}; and how the syntax is not fully compatible with C scanf: "%f != %lf", scanf doesn't support dynamic-length strings.

To read literal a %-character and immediately discard it, write %% ({{ and }} with default format string syntax).