arg_router
1.4.0
C++ command line argument parsing and routing
|
arg_router
is a C++17/20 command line parser and router. It uses policy-based objects hierarchically, so the parsing code is self-describing. Rather than just providing a parsing service that returns a map of variant
s/any
s, it allows you to bind Callable
instances to points in the parse structure, so complex command line arguments can directly call functions with the expected arguments - rather than you having to do this yourself.
Callable
, or tweak its formatting using policiesCallable
s inline for specific arguments, or you can implement a specialisation to cover all instances of that typeIt's not immediately obvious why defining a parse tree at compile would bring any benefits, so before we show you how arg_router
is used, let us show you why. Here is a very contrived parse tree defined using the very popular and well-made argparse:
This parser has an easy to make error, both flags have a short name that is the same, resulting in an ambiguous parse tree. In this example it stands out as the whole program is so small, but it wouldn't be so obvious on a larger, more real-world program. Let's build and run it:
There is no runtime error detection so you would need to write a test for this scenario. Let's try the equivalent in arg_router
:
Let's build it:
"Policy must be unique in the parse tree up to the nearest mode or root"
is a formal way of saying that you've got a duplicate policy, and it's a policy::short_name_t
. By having the tree defined at compile-time, it can verify itself via C++ template metaprogramming and cause a build failure. This is a simple example, but there is a great many more checks done during compilation to help you write safer code with less explicit testing.
There are several ways to install arg_router
, the most appropriate depends on your project's configuration.
arg-router
(note the hyphen) to dependencies
in your vcpkg.json
arg_router
to your requirementsinclude/arg_router
directory or package config location share/arg_router
cmake ../arg_router -DINSTALLATION_ONLY=ON; cmake --install .
(see developer installation)If you're a library user, you will need the following dependencies in order to build:
If you're a vcpkg user, then these will be brought in automatically for you. arg_router
is header-only (due to all the templates) and so are the above dependencies.
Note currently arg_router
requires exception support, but not RTTI.
Let's start simple, with this cat
-like program:
Let's start from the top, as the name suggests root
is the root of the parse tree and provides the parse(argc, argv)
method. Only children of the root can (and must) have a router
policy (except for nested mode
s, more on that later) and therefore act as access points into the program. The root's children are implicitly mutually exclusive, so trying to pass --version --help
in the command line is a runtime error.
The arp::validation::default_validator
instance provides the default validator that the root uses to validate the parse tree at compile-time. It is a required policy of the root
. Unless you have implemented your own policy or tree node you will never need to specify anything else.
The help
node is used by the root
to generate the argument documentation for the help output, by default it just prints directly to the console and then exits, but a router
can be attached that accepts the formatted output to do something else with it. The optional program_name_t
and program_version_t
policies add a header to the help output.
Now let's introduce some 'policies'. Policies define common behaviours across node types, a basic one is long_name_t
which provides long-form argument definition. By default, a standard unix double hyphen prefix for long names is added automatically. short_name_t
is the single character short-form name, by default a single hyphen is prefixed automatically. arg_router
supports short name collapsing for flags, so if you have defined flags like -a -b -c
then -abc
will be accepted or -bca
, etc. (note short-form name collapsing is disabled if the library has been configured to have the same long and short prefix, which is common on Windows).
Compile-time strings are created via the ""_S
string literal, which creates an instance of a ar::str
type. Note Advanced NTTP language support that allows for this is not present until C++20, see compile-time string support for what to do in C++17.
In order to group arguments under a specific operating mode, you put them under a mode
instance. In this case our simple cat program only has one mode, so it is anonymous i.e. there's no long name or description associated with it - it is a build error to have more than one anonymous mode under the root of a parse tree.
arg<T>
does exactly what you would expect, it defines an argument that expects a value to follow it on the command line. If an argument is not required
then it may have a default_value
(if neither are set, then a default initialised value is used instead), this is passed to the router
's Callable
on parsing if it isn't specified by the user on the command line.
A flag
is essentially an arg<bool>{default_value{false}}
, except that it doesn't expect an argument value to follow on the command line as it is the value. Flags cannot have default arguments or be marked as required.
An alias
policy allows you to define an argument that acts as a link to other arguments, so in our example above passing -A
on the command line would actually set the -E
and -n
flags to true. You can use either the long or short name of the aliased flag, but the value_type
s (bool
for a flag) must be the same.
By default whitespace is used to separate out the command line tokens, this is done by the terminal/OS invoking the program, but often '=' is used a name/value token separator. arg_router
supports this with the value_separator_t
policy as used in the arg<int>
node in the example.
positional_arg<T>
does not use a 'marker' token on the command line for which its value follows, the value's position in the command line arguments determines what it is for. The order that arguments are specified on the command line normally don't matter, but for positional arguments they do; for example in our cat program the files must be specified after the arguments so passing myfile.hpp -n
would trigger the parser to land on the positional_arg
for myfile.hpp
which would then greedily consume the -n
causing the application to try to open the file -n
... We'll cover constrained positional_arg
s in later examples. The display_name_t
policy is used when generating help or error output - it is not used when parsing.
Assuming parsing was successful, the final router
is called with the parsed argument e.g. if the user passed -E file1 file2
then the router
is passed (true, false, -1, {"file1", "file2"})
.
You may have noticed that the nodes are constructed with parentheses whilst the policies use braces, this is necessary due to CTAD rules that affect nodes which return a user-defined value type. This can be circumvented using a function to return the required instance, for example the actual type of a flag is flag_t
, flag(...)
is a factory function.
Although explicit, which may make it easier to read, the name policies are also verbose. To ease this most of the built-in nodes support implicit string-to-policy mapping, which allows bare compile-time strings to be passed to the node factory functions which are then mapped to appropriate built-in policies. The rules vary from node to node, but typically they are:
policy::long_name_t
policy::description_t
policy::short_name_t
The above are unicode aware. The strings can be passed in any order relative to the other policies, but it is recommended to put them first to ease reading.Re-writing the cat
-like program using implicit strings shortens it considerably:
This documentation will use implicit string policies going forward, but new nodes will always be introduced with explicit policies.
Let's add another feature to our cat program where we can handle lines over a certain length differently.
We've defined a new argument --max-line-length
but rather than using -1
as the "no limit" indicator like we did for --max-lines
, we specify the argument type to be std::optional<std::size_t>
and have the default value be an empty optional - this allows the code to define our intent better.
What do we do with lines that reach the limit if it has been set? In our example we can either skip the line output, or truncate it with a suffix. It doesn't make any sense to allow both of these options, so we declare them under a one_of
node. Under this node, only one is valid when parsing at runtime, if the user specifies both then it is an error. A one_of
must be marked as required or have a default value in case the user passes none of the arguments it handles.
To express the 'one of' concept better in code, the one_of
node has a single representation in the router
's arguments - a variant that encompasses all the value types of each entry in it. In our example's case, a bool for the --skip-line
flag and a string_view
for the --line-suffix
case.
What happens if a user passes --skip-line
or --line-suffix
without --max-line-length
? Normally the developer will have to check that max-line-length
is not empty and either ignore or throw if it is. But by specifying the one_of
as dependent_t
on max-line-length
, arg_router
will throw on your behalf in this scenario. To be clear, the dependent_t
policy just means that the owner cannot appear on the command line without the arg
or flag
named in its parameters also appearing on the command line.
The smart developer may have noticed an edge case here. If both the children in the above example had the same value_type
, then the variant will be created with multiples of the same type. std::variant
supports this but it means that the common type-based visitation pattern will become ambiguous on those identical types i.e. you won't know which flag was used. This may or may not be important to you, but if it is, you will have to switch
on std::variant::index()
instead.
Firstly, custom parsing is only needed if the type does not have a constructor that takes a std::string_view
(or a type implicitly convertible from it).
Custom parsing for arg
and positional_arg
types can be done in one of two ways, the first is using a custom_parser
policy. Let's add theme coloring to our console output.
This is a convenient solution to one-off type parsing in-tree, I think it's pretty self-explanatory. However this is not convenient if you have lots of arguments that should return the same custom type, as you would have to copy and paste the lambda or bind calls to the conversion function for each one. In that case we can specialise on the arg_router::parse
function.
With this declared in a place visible to the parse tree declaration, theme_t
can be converted from a string without the need for a custom_parser
. It should be noted that custom_parser
can still be used, and will be preferred over the parse()
specialisation.
Another Unix feature that is fairly common is flags that are repeatable i.e. you can declare it multiple times on the command line and it's value will increase with each repeat. A classic example of this is 'verbosity levels' for program output:
The number of times -v
appears in the command line will increment the returned value, e.g. -vv
will result in verbosity_level_t::INFO
.
Note that even though we are using a custom enum, we haven't specified a custom_parser
. counting_flag
will count up in std::size_t
and then static_cast
to the user-specified type, so as long as your requested type is explicitly convertible to/from std::size_t
it will just work. If it isn't, then you'll have to modify your type or not use counting arguments, as attaching a custom_parser
to any kind of flag will result in a build failure.
Short name collapsing still works as expected, so passing -Evnv
will result in show_ends
and show_non_printing
being true, and verbosity_level
will be verbosity_level_t::INFO
in the router
call.
We can constrain the amount of flags the user can provide by using the max_value
policy, so passing -vvvv
will result in a runtime error. There are min/max and min variants too. Here we are using the compile-time variant of the policy, we can do that because the value type is an integral/enum, but if your value type cannot be used as a template parameter then there equivalent runtime variants that take the parameters as function parameters instead. The compile-time variants should be used when possible due to extra checks e.g. setting the max value less than the min.
The long_name_t
policy is allowed but usually leads to ugly invocations, however there is a better option:
We can declare a new arg
that takes a string equivalent of the enum and put them both into an alias_group
, so now you can use --verbose=INFO
or -vv
. Short name collapsing still works as expected.
What's this new alias_group
? policy::alias
is an input alias, it works by duplicating the value tokens to each of the aliased nodes it refers to i.e. it forms a one-to-many aliasing relationship. The limitations of that are:
alias_group
is almost the opposite of policy::alias
, it is an output alias. Each of its child nodes are aliases of the same output value i.e. it forms a many-to-one aliasing relationship. You can also think of alias_group
as a variation on one_of
, but instead of the output being a std::variant
of all the child node output types, alias_group
requires that they all have the same output type.
The positional_arg
shown in the first example is unconstrained, i.e. it will consume all command line tokens that follow. This isn't always desired, so we can use policies to constrain the amount of tokens consumed. Let's use a file copier as an example:
Our simple copier takes multiple source paths and copies the files to a single destination directory, a 'force' flag will overwrite existing files silently.
It was noted in the Basics section that ordering does matter for positional arguments, so we can see here that the destination path is specified first and therefore comes first on the command line. There can only be one destination path so we specify the count
to one. Because the count
is 1 the return type doesn't need to be a container.
Following the destination path are the source paths, we need at least one so we mark it as required, as our range is unbounded the value_type
needs to be a container. positional_arg
uses a push_back
call on the container, so std::vector
is the typical type to use.
Only the last positional_arg
may be of variable length - unless a token_end_maker
policy is used. A runtime error will only occur if there are no unbounded variable length postional_arg
s and there are more arguments than the maximum or less than the minimum.
It should be noted that setting a non-zero minimum count (min_count
, fixed_count
, or min_max_count
) does not imply a requirement, the minimum count check only applies when there is at least one argument for the node to process. So as with an arg
, you should use a required
policy to explicitly state that at least one argument needs to be present, or a default_value
policy - otherwise a default initialised value will be used instead. For positional_arg
nodes that are marked as required
, it is a compile-time error to have a minimum count policy value of 0.
The token_end_maker
policy allows for multiple adjacent variable length positional_arg
nodes to be defined. It does this by defining a token that marks the end of the value token list for that node. A trivial example could be a simple launcher application (which is available as a buildable example) e.g.:
Where the argument tokens are used invoke the programs as child processes. Here the --
token is used to separate the two adjacent positional_args
:
There is no limit to number of positional_arg
nodes that can be chained together like this, but they must still follow positional_arg
rules such as being at the end of the child node list.
Another variation of multi-value arguments are multi_arg
and forwarding_arg
. multi_arg
is almost the same as arg
but accepts multiple (at least one) value tokens from the command line, you can tune the min/max value token count using the min_max_count_t
policy - although you can't use a value_separator
policy.
forwarding_arg
is similar to multi_arg
except that it is tuned for simply forwarding arguments, so has no naming restrictions, no min/max count, and is has a fixed value_type
of vector<std::string>
. You could re-write the launcher example above to use it:
Which would look like this on the command line:
As noted in Basics, mode
s allow you to group command line components under an initial token on the command line. A common example of this developers will be aware of is git
, for example in our parlance git clean -ffxd
; clean
would be the mode and ffxd
would be the flags that are available under that mode.
As an example, let's take the simple-copy
above and split it into two modes:
The name of a mode
can only be the none version, as it doesn't use a prefix. We now have two named modes: copy
and move
.
Named mode
s can be nested too! Only one mode can be invoked, so attempting to use flags from parent modes is a runtime failure. Another stipulation is that every mode
needs a router
unless all of its children are mode
s as well.
An obvious ugliness to the above example is that we now have duplicated code. We can split that out, and then use copies in the root declaration.
ar::list
is a simple arg
and flag
container that mode
and root
instances detect and add the contents to their child/policy lists. Also don't be afraid of the copies, the majority of arg_router
types hold no data (the advantage of compile-time!) and those that do (e.g. default_value
) generally have small types like primitives or std::string_view
.
Sometimes features or parameters only make sense within certain environments or scenarios that can only be detected at runtime. You can use policy::runtime_enable
to dynamically make a node 'disappear' from the parsing process and help output by the value set at runtime in the policy's constructor. A trivial example is given by the runtime_node_enable example:
Here an environment variable is used to detect if a license is installed (do not use this method in production...) and if it is, unlocks 'advanced' features. If the license is available then the entire advanced
mode is available and so is the --advance-foo
std::string_view
arg
in the default mode.
The help output adjusts to match:
And with the license:
As shown in prior sections, a help
node can be a child of the root
(and only the root
!), which acts like an arg
, and generates the help output when requested by the user. This node is optional, without it there is no help command line argument. As the node is arg
-like, it requires a long and/or short name.
The output can be embellished with the following policies:
program_name_t
, the name of the program as it should be displayed to the userprogram_version_t
, version string. This is not shown if a program_name_t
is not specifiedprogram_intro_t
, used to give some more information on the program, before the argument ouputprogram_addendum_t
, used to add supplementary text after the argument outputflatten_help_t
, by default only top-level arguments and/or those in an anonymous mode are displayed. Child modes are shown by requesting the mode's 'path' on the command line (e.g. app --help mode sub-mode
). The presence of this policy will make the entire requested subtree's (or root's, if no mode path was requested) help output be displayedUnlike string data everywhere else in the library, the formatted help output is created at runtime so we don't need to keep duplicate read-only text data.
For example the slightly embellished tree from above:
Would output this to the terminal:
As you can see positional arguments are wrapped in angle brackets, and counts are displayed using interval notation.
Removing the flatten_help
policy would change it to this:
In either case specifying the mode as an argument to the help argument displays just the sub-arguments of that mode:
By default when parsed, help
will output its contents to std::cout
and then exit the application with EXIT_SUCCESS
. Obviously this won't always be desired, so a router
policy can be attached that will pass a std::ostringstream
to the user-provided Callable
. The stream will have already been populated with the help data shown above, but it can now be appended to or converted to a string for use somewhere else.
Often programmatic access is desired for the help output outside of the user requesting it, for example if a parse exception is thrown, generally the exception error is printed to the terminal followed by the help output. This is exposed by the help()
or help(std::ostringstream&)
methods of the root object.
Help output can be customised in several ways:
router
policy to capture the output and modify it. This is useful for appending string data, but anything more sophisticated becomes a chorehelp
node. This is the nuclear option as it gives maximal control but is a lot of work, it is very rarely necessary to do this as the node is primarily just the tree_node
implementation, the actual formatting is delegatedhelp
node. The help
node delegates the formatting to a policy, if no formatter is specified when defining a help
node specialisation the default_help_formatter
is used. This is still non-trivial as the formatter policy needs to perform the compile-time tree iteration in order to process the per-node help datadefault_help_formatter
further delegates formatting to three sub-policies, one that generates the 'preamble' text (i.e. the program name, version, intro), another that generates each argument in the argument output, and a final one to generate the addendum. default_helper_formatter
uses help_formatter_component::default_preamble_formatter
, help_formatter_component::default_line_formatter
, and help_formatter_component::default_addendum_formatter
respectively by defaultA faintly ridiculous example of Unicode support from the just_cats
example:
arg_router
only supports UTF-8 encoding, and will stay that way for the medium term. If you want other encodings (e.g. UTF-16 on Windows), then you'll need to convert the input tokens to UTF-8 before calling parse(..)
. The compile-time strings used in the parse tree must be UTF-8 because that's the only encoding supported by arg_router
's string checkers and line-break algorithm. Likewise the help output generated will be UTF-8, so you'll need to capture the output (by attaching a policy::router
) and then convert before printing
We didn't want to... Normally an application will link to ICU for its Unicode needs, but unfortunately we can't do that here as ICU is not constexpr
and therefore cannot be used for our compile-time needs - so we need to roll our own.
arg_router
has support for runtime language selection, using multi_lang::root
. This wraps around a Callable
that returns a root
instance 'tweaked' for a given language. Let's take the start of the simple file copier/mover application and convert it to use multi_lang::root
- start by defining translations of the tree's strings:
There is nothing special about the translation
type, the unspecialised version will simply static assert if used to remind you to implement specialisations for all language IDs (the template parameter type) - otherwise it is just an empty type. Its use is still recommended though as functionality may be added to it in the future.
multi_lang::root
takes a set of supported language identifiers (ISO language/country codes are recommended for readability/ease, but not required), these are used as the language IDs for translation
specialisation. The runtime-selected translation
instance is then passed to the Callable
provided to multi_lang::root
, the type of which is then used to access the translated compile-time string. Any missing string translation will cause a compilation error.
The first argument to multi_lang::root
is a string provided by the user that should match one of the language identifiers, if it doesn't arg_router
will fall back to using the first defined language (UK English in this case). As a convenience arg_router
provides a simple function that takes the OS locale name and standardises it to an ISO language/country identifier.
multi_lang::root
's interface is designed to mimic ar::root
, so it can be used as a drop in replacement.
The above code (which is available as a buildable example) will yield the following help output:
Optionally, you can also provide translations for the exception messages by defining an error_code_translations
subtype that consists of a tuple of pairs that form a mapping between the error code and translation string. If this isn't provided then an internal en_GB
one is automatically used instead.
Could yield:
An annoyance of the above is that the translation types are verbose and difficult to read, so as of v1.4 a CMake function is provided by the package config that allows translation type headers to be generated from TOML.
For example the Japanese translation above looks like this:
The TOML file name is the language ID. Calling the CMake function like this:
Creates a header for each language with the corresponding translation specialisation that can be included in your root source file. You can see this in the buildable example). This is now the recommended approach for writing translations, but is certainly not (and never will be) a requirement.
multi_lang::root_wrapper
from v1.0 is still present and supported, but is now marked as deprecated - new code should use multi_lang::root
. It is not supported when using C++20 compile-time strings (see compile-time string support).
In v1.0 the library exclusively used the S_()
macro for compile-time string generation, this was an almost necessary convenience as that version of C++ had relatively poor NTTP support. In v1.1 we added an alternative for those using C++20 and above, which is the str
type and its ""_S
string literal.
For those targetting C++20 with existing v1.0 code, upgrading to a newer library version will cause a compilation failure as the two methods are not compatible. But don't worry! Code changes aren't ncessary, but you will need to add the define AR_DISABLE_CPP20_STRINGS=true
to your build. Those still targetting C++17 will not need to do anything.
Note C++17 will be supported as a first class citizen until v2.0, after that C++20 will be the minimum so I can strip out a ton of code and get better diagnostics by using concepts.
Currently arg_router
only supports exceptions as error handling. If parsing fails for some reason a arg_router::parse_exception
is thrown carrying information on the failure.
Low-level tweaking of the library is achieved via some defines and/or CMake variables, documented here.
The CI system attached to this repo builds the unit tests and examples with:
You can build the unit tests on Windows using MSBuild but you must set the CMake variable DEATH_TEST_PARALLEL
to 1 otherwise the parallel tests will attempt to write to the project-wide lastSuccessfulBuild
file simultaneously, which causes the build to fail. MSVC is supported but only when using the C++20-style compile-time strings due this MSVC bug.
Other compiler versions and platform combinations may work, but I'm currently limited by the built-in GitHub runners and how much I'm willing to spend on Actions!
The parse tree is very expensive to construct due to all the compile-time checking and meta-programming shennanigans, so do NOT define it in a header and have multiple source files include it - it will cause the tree to be built/checked in every source file it is included in.
This is especially important on Windows as windows.h
is included (needed for calculating terminal column width) with NOMINMAX
and WIN32_LEAN_AND_MEAN
set by default.
Despite not using typeid
or dynamic_cast
in the library, compilers will still generate class name data if RTTI is enabled, because it is used in the standard library implementations (e.g. std::function
on Clang). Due to the highly nested templates that make up the parse tree, these class names can become huge and occupy large amount of static storage in the executable. As an example, the basic_cat
project in the repo will create ~100KB of class name data in the binary - this data is not used and cannot be stripped out.
Disabling RTTI is rarely feasible for most projects, but it is possible to disable RTTI for a single CMake target. So if it was deemed worth it for the size reduction, the command line parsing could be the application's executable (compiled without RTTI) and then the wider application logic could be in a static library (compiled with RTTI). This does not affect exceptions, as their type information is always added by the compiler regardless of RTTI status.
This may seem like an obvious point, but people need reminding: The rise of TMP use has been quicker than the compiler optimisations for it. In practice this means that although arg_router
can be compiled on e.g. GCC v9, it will use staggeringly more memory than e.g. GCC v11.
Complete Doxygen-generated API documentation is available here. Examples are provided in the examples
directory of the repo or online here. Doxygen theming is provided by Doxygen Awesome CSS.
The latest unit test coverage report is found here.
Take a look at the issues page for all upcoming features and fixes.