arg_router  1.4.0
C++ command line argument parsing and routing
arg_router::utility::utf8 Namespace Reference

Namespaces

 code_point
 
 no_break_rules
 

Classes

class  iterator
 
class  line_iterator
 

Enumerations

enum class  grapheme_cluster_break_class : std::uint8_t
 
enum class  line_break_class : std::uint8_t
 

Functions

std::size_t levenshtein_distance (std::string_view a, std::string_view b)
 
template<typename Node >
vector< parsing::token_typeclosest_matching_child_node (const Node &node, parsing::token_type token)
 
constexpr std::size_t count (std::string_view str) noexcept
 
constexpr bool is_whitespace (std::string_view str) noexcept
 
constexpr bool contains_whitespace (std::string_view str) noexcept
 
constexpr std::size_t terminal_width (std::string_view str) noexcept
 

Variables

constexpr auto double_width_table
 
constexpr auto grapheme_cluster_break_table
 
constexpr auto line_break_table
 
constexpr auto whitespace_table
 
constexpr auto zero_width_table
 

Detailed Description

Namespace for UTF-8 encoded string functions.

Enumeration Type Documentation

◆ grapheme_cluster_break_class

Grapheme cluster break classes, and their values in the encoded code points in grapheme_cluster_break_table.

Do not change the order or value as they need to match scripts/unicode_table_generators.py.

Definition at line 18 of file grapheme_cluster_break.hpp.

◆ line_break_class

enum arg_router::utility::utf8::line_break_class : std::uint8_t
strong

Line break classes, and their values in the encoded code points in line_break_table.

Do not change the order or value as they need to match scripts/unicode_table_generators.py.

Definition at line 15 of file line_break.hpp.

Function Documentation

◆ closest_matching_child_node()

template<typename Node >
vector<parsing::token_type> arg_router::utility::utf8::closest_matching_child_node ( const Node &  node,
parsing::token_type  token 
)

Uses the Levenshtein distance algorithm to find the closest matching child node to the given token, and it's parents (if any).

Note
This function requires Node to have at least one child
Template Parameters
NodeParent node type
Parameters
nodeParent node instance
tokenToken being queried, the prefix type is considered during the distance calculation
Returns
Closest matching child node token_type and any parents, or an empty vector if all available children are runtime disabled

Definition at line 70 of file levenshtein_distance.hpp.

◆ contains_whitespace()

constexpr bool arg_router::utility::utf8::contains_whitespace ( std::string_view  str)
inlineconstexprnoexcept

Returns true if str contains whitespace.

Parameters
strInput string
Returns
True if whitespace is present

Definition at line 304 of file utf8.hpp.

◆ count()

constexpr std::size_t arg_router::utility::utf8::count ( std::string_view  str)
inlineconstexprnoexcept

Number of UTF-8 grapheme clusters in the string.

Parameters
strInput string
Returns
Number of grapheme clusters

Definition at line 278 of file utf8.hpp.

◆ is_whitespace()

constexpr bool arg_router::utility::utf8::is_whitespace ( std::string_view  str)
inlineconstexprnoexcept

True if the leading code point of str is one of the known whitespace characters.

Parameters
strCode point
Returns
True if whitespace, also false if str is empty or there are not enough bytes in str to read the entire code point

Definition at line 289 of file utf8.hpp.

◆ levenshtein_distance()

std::size_t arg_router::utility::utf8::levenshtein_distance ( std::string_view  a,
std::string_view  b 
)
inline

Calculates the Levenshtein distance between a and b.

Levenshtein distance gives a measure of similarity between two strings.

Parameters
aFirst string
bSecond string
Returns
'Distance' metric as an integer

Definition at line 22 of file levenshtein_distance.hpp.

◆ terminal_width()

constexpr std::size_t arg_router::utility::utf8::terminal_width ( std::string_view  str)
inlineconstexprnoexcept

Returns the terminal width (i.e. number columns) required by str.

This is equivalent to https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, but constexpr.

Parameters
strInput string
Returns
Terminal width

Definition at line 321 of file utf8.hpp.

Variable Documentation

◆ double_width_table

constexpr auto arg_router::utility::utf8::double_width_table
constexpr

Double-width code points i.e. those that occupy 2 terminal columns when rendered.

Each entry is an inclusive range of code points.

This table is generated using scripts/unicode_table_generators.py from http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt v14.0.0.

Definition at line 18 of file double_width.hpp.

◆ grapheme_cluster_break_table

constexpr auto arg_router::utility::utf8::grapheme_cluster_break_table
constexpr

Grapheme cluster break class table.

Each entry is an inclusive range of code points, with each range's break class value encoded as metadata.

This table is generated using scripts/unicode_table_generators.py from https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt v14.0.0 and https://www.unicode.org/Public/14.0.0/ucd/emoji/emoji-data.txt v14.0.0

Definition at line 244 of file grapheme_cluster_break.hpp.

◆ line_break_table

constexpr auto arg_router::utility::utf8::line_break_table
constexpr

Line break class table.

Each entry is an inclusive range of code points, with each range's break class value encoded as metadata.

This table is generated using scripts/unicode_table_generators.py from https://www.unicode.org/Public/UCD/latest/ucd/LineBreak.txt v14.0.0

Definition at line 700 of file line_break.hpp.

◆ whitespace_table

constexpr auto arg_router::utility::utf8::whitespace_table
constexpr
Initial value:
= std::array<code_point::range, 11>{{
{0x000009, 0x00000D},
{0x000020, 0x000020},
{0x000085, 0x000085},
{0x0000A0, 0x0000A0},
{0x001680, 0x001680},
{0x002000, 0x00200A},
{0x002028, 0x002028},
{0x002029, 0x002029},
{0x00202F, 0x00202F},
{0x00205F, 0x00205F},
{0x003000, 0x003000},
}}

Whitespace code points.

Each entry is an inclusive range of code points.

This table is generated using scripts/unicode_table_generators.py from http://www.unicode.org/Public/UNIDATA/PropList.txt v14.0.0.

Definition at line 18 of file whitespace.hpp.

◆ zero_width_table

constexpr auto arg_router::utility::utf8::zero_width_table
constexpr

Zero-width code points i.e. those that occupy 0 terminal columns when rendered.

Each entry is an inclusive range of code points.

This table is generated using scripts/unicode_table_generators.py from http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt v14.0.0.

Definition at line 18 of file zero_width.hpp.