API to parse the MeTTa language from text into Atoms.
More...
|
| struct | tokenizer_t |
| | Represents a handle to a Tokenizer, capable of recognizing meaningful Token substrings in text. More...
|
| struct | sexpr_parser_t |
| | Represents an S-Expression Parser state machine, to parse input text into an Atom. More...
|
| struct | syntax_node_t |
| | Represents a component in a syntax tree created by parsing MeTTa code. More...
|
| struct | token_api_t |
| | A table of callback functions to implement custom atom parsing. More...
|
|
| struct tokenizer_t | tokenizer_new (void) |
| | Creates a new Tokenizer, without any registered Tokens.
|
| void | tokenizer_free (struct tokenizer_t tokenizer) |
| | Frees a Tokenizer handle.
|
| void | tokenizer_register_token (struct tokenizer_t *tokenizer, const char *regex, const struct token_api_t *api, void *context) |
| | Registers a new custom Token in a Tokenizer.
|
| struct tokenizer_t | tokenizer_clone (const struct tokenizer_t *tokenizer) |
| | Performs a "deep copy" of a Tokenizer.
|
| struct sexpr_parser_t | sexpr_parser_new (const char *text) |
| | Creates a new S-Expression Parser.
|
| void | sexpr_parser_free (struct sexpr_parser_t parser) |
| | Frees an S-Expression Parser.
|
| atom_t | sexpr_parser_parse (struct sexpr_parser_t *parser, const struct tokenizer_t *tokenizer) |
| | Parses the text associated with an sexpr_parser_t, and creates the corresponding Atom.
|
| const char * | sexpr_parser_err_str (const struct sexpr_parser_t *parser) |
| | Returns the error string associated with the last sexpr_parser_parse call.
|
| struct syntax_node_t | sexpr_parser_parse_to_syntax_tree (struct sexpr_parser_t *parser) |
| | Parses the text associated with an sexpr_parser_t, and creates a syntax tree.
|
| void | syntax_node_free (struct syntax_node_t node) |
| | Frees a syntax_node_t.
|
| struct syntax_node_t | syntax_node_clone (const struct syntax_node_t *node) |
| | Creates a deep copy of a syntax_node_t
|
| void | syntax_node_iterate (const struct syntax_node_t *node, c_syntax_node_callback_t callback, void *context) |
| | Performs a depth-first iteration of all child syntax nodes within a syntax tree.
|
| enum syntax_node_type_t | syntax_node_type (const struct syntax_node_t *node) |
| | Returns the type of a syntax_node_t
|
| bool | syntax_node_is_null (const struct syntax_node_t *node) |
| | Returns true if a syntax node represents the end of the stream.
|
| bool | syntax_node_is_leaf (const struct syntax_node_t *node) |
| | Returns true if a syntax node is a leaf (has no children) and false otherwise.
|
| void | syntax_node_src_range (const struct syntax_node_t *node, uintptr_t *range_start, uintptr_t *range_end) |
| | Returns the beginning and end positions in the parsed source of the text represented by the syntax node.
|
API to parse the MeTTa language from text into Atoms.
This interface facilitates parsing textual representations of MeTTa into atom representations, and can be extended to parse custom atom types with specialized syntax.
◆ c_syntax_node_callback_t
| typedef void(* c_syntax_node_callback_t) (const struct syntax_node_t *node, void *context) |
Function signature for a callback providing access to a syntax_node_t
- Parameters
-
| [in] | node | The syntax_node_t being provided. This node should not be modified or freed by the callback. |
| [in] | context | The context state pointer initially passed to the upstream function initiating the callback. |
◆ syntax_node_type_t
The type of language construct respresented by a syntax_node_t.
| Enumerator |
|---|
| COMMENT | A Comment, beginning with a ';' character.
|
| VARIABLE_TOKEN | A variable. A symbol immediately preceded by a '$' sigil.
|
| STRING_TOKEN | A String Literal. All text between non-escaped '"' (double quote) characters.
|
| WORD_TOKEN | Word Token. Any other whitespace-delimited token that isn't a VARIABLE_TOKEN or STRING_TOKEN.
|
| OPEN_PAREN | Open Parenthesis. A non-escaped '(' character indicating the beginning of an expression.
|
| CLOSE_PAREN | Close Parenthesis. A non-escaped ')' character indicating the end of an expression.
|
| WHITESPACE | Whitespace. One or more whitespace chars.
|
| LEFTOVER_TEXT | Leftover Text that remains unparsed after a parse error has occurred.
|
| EXPRESSION_GROUP | A Group of nodes between an OPEN_PAREN and a matching CLOSE_PAREN
|
| ERROR_GROUP | A Group of nodes that cannot be combined into a coherent atom due to a parse error, even if some of the individual nodes could represent valid atoms.
|
◆ sexpr_parser_err_str()
| const char * sexpr_parser_err_str |
( |
const struct sexpr_parser_t * | parser | ) |
|
Returns the error string associated with the last sexpr_parser_parse call.
- Parameters
-
| [in] | parser | A pointer to the Parser, which is associated with the text to parse |
- Returns
- A pointer to the C-string containing the parse error that occurred, or NULL if no parse error occurred
- Warning
- The returned pointer should NOT be freed. It must never be accessed after the sexpr_parser_t has been freed, or any subsequent call to sexpr_parser_parse or sexpr_parser_parse_to_syntax_tree has been made.
◆ sexpr_parser_free()
Frees an S-Expression Parser.
- Parameters
-
◆ sexpr_parser_new()
Creates a new S-Expression Parser.
- Parameters
-
| [in] | text | A C-style string containing the input text to parse |
- Returns
- The new sexpr_parser_t, ready to parse the text
- Note
- The returned sexpr_parser_t must be freed with sexpr_parser_free() or passed to another function that takes ownership
- Warning
- The returned sexpr_parser_t borrows a reference to the text, so the returned sexpr_parser_t must be freed before the text is freed or allowed to go out of scope.
◆ sexpr_parser_parse()
Parses the text associated with an sexpr_parser_t, and creates the corresponding Atom.
- Parameters
-
| [in] | parser | A pointer to the Parser, which is associated with the text to parse |
| [in] | tokenizer | A pointer to the Tokenizer, to use to interpret atoms within the expression |
- Returns
- The new atom_t, which may be an Expression atom with many child atoms. Returns a none atom if parsing is finished, or an error expression atom if a parse error occurred.
- Note
- The caller must take ownership responsibility for the returned atom_t, and ultimately free it with atom_free() or pass it to another function that takes ownership responsibility
-
If this function encounters an error, the error may be accessed with sexpr_parser_err_str()
◆ sexpr_parser_parse_to_syntax_tree()
Parses the text associated with an sexpr_parser_t, and creates a syntax tree.
- Parameters
-
| [in] | parser | A pointer to the Parser, which is associated with the text to parse |
- Returns
- The new syntax_node_t representing the root of the parsed tree
- Note
- The caller must take ownership responsibility for the returned syntax_node_t, and ultimately free it with syntax_node_free()
◆ syntax_node_clone()
◆ syntax_node_free()
◆ syntax_node_is_leaf()
Returns true if a syntax node is a leaf (has no children) and false otherwise.
- Parameters
-
- Returns
- The boolean value indicating if the node is a leaf
◆ syntax_node_is_null()
Returns true if a syntax node represents the end of the stream.
- Parameters
-
- Returns
- The boolean value indicating if the node is a a null node
◆ syntax_node_iterate()
Performs a depth-first iteration of all child syntax nodes within a syntax tree.
- Parameters
-
| [in] | node | A pointer to the top-level syntax_node_t representing the syntax tree |
| [in] | callback | A function that will be called to provide a vector of all type atoms associated with the atom argument atom |
| [in] | context | A pointer to a caller-defined structure to facilitate communication with the callback function |
◆ syntax_node_src_range()
| void syntax_node_src_range |
( |
const struct syntax_node_t * | node, |
|
|
uintptr_t * | range_start, |
|
|
uintptr_t * | range_end ) |
Returns the beginning and end positions in the parsed source of the text represented by the syntax node.
- Parameters
-
| [in] | node | A pointer to the syntax_node_t |
| [out] | range_start | A pointer to a value, into which the starting offset of the range will be written |
| [out] | range_end | A pointer to a value, into which the ending offset of the range will be written |
◆ syntax_node_type()
◆ tokenizer_clone()
Performs a "deep copy" of a Tokenizer.
- Parameters
-
| [in] | tokenizer | A pointer to the Tokenizer to clone |
- Returns
- The new Tokenizer, containing all registered Tokens belonging to the original Tokenizer
- Note
- The returned tokenizer_t must be freed with tokenizer_free()
◆ tokenizer_free()
Frees a Tokenizer handle.
- Parameters
-
- Note
- When the last tokenizer_t handle for an underlying Tokenizer has been freed, then the Tokenizer will be deallocated
◆ tokenizer_new()
Creates a new Tokenizer, without any registered Tokens.
- Returns
- an tokenizer_t handle to access the newly created Tokenizer
- Note
- The returned tokenizer_t handle must be freed with tokenizer_free()
◆ tokenizer_register_token()
| void tokenizer_register_token |
( |
struct tokenizer_t * | tokenizer, |
|
|
const char * | regex, |
|
|
const struct token_api_t * | api, |
|
|
void * | context ) |
Registers a new custom Token in a Tokenizer.
- Parameters
-
| [in] | tokenizer | A pointer to the Tokenizer in which to register the Token |
| [in] | regex | A regular expression to match the incoming text, triggering this token to generate a new atom |
| [in] | api | A table of functions to manage the token |
| [in] | context | A caller-defined structure to communicate any state necessary to implement the Token parser |
- Note
- Hyperon uses the Rust RegEx engine and syntax, documented here.