Hyperon C
Loading...
Searching...
No Matches
Tokenizer and Parser Interface

API to parse the MeTTa language from text into Atoms. More...

Classes

struct  tokenizer_t
 Represents a handle to a Tokenizer, capable of recognizing meaningful Token substrings in text. More...
struct  sexpr_parser_t
 Represents an S-Expression Parser state machine, to parse input text into an Atom. More...
struct  syntax_node_t
 Represents a component in a syntax tree created by parsing MeTTa code. More...
struct  token_api_t
 A table of callback functions to implement custom atom parsing. More...

Typedefs

typedef void(* c_syntax_node_callback_t) (const struct syntax_node_t *node, void *context)
 Function signature for a callback providing access to a syntax_node_t

Enumerations

enum  syntax_node_type_t {
  COMMENT , VARIABLE_TOKEN , STRING_TOKEN , WORD_TOKEN ,
  OPEN_PAREN , CLOSE_PAREN , WHITESPACE , LEFTOVER_TEXT ,
  EXPRESSION_GROUP , ERROR_GROUP
}
 The type of language construct respresented by a syntax_node_t. More...

Functions

struct tokenizer_t tokenizer_new (void)
 Creates a new Tokenizer, without any registered Tokens.
void tokenizer_free (struct tokenizer_t tokenizer)
 Frees a Tokenizer handle.
void tokenizer_register_token (struct tokenizer_t *tokenizer, const char *regex, const struct token_api_t *api, void *context)
 Registers a new custom Token in a Tokenizer.
struct tokenizer_t tokenizer_clone (const struct tokenizer_t *tokenizer)
 Performs a "deep copy" of a Tokenizer.
struct sexpr_parser_t sexpr_parser_new (const char *text)
 Creates a new S-Expression Parser.
void sexpr_parser_free (struct sexpr_parser_t parser)
 Frees an S-Expression Parser.
atom_t sexpr_parser_parse (struct sexpr_parser_t *parser, const struct tokenizer_t *tokenizer)
 Parses the text associated with an sexpr_parser_t, and creates the corresponding Atom.
const char * sexpr_parser_err_str (const struct sexpr_parser_t *parser)
 Returns the error string associated with the last sexpr_parser_parse call.
struct syntax_node_t sexpr_parser_parse_to_syntax_tree (struct sexpr_parser_t *parser)
 Parses the text associated with an sexpr_parser_t, and creates a syntax tree.
void syntax_node_free (struct syntax_node_t node)
 Frees a syntax_node_t.
struct syntax_node_t syntax_node_clone (const struct syntax_node_t *node)
 Creates a deep copy of a syntax_node_t
void syntax_node_iterate (const struct syntax_node_t *node, c_syntax_node_callback_t callback, void *context)
 Performs a depth-first iteration of all child syntax nodes within a syntax tree.
enum syntax_node_type_t syntax_node_type (const struct syntax_node_t *node)
 Returns the type of a syntax_node_t
bool syntax_node_is_null (const struct syntax_node_t *node)
 Returns true if a syntax node represents the end of the stream.
bool syntax_node_is_leaf (const struct syntax_node_t *node)
 Returns true if a syntax node is a leaf (has no children) and false otherwise.
void syntax_node_src_range (const struct syntax_node_t *node, uintptr_t *range_start, uintptr_t *range_end)
 Returns the beginning and end positions in the parsed source of the text represented by the syntax node.

Detailed Description

API to parse the MeTTa language from text into Atoms.

This interface facilitates parsing textual representations of MeTTa into atom representations, and can be extended to parse custom atom types with specialized syntax.

Typedef Documentation

◆ c_syntax_node_callback_t

typedef void(* c_syntax_node_callback_t) (const struct syntax_node_t *node, void *context)

Function signature for a callback providing access to a syntax_node_t

Parameters
[in]nodeThe syntax_node_t being provided. This node should not be modified or freed by the callback.
[in]contextThe context state pointer initially passed to the upstream function initiating the callback.

Enumeration Type Documentation

◆ syntax_node_type_t

The type of language construct respresented by a syntax_node_t.

Enumerator
COMMENT 

A Comment, beginning with a ';' character.

VARIABLE_TOKEN 

A variable. A symbol immediately preceded by a '$' sigil.

STRING_TOKEN 

A String Literal. All text between non-escaped '"' (double quote) characters.

WORD_TOKEN 

Word Token. Any other whitespace-delimited token that isn't a VARIABLE_TOKEN or STRING_TOKEN.

OPEN_PAREN 

Open Parenthesis. A non-escaped '(' character indicating the beginning of an expression.

CLOSE_PAREN 

Close Parenthesis. A non-escaped ')' character indicating the end of an expression.

WHITESPACE 

Whitespace. One or more whitespace chars.

LEFTOVER_TEXT 

Leftover Text that remains unparsed after a parse error has occurred.

EXPRESSION_GROUP 

A Group of nodes between an OPEN_PAREN and a matching CLOSE_PAREN

ERROR_GROUP 

A Group of nodes that cannot be combined into a coherent atom due to a parse error, even if some of the individual nodes could represent valid atoms.

Function Documentation

◆ sexpr_parser_err_str()

const char * sexpr_parser_err_str ( const struct sexpr_parser_t * parser)

Returns the error string associated with the last sexpr_parser_parse call.

Parameters
[in]parserA pointer to the Parser, which is associated with the text to parse
Returns
A pointer to the C-string containing the parse error that occurred, or NULL if no parse error occurred
Warning
The returned pointer should NOT be freed. It must never be accessed after the sexpr_parser_t has been freed, or any subsequent call to sexpr_parser_parse or sexpr_parser_parse_to_syntax_tree has been made.

◆ sexpr_parser_free()

void sexpr_parser_free ( struct sexpr_parser_t parser)

Frees an S-Expression Parser.

Parameters
[in]parserThe sexpr_parser_t handle to free

◆ sexpr_parser_new()

struct sexpr_parser_t sexpr_parser_new ( const char * text)

Creates a new S-Expression Parser.

Parameters
[in]textA C-style string containing the input text to parse
Returns
The new sexpr_parser_t, ready to parse the text
Note
The returned sexpr_parser_t must be freed with sexpr_parser_free() or passed to another function that takes ownership
Warning
The returned sexpr_parser_t borrows a reference to the text, so the returned sexpr_parser_t must be freed before the text is freed or allowed to go out of scope.

◆ sexpr_parser_parse()

atom_t sexpr_parser_parse ( struct sexpr_parser_t * parser,
const struct tokenizer_t * tokenizer )

Parses the text associated with an sexpr_parser_t, and creates the corresponding Atom.

Parameters
[in]parserA pointer to the Parser, which is associated with the text to parse
[in]tokenizerA pointer to the Tokenizer, to use to interpret atoms within the expression
Returns
The new atom_t, which may be an Expression atom with many child atoms. Returns a none atom if parsing is finished, or an error expression atom if a parse error occurred.
Note
The caller must take ownership responsibility for the returned atom_t, and ultimately free it with atom_free() or pass it to another function that takes ownership responsibility
If this function encounters an error, the error may be accessed with sexpr_parser_err_str()

◆ sexpr_parser_parse_to_syntax_tree()

struct syntax_node_t sexpr_parser_parse_to_syntax_tree ( struct sexpr_parser_t * parser)

Parses the text associated with an sexpr_parser_t, and creates a syntax tree.

Parameters
[in]parserA pointer to the Parser, which is associated with the text to parse
Returns
The new syntax_node_t representing the root of the parsed tree
Note
The caller must take ownership responsibility for the returned syntax_node_t, and ultimately free it with syntax_node_free()

◆ syntax_node_clone()

struct syntax_node_t syntax_node_clone ( const struct syntax_node_t * node)

Creates a deep copy of a syntax_node_t

Parameters
[in]nodeA pointer to the syntax_node_t
Returns
The syntax_node_t representing the cloned syntax node
Note
The caller must take ownership responsibility for the returned syntax_node_t, and ultimately free it with syntax_node_free()

◆ syntax_node_free()

void syntax_node_free ( struct syntax_node_t node)

Frees a syntax_node_t.

Parameters
[in]nodeThe syntax_node_t to free

◆ syntax_node_is_leaf()

bool syntax_node_is_leaf ( const struct syntax_node_t * node)

Returns true if a syntax node is a leaf (has no children) and false otherwise.

Parameters
[in]nodeA pointer to the syntax_node_t
Returns
The boolean value indicating if the node is a leaf

◆ syntax_node_is_null()

bool syntax_node_is_null ( const struct syntax_node_t * node)

Returns true if a syntax node represents the end of the stream.

Parameters
[in]nodeA pointer to the syntax_node_t
Returns
The boolean value indicating if the node is a a null node

◆ syntax_node_iterate()

void syntax_node_iterate ( const struct syntax_node_t * node,
c_syntax_node_callback_t callback,
void * context )

Performs a depth-first iteration of all child syntax nodes within a syntax tree.

Parameters
[in]nodeA pointer to the top-level syntax_node_t representing the syntax tree
[in]callbackA function that will be called to provide a vector of all type atoms associated with the atom argument atom
[in]contextA pointer to a caller-defined structure to facilitate communication with the callback function

◆ syntax_node_src_range()

void syntax_node_src_range ( const struct syntax_node_t * node,
uintptr_t * range_start,
uintptr_t * range_end )

Returns the beginning and end positions in the parsed source of the text represented by the syntax node.

Parameters
[in]nodeA pointer to the syntax_node_t
[out]range_startA pointer to a value, into which the starting offset of the range will be written
[out]range_endA pointer to a value, into which the ending offset of the range will be written

◆ syntax_node_type()

enum syntax_node_type_t syntax_node_type ( const struct syntax_node_t * node)

Returns the type of a syntax_node_t

Parameters
[in]nodeA pointer to the syntax_node_t
Returns
The syntax_node_type_t representing the type of the syntax node

◆ tokenizer_clone()

struct tokenizer_t tokenizer_clone ( const struct tokenizer_t * tokenizer)

Performs a "deep copy" of a Tokenizer.

Parameters
[in]tokenizerA pointer to the Tokenizer to clone
Returns
The new Tokenizer, containing all registered Tokens belonging to the original Tokenizer
Note
The returned tokenizer_t must be freed with tokenizer_free()

◆ tokenizer_free()

void tokenizer_free ( struct tokenizer_t tokenizer)

Frees a Tokenizer handle.

Parameters
[in]tokenizerThe tokenizer_t handle to free
Note
When the last tokenizer_t handle for an underlying Tokenizer has been freed, then the Tokenizer will be deallocated

◆ tokenizer_new()

struct tokenizer_t tokenizer_new ( void )

Creates a new Tokenizer, without any registered Tokens.

Returns
an tokenizer_t handle to access the newly created Tokenizer
Note
The returned tokenizer_t handle must be freed with tokenizer_free()

◆ tokenizer_register_token()

void tokenizer_register_token ( struct tokenizer_t * tokenizer,
const char * regex,
const struct token_api_t * api,
void * context )

Registers a new custom Token in a Tokenizer.

Parameters
[in]tokenizerA pointer to the Tokenizer in which to register the Token
[in]regexA regular expression to match the incoming text, triggering this token to generate a new atom
[in]apiA table of functions to manage the token
[in]contextA caller-defined structure to communicate any state necessary to implement the Token parser
Note
Hyperon uses the Rust RegEx engine and syntax, documented here.