DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. IoT
  4. Refactoring C: Implementing Parsing

Refactoring C: Implementing Parsing

In C, implementing the parsing of messages is actually a very complex operation. Let's get started.

Oren Eini user avatar by
Oren Eini
·
Dec. 14, 18 · Tutorial
Like (3)
Save
Tweet
Share
8.35K Views

Join the DZone community and get the full member experience.

Join For Free

So far in this series, I've done a whole lot of work around building the basic infrastructure of just building a trivial echo server with SSL. But the protocol I have in mind is a lot more complex. Let’s get started with actually implementing the parsing of messages.

To start with, we need to implement parsing of lines. In C, this is actually a decidedly non-trivial operation, because you need to read the data from the network into someplace and parse it. This area is rife with errors, so that is going to be fun.

Here is a simple raw message:

GET employees/1-A employees/2-B
Timeout: 30
Sequence: 293
Include: ReportsTo


The structure goes:

CMD args1 argN\r\n


And then header lines with:

Name: value\r\n


The final end of the message is \r\n\r\n.

To make things simple for myself, I’m going to define the maximum size of a message as 8KB (this is the common size in HTTP as well). Here is how I decided to represent it in memory:

image

The key here is that I want to minimize the amount of work and complexity that I need to do. That is why the entire message is limited to 8KB. I’m also simplifying how I’m going to be handling things from an API perspective. All the strings are actually C strings, null terminated, and I’m using the argv, argc convention for naming, just like in the main function.

This means that I can simply read from the network until I find a “\r\n\r\n” in there. Here is how I do this:

struct cmd* read_message(struct connection * c) {
int rc, to_read, to_scan = 0;
do
{
// first, need to check if we already
// read the value from the network
if (c->used_buffer > 0) {
char* final = strnstr(c->buffer + to_scan, "\r\n\r\n", c->used_buffer);
if (final != NULL) {
struct cmd* cmd = parse_command(c, c->buffer, final - c->buffer + 2/*include one \r\n*/);
// now move the rest of the buffer that doesn't belong to this command 
// adding 4 for the length of the msg separator (\r\n\r\n)
c->used_buffer -= (final + 4) - c->buffer;
memmove(c->buffer, final + 4, c->used_buffer);
return cmd;
}
to_scan = max(c->used_buffer - 3, 0);
}
to_read = MSG_SIZE - c->used_buffer;
if (to_read == 0) {
push_error(EINVAL, "Message size is too large, after 8KB, "
"couldn't find \r\n separator, aborting connection.");
return NULL;
}
rc = connection_read(c, c->buffer + c->used_buffer, to_read);
if (rc == 0)
return NULL; // broken connection, probably
c->used_buffer += rc;
} while (1);
}


There is a bit of code here, but the gist of it is pretty simple. The problem is that I need to handle partial state. That is, a single message may come in two separate packets, or multiple messages may come in a single packet. I don’t have a way to control that, so I need to be careful about tracking past state. The connection has a buffer that is used to hold the state in memory, whose size is large enough to hold the largest possible message. I’m reading from the network to a buffer and then scanning to find the message separator.

If I couldn’t find it, I’m recording the last location where it could be starting, and then issuing another network read and will try searching for \r\n\r\n again. Once that is found, the code will call to the parse_commnad() method that operates over the entire command in memory (which is much easier). With that done, my message parsing is actually quite easy, from a conceptual point of view, although I’ll admit that C make it a bit long.

static struct cmd* parse_command(struct connection* c, char* buffer, size_t len) {
char* line_ctx = NULL, *ws_ctx = NULL, *line, *arg;
struct cmd* cmd = NULL;
char* copy = malloc(len+1);
if (copy == NULL) {
push_error(ENOMEM, "Unable to allocate command memroy");
goto error_cleanup;
}
// now we need to have our own private copy of this
memcpy(copy, buffer, len);
copy[len] = 0; // ensure null terminator!

cmd = calloc(1, sizeof(struct cmd));
if (cmd == NULL) {
push_error(ENOMEM, "Unable to allocate command memroy");
goto error_cleanup;
}
line = strtok_s(copy, "\r\n", &line_ctx);
if (line == NULL) {
push_error(EINVAL, "Unable to find \r\n in the provided buffer");
goto error_cleanup;
}
arg = strtok_s(line, " ", &ws_ctx);
if (arg == NULL) {
push_error(EINVAL, "Invalid message command line: %s", line);
goto error_cleanup;
}

do
{
cmd->argc++;
cmd->argv = realloc(cmd->argv, sizeof(char*) * cmd->argc);
cmd->argv[cmd->argc - 1] = arg;
arg = strtok_s(NULL, " ", &ws_ctx);
} while (arg != NULL);

while (1)
{
line = strtok_s(NULL, "\r\n", &line_ctx);
if (line == NULL)
break;
arg = strtok_s(line, ":", &ws_ctx);

if (arg == NULL) {
push_error(EINVAL, "Header line does not contain ':' separator: %s", line);
goto error_cleanup;
}

while (*ws_ctx != 0 && *ws_ctx == ' ')
ws_ctx++; // skip initial space

cmd->headers_count++;
cmd->headers = realloc(cmd->headers, sizeof(struct header) *cmd->headers_count);
cmd->headers[cmd->headers_count - 1].key = arg;
cmd->headers[cmd->headers_count - 1].value = ws_ctx;
}
return cmd;

error_cleanup:
if (copy != NULL)
free(copy);
if (cmd != NULL) {
cmd_drop(cmd);
}
return NULL;
}


I’m copying the memory from the network buffer to my own location. This is important because the read_message() function will overwrite it in a bit, and it also allows me to modify the memory more easily, which is required for using strtok(). This basically allows me to tokenize the message into its component parts — first on a line by line basis, with splitting on space for the first line and then treating this as headers lines.

I added the ability to reply to a command, which means that we are pretty much almost done. You can see the current state of the code here.

Memory (storage engine) Network Buffer (application) Strings Command (computing) Data (computing) Echo (command)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Cloud-Native Application Networking
  • Java Development Trends 2023
  • PostgreSQL: Bulk Loading Data With Node.js and Sequelize
  • Using JSON Web Encryption (JWE)

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: