Last week one of my clients sent me an email asking for some help parsing data. He had a bunch of documents used in his industry all stored in memo fields. The memos look like the contents of a standard Word document. They start out with a table of contents (complete with page numbers and leading periods). Each of the sections inside the memo has a header included in the table of contents.
My mission was to come up with the code to parse the memo into individual records for each of the sections inside the memo. If the memo has 53 entries in the table of contents I should have 53 records in the resulting record set. The contents of each section varies. There may be no text, there may be several paragraphs of text, or something in between. If the table of content has an entry there minimally will be a header in the text. The code needs to put the header entry in one column and the section contents in a memo field.
Want to guess how long it would take you to write this and how many lines of code? Go ahead, take a wild guess. My initial guess was 60 minutes. I did not guess how many lines of code.
I wrote the initial cut of the solution in less than an hour. I parsed out the table of contents using ALINES() and then parsed out the section headers from the table of contents looking for all the text in front of the leading periods. I used the section headers and extracted all the text between the section headers in the rest of the memo using STREXTRACT(). Unfortunately some of the words in the memo were duplicated in the table of contents so I was getting the table of contents in the parse text. It took a little while for me to work around the duplicates issue, clean up the extra spaces and carriage returns.
The solution I prototyped for him can be downloaded here: ParseMemoViaTableOfContents.prg
All done in 61 line of code (including minimal comments and some white space). VFP’s string parsing capabilities absolutely rock! Don’t you agree?
I knew the client was satisfied when he called it the “cat’s meow”. {g}




