The Cheater’s Code, Pt. III: Wildcards

Introduction

Time to finish off this Find and Replace macro by introducing wildcards. This post is the last in a three-part series. If you want to know how we got here, check out Part I and Part II.

Scope of Application

Today’s post relies on Microsoft Word 2013 running on a Windows 8.1 machine. But what I’ll cover applies at least as far back as Word 2007 and Windows XP and the equivalent versions of Word for the Mac. If you don’t have Word, you can get a trial copy from Microsoft’s website.

Where We Left Off

I’ve had a bit of a lull here. Things at work got pretty busy, and I spent several nights working on a library to help analyze lists of files against collections of files to see if they match up. It’s called VennDexer, and the source code is up on GitHub if you’re interested. (If you don’t know what GitHub is and don’t understand what Google is trying to tell you about it, I’ll help you out in an upcoming post where I talk about tools of the trade you’ll want to know.) I also built a lot more macros, so I’ve got a lot of material to post in the near future. I won’t be analyzing them in depth like I have our Find and Replace macro, but I think some of them are good use cases for programming basic repeatable tasks.

But to get back to where we left off, a quick recap:

  • In Part I, we recorded a macro that found a period followed by a space and replaced it with a period followed by two spaces. We also covered what a macro is, in general terms, and why they’re useful. However, our macro had a slight problem. Even when a period was already followed by two spaces, it added a space.
  • The recording process in Part I automatically generated some code for us, so in Part II we examine that code in detail to gain an understanding of how it worked.
  • At the end of Part II, we looked at the source of the problem in our original macro–our logic. The computer was doing exactly what we told it to: finding a period followed by a space and replacing it with a period followed by two spaces. The computer takes our instructions very literally. We didn’t tell it to look for a period followed by one space and only one space. So whether the period is followed by 1, 3, or 100 spaces, our macro is going to find the match and do its job.

Knowing that the computer is going to take you literally to a fault is an important understanding in programming. Sometimes the programming language will help make up for this a little bit, but generally speaking the computer does exactly what you tell it to do–not exactly what you expect it to do. It’s like the old communication exercise where you try to explain to someone how to make a sandwich. Every time you say something, it’s wrong.

You: Get the bread.

Student: Where is it?

You: In the cupboard.

Student: What’s a cupboard?

Really, dumb dumb?

But it gets worse. You have to explain to the student to go to the cupboard, open it, grab the bag with the bread (and you have to explain which bag that is), and so on and so forth. It’s a really useful technique for teaching us how much we take for granted in our communication, and that’s really good to remember when you’re programming. If the computer isn’t doing something you want it to do, you should double check that you told it to do that thing. If it’s doing something you don’t want it to do, make sure that you didn’t tell it to do that. It won’t always be your fault, but I’d say at least 85% of my programming problems are a result of miscommunication problems. The computer is always doing what I say instead of what I mean. (The other 15% of my programming problems stem from the computer being incapable of making a sandwich.)

So we need a way to go tell the computer to look for a period followed by a single space and only a single space. Another way to put it is that we need the computer to look for a period followed by a space, followed by any character that is not a space.

I’m not going to run you through the logic of all the bad options. Obviously it doesn’t make sense to tell the computer to search for “. A”, “. B”, “. C” one after the other until we’ve accounted for every possible set. Not only is that terribly inefficient, but it would be be almost impossible to account for every single character. Also, and this is another key to programming logic, we can safely assume that we are not the first people in the history of coding who have had this problem. Just like we didn’t have to make up a computer language to enjoy automated find and replace functionality, we don’t have to make up the solution to matching large, vague sets of characters. A tool exists.

Wildcards: Not Just for Poker Anymore

You’ve probably already used wildcards, whether you knew them by that name or not. If you’ve ever gone into a file folder to look for all your JPEG images, you might have typed *.jpg in the search bar. The asterisk, or star, tells Windows that you don’t care what comes before the dot. You want to match anything file whose name ends in .jpg. So if you had the following files in a directory

hippopotamus.jpg

todoatthezoo.docx

penguins.jpg

passwords.txt

tigers.jpg

directions.docx

zoobudget.xlsx

and then you searched for *.jpg, you’d get back the following list:

hippopotamus.jpg

penguins.jpg

tigers.jpg

The star (*) is a wildcard. In short, that just means it stands in for something else. Specifically, in this context, the star stands in for anything else. I won’t be covering all the wildcards or how they could be used, but here’s an article that expands on the topic. Also, I found a really well-done blog post on how to do advanced Find and Replace in Word, which includes some info on using wildcards. For something a little more advanced, read this.

You’ll often see wildcards mentioned in the same breath as regular expressions. Just to be clear, Word’s wildcards do not function the same way as the wildcards in regular expressions. There is some overlap, but it is minimal. As you get deeper into programming, you’ll likely come across some regular expressions.  They may look a lot like some of Word’s wildcards, but don’t get the two confused. It is best to think of them as estranged cousins who don’t play well together–who maybe even give each other the evil eye on occasion. Don’t invite them to the same barbecue.

There’s More Than One Wildcard

Just to get another pet peeve off of my chest, the star or asterisk is not the only wildcard. A lot of folks get the impression that it is. The question mark, at sign (@), and angled brackets (<>) can also be wildcards, in addition to other characters. This is more of a snotty nerd issue, but armed with this knowledge, when someone tells you to “use the wildcard,” you have enough snotty nerd clout to say, “Which one?”

But I’ve Searched for the * Before and Found It

Remember that Word’s Find and Replace has additional settings. If you hit Ctrl+F in Word and search for the asterisk, it will match only the asterisk–not any character. But when you hit the “More>>” button, you’ll see an option to use wildcards.

advanced-fr-usewildcards

When that option is turned on, searching for the asterisk will match everything.

How Wildcards Solve Our Problem

My gut instinct might be to immediately add the star to the string we’re trying to find.

.Text = ". *"

Of course, it will only take you a second to recognize the flaw in my dummy logic. The star matches anything. That means it will match another space. So as soon as Word finds a period followed by a space, followed by anything–including another space–it will consider it a match and substitute our replacement text.

But the problem is a little bigger than that. This would also match “. A” or “. t”, or any combination of period, space, letter. When it replaces that text, it will also replace the letter we found.

Poo.

To solve this problem, we need to learn one more wildcard feature. I think it might be a little easier to see the solution and work backward from there. The solution will look like this:

.Text = ".*<"

.Replacement.Text = ".  "

The angled brackets are wildcards that represent the beginning (<) or end (>) of a word, respectively. So what we’re saying here is “Hey, Word! Find a period, followed by anything leading up to the beginning of a word. Replace it with period, space, space.”

Why Does It Work?

The wildcard expression works because it stops matching when it gets to the beginning of a word, which Microsoft Word considers to be any character. It doesn’t match on the character. Just everything leading up to it. This actually means that the expression is pulling double-duty. Not only will it find the combination period-space-character, but it will find period-space-space-space-character. In fact, it will find any number of spaces between a period and the beginning of the next word. So it will ensure that periods with too many spaces are also replaced by a period followed by two spaces.

Actually, It Doesn’t Work… Yet

If you try to run the macro now, you probably won’t get any matches. You definitely won’t if you’re using the source text I was in Part I. You know why, though. We have to turn wildcard matching on:

.MatchWildcards = True

Now it works. If it doesn’t work, have a couple Oreos to calm yourself down and go back through the posts.

We’ve Still Got One Problem

Let’s say this sentence is in the third paragraph of our document:

Dr. Smith called Ms. Johnson and asked her to come in for an appointment the following Monday a.m.

Dr. Smith and Ms. Johnson are going to give us a problem. We’re going to end up with some added space between “Dr.” and “Smith,” as well as between “Ms.” and “Johnson.” Other abbreviations show up in the middle of sentences all the time, too: Inc., Rd., St., Mr., Jr., etc. Again, the computer is going to take us absolutely literally. If we want it to do something different with abbreviations than it does with the end of sentences, we have to give it specific instructions. But I think I’ve given you enough ammunition at this point to leave you with two hints. The first is this: in all but a few rare cases, there are more sentences than there are abbreviations. Fix the spaces after a sentence first. The second hint is the solution:

.Text = "([DIJMNOPRS][cdnoprst]{1,3})(.  )(<?)"
.Replacement.Text = "\1. \3"
.MatchWildcards = True
.MatchCase = True

That wildcard expression isn’t perfect. I was still learning how to use wildcards efficiently when I put that together. But it works. I’ll leave it to you to figure out why it works. If you really get stuck and have no idea what’s going on, feel free to leave a comment or two. I’ll be happy to point you in the right direction.

Love and kisses,

Tyler

Advertisements

2 thoughts on “The Cheater’s Code, Pt. III: Wildcards

    • Good call. There’s only one space after the period in the expression. There should be two. WordPress formatted some of the extra whitespace out. This should work:

      ([DIJMNOPRS][cdnoprst]{1,3})(.  )(<?)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s