Anatomy of a Stackoverflow Discussion

Introduction

I’ve mentioned Stackoverflow in a couple of other posts. It’s not the only place the cool kids in coding hang out, but it’s a pretty important one. And its sister sites aren’t just for coders. I had an interesting exchange there over the last couple of days that I thought would fit my “Learn to Code” series. After this, though, I’m going to spend some time digging into OTRS ITSM and writing about that. I’ve had several requests for it, and I’ve noticed that my blog is getting the most traffic for the OTRS stuff. I feel like the people have spoken. If you come around here for the Learn to Code posts, let me know where you’d like those to head next.

Scope of Application

This post references Microsoft Excel 2013 running on a Windows 8.1 machine. I’ll also be referencing Stackoverflow as experienced through Firefox 28.0 running in Sandboxie 4.08.

The Question

Stackoverflow, to severely oversimplify, is a Q&A site. It’s a really sophisticated Q&A site with some conventions that its users can be fairly zealous about, but overall it’s an awesome community that makes getting answers to programming questions about as easy as it could possibly be without having Dennis Ritchie as a personal advisor.

Every post starts with a question. In this case, the question had to do with an Excel macro. I won’t rehash the whole question for you here. You can follow the link and take a look. The format is fairly familiar: header, body, some up-/downvote arrows and a star to “favorite” the question. At the time that I’m writing this, the question has a score of “-3.” You can hover over the up-/downvote arrows to see how the site designers intended those features to be used. It’s hard to know exactly why questions get downvoted sometimes, but it’s also important to note that questions can be edited over time–don’t always trust a negative score. New users who are inexperienced programmers can have an especially hard time getting their first couple questions right on Stackoverflow, but many of them make quick edits and improve the question.

My guess is that this one was down-voted because the community at large didn’t find that the original question was framed well, or that it didn’t show enough leg work. A lot of folks get on the site and ask things as general as, “How do you add two numbers together?” (Ok. That’s a slight exaggeration… but only a slight one.) If you can Google the answer fairly easily, the community generally doesn’t like that.

The Answer

I don’t think my answer is perfect, but I did see a solution to the problem based on some similar issues I’d faced in the past. So I wrote and tested a quick macro that did what I understood to be the goal of the asker. It took a little more back and forth, and because the person asking the question is fairly new to macros, last time I checked, he still hadn’t quite gotten it to work with the real data. Because of my recent posts on macros, I thought I’d take this opportunity to explain my code for his sake and for anyone else who might be working at the same level.

The entirety of the macro is already on Stackoverflow, so I’m going to go ahead and go into the detail.

Terms

Some folks might be used to talking about Excel files and tabs. The official jargon for an Excel file is a workbook, and the tabs are worksheets. Knowing this makes understanding the code a little easier.

The Code

Here are the first few lines:

Dim rowCount1 As Long
Dim rowCount2 As Long

rowCount1 = ThisWorkbook.Sheets(1).Range("A2").SpecialCells(xlCellTypeLastCell).Row
rowCount2
= ThisWorkbook.Sheets(2).Range("A2").SpecialCells(xlCellTypeLastCell).Row

“Dim” is a keyword in VBA that you use whenever you want to declare a variable. (If you want to know why, Google has the answer.) So we’re declaring and assigning two variables as “Long” data types. When you declare a variable in VBA, you have to tell it whether the data is going to be a string of characters, a number, or something else. So you’ll see “Dim [myVariable] As [someType]” everywhere in VBA.

A “Long” is just a type of number. We want to be able to loop through a column of cells without having to tell Excel explicitly how many rows we’ve used in that column. We want Excel to figure out where the last used row is. And that’s what we’re doing with the next two lines. Their practically identical, but they’re looking at different sheets.  The line is pretty self-explanatory, but let’s take a quick close look:

  • ThisWorkbook
    • Refers to the workbook that contains the macro
  • Sheets(1)
    • This is how you tell Excel which sheet to look at. You can also pass it a name: Sheets(“Payments”)
  • Range(“A2”)
    • A Range in Excel is a group of cells. In this case, we’re referencing only a single cell, but you can reference more, as we’ll see later
  • SpecialCells(xlCellTypeLastCell)
    • This might look a little cryptic, but it’s pretty simple. “SpecialCells” is a method on the Range that takes one of a few specific arguments predefined by VBA. In this case, we’re telling it to find the last cell in the column with any information in it–xlCellTypeLastCell.
  • Row
    • SpecialCells is going to return a cell address. “Row” grabs the row number off of that address. Now our “rowCount” variables refer to the last row in Column A on their respective sheets.

The next section is really similar:

Dim rng1 As Range
Dim rng2 As Range

Set rng1 = ThisWorkbook.Sheets(1).Range("A2:A" & rowCount1)
Set rng2 = ThisWorkbook.Sheets(2).Range("A2:A" & rowCount2)

Here, we declare two new variables of the Range data type. That means that we’re going to refer to a couple of groups of cells. When you assign a Range variable, you have to use the “Set” keyword. Notice that we didn’t have to do that to assign the Long variables. This has to do with the different ways Excel handles different data types. I’m not going to get into the details of that here. For now, just know that if you try to set a Range variable without the Set keyword, you’re gonna have a bad time.

You already know what “ThisWorkbook.Sheets(1).Range” means, but the range reference is a little different than the one we saw before. Instead of simply A2, we have “A2:A” & rowCount1. If you’re familiar with Excel, you’ve probably used or seen notation like “A2:A10” in a formula before. This refers to the range of cells in Column A, rows 2 through 10. If the notation were “A2:C10”, it would include all the cells in Columns A, B, C, Rows 1 through 10. The notation here means the exact same thing. “A2:A” tells Excel to refer to Column A, Row 2 through Column A… What?

You may have seen the rest of this notation in a formula somewhere, as well. The ampersand (&) is the concatenation symbol–concatenate being a fancy word for “smoosh together.” If you concatenate “antidisestablishment” with “arianism,” you get “antidisestablishmentarianism.” Simple as that. So that last piece of our assignment is telling Excel to assign rng1 to all the cells in Column A from row 2 through the last used row–the value we stored in that variable earlier.

The next variable is a little more complex, and I’m not going to go into a lot of detail on it. If you understand based on the explanation I give, great. If not, don’t worry about it. I’m not exactly starting at the beginning.

Dim sheet1() As Variant
ReDim sheet1(rowCount1 - 1,2)

Here I declare sheet1 as an array. You can tell it’s an array because of the parentheses–which up to this point we’ve only seen as part of a function or method. (See my post on the coin slot).  They also show up with array variables, because arrays hold collections of values instead of a single value, and we need a way to tell it which one we want. It’s not entirely wrong to think of an array as a kind of mini-function: You call its name and pass in an argument, and it gives back the value that matches that argument.

The Variant data type means that the array might (and in our case, definitely will) hold different types of data. They won’t all be numbers or all be letters. We’re going to mix it up.

The next line has a “ReDim” statement. If you really want to understand that now, you’ll have to Google it. I’ll leave my explanation at this: VBA needs you to tell it what size an array is going to be so it can reserve the appropriate amount of space. We didn’t reserve any space when we declared it, so we’re reserving that space with the ReDim statement.

I’m also using a multidimensional array, meaning each entry in the array will consist of multiple pieces of data. That’s why the coin slot has two values: rowCount1 – 1, and 2. The first dimension, rowCount1-1, is the size of the array, or the number of elements it will have. In this case, I need to store one element in the array for each row of data in my used range on Sheet1. Why do I subtract 1 from the row count? Let me explain roundaboutly. The second dimension, 2, indicates that each of my elements will have 3 pieces of data.

“Wait… it sounds like you just said 2 = 3.” Well, that’s because I did. You know how 0 is supposed to mean nothing? Well, not with arrays in programming. Arrays treat 0 like something. It’s the first element in an array. So 0 = 1. It’s called base 0, and it’s kind of annoying sometimes. But it will make more sense in a few lines.

Just to give you a more concrete idea of how a multidimensional array works, let’s say I have the following data on the characters of 30 Rock:

  • Liz Lemon
    • University of Maryland
    • Theater Tech
  • Jack Donaghy
    • Harvard
    • MBA
  • Kenneth Parcell
    • Kentucky Mountain Bible College
    • Television Studies and Bible Sexuality

These are related sets of data that I want to be able to reference together. For this data set, my multidimensional array would need to contain three elements, each of which has three pieces of data. I could declare this like so:

Dim 30Rock() As Variant
ReDim 30Rock(2,2)

 Cool? Cool.

We then declare another number variable and set it to 0.

Dim n As Long
n = 0

And now, something a little more interesting happens:

For Each cell In rng1.Cells
sheet1
(n,0)= cell.Value
sheet1
(n,1)= cell.Offset(0,2).Value
sheet1
(n,2)= cell.Offset(0,5).Value
Debug
.Print cell.Value
n
= n + 1
Next cell

We haven’t covered “For Each” loops yet, but you’re smarter than me. You understand that these tell the computer to loop through a certain section of code for each element in some kind of collection. In our case, we’re telling it to repeat these lines of code for each cell in the collection of cells in the range we assigned to rng1 several lines ago.

First, we get the value of the cell in Column A, which is the column our range is set to. “cell.Value” means whatever is in the cell. It may not be what is displayed on the screen. If the cell contains a formula, the value is the formula. The result of that formula would be in the text property: cell.Text.

We assign that value as the first piece of data in the first element of the array. Remember that we set n to 0, so what that first line actually means is “sheet1(0,0).” You’ll notice that we increment n by 1 at the end of each loop. That way we’re always moving up one element in the array. The second time we pass through, the first line would mean “sheet1(1,0)”, the next time “sheet1(2,0)”, and so forth.

The next line gets the value from the cell 2 columns to the right. The “Offset” property allows you to create relative references. You pass it a row number and column number, respectively, and it finds the cell at that location. For example, in the first pass of the above code, the address of “cell” is “A2.” So cell.Offset(0,2).Value says to Excel, “go 0 rows up or down, and go 2 columns to the right.” (Positive numbers go down and to the right, negative numbers go up and to the left.) This will refer to “C2.”

The “Debug.Print” statement is just a way to show in the Immediate Window of the VBA Editor if I’m grabbing the value of the cell that I expect to get.

The reason I’m gathering these values is rooted in the original question. Based on the sample data in the original question, you can see that the asker has values in columns A, C, and F on sheet 1 that he would like to compare to data in Columns A, B, and C on sheet 2. That’s why I’m using the offsets. I only have to refer to one column by its letter, an then I can use relative references to get the values out of the other columns.

It’s worth asking why I’m collecting these values into a multidimensional array instead of just comparing the two ranges in the two sheets directly. The same task could be accomplished either way. To be perfectly honest, I don’t know why I did it except that that’s how it occurred to me to do it. Maybe I’m too tired now to remember why, or maybe I was in too much of a hurry then to consider all of my options, but I did it this way, it worked, and so I passed my answer along. Other than possibly making the final loop a little easier to read, this approach doesn’t actually offer any major advantage that I can see. Thinking about it a little more, I may have thought there was a performance advantage to holding those cell values in memory rather than reading them out of the ell each time. I don’t think that’s accurate. So learn from my mistakes: don’t add unnecessary code!

Okay, next step. We declare another variable and set it to 1:

Dim currentRow As Long
currentRow
=1

The asker wants to take any data from sheet1 that matches data from sheet2 and copy the row to sheet3. We’ll use this variable to track which row on sheet3 we’re pasting into.

Finally, the heavy lifting:

For n = 0 To UBound(sheet1)
For Each cell In rng2.Cells
If cell.Value = sheet1(n,0) And cell.Offset(0,1).Value = sheet1(n,1) And cell.Offset(0,2).Value = sheet1(n,2) Then
ThisWorkbook
.Sheets(1).Rows(n + 2).Copy Destination:=ThisWorkbook.Sheets(3).Range("A"& currentRow)
currentRow
= currentRow +1
GoTo NextIteration
End If
Next
cell
NextIteration
:
Next n

First, we have another For loop. This one is just a tad different. Rather than saying For Each, we tell it to count. You can start from any value and count to any value. Since we’re looping through an array, we start at 0. We then tell it to count to the upper boundary (UBound) of an array–meaning its highest value. UBound is a function that takes an array as an argument. We give it sheet1.

Now we do something really inefficient. We nest a For Each loop inside a For loop. I haven’t thought of a better way to address this particular problem, but be cautious about doing this. Let’s say we have an array with 1,000 elements in it, and we run a For loop on every element in that array. For each element in our array, we compare it against each value in another array to see if it matches. Well, what if that second array has 1,000,000 elements? And what if nothing matches? We end up running that comparison a thousand million times. If there’s any way to avoid running so many stinkin’ loops, we should figure it out!

You can see where this is going. We want to check everything in our original array (the values from sheet 1) against the corresponding cells in sheet 2. So if the value in sheet 2, Column A matches the first piece of data in the first element of our array AND the value in sheet 2, Column B matches the second piece of data in the first element of our array AND the value in sheet 2, Column C matches the third piece of data in the first element of our array, THEN do the following stuff.

The next line should be fairly self-explanatory. We refer to the matching row back on sheet 1, and tell Excel to copy that row. We find the row number by adding 2 to n. Remember that our data started in the second row, but n starts at 0. This corrects for that difference. The “Copy” method takes a Range as its “Destination” argument, so we point it to the first unused row on the third sheet using the currentRow variable we set up earlier. Then we increment that to make sure the next time we copy, we don’t overwrite what’s already there.

“GoTo NextIteration”… goes to… the NextIteration label… I don’t know how to explain that any better. But the reason we need it is because otherwise Excel would stay inside that inner loop even after it finds a match. By sending the code outside of that loop, we can move to the next element in our array without looping through all the cells in the second sheet.

That’s it! Easy peasy, right?

Discrepancy

Now, the sample data included in the question didn’t match the data the asker was actually using. If you go to the question now, you’ll find a link to the data file. That wasn’t there the first time. So the macro didn’t work with the live data. It needed some edits. Below is the revised macro, taking into account the realization I had that the multidimensional array wasn’t really necessary. I’m not going to explain all the changes. I’ve explained everything else in enough detail that you ought to be able to look at the sample data file and the code below and figure out why the changes work. Or maybe I’m wrong and they don’t. Then I guess it’s up to you to fix it and let me know.

Sub Test()

Dim rowCount1 As Long
 Dim rowCount2 As Long
rowCount1 = ThisWorkbook.Sheets(1).Range("B20").SpecialCells(xlCellTypeLastCell).Row
 rowCount2 = ThisWorkbook.Sheets(2).Range("C2").SpecialCells(xlCellTypeLastCell).Row
Dim rng1 As Range
 Dim rng2 As Range
Set rng1 = ThisWorkbook.Sheets(1).Range("B20:B" & rowCount1)
 Set rng2 = ThisWorkbook.Sheets(2).Range("C2:C" & rowCount2)
Dim currentRow As Long
 currentRow = 1
For Each cell In rng1.Cells
 For Each cell2 In rng2.Cells
 If cell2.Value = cell.Value And cell2.Offset(0, -1).Value = cell.Offset(0, 5).Value And cell2.Offset(0, -2).Value = cell.Offset(0, 2).Value Then
 ThisWorkbook.Sheets(1).Rows(cell.Row).Copy Destination:=ThisWorkbook.Sheets(3).Range("A" & currentRow)
 currentRow = currentRow + 1
 GoTo NextIteration
 End If
 Next cell2
 NextIteration:
 Next cell
End Sub

Love and kisses,

Tyler

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s