Preempting ‘gotchas’

Steve Condylios
2 min readDec 15, 2019

“Gotchas” are realisations that don’t come naturally or intuitively, and which we often find very painful or time consuming to learn “the hard way”.

A classic “gotcha” in the world of R programming was (until R 4.0.0) accidentally treating strings as factors.

Here’s an example; a data.frame with a single value (5)…

df <- data.frame(col1 = "5")
df$col1[1] %>% as.integer
[1] 1

…but when we access that single value and convert it to an integer, we should expect to see 5? No, we see 1 is returned instead of 5. Huh?!

This happens because in certain cases, R defaults to treating strings as factors, which can be confounding for those who aren’t aware of it.

It’s a painful lesson almost every R coder has encountered and can relate to and laugh about. But since it stumps almost everyone, wouldn’t it be handy to have a way to know about this ‘gotcha’ and the others in advance, rather than waiting for them to spring up when we’re not expecting them?

Enter Stack Exchange Data Explorer

I stumbled upon the awesomeness of the Stack Exchange Data Explorer (SEDE) a short while ago, and discovered I could use it to find the most common pain points for any technology.

SEDE allows users to craft their own T-SQL queries, before returning up to 50k results. Very cool indeed.

Here’s one of the queries I ran to identify the ‘gotchas’ with Google BigQuery. The query takes a few seconds to execute, and returns links to the questions in the first column of the results — so we can easily check out what issues people are having with the technology!

Sub in whatever Stack Overflow tag you like, and change the ordering if you prefer (this query is ordered by score — that’s upvotes less downvotes — but I also found average views per day useful).

A variation on the query I used

After perusing the top 50–100 of questions on the technology, I had an idea of the most common problems people were having. I’d discovered the common misconceptions and stumbling blocks, and had some excellent ideas around what I, too, would soon need help with.

And best of all, this works for any stack overflow tag! This simple exercise gave me a reproducible way of foreseeing the most common stumbling blocks for any new technology!

See also:

The full Stack Exchange Database Schema

Stack Exchange Database Schema

--

--