Shuffling the deck: an interview question

Here is a story about an interesting interview question and how I approached it.

The company in question wasn't interested in actually looking at my code, since I apparently tried to answer the wrong question.

Given a deck of n unique cards, cut the deck c cards from the top and perform a perfect shuffle. A perfect shuffle is where you put down the bottom card from the top portion of the deck followed by the bottom card from the bottom portion of the deck. This is repeated until one portion is used up. The remaining cards go on top.

Determine the number of perfect shuffles before the deck returns to its original order. This can be done in any language. A successful solution will solve the problem for 1002 cards and a cut size of 101 in under a second even on a slow machine.

I looked at that and did what they tell you to do for interviews, and coding in general, especially when you don't know where to start: start with the naive, simple approach.

Step 1. make_deck

 cards = [x for x in range(1,n+1)] 

Step 2. def shuffle(cards,c):

   """
   :param: c, where to cut the deck (int)
   """
    top = cards[0:c]
    bottom = cards[c:]

    stopping_criteria = min(len(top), len(bottom))

    newstack = deque()

    for i in range(stopping_criteria):
        newstack.append(top.pop())
        newstack.append(bottom.pop())

    if (len(top)==0) and (len(bottom)==0):
        return newstack

    elif len(top) > 0:
        newstack.extendleft(top)
    elif len(bottom) > 0:
        newstack.extendleft(bottom)

    return newstack
   

Step 3. def shuffle_recursive(cards, c, shuffle_count):

    """
    shuffle until the original order is restored, and count as you go.
    assuming for now that original order is sequential and first card is always 1.

    :param n: deck size to pass to shuffle function (int)
    :param c: cut size to pass to shuffle function (int)
    :param newstack: variable to hold the list during shuffling
    :return: (newstack (shuffled list), shuffle_count (int)) as a tuple
    >>> shuffle_recursive([1,2,3,4,5], 3, 0)
    4
    """
    newstack = shuffle(cards,c)

    shuffle_count +=1

    if list(newstack) == [x for x in range(1, len(cards)+1)]: #stopping criteria
        return shuffle_count

    else:
        return shuffle_recursive(list(newstack), c, shuffle_count)

So I did that, and was surprised to get a recursion depth error.

Then I realized it only works up to the max recursion depth of 999.

Also, it was obviously too slow.

So I did some profiling, and found that the majority of time was spent in these 3 lines:

   for i in range(stopping_criteria):
        newstack.append(top.pop())
        newstack.append(bottom.pop())

And that kind of surprised me, since I thought the whole point of deque() is that it's supposed to be faster.

So then I spent some time thinking about how I could possibly make the code go faster.

Ultimately I ended up directly creating the interleaved bottom part of the deck, and then added the top. I noticed that the tricky part was dealing with the leftover cards. I also noticed that it took a lot fewer iterations to get back to the starting order if I reversed the top cards before I put them back.

Then I hooked that up to run iteratively, so I could control the number of times it ran, for debugging, etc.

The code is here if you want to see what I did.

I wrote a bunch of tests while I was doing this, like I always do, and I couldn't help noticing that there were some weird edges cases that never worked.

I tried to read some advanced math articles, which led me to understand that the weird edge cases I was seeing were probably harmonics.

Then, because I'm really a data scientist at heart, I wanted to see what that looked like.

I wrote a couple of methods to help me visualize the results.

Overall, I'd say it was a great coding challenge, really interesting and I learned a lot.

However. When I went to turn in my work, the response was less than encouraging.

I wrote:

I came up with a simple, very slow (10 second+ run-time) solution fairly quickly, and then spent 3-4x more time coming up with a 10x faster solution.

What I have right now meets the requirement for 1002 cards with cut size 101 in under a second on my mac laptop (see below - not sure what you define as a "slow machine"?).

And the reply came back:

What answer did your solution arrive at for the test case? Is is 790034? That's not correct, so if that's the case you should take another look. It should only take a tenth of a second or so.

Apparently I was so annoyed at the way this exchange ended that I deleted both my response (let's consider it redacted) and theirs. I said something about how if the point was that it was a coding exercise, maybe they'd want to see my code even if I got a different answer (I did)?

They said I should have known I wasn't supposed to try to actually make the decks based on how the question was worded.

I did not know that. I'm not sure why it's so hard to just ask a straightforward question instead of including, as part of the challenge, that I should be able to read your mind.

Anyway, they did not want to see my code.

Shortly thereafter, I asked a friend who is more of an algorithms person and he said "Oh yeah, all you do is write the equation for a single card to get back to its original position, and then you have the answer."

Of course, I found that confusing, because based on what I did, I don't think it's really that simple. I think it depends on how you do the shuffling, e.g. whether you reverse the top half when you add it back on. Which the original question said nothing about.

And some cards (as the edge cases show) will take a much longer time to get back to their original position, depending on where you cut the deck and how many shuffles you do.

So, my shuffles might be imperfect, and my ability to read interviewers' minds hasn't improved much. But hey, those harmonics are pretty interesting.

Validating results

I don't believe truth is a finite value. Truth is what we know right now. Every ten years or so, a major discovery gets overturned. Scientists are just people, and we're wrong a lot.

So one of the scariest things about doing research, or predictions, is trying to convince yourself, and other people, that what you think you've discovered is 'real'.

Or at least real enough, right now, to be believable. Whenever I do a project, I hope my findings will stand the test of time, at least long enough to be useful.

One piece of advice that has always stuck with me is that there's no such thing as the perfect experiment (or model). No matter how well you do it, no matter how many controls you do, no matter how many replicates, every approach has limitations. It might be biased in ways you know about, and in ways that are not obvious because of variables you don't know about yet.

How do we get around that, and generate solid insights without wasting a lot of time?

  1. Start from what is known. Don't reinvent the wheel. Trust, but verify.
  2. Design your experiments wisely.
  3. Use orthogonal approaches to cover your bases, and test your assumptions.
  4. Get input from other people.

Start from what is known.

Read Everything

When I was in college, the Chair of my Department was big on reading. He gave us 2 thick textbooks the summer before our senior seminar with him, and told us to read them cover to cover. Then when we wrote our final papers, he told us to read every paper we could find on our subject of choice.

It sounds ridiculous, but it's great advice. If you don't know what's been done before, how do you know you're actually doing something new? If it's been attempted before, how do you know you won't make all the same mistakes? How do you know you're not missing out on all kinds of tips and tricks that might save you a lot of time and confusion?

Trust but verify

Perhaps most importantly, if there's some way to verify your assumptions, like say if someone has done part or all of your project before, try to get your hands on their results. You're going to want to know how your results compare with theirs. Can you understand why your results might be different from what they reported? Have you improved on previous methods? Are you able to recapitulate their results using their methods? Using yours? Maybe their results are completely unreproducible. In that case, you shouldn't be trying to make your approach look like theirs. Or maybe your methods aren't very robust, in which case you have more work to do.

Design your experiments wisely

Pick two

Choose at least two approaches. It's never a good idea to base a major conclusion on a single measurement, or even a single type of measurement made in triplicate.

Ideally, it's best to have a handful of orthogonal approaches. Then if the conclusions from each of these are consistent, you can feel pretty confident about your overall conclusion.

Have a testable hypothesis

Do you know what you expect? Try examining your assumptions, and turn those into hypotheses. Are you trying to confirm a known result? Then you are testing whether you can reproduce that result. Set boundaries on how exact your answer should be. Does it have to be perfect? Or within 5%? It really can be that simple.

Know the pros and cons of the approaches you choose

Maybe you're debating about the best way to get data into your system. Your options are things like:

a) scan PDF files and have a 3rd party library parse them into text

b) have someone type in the values by hand

c) pay for access to a 3rd party database

The advantage of using the PDF files is that it's probably the cheapest solution. The disadvantage is that the parsing probably won't be perfect, and will require some additional validation, probably by a person, possibly by a person writing code to check the results.

Having someone type in the values by hand may be more accurate, but it depends on the person. You'll probably still need a second person, or some other method, to check for typos.

Paying for access to a 3rd party database might be the most expensive, but it also might be the most scalable long-term.

It also depends on whether the data is going to be used as samples (for experimentation, where there might not be a 'right answer'), vs. as reference data (where you need the data to be as stable and accurate as possible).

Design an experiment that will let you see things you didn't know to look for

In the example above, options (b) and (c) might provide additional insights that (a) won't give you. Computers only know to look for what you tell them, but people tend to notice everything. Databases often contain additional data that you didn't think you would need, but which might be interesting to you and relevant to your project. Choosing your data sources and process of collection can provide additional insights that you might otherwise miss out on.

Sample selection

Choose samples that represent:

Positive controls: known responders. Examples that reproducibly support your original hypothesis.

Negative controls: known non-responders. Examples that reproducibly do not fit in your target category. They should be examples of things that cannot respond for different reasons. These are perhaps the most important to get right, especially in multivariate systems. If you're not sure what to use, they can be "leave-one-out", if the thing you're leaving out is critical to the event you're observing.

For example, if you were doing a PCR to detect DNA in a sample of water, your negative controls include:

  1. a sample of a different piece of DNA in water
  2. sample of clean water
  3. a sample that lacks the polymerase enzyme for the detection reaction
  4. a sample that lacks the oligonucleotide primers specific for your target DNA

Edge cases: examples of things that you know are hard to identify

To use the PCR example again, this might include a sample that you know contains your target DNA, but with a rearrangement that destroys half of it.

Edge cases are critical for determining the sensitivity and selectivity of your approach. It's also essential to set your metrics based on whether you want to include or filter out certain types of edge cases.

Main samples: examples of what you expect to occur most commonly, which contain your target category and which can be sorted or otherwise identified (if only you can get the right methods and models in place!). Usually this is a mixed population, and you have some hypotheses about what variables might play a role in segmentation, even if your hypothesis is just "there is more than one sub-population in this group." (Hint: if you look carefully, there is usually more than one sub-population in a group)

Sample size

Choose samples that are big enough to be representative. For edge cases, sometimes one is enough. For positive and negative controls, you usually need a few, to check for variability within the populations, and also to compare against your main sample. But it all depends on the distribution of your system. If it's not a normal distribution, or your controls are Gaussian but your main samples are not, you're going to need a bigger sample to train your model, if you want it to perform well in real tests.

It's also worth thinking about how your test population and controls, which are usually static, will represent the real thing, which might be streaming or otherwise changing over time. Are there known variables, like season, that might require you to periodically update your test samples? Your model can't account for what it hasn't seen.

Metrics to evaluate whether it worked or not

Before you begin any experiment, think about how you're going to evaluate what you observe. If you have a hypothesis, your metric for success might be whether your data are consistent with your hypothesis. If you don't, maybe you have a threshold for what you'd consider a believable observation. How are you going to measure that? Is it going to affect the sample sizes you choose for your test populations (probably)?

Get input

Ask other people to discuss the models you're considering. Look at what other people have done in similar situations. Review your work with interested parties, while it's still in progress, so you can make modifications if any of your assumptions are wrong, if new insights become available, or if any of the priorities have changed.

And then when you think you're done, review it again. And again. Get other people to help you look for possible mistakes. Everybody makes mistakes. It doesn't mean you're not working hard or not good at your job. Sometimes it just means you're too close to the problem, or you've been looking at it too long. Make sure to build time into your process so you can take a break, even if it's just to sleep on it for one night, and then look everything over again with fresh eyes.

You might still be wrong

Always remember that you might be wrong. Always re-examine all your assumptions, and all the shortcomings of the approaches you've used. Sometimes this will open up new avenues for investigation, or help you shore up your findings. Sometimes someone else will come to you with evidence that you've made a mistake. There will be bugs in your code. That's ok. Just take a deep breath and see what's changed with this new information.

Ultimately, all you can do is your best, and then you have to let it go. And you have to be ok with that. If you're not ok with knowing that you're human and imperfect, you'll be paralyzed with fear and unable to do anything that hasn't already been done before. That's ok, it just means doing this kind of work is probably not the right path for you.

Test-driven data pipelining

When to test, and why:

• Write a test for every method.

• Write a test any time you find a bug! Then make sure the test passes after you fix the bug.

• Think of tests as showing how your code should be used, and write them accordingly. The next person who's going to edit your code, or even just use your code, should be able to refer to your tests to see what's happening.

• Think of tests as the way you'll make sure things don't break later when you have to refactor.

• It's perfectly okay to write the name of the test first, and then figure out how to write the rest later. In fact, sometimes it helps to write the test so that you know it will fail, and then use that to write better code. That's test-driven development.

    def test_whether_examples_return_likes(self):
        pass 

Give your test a MEANINGFUL name.

The name should say exactly what the expected behavior is.

    def test_savings_meets_minimum_requirement(self):
        savings.pct = calculate_savings()
        self.assertGreater(savings.pct, 5)

Tests should be independent of each other.

If you're thinking about writing a test, first check whether a similar test already exists. Don't duplicate existing tests, because that can create confusion when looking at how many tests fail (creating the impression that things are more broken than they actually are).

What to Test

Start with user stories. What do they expect to see? Test for that.

        def test_user_visits_landing_page(self): 
    	    self.assertIn(title, landing_page)

• Confirm expected data types

	def test_answer_dict_creation(self):
            named = answer_dict()
    	    self.assertTrue(isinstance(named, dict))

• Confirm expected attributes on objects

	def test_profile1_has_no_names(self):
            self.assertEqual(len(profile1.names), 0)
    
        def test_profile2_has_three_names(self):
            self.assertEqual(len(profile2.names), 3)

Features you haven't finished writing yet (Test-Driven Development)

	def test_user_horoscope_is_accurate(self):
            prediction = None
            self.assertEqual(reality, prediction)
        

It's okay (and faster) to group assertions together in one test if they all refer to the same object:

    for name, lookup in get_answers(question):
        self.assertTrue(isinstance(name, str))
        self.assertTrue(isinstance(lookup, dict))
        self.assertTrue(isinstance(lookup[name], pd.DataFrame))
        self.assertIn('answer_name', lookup[name].columns)

Test pipelines sequentially and swap in positive controls for all variables except the one you're testing. Then in the final tests, test the whole pipeline and use only test data. You might want to do this for more complicated features. It ends up looking something like this:

	def test_query_returns_objects(self):
            actual_obj_list = query()
            self.assertEqual(set(expected_obj_list), set(actual_obj_list))
    
        def test_make_dataframes_from_query_objects(self):
            expected_obj_list = query()
            query_obj_dataframe= pd.DataFrame.from_records(expected_obj_list.values())
            self.assertTrue(isinstance(query_obj_dataframe, pd.Dataframe))
   
        def test_query_returns_objects_and_makes_dataframes(self):
            actual_obj_list = query()
            query_obj_dataframe= pd.DataFrame.from_records(actual_obj_list.values())
            self.assertTrue(isinstance(query_obj_dataframe, pd.Dataframe))
            self.assertIn('total success', query_obj_dataframe.columns)

Write tests for how to handle expected obstacles and failures

Write tests for things like "test_data_cleaning" and "test_handle_missing_values". Don't expect to rely on inserting a debugger every time something breaks. If the same piece of code breaks more than once, you've already wasted time by not writing a test.

Don't

• Don't test too many things in one test, unless you have sufficient coverage of all the supporting parts. The whole point of unittests is to help isolate the causes of problems, particularly when you change things later, i.e. to help speed up refactoring and adding new features.

• Don't write useless tests. If the test fails and tells you nothing about why it failed, you did it wrong.

• It's generally considered poor form to put assertions in main code. Assertions are best used in testing. If you want to check for something in the code, use an if statement. However, if a condition is expected to only occur if something has gone horribly wrong, you should raise a custom exception. This can be tremendously helpful for debugging, for example, if database queries are failing.

Don't do this:

    def assertNoErrors(self, resp, msg=''):
        return self.assertEqual(len(resp['messages']['error']), 0, msg)

Better:

    def assertNoErrors(self, resp, msg=''):
    	try:
	    self.assertEqual(len(resp['messages']['error']), 0)
        except AssertionError:
            return resp['messages']['error'] 
            

Running tests

  1. Run tests locally. Whether you're just running unit tests, or django tests, or using some other library (nose, pytest, etc.), you can run single tests, whole test folders or the entire test suite.
  2. Set up and run automated tests through something like Jenkins. Keep in mind that Jenkins may not be using the same database that you're using locally.
  3. Test manually both locally and remotely.
  4. Get someone else to test manually both specifically (black-box testing) and randomly (smoke-testing).

GitHub – szeitlin

Sam Zeitlin

San Francisco

Former research scientist, self-taught pythonista. Yes, I have a PhD in biochemistry, and I write ...