8

I have an experimental study with a list of demographic and related questions and in order to identify data from participants that were potentially just answering the questions at random (to get through them more quickly I would assume), I've included two very similar 7-point likert scale questions at different points in the survey. My assumption would be that since the questions are reflective, the answers participants should give will be at least somewhat similar between the two questions (eg, it should be very unlikely that a participant answers 7 to one question yet 1 to the other).

I haven't yet collected the data, however I would like to have a method for determining which sets of data are suspicious (might be considered for exclusion in analysis) based on these control questions. One method might be to simply determine where the data fit on a Gaussian distribution. However, I think that the limited discriminating power of a 7-point scale would make this an improper test. My other idea was to do a cluster analysis on the data, looking for five groups: three along the line of correlation (between the questions), and two to examine unusually high/low and low/high values. I thought this could provide better suggestions for which data sets might be unusual since it wouldn't use somewhat arbitrary comparisons, it would only use the data given.

I'd really appreciate any suggestions for a better method, or improvements I could make as well as any comments toward more "standard" practices in this area, since I'm somewhat new to research.

Jeromy Anglim
  • 30,741
  • 11
  • 93
  • 221
Ryan Lang
  • 128
  • 1
  • 5
  • I don't know what your control questions are, but you might want to consider that questions that (to you) have a similar meaning, might not appear so similar to your subjects. Also, there might be positioning effects (priming) related to preceding questions. You should test your questionaire with attentive subjects in a closely monitored setting and see if the control questions actually score equally. If there is even a slight variance in this test of your test, you should be extremely careful how you interpret a larger variance in a situation you do not monitor closely. –  Mar 10 '13 at 16:32

3 Answers3

4

You seem to be concerned with reliability, and more specifically internal reliability. Internal reliability is the degree to which different questions are measuring the same construct. This concept is used often in psychology and is usually measured using Cronbach's alpha. However, it is typically used to measure the reliability of a test, and not the reliability of an individual.

As Jeromy Anglim points out, I think it's important to consider the goal here. Using a two question Likert scale is probably not good enough to reliably detect outliers: What if the respondent checked all '4s' on a 7-point Likert scale? Reversing the scale would have no effect.

One alternative approach is to employ an instructional manipulation check (Oppenheimer et al., 2009). The gist of the technique is to trap participants into answering a question in a specific way that they could only have done by reading the instructions carefully. Here is an example from a survey administered by Facebook:

enter image description here

While this technique may throw out a few good participants, it will almost certainly raise the signal-to-noise ratio of your data by only including participants who followed instructions and read questions before answering.

Another tried and true technique is to use a computer-administered test and look at reaction times. You may be able to throw out a few responses (or whole participants) by simply looking for outliers in response time that are below the mean.

Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45(4), 867-872.

Jeff
  • 5,488
  • 29
  • 44
  • "Internal reliability tests the degree to which different questions are measuring the same construct" does not seem quite right. You can have 2 underlying dimensions and have high Cronbach's alpha. See: http://psycnet.apa.org/journals/pas/8/4/350/ – RJ- Mar 11 '13 at 05:14
  • @RJ that means Cronbach's alpha might not be measuring internal reliability, not that the definition of internal reliability is wrong. According to the paper you cite, "Internal consistency refers to the interrelatedness of a set of items" which seems in line with what I am saying. – Jeff Mar 11 '13 at 05:47
  • I am taking issue mainly with "measuring the same construct". The paper also points out that "measuring the same construct" is different from "inter-relatedness" which is what Cronbach's alpha is measuring. – RJ- Mar 11 '13 at 05:50
  • Ah, perhaps I should change the wording to "Internal reliability is the degree to which..." and "Cronbach's alpha tests..." I can see how my definition is not in line with what Cronbach's alpha is testing, but still think it's an accurate description of what internal reliability is. – Jeff Mar 11 '13 at 05:58
  • 1
    The FB example is rather problematic. The text under "Almost done" does not visually relate to the following two questions, and the meaning of "almost done" does not signal relevant instructions. I would never in my life read it, and it took me a good minute to understand the nature of this example! This would only work if the instructions where placed between the question heading and the question. –  Mar 11 '13 at 13:45
  • @what that is sort of the point. many psychology experiments contain blocks of text that explain how to answer the following questions. it's important that you read it to answer them correctly. if you skip sections that you don't "expect to have relevant instructions" then, well, you may be skipping relevant instructions. case in point, the facebook survey. however, this is not the only example of an IMC; try also reading the oppenheimer reference or do a scholar search for "instructional manipulation check" – Jeff Mar 11 '13 at 19:36
  • @Jeff I (think I) understand, but from my viewpoint a good survey needs do adhere to basic rules of user interface design. A heading means that something new is coming. That is what we learn from elementary school onwards. You cannot simply use it to connect components. The instruction for a question belongs between the heading for a question and the question itself, or beside the question. You can't just break user expectations that have been built by filling out thousands of online forms and reading books and magazines and expect meaningful results. –  Mar 11 '13 at 20:56
  • I agree that good UI is important. There are many reasons why a participant may not follow instructions, and bad UI is one of them. But from the data analyst's standpoint, it really doesn't matter what the reason is. The IMC detects failure to follow instructions for any reason. – Jeff Mar 11 '13 at 21:35
  • As an aside, in many experiments, instructions will apply to a block of questions. Putting instructions in between a question and its answers is not feasible. I do agree that the heading title itself "Almost done!" is filler, which probably leads a lot of people to skip that section. It should probably say "Instructions". But I don't think the placement of the any of the elements is a problem here. In any case, this is a debate for chat or for ux.SE and doesn't really have a bearing on my answer IMHO. – Jeff Mar 11 '13 at 21:39
  • You are right, the idea behind your answer is perfect (and I have upvoted it). The problem is that the question is formatted to appear as a heading of the same semantic level as the section heading ("Almost done"). –  Mar 12 '13 at 16:18
  • (-1) I think this is terrible advice. The question is only tangentially related to reliability. Cronbach alpha, beside being much less useful than usually thought and often misinterpreted, does not address it at all. Alpha, internal consistency or reliability all come up when building or interpreting a scale and can only be computed over a set of scores. None of this helps to select observations. – Gala May 07 '13 at 18:33
  • The only relevant part is the Facebook trap question but it looks like the cure is worse than the disease here. You risk confusing a great number of sincere respondents (and you certainly can't assume that the “good” respondents you exclude are picked at random so that you are not only reducing power and sample size but also introducing bias) for a benefit that is very doubtful. – Gala May 07 '13 at 18:37
  • Researchers are prone to worry about it but I am yet to see evidence that satisficing is generally such a big problem (I don't see it in my research and I have run all sorts of psychological experiments with students, usability tests, long interviews with people of all ages, crowdsourcing studies on Mechanical Turk, Internet surveys in more than a dozen countries, etc.) – Gala May 07 '13 at 18:43
  • @GaëlLaurans I am not suggesting the use of Cronbach's alpha here. I mentioned it because it seems like that's the solution OP was trying to employ, but as I said in my post it is inappropriate here. As for the Facebook example-- don't take it too literally. I'm promoting the general idea behind an IMC, and not advocating for a specific implementation. There are probably more effective examples. – Jeff May 07 '13 at 19:53
  • As for the issue of satisficing, I'm simply answering the OP's question. He is concerned about it, and other researchers have voiced concerns as well. I never claimed that it was a widespread problem, but it appears to be a problem in at least some circumstances. – Jeff May 07 '13 at 19:54
3

Preventing random responding: An important first step is to think about ways to prevent random responding from occurring in the first place. A few ideas include: administer the survey face to face; have an experimental invigilator present; communicate the importance of the research to participants and the importance of participants taking the research seriously; use financial remuneration.

That said, there are situations where participants do not take a study seriously responding randomly for example. This seems to be particularly an issue when collecting data online.

General approach: My overall approach to this is to develop multiple indicators of problematic participation. I'll then assign penalty points to each participant based on the severity of the indicators. Participants with penalty points above a threshold are excluded from analyses.

The choices of what is problematic depends on the type of study:

  • If a study is performed in a face to face setting, the experimenter can take notes recording when participants engage in problematic behaviour.
  • In online survey style studies I record reaction time for each item. I then see how many items are answered more quickly than the person could conceivably read and respond to the item. For example, answering a personality test item in less than about 600 or even 800 milliseconds indicates that the participant has skipped an item. I then count up the number of times this occurs, and set a cut-off.
  • In performance based tasks, other participant actions may imply distraction or not taking the task seriously. I'll try to develop indicators for this.

Mahalanobis distance is often a useful tool to flag multivariate outliers. You can further inspect the cases with the largest values to think about whether they make sense. There is a bit of an art in deciding which variables to include in the distance calculation. In particular, if you have a mix of positively and negatively worded items, carelessness is often indicated by a lack movement between the poles of a scale as you move from positively to negatively worded items.

In general, I also often include items at the end of the test asking the participant whether they took the experiment seriously.

Discussion in the Literature

Osborne and Blanchard (2010) discuss random responding in the context of multiple choice tests. They mention the strategy of including items that all participants should answer correctly. To quote

These can be content that should not be missed (e.g., 2+2=__), behavioral/attitudinal questions (e.g., I weave the fabric for all my clothes), non-sense items (e.g., there are 30days in February), or targeted multiple-choice test items [e.g., “How do you spell ‘forensics’?” (a) fornsis, (b) forensics, (c) phorensicks, (d) forensix].

References

Jeromy Anglim
  • 30,741
  • 11
  • 93
  • 221
  • For surveys, how do you use an "invigilator" or "take notes recording when participants engage in problematic behaviour," without violating the anonymity of the participant? – Ryan Lang Mar 10 '13 at 08:51
  • @RyanLang Anonymity is preserved as long as there is no identifying information attached to the data. You may well note information about the subject, such as uncommon behavior, to make your data more meaningful. E.g. taking note that a subject appeard intoxicated might help explain their slow reaction times and better help you decide to exclude the data. Consider that usually data is not collected by the same person that evaluates it, and both might be different from the person designing a study. I would go so far as to say that it is a must to note down anything uncommon about a proband. –  Mar 10 '13 at 16:39
  • What you must usually keep separate from the data are names, addresses, birth dates etc., that are more or less unique to a person. You may even collect these within your data, if they are necessary for your research, but you have to be extremely careful with this information and delete it as soon as it is no longer used. Usually your ethics commission will decide if they allow the collection of this information within your data. (This is German law. The law of other countries will most certainly be different.) –  Mar 10 '13 at 16:43
  • 1
    @Jerome Managing the attitude of subjects towards test taking is an important part in test design. Good practices are: (1) creating interest in the subjects by providing an engaging explanation ("story") and, if possible, a relevant outcome (e.g. display or discuss results that they would like to know); (2) be friendly (this can and needs to be done in online surveys, too); (3) create short tests that don't tire or bore your subjects; (4) make your test visually appealing and easy to "parse"; (5) ask your grandmother, if she understands your questions; (6) don't pay for participation –  Mar 10 '13 at 16:54
2

This is not directly an answer to your question but, in line with my comments to another answer, my main advice would be “don't worry about it”.

Jeromy Anglim's tips are all good but I am still unconvinced that this is an important issue for most people. Since you are new to research, there are probably dozens of other things you should worry about.

Furthermore, if you do see evidence that there is a problem (extremely short response times, contradictory answers, large number of respondent providing absurd answers to open-ended questions), I would argue that you should first step back and ask yourself if what you are asking is reasonable (Do the task make sense? Can people be expected to have an opinion about the topic you are investigating? Are you demanding too much effort?) rather than trying to sort out “bad” respondents.

If you really want to dig into the subject and look up some literature, another name for this phenomenon is “satisficing”. “Response set” is a related idea that might be of interest.

Gala
  • 1,195
  • 5
  • 12