A Paper A Day
- Finished Guo et al. (2014)
I admire this piece of research very much. Again, it’s the ideal kind I am striving for: simple, straightforward, easy to understand, and yet impactful.
- Lawrence, P. A. (2007). The mismeasurement of science. Current Biology, 17(15), R583-R585.
Guo, P. J., Kim, J., & Rubin, R. (2014, March). How video production affects student engagement: An empirical study of MOOC videos. In Proceedings of the first ACM conference on Learning@ scale conference (pp. 41-50).
If I am asked to design a course for Coursera, I’d better: #
- Segment videos into short chunks (< 6 minutes);
- Have my head recorded. Presentations should be inserted at opportune times or simply be presented with a picture-in-picture view;
- Film in an informal setting where I can make eye contact with the potential audience, just like in an office hour talk;
- If I don’t want my head to be filmed, I’d better use Khan-style tutorials rather than slides;
- Plan my lessons “specifically for an online video format” (p. 10) [Edited on 2020-10-22].
Börner et al. (2018). Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy. Proceedings of the National Academy of Sciences, 115(50), 12630-12637.
Main takeaway: Soft skills are in high demand by the industry.
My issue: I like data viz. However, I feel visualizations in this paper are a little bit too much.
Fei-Fei, L., & Perona, P. (2005, June). A bayesian hierarchical model for learning natural scene categories. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (Vol. 2, pp. 524-531). IEEE.
It’s all Greek to me.
Larivière, Ni, Gingras, Cronin, & Sugimoto. (2013). Bibliometrics: Global gender disparities in science. Nature News, 504(7479), 211.
Barriers to women in science remain widely spread worldwide.
In the most productive countries, papers with women in dominent author positions, i.e., sole author, first author, and last author, are cited less than those with men in the same positions;
South America and Eastern Europe had greater gender parity in terms of proportion of authorships.
Disciplines dominated by women all have to do with “care”, for example, nursing; speech, language, and hearing; education.
Natural sciences and humanities are dominated by men. Social sciences had a higher proportion of femail authors.
“Female collaborations are more domestically oriented than are the collaborations of males from the same country” (p. 213)
My issue: How did the authors assign gender to each author? It seems to me that it’s a very difficult job, especially when the names are of a non-Western origin.
Geman, D., & Geman, S. (2016). Opinion: Science in the age of selfies. Proceedings of the National Academy of Sciences, 113(34), 9384-9387.
My thoughts are here .
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205.
Major takeaway: Big data research can learn from, and collaborate with small data research, which offers data that is not contained in big data.
I started to think about my selfie studies. Specifically, I looked at 1) whether there are cultural differences between Chinese women’s selfies on China’s Weibo, and White Women’s selfies on Twitter. For example, is it true that Chinese women focus on their face whereas White women focus on their body in their selfies? Do Chinese women’s selfies show more cuteness?
I also looked at 2) whether there are gender differences between men’s selfies and women’s selfies. For example, do women show more self-touching in selfies?
I used a small-data approach. Although I downloaded over 30,000 images from Twitter and 8,000 images from Weibo, I only selected 200 from each platfrom for analysis, simply because I didn’t have that much man power to analyze them all.
Talking about dig data and small data research, I think I can combine the two here. Human content analysis can offer some insights and then directions for bid data research. After all, there are so many things to detect in a selfie: the gender of the person, his or her mood, surroundings, posture, facial expressions, etc. Deep learning algorithms need some directions so that they can give us the analysis we need.
Finished Bollen, Mao, & Zeng. (2011)
- Finished Giles (2012).
Thoughts: Professor Granovetter is right in pointing out that data itself might not help us have a deeper understanding of our society. After all, his seminal paper on “weak ties” is based on theoretical thinking rather than data.
What are most of the data in research papers used for? To test theories. But theories arise from thinking, not data. Data is limited. It’s extremely difficult for most scholars to get high-quality large-scale data. That shouldn’t become a barrier to theoretical advances. Scholars who cannot get access to quality data can focus on theoretical thinking.
- Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of computational science, 2(1), 1-8.
- Finished Sarma & Kay (2020).
“Weakly informed priors” are popular among scholars practicing Bayesian inferences. However, scholars might have differet interpretations of this concept and different strategies to implement it.
Innovative prior elicitation interfaces can assist novice Bayesian practitioners set priors.
- Giles, J. (2012). Making the links. Nature, 488(7412), 448-450.
2020-10-12 [Completed on 2020-10-13] #
Sarma & Kay. (2020, April). Prior Setting In Practice: Strategies and rationales used in choosing prior distributions for Bayesian analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-12).
Shen & Williams (2011). Unpacking time online: Connecting internet and massively multiplayer online game use with psychosocial well-being. Communication Research, 38(1), 123-149.
Main Takeaway: The psychological impacts of Internet activities are nuanced.
Finished Centola & Macy. (2007). Main Takeaway:
The strength of weak ties should not be simply generalized to complex contagions, which requires affirmation from multiple sources. Therefore, not only the length, but also, and maybe more importantly, the width of the ties influences complex contagions.
Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American journal of Sociology, 113(3), 702-734.
p. 702 - p.711
Centola, D. (2010). #
Within the unstructured conition, there are more non-obese adopters than obese adopter, both in terms of number and percentage;
Across conditions: homophily boosted adoption among both the obese (P < 0.01) and the non-obese people (P < 0.05), using Mann-Whitney U test.
We can see that homophily had a significant effect on adoption of healthy behaviors. However, is it because obese people are more likely to be exposed to the behavior, or those who are exposed are morely likely to adopt these behaviors in a homophilous group?
It turns out that within both conditions, the relative percentage of the obese and the non-obese did not differ significantly.
Across conditions: homophily boosted both the number and the fraction of the obese who were exposed to the behavior (P < 0.05), using Mann-Whitney U test.. This happened despite that obese people initially had greater exposure in the unstructured networks.
Did homophily affect the adotion rate among those exposed? The effect was significant among the exposed obese people (P < 0.01), using Mann-Whitney U test, but not among the exposed non-obese individuals.
I like this study: simple, and impactful.
- Finished Eubank et al. (2004)
Time of withdrawal to the home is by far the most important factor (in a disease outbreak in cities), followed by delay in response. This indicates that targeted vaccination is feasible when combined with fast detection. Ironically, the actual strategy used is much less important than either of these factors. – Eubank et al. (2004)
- Centola, D. (2011). An experimental study of homophily in the adoption of health behavior. Science, 334(6060), 1269-1272.
- Within the homophilous condition, a higher percentage of obese people than non-obese people adopted the behavior (P < 0.05).
- Finished Schmälzle et al. (2017).
Social exclusion correlates increased connectivity in the brain’s mentalizing system;
When excluded, people whose friends are sparsely connected with each other showed increased connectivity within key brain systems.
Overall, social exclusion / inclusion is related to connectivity within one’s brain networks. Also, the density of one’s friendship network has an effect on the connectivity change.
- Eubank et al.(2004). Modelling disease outbreaks in realistic urban social networks. Nature, 429(6988), 180-184.
Schmälzle et al. (2017). Brain connectivity dynamics during social interaction reflect social network structure. Proceedings of the National Academy of Sciences, 114(20), 5153-5158.
p. 5153 -p.5156
Finished Chambliss. (1989).
Superlative performance is really a confluence of dozens of small skills or activities, each one learned or stumbled upon, which have been carefully drilled into habit and then are fitted together in a synthesized whole. — Chambliss, D. F. (p. 81)
Excellence requires qualitative differentiation. #
Those who are more successful are doing different things, rather than more of the same things. Quantitative changes do bring success, but only whithin the world you are currently in. You cannot go to another world by doing more of what you have been doing. Those who are top performers are better to be seen as different rather than as better.
Talent is not the reason for excellence. #
First of all, factors other than talent predict success more precisely.
Second, you cannot distinguish talent from its effects, i.e., you cannot realize there is talent until someone succeeds.
Third, the amount of talent needed for excellence is surprisingly small.
Excellence is mundane. #
- Success is ordinary. Success is simply doing small tasks consistantly and correctly.
Note : Below are the notes on 2020-10-04
Motivation is also ordinary. Gold medalists did not think too far ahead. Instead, they focused on the most immediate goals, the so-called “small wins”. For example, Steve Lundquist, who won two gold medals in swimming in the Los Angeles Olympics, set a goal that he would win every single siwm in every single practice. Small wins added up to excellence and success.
Don’t take what you do as too important. You should maintain mundanity. If you are going to deliever a commencement speech in front of an audience of thousands, you should know that almost nobody cares about nor remembers what you have to say. When you are writing your doctoral thesis, you should also be aware that few people will read what you write.
Finished Bullmore & Sporns. (2009)
Chambliss, D. F. (1989). The mundanity of excellence: An ethnographic report on stratification and Olympic swimmers. Sociological theory, 7(1), 70-86.
Bullmore & Sporns. (2009). p6-p9.
Bullmore, E., & Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews neuroscience, 10(3), 186-198.
Stivers et al. (2009)
There is a universal patter for turn-taking. People aim to minimize gap and overlap in conversations.
- Nonanswer responses
- Disconfirmation responses
- Responses ithout a visible component (e.g., head nods shrugs, head shakes, blinks, or eyebrow flashes)
Faster: Questions with gaze from the questioner
Stivers et al. (2009). Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences, 106(26), 10587-10592.
Liljeros et al. (2001)
For both males and females, the cumulative distribution of the number of partners in the previous 12 months almost perfectly followed a straight line, indicating scale-free power-law characteristics;
For both genders, the cumulative distribution of the total number sexual partners in the entire lifetime followed a straight line only when
$k > 20$.
The network of sexual partners is a scale-free one, meaning that you cannot assume, for example, 90% of the individuals have 3 - 10 partners. This is simply because there is no inherent scale. It’s a crazy world, literally. I cannot believe that there are people who have over 100, even 1000 partners in their lifetime. Isn’t this a crazy world?
Thanks to this paper, I now know that for a power-law distribution to show a straight line, I need to use CDF (cumulative distribution function)
One thing I didn’t understand is that how could the authors conclud that “the rich get richer” by simply looking at Figure 2a? I don’t think it a rigorous remark.
1. Del Vicario et al. (2016). #
This piece is a little bit too technical for me, especially the second part that involve modeling. Also, I had difficulty understanding the conceptualization of “homogeneity” and “polarization”.
Major takeaways from this paper:
Information on social media quickly reach in 2 hours around 20% of the people it can reach in the end, and reach in 5 hours around 40%. This is true for both science and rumors.
Science news is usually quickly diffused. However, long-lasting interest doesn’t correspond to the size of the interest. This means, even though people keep sharing it, not a lot of people will be interested in it.
Conspiracy rumors diffused slowly and its cascade size is positively correlated with its lifetime. Meaning that the longer it lasts, the more people become interested in it.
2. Liljeros, F., Edling, C. R., Amaral, L. A. N., Stanley, H. E., & Åberg, Y. (2001). The web of human sexual contacts. Nature, 411(6840), 907-908. #
This is the kind of study I admire: short, interesting, and impactful.
Del Vicario et al. (2016).
- Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130-1132.
Among 7 million distinct URLs shared by 10 million Facebook users in the US, 13% were hard news;
Around 20% of a person’s friends had the opposite political affiliation;
Liberals had fewer friends who shared news from the other side;
Controlling for the position of the news feed, it seemed conservatives were more likely to click on cross-cutting content, i.e., news that came from the other side; This result suprised me.
- Del Vicario et al. (2016). The spreading of misinformation online. Proceedings of the National Academy of Sciences, 113(3), 554-559.
Finished Kay et al. (2016).
Helping researchers in different fields set priors might be something worth doing in the future.
Hullman et al. (2017)
Kay, M., Nelson, G. L., & Hekler, E. B. (2016, May). Researcher-centered design of statistics: Why Bayesian statistics better fit the culture and incentives of HCI. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 4521-4532).
Bayesian approaches make knowledge accrual possible without meta-analysis approaches
Even though scholars use effect size and confidence intervals, the ultimate goal of looking for small ps will ruin everything.
Hullman, J., Kay, M., Kim, Y. S., & Shrestha, S. (2017). Imagining replications: Graphical prediction & discrete visualizations improve recall & estimation of effect uncertainty. IEEE transactions on visualization and computer graphics, 24(1), 446-456.
Continue from 2nd para. of 3.2 (Evaluations with Users) tomorrow.
Vosoughi et al. (2018)
The work is indeed significant. It compared the spreading of true and false news on Twitter and concluded that the false spread faster, deeper, and farther than the truth. False political news, in particular, is diffused especially broadly and deeply.
- Was it because those who spread the false were more influential or active?
Not really. Those who spread false news had fewer followers, followed fewer people on Twitter, are less likely to be verified, and had been on Twitter for less time.
Was it because false news was more noval and users are more likely to retweet information with more novelty?
- False rumors were indeed more novel than the truth;
- False news was objectively more novel, but did users get it?
- Yes, replies to false news showed greater surprise and disgust, whereas the truth inspired more sadness and joy.
Was it because of selection bias? I mean, the tweets from the six organizations might not be representative of all tweets.
- The authors verified a second sample of Tweets, which were labled by three undergraduates students as true, false, or mixed. Again, the results were the same.
Did false news spread faster, deeper, farther, and more broadly because of bot activities? I mean, was it because bot crazily retweeted and replied to false news?
- Two bot-detection algorithms were applied independently to detect and remove bots before data analysis. Results were the same. This has significant implications: that false news travelled faster and farther not because of bots, but because of humans.
I had several issues: #
- Bad data visualization
At first glance, data visualization in this article is good. However, most of the figures used only red and green and therefore are not friendly to color-blinded people.
- Content analysis
They should report Krippendorff’s alpha rather than an agreement of 90%, I believe.
- No hypotheses beforehand
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151.
Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96-104.
Hilbert, M., & López, P. (2011). The world’s technological capacity to store, communicate, and compute information. Science, 332(6025), 60-65.
González-Bailón, S., Borge-Holthoefer, J., Rivero, A., & Moreno, Y. (2011). The dynamics of protest recruitment through an online network. Scientific reports, 1, 197.
Study goal: Study whether and how social network sites encourage recruitment in social movements.
Why wasn’t it published on Nature or Science: A first look at this paper made me feel that it should have published on Nature or Science. I believe the authors must have tried. After reading the whole paper, I concluded that lack of sufficient evidence might have been the reason why it didn’t manage to do so. As the authors have mentioned in their limitations part, there were so many factors other than Twitter that influenced the movement in question, and it was impossible to single them out.
Lazer et al. (2009). Life in the network: the coming age of computational social science. Science, 323(5915), 721.
The potential of computational social science and how to make preparations for its future.
p 1-3. Lazeret et al. (2018). The science of fake news. Science, 359(6380), 1094-1096.
Increasing partisan preferences in the US created a context for fake news to attract huge audiences;
We don’t know the exact ratio of fake news against real news, and we don’t know the medium-to-long-run effect of exposure to fake news on people’s attitudes.
Bots on social media are hard to detect. Once a detecting technique is developed, bots will upgrate themselves.
- Encouraging people to use fact checking. However, we are not sure whether this is useful or not, partyly due to people’s confirmation bias and desirability bias.
- Internet oligoplies should collaborate with academia to understand how pervasive fake news is. Also, these oligoplies’ power should be contained by, for example, legal systems.
Lazer et al. (2020)
- Compuational social science: language, location, movement, networks, images, and video, using statistical models that capture multifarious dependencies.
- Interdisplinary research not encouraged enough, especially that involve cooperation between social and computer scientists, due to unfavorable policies at universities;
- Proprietary data unavailable to researchers.
- Available data is not intended for research and won’t be shared with other researchers, which impedes reproducibility.
- Lack of regulatory guidance from university IRBs about collectinga nd analyzing sensitive data.
- Collaborate and negotiate with private companies for data;
- Build infrastructures that provide data as well as preserve participants’ privacy;
- Develop new ethical guidelines;
- Reorganize universities so that 1) multi-displinary collaboration is professionally or fanancially rewarded, and 2) enforce ethical research
- Researchers make sure that they do public good.
p1. Lazer et al. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060-1062.
Recapping Centola (2010):
- Contribution: An experimental design that ran contrary to previous findings regarding the strength of weak ties.
- Conclusion: networks with local clustering are conducive to behavioral diffusion.
- Method: An experiment with two groups. One group found themselves in a random network, and the other group in a clustered-lattice network. Degree distribution of the two networks are identical.
- Why could it be published on Science: Maybe the first empirical test of two competing hypotheses regarding the effect of network topology on behavior spreading.
- My question: I didn’t see many long ties in the “small-world network” in Figure 1.
- Improvements: I didn’t know all of the statistical tests used in this paper. I know Mann-Whitney U test but I don’t know Kolmogorov-Smirnov. I am wondering whether the study could be conducted using Bayesian statistics.
- p5-p12. Cha et al. (2007, October).
- p1-p4. Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329(5996), 1194-1197.
- p1-p4. Cha et al. (2007, October). I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (pp. 1-14).