Understanding the CC License Selection Behavior of Flickr Users

David Wiley
The Center for Open and Sustainable Learning
Department of Instructional Technology
Utah State University
david[dot]wiley[at]usu[dot]edu

Creative Commons

As a matter of law, all creative works produced in the US are copyrighted as soon as they are created. (And while I'm not an expert on international law, since the US does such a great job evangelizing its copyright policies, this is increasingly the case everywhere you go.) Because the law works in this manner, an individual who wishes to provide access to their creative works under any arrangement other than "All Rights Reserved" must use a legal mechanism to express this alternative arrangement. The Creative Commons (CC) project creates and distributes licenses that allow creators to easily express a variety of "Some Rights Reserved" arrangements to consumers. Given the availability of this variety of licenses, the description and explaination of selection behavior becomes an important component of understanding the open content community.

In explaining the license selection process the Creative Commons website states:

With a Creative Commons license, you keep your copyright but allow people to copy and distribute your work provided they give you credit -- and only on the conditions you specify here.
Each of the six licenses is a different configuration of four conditions, which CC describes as follows: The six standard CC licenses combine these conditions as follows:

The WiSH

Once so many configurations of conditions - or licenses - are available, the description and explaination of selection behavior becomes an important component of understanding the open content community. In predicting selection behavior, we can begin from the assumption in law that all authors and artists want to reserve all rights, and broaden this to view to encompass a spectrum of people and a spectrum of rights reserved. We may make the qualitative prediction that there will be extremely few creators who are willing to give away all rights to their materials, a small group of creators willing to give away most of the rights to their materials, a large group of creators willing to give away only a few of the rights to their materials, and that the largest group of all is not willing to give away any rights. In shorter form, Wiley's Stinginess Hypothesis (WiSH) states:
Proportion of creators choosing the license ∝ Proportion of rights reserved in the license

One approach to testing WiSH in the context of CC licenses would be to predict that licenses that impose three conditions on use should have more adopters than licenses that impose two conditions on use, which should in turn have more than adopters than the license that imposes only one condition. So, in terms of the proportion of selections, we will expect to find the following relationship:

( By-NC-ND + By-NC-SA ) > ( By-NC + By-ND + By-SA ) > By
We will call the comparison of this hypothesis with data the Simple Test of WiSH.

A more sensitive test would allow comparisons between licenses with the same number of conditions. This, however, requires us to distinguish between the relative proportion of rights reserved by the NC, ND, and SA conditions (since every license imposes the By condition we can ignore it here). Because ND reserves the right to all derivative works, and SA allows derivative works under certain conditions, we can assert that ND reserves more rights than SA.

All CC licenses include a provision allowing users to copy, distribute, display, and perform the licensed work. The ND clause specifically prohibits an additional kind of use (specifically, making derivative works), but does not impact this base set of four freedoms. The NC clause, however, restricts the licensee's ability take advantage of the four basic freedoms to copy, distribute, display, and perform the work in certain situations. For this reason, we may assert that NC reserves more rights than ND.

In summary, we can hypothesize the following relationship among license conditions in terms of the proportion of rights they reserve:

NC > ND > SA
And we may, therefore, hypothesize the following order of license selection based on the degree of rights reserved by each license:
By-NC-ND > By-NC-SA > By-NC > By-ND > By-SA > By
We will call the test of this hypothesis with data the Sensitve Test of WiSH.

Flickr

The online photo sharing site Flickr provides an excellent opportunity to test these hypotheses empirically. Flickr is a free service by which users can upload photos for public viewing (there is a paid service as well). Flickr offers its users the range of six standard CC licenses, which allow users to choose to reserve a broad range of rights for their photos. As of August 2, 2005, Flickr contains just under 3.5 million photos licensed under a Creative Commons license, which provides ample data for analysis. Finally, Flickr makes information about the total number of photographs licensed with each configuration of options readily available.

One weakness of choosing Flickr for this research is that Flickr provides no historical view of their data. If the preference ranking of licenses as expressed in user selections varies significantly over time, comparison of data from any point in time with the hyptheses will be meaningless. Fortunately, the Internet Archive archived the Flickr CC page in October of 2004, giving us a second data point with which to compare current data. Table 1 represents the raw selection data for October 11, 2004 and August 2, 2005. Figure 1 shows the same data displayed as pie charts.

Table 1. Raw Data for selection of CC Licenses for photos by Flickr users in October 2004 and August 2005.

License October 2004 August 2005
By7841338,543
By-ND2329100,237
By-NC-ND237311,040,879
By-NC11954502,296
By-NC-SA298661,212,885
By-SA5369271,212
Total81,0903,466,052

Figure 1. Selection of CC Licenses for photos by Flickr users in October 2004 and August 2005 as Pie Charts.

There is something absolutely amazing about this data. Even though the collection grows by a factor of 43 over the ten months between these two data points, selection behavior seems to be almost entirely stable. A Pearson correlation of the 10/04 data and the 08/05 data gives a result of 0.997.

One might think that this stability is just a result of the same users making the same selections as they license additional photos, but this is almost certainly not the case. The increase in licensed photos is not dissimilar from the increase in Flickr users themselves, a number which grew from around 175,000 users in December 2004 to over 1,000,000 by August 2005.

The Simple Test of WiSH

Confident that the August 2005 data represent a stable set of license preference expressions by Flickr users, we can conduct the Simple Test of WiSH. Table 2 shows selection data clustered by number of conditions in each license.

Table 2. Licenses clustered by number of conditions

Conditions License(s) Photos
3By-NC-ND + By-NC-SA2,253,764
2By-NC + By-ND + By-SA873,745
1By338,543

As the data show the predicted sequence of "more users adopt licenses which reserve more rights," and as each group of selections is more than twice the size of the previous group, we may say that the Simple Test of WiSH seems to support the hypothesis.

The Sensitive Test of WiSH

The Sensitive Test requires us to compare the hypothesized order of license selections and the actual order of those selections. Table 3 presents this information as well as the raw number of actual selections from the August 2005 data.

Table 3. Hypothesized and actual orders of license selections, with raw data

Hypothesized Order Actual Order Actual Seletions
By-NC-NDBy-NC-SA1,212,885
By-NC-SABy-NC-ND1,040,879
By-NCBy-NC502,296
By-NDBy338,543
By-SABy-SA271,212
ByBy-ND100,237

Things appear all out of whack here. Only two of the predictions appear close to accurate. First, By-NC is the most popular of the two-condition licenses and the third most selected license overall. Second, By-SA is the fifth most popular license (although it is out of its predicted position with respect to By-ND and By).

Discussion of Results

The incredible stability of user preferences for various licenses, even as the user community has expanded at an enormous rate, is both surprising and promising.

While WiSH holds up when licenses are aggregated according to the number of conditions comprising them, there appears to be very little support for WiSH at the grain size of individual licenses.

The most interesting deviation from the WiSH prediction is the popularity of By, which beats out both the two-condition By-ND and By-SA.

Temporarily ignoring By, as all licenses include the By condition, Flickr users are expressing preferences between NC, ND, and SA as:

NC > SA > ND
As opposed to the predicted:
NC > ND > SA

What does it all mean? WiSH does a fair job of describing behavior at a high level. At a lower level, a different explanatory framework is necessary. My first impression of the NC > SA > ND ordering in the data is this: the group of people interested in Creative Commons licenses associate CC and other artifacts of the open source movement with anti-commercial models more about sharing than anything else. Further study will be required, but it may be that the best explanation for selection behavior has more to do with sentiments that resonate with the terms "noncommercial" and "sharing" than with complicated theoretical structures regarding the proportion of rights reserved by a given license. Although WiSH does a good job of predicting behavior at a high level...

Threats to Validity and Weaknesses of the Study

If users randomly selected licenses for their photos (i.e., if license selection were randomly distributed over the photos) we would expect larger clusters of licenses to have a greater proportion of selections than smaller clusters or than a single license. From this perspective, we would expect the three licenses with two conditions (By-NC + By-ND + By-SA) to have the greatest proportion of selections, while the two licenses with three conditions (By-NC-ND + By-NC-SA) would have the next greatest proportion, followed by the sole license with one condition (By). The data in Table 2 show that this is clearly not the case, as the two three-condition licenses have been selected more than twice as frequently as the three two-condition licenses.

Second, one may suspect that the order in which the license options are presented to users in the Flickr license selection interface influences the individuals selections. In fact, the Flickr selection tool presents the licenses in the following order: By, By-ND, By-NC-ND, By-NC, By-NC-SA, By-SA, as illustrated in Figure 2.

Figure 2. The Flickr license selection interface

This order bears almost no resemblance to the selection order in the data.

One weakness of this study is that it works from aggregate data without access to individual user decisions. This weakness could be overcome either by crawling all of Flickr or Flickr simply providing me with data because they're nice guys. Another weakness is the fact that the study looks at selection behavior for a single community (Flickr). If you know of other large collections of CC licensed materials that make aggregate statistics available OR have any other feedback on this paper, please leave a comment here.

This paper is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.