Introduction

The objective of this project is to analyze a dataset on university faculty perceptions and practices of using Wikipedia as a teaching resource. The data is from a survey sent to part-time and full-time professors at two Spanish universities in 2012-2013: Universitat Oberta de Catalunya (UOC) and Universitat Pompeu Fabra (UPF). First, using PCA, we will identify any relationships between the survey items, as well as whether the survey items cluster according to any of the teachers’ attributes. There are many variables with missing values, some systematic and some at-random. We will need to think of a way to remedy the missing values. After this, we will use classification techniques such as logistic regression, LDA, QDA, and/or kNN, to predict the “use behavior” of Wikipedia by teachers based on the teachers’ attributes and responses to survey items. Although these techniques each have their own limitations - KNN has a high computational cost, Logistic Regression difficult to interpret individual variable’s effect, and LDA/QDA will not perform well if assumptions are not met.

Data

The data set we will be using pertains to ongoing research on university faculty perceptions and practices of using Wikipedia as a teaching resource. Based on a Technology Acceptance Model, the relationships within the internal and external constructs of the model are analyzed. Both the perception of colleagues’ opinion about Wikipedia and the perceived quality of the information in Wikipedia play a central role in the obtained model.

Attribute Information:

AGE: numeric
GENDER: 0=Male; 1=Female
DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics; 6=Social Science
PhD: 0=No; 1=Yes
YEARSEXP (years of university teaching experience): numeric
UNIVERSITY: 1=UOC; 2=UPF
UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
OTHER_POSITION (main job in another university for part-time members): 1=Yes; 2=No
OTHERSTATUS (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
USERWIKI (Wikipedia registered user): 0=No; 1=Yes

The following survey items are Likert scale (1-5) ranging from strongly disagree / never (1) to strongly agree / always (5)

Perceived Usefulness
PU1: The use of Wikipedia makes it easier for students to develop new skills
PU2: The use of Wikipedia improves students’ learning
PU3: Wikipedia is useful for teaching

Perceived Ease of Use
PEU1: Wikipedia is user-friendly
PEU2: It is easy to find in Wikipedia the information you seek
PEU3: It is easy to add or edit information in Wikipedia

Perceived Enjoyment
ENJ1: The use of Wikipedia stimulates curiosity
ENJ2: The use of Wikipedia is entertaining

Quality
QU1: Articles in Wikipedia are reliable
QU2: Articles in Wikipedia are updated
QU3: Articles in Wikipedia are comprehensive
QU4: In my area of expertise, Wikipedia has a lower quality than other educational resources
QU5: I trust in the editing system of Wikipedia

Visibility
VIS1: Wikipedia improves visibility of students’ work
VIS2: It is easy to have a record of the contributions made in Wikipedia
VIS3: I cite Wikipedia in my academic papers

Social Image
IM1: The use of Wikipedia is well considered among colleagues
IM2: In academia, sharing open educational resources is appreciated
IM3: My colleagues use Wikipedia

Sharing attitude
SA1: It is important to share academic content in open platforms
SA2: It is important to publish research results in other media than academic journals or books
SA3: It is important that students become familiar with online collaborative environments

Use behaviour
USE1: I use Wikipedia to develop my teaching materials
USE2: I use Wikipedia as a platform to develop educational activities with students
USE3: I recommend my students to use Wikipedia
USE4: I recommend my colleagues to use Wikipedia
USE5: I agree my students use Wikipedia in my courses

Profile 2.0
PF1: I contribute to blogs
PF2: I actively participate in social networks
PF3: I publish academic content in open platforms

Job relevance
JR1: My university promotes the use of open collaborative environments in the Internet
JR2: My university considers the use of open collaborative environments in the Internet as a teaching merit

Behavioral intention
BI1: In the future I will recommend the use of Wikipedia to my colleagues and students
BI2: In the future I will use Wikipedia in my teaching activity

Incentives
INC1: To design educational activities using Wikipedia, it would be helpful: a best practices guide
INC2: To design educational activities using Wikipedia, it would be helpful: getting instruction from a colleague
INC3: To design educational activities using Wikipedia, it would be helpful: getting specific training
INC4: To design educational activities using Wikipedia, it would be helpfull: greater institutional recognition

Experience EXP1: I consult Wikipedia for issues related to my field of expertise
EXP2: I consult Wikipedia for other academic related issues
EXP3: I consult Wikipedia for personal issues
EXP4: I contribute to Wikipedia (editions, revisions, articles improvement…)
EXP5: I use wikis to work with my students

Data Cleaning

We will first take a look at our data and see if there are any unusual patterns.

setwd("C:/Users/crzys/Documents")
wiki = read.csv("wiki4HE.csv", header=T, sep=";", na.strings="?")
summary(wiki)
##       AGE            GENDER          DOMAIN           PhD        
##  Min.   :23.00   Min.   :0.000   Min.   :1.000   Min.   :0.0000  
##  1st Qu.:36.00   1st Qu.:0.000   1st Qu.:2.000   1st Qu.:0.0000  
##  Median :42.00   Median :0.000   Median :5.000   Median :0.0000  
##  Mean   :42.25   Mean   :0.425   Mean   :4.098   Mean   :0.4644  
##  3rd Qu.:47.00   3rd Qu.:1.000   3rd Qu.:6.000   3rd Qu.:1.0000  
##  Max.   :69.00   Max.   :1.000   Max.   :6.000   Max.   :1.0000  
##                                  NA's   :2                       
##     YEARSEXP       UNIVERSITY     UOC_POSITION   OTHER_POSITION 
##  Min.   : 0.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 5.00   1st Qu.:1.000   1st Qu.:6.000   1st Qu.:1.000  
##  Median :10.00   Median :1.000   Median :6.000   Median :2.000  
##  Mean   :10.87   Mean   :1.124   Mean   :5.406   Mean   :1.589  
##  3rd Qu.:15.00   3rd Qu.:1.000   3rd Qu.:6.000   3rd Qu.:2.000  
##  Max.   :43.00   Max.   :2.000   Max.   :6.000   Max.   :2.000  
##  NA's   :23                      NA's   :113     NA's   :261    
##   OTHERSTATUS       USERWIKI           PU1             PU2      
##  Min.   :1.000   Min.   :0.0000   Min.   :1.000   Min.   :1.00  
##  1st Qu.:2.000   1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:2.00  
##  Median :4.000   Median :0.0000   Median :3.000   Median :3.00  
##  Mean   :4.209   Mean   :0.1375   Mean   :3.138   Mean   :3.15  
##  3rd Qu.:7.000   3rd Qu.:0.0000   3rd Qu.:4.000   3rd Qu.:4.00  
##  Max.   :7.000   Max.   :1.0000   Max.   :5.000   Max.   :5.00  
##  NA's   :540     NA's   :4        NA's   :7       NA's   :11    
##       PU3            PEU1            PEU2            PEU3      
##  Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.00   1st Qu.:4.000   1st Qu.:4.000   1st Qu.:3.000  
##  Median :3.00   Median :5.000   Median :4.000   Median :3.000  
##  Mean   :3.45   Mean   :4.356   Mean   :4.046   Mean   :3.384  
##  3rd Qu.:4.00   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:4.000  
##  Max.   :5.00   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :5      NA's   :4       NA's   :14      NA's   :97     
##       ENJ1            ENJ2            Qu1             Qu2       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000  
##  Median :4.000   Median :4.000   Median :3.000   Median :3.000  
##  Mean   :3.795   Mean   :3.821   Mean   :3.195   Mean   :3.422  
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :7       NA's   :17      NA's   :7       NA's   :10     
##       Qu3             Qu4             Qu5             Vis1      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :3.000   Median :3.000   Median :3.000   Median :3.000  
##  Mean   :2.981   Mean   :3.238   Mean   :3.042   Mean   :2.945  
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :15      NA's   :22      NA's   :29      NA's   :72     
##       Vis2            Vis3            Im1             Im2       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:3.000  
##  Median :3.000   Median :2.000   Median :2.000   Median :3.000  
##  Mean   :3.069   Mean   :2.027   Mean   :2.478   Mean   :3.295  
##  3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :117     NA's   :8       NA's   :22      NA's   :20     
##       Im3             SA1             SA2            SA3       
##  Min.   :1.000   Min.   :1.000   Min.   :1.00   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:4.000   1st Qu.:4.00   1st Qu.:4.000  
##  Median :3.000   Median :4.000   Median :4.00   Median :5.000  
##  Mean   :2.888   Mean   :4.191   Mean   :4.13   Mean   :4.384  
##  3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:5.00   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.00   Max.   :5.000  
##  NA's   :57      NA's   :11      NA's   :12     NA's   :11     
##       Use1            Use2            Use3            Use4      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :1.000   Median :3.000   Median :3.000  
##  Mean   :2.116   Mean   :1.831   Mean   :2.662   Mean   :2.554  
##  3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:4.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :14      NA's   :17      NA's   :9       NA's   :23     
##       Use5            Pf1             Pf2             Pf3       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :3.000   Median :2.000   Median :3.000   Median :2.000  
##  Mean   :3.305   Mean   :2.274   Mean   :2.861   Mean   :2.551  
##  3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :15      NA's   :11      NA's   :6       NA's   :14     
##       JR1             JR2             BI1             BI2      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.00  
##  1st Qu.:3.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.00  
##  Median :4.000   Median :3.000   Median :3.000   Median :3.00  
##  Mean   :3.699   Mean   :3.108   Mean   :2.952   Mean   :2.99  
##  3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.00  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.00  
##  NA's   :27      NA's   :53      NA's   :32      NA's   :43    
##       Inc1            Inc2            Inc3            Inc4     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.00  
##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.00  
##  Median :4.000   Median :4.000   Median :3.000   Median :4.00  
##  Mean   :3.746   Mean   :3.461   Mean   :3.442   Mean   :3.49  
##  3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.00  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.00  
##  NA's   :35      NA's   :35      NA's   :37      NA's   :42    
##       Exp1            Exp2            Exp3            Exp4      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:1.000  
##  Median :3.000   Median :4.000   Median :4.000   Median :1.000  
##  Mean   :3.001   Mean   :3.492   Mean   :3.651   Mean   :1.588  
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:2.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :13      NA's   :11      NA's   :13      NA's   :14     
##       Exp5      
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :2.000  
##  Mean   :2.487  
##  3rd Qu.:4.000  
##  Max.   :5.000  
##  NA's   :13

There are missing values for a lot of the variables but seems that “OtherStatus” has the most amount of missing values. The problem of missing data is relatively common in almost all research and can have a significant effect on the conclusions that can be drawn from the data. For this reason, we will need to remedy the missing values. We simply cannot just omit the observations with a missing value as this would end up having the possibly uninteresting and incomplete variables dictate who gets to stay in the sample and cause inaccuracte conclusions.

We can visualize the missing values and create a plot of the observations and variables.

library(visdat)
vis_dat(wiki)

vis_miss(wiki)

Missing Completely at Random (MCAR): There’s no relationship between whether a data point is missing and any values in the data set, missing or observed.

Missing at Random (MAR): The propensity for a data point to be missing is not related to the missing data, but it is related to some of the observed data. The missing data are just a random subset of the data.

Missing Not at Random (MNAR): Data that is neither MAR nor MCAR (i.e. the value of the variable that’s missing is related to the reason it’s missing).

From the above definitions, it appears we have MNAR data or in other words systematic missing data. This is evident based on the above graphs where 59.15% of the missing data comes from the “OTHERSTATUS” variable. This variable describes the “work as part-time in another university and UPF members”. This is a poorly designed question as it is not applicable to most the faculty members and therefore they did not respond. As a result, we will remove this variable from the data to help get rid of the systematic missing data.

wiki = wiki[,-9]
vis_dat(wiki)

vis_miss(wiki)