The objective of this project is to analyze a dataset on university faculty perceptions and practices of using Wikipedia as a teaching resource. The data is from a survey sent to part-time and full-time professors at two Spanish universities in 2012-2013: Universitat Oberta de Catalunya (UOC) and Universitat Pompeu Fabra (UPF). First, using PCA, we will identify any relationships between the survey items, as well as whether the survey items cluster according to any of the teachers’ attributes. There are many variables with missing values, some systematic and some at-random. We will need to think of a way to remedy the missing values. After this, we will use classification techniques such as logistic regression, LDA, QDA, and/or kNN, to predict the “use behavior” of Wikipedia by teachers based on the teachers’ attributes and responses to survey items. Although these techniques each have their own limitations - KNN has a high computational cost, Logistic Regression difficult to interpret individual variable’s effect, and LDA/QDA will not perform well if assumptions are not met.
The data set we will be using pertains to ongoing research on university faculty perceptions and practices of using Wikipedia as a teaching resource. Based on a Technology Acceptance Model, the relationships within the internal and external constructs of the model are analyzed. Both the perception of colleagues’ opinion about Wikipedia and the perceived quality of the information in Wikipedia play a central role in the obtained model.
AGE: numeric
GENDER: 0=Male; 1=Female
DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics; 6=Social Science
PhD: 0=No; 1=Yes
YEARSEXP (years of university teaching experience): numeric
UNIVERSITY: 1=UOC; 2=UPF
UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
OTHER_POSITION (main job in another university for part-time members): 1=Yes; 2=No
OTHERSTATUS (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
USERWIKI (Wikipedia registered user): 0=No; 1=Yes
The following survey items are Likert scale (1-5) ranging from strongly disagree / never (1) to strongly agree / always (5)
Perceived Usefulness
PU1: The use of Wikipedia makes it easier for students to develop new skills
PU2: The use of Wikipedia improves students’ learning
PU3: Wikipedia is useful for teaching
Perceived Ease of Use
PEU1: Wikipedia is user-friendly
PEU2: It is easy to find in Wikipedia the information you seek
PEU3: It is easy to add or edit information in Wikipedia
Perceived Enjoyment
ENJ1: The use of Wikipedia stimulates curiosity
ENJ2: The use of Wikipedia is entertaining
Quality
QU1: Articles in Wikipedia are reliable
QU2: Articles in Wikipedia are updated
QU3: Articles in Wikipedia are comprehensive
QU4: In my area of expertise, Wikipedia has a lower quality than other educational resources
QU5: I trust in the editing system of Wikipedia
Visibility
VIS1: Wikipedia improves visibility of students’ work
VIS2: It is easy to have a record of the contributions made in Wikipedia
VIS3: I cite Wikipedia in my academic papers
Social Image
IM1: The use of Wikipedia is well considered among colleagues
IM2: In academia, sharing open educational resources is appreciated
IM3: My colleagues use Wikipedia
Sharing attitude
SA1: It is important to share academic content in open platforms
SA2: It is important to publish research results in other media than academic journals or books
SA3: It is important that students become familiar with online collaborative environments
Use behaviour
USE1: I use Wikipedia to develop my teaching materials
USE2: I use Wikipedia as a platform to develop educational activities with students
USE3: I recommend my students to use Wikipedia
USE4: I recommend my colleagues to use Wikipedia
USE5: I agree my students use Wikipedia in my courses
Profile 2.0
PF1: I contribute to blogs
PF2: I actively participate in social networks
PF3: I publish academic content in open platforms
Job relevance
JR1: My university promotes the use of open collaborative environments in the Internet
JR2: My university considers the use of open collaborative environments in the Internet as a teaching merit
Behavioral intention
BI1: In the future I will recommend the use of Wikipedia to my colleagues and students
BI2: In the future I will use Wikipedia in my teaching activity
Incentives
INC1: To design educational activities using Wikipedia, it would be helpful: a best practices guide
INC2: To design educational activities using Wikipedia, it would be helpful: getting instruction from a colleague
INC3: To design educational activities using Wikipedia, it would be helpful: getting specific training
INC4: To design educational activities using Wikipedia, it would be helpfull: greater institutional recognition
Experience EXP1: I consult Wikipedia for issues related to my field of expertise
EXP2: I consult Wikipedia for other academic related issues
EXP3: I consult Wikipedia for personal issues
EXP4: I contribute to Wikipedia (editions, revisions, articles improvement…)
EXP5: I use wikis to work with my students
We will first take a look at our data and see if there are any unusual patterns.
setwd("C:/Users/crzys/Documents")
wiki = read.csv("wiki4HE.csv", header=T, sep=";", na.strings="?")
summary(wiki)
## AGE GENDER DOMAIN PhD
## Min. :23.00 Min. :0.000 Min. :1.000 Min. :0.0000
## 1st Qu.:36.00 1st Qu.:0.000 1st Qu.:2.000 1st Qu.:0.0000
## Median :42.00 Median :0.000 Median :5.000 Median :0.0000
## Mean :42.25 Mean :0.425 Mean :4.098 Mean :0.4644
## 3rd Qu.:47.00 3rd Qu.:1.000 3rd Qu.:6.000 3rd Qu.:1.0000
## Max. :69.00 Max. :1.000 Max. :6.000 Max. :1.0000
## NA's :2
## YEARSEXP UNIVERSITY UOC_POSITION OTHER_POSITION
## Min. : 0.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 5.00 1st Qu.:1.000 1st Qu.:6.000 1st Qu.:1.000
## Median :10.00 Median :1.000 Median :6.000 Median :2.000
## Mean :10.87 Mean :1.124 Mean :5.406 Mean :1.589
## 3rd Qu.:15.00 3rd Qu.:1.000 3rd Qu.:6.000 3rd Qu.:2.000
## Max. :43.00 Max. :2.000 Max. :6.000 Max. :2.000
## NA's :23 NA's :113 NA's :261
## OTHERSTATUS USERWIKI PU1 PU2
## Min. :1.000 Min. :0.0000 Min. :1.000 Min. :1.00
## 1st Qu.:2.000 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:2.00
## Median :4.000 Median :0.0000 Median :3.000 Median :3.00
## Mean :4.209 Mean :0.1375 Mean :3.138 Mean :3.15
## 3rd Qu.:7.000 3rd Qu.:0.0000 3rd Qu.:4.000 3rd Qu.:4.00
## Max. :7.000 Max. :1.0000 Max. :5.000 Max. :5.00
## NA's :540 NA's :4 NA's :7 NA's :11
## PU3 PEU1 PEU2 PEU3
## Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.00 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:3.000
## Median :3.00 Median :5.000 Median :4.000 Median :3.000
## Mean :3.45 Mean :4.356 Mean :4.046 Mean :3.384
## 3rd Qu.:4.00 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.00 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :5 NA's :4 NA's :14 NA's :97
## ENJ1 ENJ2 Qu1 Qu2
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :3.000 Median :3.000
## Mean :3.795 Mean :3.821 Mean :3.195 Mean :3.422
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :7 NA's :17 NA's :7 NA's :10
## Qu3 Qu4 Qu5 Vis1
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000 Median :3.000
## Mean :2.981 Mean :3.238 Mean :3.042 Mean :2.945
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :15 NA's :22 NA's :29 NA's :72
## Vis2 Vis3 Im1 Im2
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:3.000
## Median :3.000 Median :2.000 Median :2.000 Median :3.000
## Mean :3.069 Mean :2.027 Mean :2.478 Mean :3.295
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :117 NA's :8 NA's :22 NA's :20
## Im3 SA1 SA2 SA3
## Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
## 1st Qu.:2.000 1st Qu.:4.000 1st Qu.:4.00 1st Qu.:4.000
## Median :3.000 Median :4.000 Median :4.00 Median :5.000
## Mean :2.888 Mean :4.191 Mean :4.13 Mean :4.384
## 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:5.00 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000
## NA's :57 NA's :11 NA's :12 NA's :11
## Use1 Use2 Use3 Use4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :1.000 Median :3.000 Median :3.000
## Mean :2.116 Mean :1.831 Mean :2.662 Mean :2.554
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :14 NA's :17 NA's :9 NA's :23
## Use5 Pf1 Pf2 Pf3
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.000
## Median :3.000 Median :2.000 Median :3.000 Median :2.000
## Mean :3.305 Mean :2.274 Mean :2.861 Mean :2.551
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :15 NA's :11 NA's :6 NA's :14
## JR1 JR2 BI1 BI2
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.00
## Median :4.000 Median :3.000 Median :3.000 Median :3.00
## Mean :3.699 Mean :3.108 Mean :2.952 Mean :2.99
## 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.00
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.00
## NA's :27 NA's :53 NA's :32 NA's :43
## Inc1 Inc2 Inc3 Inc4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.00
## Median :4.000 Median :4.000 Median :3.000 Median :4.00
## Mean :3.746 Mean :3.461 Mean :3.442 Mean :3.49
## 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.00
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.00
## NA's :35 NA's :35 NA's :37 NA's :42
## Exp1 Exp2 Exp3 Exp4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:1.000
## Median :3.000 Median :4.000 Median :4.000 Median :1.000
## Mean :3.001 Mean :3.492 Mean :3.651 Mean :1.588
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:2.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :13 NA's :11 NA's :13 NA's :14
## Exp5
## Min. :1.000
## 1st Qu.:1.000
## Median :2.000
## Mean :2.487
## 3rd Qu.:4.000
## Max. :5.000
## NA's :13
There are missing values for a lot of the variables but seems that “OtherStatus” has the most amount of missing values. The problem of missing data is relatively common in almost all research and can have a significant effect on the conclusions that can be drawn from the data. For this reason, we will need to remedy the missing values. We simply cannot just omit the observations with a missing value as this would end up having the possibly uninteresting and incomplete variables dictate who gets to stay in the sample and cause inaccuracte conclusions.
We can visualize the missing values and create a plot of the observations and variables.
library(visdat)
vis_dat(wiki)
vis_miss(wiki)
Missing Completely at Random (MCAR): There’s no relationship between whether a data point is missing and any values in the data set, missing or observed.
Missing at Random (MAR): The propensity for a data point to be missing is not related to the missing data, but it is related to some of the observed data. The missing data are just a random subset of the data.
Missing Not at Random (MNAR): Data that is neither MAR nor MCAR (i.e. the value of the variable that’s missing is related to the reason it’s missing).
From the above definitions, it appears we have MNAR data or in other words systematic missing data. This is evident based on the above graphs where 59.15% of the missing data comes from the “OTHERSTATUS” variable. This variable describes the “work as part-time in another university and UPF members”. This is a poorly designed question as it is not applicable to most the faculty members and therefore they did not respond. As a result, we will remove this variable from the data to help get rid of the systematic missing data.
wiki = wiki[,-9]
vis_dat(wiki)
vis_miss(wiki)