Difference between revisions of "Goherr: Fish consumption study"

From Testiwiki
Jump to: navigation, search
(Questionnaire: Column names improved and made unique)
(Preprocess: webropol.convert and merge.questions were moved to OpasnetUtils/Drafts. Other improvements in code)
Line 85: Line 85:
 
This code is used to preprocess the original questionnaire data from the above .csv file and to store the data as a usable variable to Opasnet base.
 
This code is used to preprocess the original questionnaire data from the above .csv file and to store the data as a usable variable to Opasnet base.
  
<rcode name="preprocess" label="Preprocess (only for developers)">
+
<rcode name="preprocess2" label="Preprocess (only for developers)">
 +
# This code is Op_en7749/preprocess2 on page [[Goherr: Fish consumption study]]
 +
 
 
library(OpasnetUtils)
 
library(OpasnetUtils)
 
+
objects.latest("Op_en6007", code_name = "answer") # [[OpasnetUtils/Drafts]] webropol.convert, merge.questions
############## Generic functions and objects are defined first.
 
 
 
### webropol.convert converts a csv file from Webropol into a useful data.frame.
 
 
 
webropol.convert <- function(
 
  data, # Data.frame created from a Webropol csv file. The first row should contain headings.
 
  rowfact, # Row number where the factor levels start (in practice, last row + 3)
 
  textmark = "Other open" # The text that is shown in the heading if there is an open sub-question.
 
) {
 
  out <- dropall(data[2:(rowfact - 3) , ])
 
  subquestion <- t(data[1 , ])
 
  subquestion <- gsub("\xa0", " ", subquestion)
 
  subquestion <- gsub("\xb4", " ", subquestion)
 
  subquestion <- gsub("\n", " ", subquestion)
 
  #  subquestion <- gsub("\\(", " ", subquestion)
 
  #  subquestion <- gsub("\\)", " ", subquestion)
 
  textfield <- regexpr(textmark, subquestion) != -1
 
  subquestion <- strsplit(subquestion, ":") # Divide the heading into a main question and a subquestion.
 
  subqtest <- 0 # The previous question name.
 
  for(i in 1:ncol(out)) {
 
    #print(i)
 
    if(subquestion[[i]][1] != subqtest) { # If part of previous question, use previous fact.
 
      fact <- as.character(data[rowfact:nrow(data) , i]) # Create factor levels from the end of Webropol file.
 
      fact <- fact[fact != ""] # Remove empty rows
 
      fact <- gsub("\xa0", " ", fact)
 
      fact <- gsub("\xb4", " ", fact)
 
      fact <- gsub("\n", " ", fact)
 
      fact <- strsplit(fact, " = ") # Separate value (level) and interpretation (label)
 
    }
 
    if(length(fact) != 0 & !textfield[i]) { # Do this only if the column is not a text type column.
 
      out[[i]] <- factor(
 
        out[[i]],
 
        levels = unlist(lapply(fact, function(x) x[1])),
 
        labels = unlist(lapply(fact, function(x) x[2])),
 
        ordered = TRUE
 
      )
 
    }
 
    subqtest <- subquestion[[i]][1]
 
  }
 
  return(out)
 
}
 
 
 
# merge.questions takes a multiple checkbox question and merges that into a single factor.
 
# First levels in levs have priority over others, if several levels apply to a row.
 
 
 
merge.questions <- function(
 
  dat, # data.frame with questionnaire data
 
  cols, # list of vectors of column names or numbers to be merged into one level in the factor
 
  levs, # vector (with the same length as cols) of levels of factors into which questions are merged.
 
  name # text string for the name of the new factor column in the data.
 
) {
 
  for(i in length(cols):1) {
 
    temp <- FALSE
 
    for(j in rev(cols[[i]])) {
 
      temp <- temp | !is.na(dat[[j]])
 
    }
 
    dat[[name]][temp] <- levs[i]
 
  }
 
  dat[[name]] <- factor(dat[[name]], levels = levs, ordered = TRUE)
 
  return(dat)
 
}
 
  
 
############# Data preprocessing
 
############# Data preprocessing
Line 156: Line 97:
 
#Survey original file: N:/Ymal/Projects/Goherr/WP5/Goherr_fish_consumption.csv
 
#Survey original file: N:/Ymal/Projects/Goherr/WP5/Goherr_fish_consumption.csv
  
survey <- opasnet.csv("5/57/Goherr_fish_consumption.csv", sep = ";", fill = TRUE, quote = "\"")
+
survey <- opasnet.csv(
surcol <- t(survey[1,])
+
  "5/57/Goherr_fish_consumption.csv",
 +
  wiki = "opasnet_en", sep = ";", fill = TRUE, quote = "\""
 +
)
 +
#survey <- re#ad.csv(file = "N:/Ymal/Projects/Goherr/WP5/Goherr_fish_consumption.csv",
 +
#                  header=FALSE, sep=";", fill = TRUE, quote="\"")
  
survey <- webropol.convert(survey, 2121, textmark = ":Other open") # Data file is converted to data.frame using levels at row 1269.
+
# Data file is converted to data.frame using levels at row 2121.
 +
survey <- webropol.convert(survey, 2121, textmark = ":Other open")  
  
 
# Take the relevant columnames from the table on the page.
 
# Take the relevant columnames from the table on the page.
colframe <- opbase.data("Op_en7749", subset = "Questions in the Goherr questionnaire")
+
colnames(survey) <- gsub(" ",  ".",
colnames(survey) <- colframe$Result
+
  opbase.data("Op_en7749", subset = "Questions in the Goherr questionnaire")$Result[1:ncol(survey)]
 +
)
 +
survey$Row <- 1:nrow(survey)
 +
survey$Weighting <- as.double(gsub(",",".", survey$Weighting))
 +
survey$Ages <- factor(
 +
  ifelse(as.numeric(as.character(survey$Age)) < 46, "18-45",">45"),
 +
  levels = c("18-45", ">45"), ordered = TRUE
 +
)
  
survey$Row <- 1:nrow(survey)
+
# webropol.convert should put these in the right order but doesn't. So do it manually.
survey$Weighting <- as.double(levels(survey$Weighting))[survey$Weighting]
 
survey$Ages <- ifelse(as.numeric(as.character(survey$Age)) < 46, "18-45",">45")
 
  
 
freqlist <- c(
 
freqlist <- c(
Line 186: Line 137:
 
   "full plate (300 grams)",
 
   "full plate (300 grams)",
 
   "overly full plate (500 grams)"
 
   "overly full plate (500 grams)"
#  "Not able to estimate"
+
  #  "Not able to estimate"
 
)
 
)
  
Line 195: Line 146:
 
   "2/3 plate (200 grams)",
 
   "2/3 plate (200 grams)",
 
   "5/6 plate (250 grams)"
 
   "5/6 plate (250 grams)"
#  "Not able to estimate"
+
  #  "Not able to estimate"
 
)
 
)
 +
 +
fishamounts <- c(29,46:49,95:98)
 +
colnames(survey)[fishamounts]
 +
#[1] "How.often.fish"    "How.often.BS"      "How.much.BS"      "How.often.side.BS"
 +
#[5] "How.much.side.BS"  "How.often.BH"      "How.much.BH"      "How.often.side.BH"
 +
#[9] "How.much.side.BH"
  
 
ansl <- list(
 
ansl <- list(
   `How often eat fish` = freqlist,
+
   freqlist,
   `How often Baltic salmon` = c("Never", freqlist),
+
   c("Never", freqlist),
   `How much Baltic salmon` = amlist,
+
   amlist,
   `How often side Baltic salmon` = c("Never", freqlist),
+
   c("Never", freqlist),
   `How much side Baltic salmon` = sidel,
+
   sidel,
   `How often Baltic herring` = c("Never", freqlist),
+
   c("Never", freqlist),
   `How much Baltic herring` = amlist,
+
   amlist,
   `How often side Baltic herring` = c("Never", freqlist),
+
   c("Never", freqlist),
   `How much side Baltic herring` = sidel
+
   sidel
 
)
 
)
  
for (i in names(ansl)) {
+
for (i in 1:length(fishamounts)) {
   survey[[i]] <- factor(survey[[i]], levels = ansl[[i]])
+
   survey[[fishamounts[i]]] <- factor(survey[[fishamounts[i]]], levels = ansl[[i]], ordered = TRUE)
 
}
 
}
  
objects.store(survey, surcol)
+
oprint(head(survey))
cat("objects survey, surcol were stored.\n")
 
  
 +
objects.store(survey)
 +
cat("Data.frame survey was stored.\n")
 
</rcode>
 
</rcode>
  

Revision as of 08:36, 13 April 2017


Question

How Baltic herring and salmon are used as human food in Baltic sea countries? Which determinants affect on people’s eating habits of these fish species?

Answer

Survey data will be analysed during winter 2016-2017 and results will be updated here.

+ Show code

Rationale

Survey of eating habits of Baltic herring and salmon in Denmark, Estonia, Finland and Sweden has been done in September 2016 by Taloustutkimus oy. Content of the questionnaire can be accessed in Google drive. The actual data will be uploaded to Opasnet base on Octobere 2016.

The R-code to analyse the survey data will be provided on this page later on.

Data

Original datafile File:Goherr fish consumption.csv

Preprocessing

This code is used to preprocess the original questionnaire data from the above .csv file and to store the data as a usable variable to Opasnet base.

+ Show code

Analyses

Error creating thumbnail: Unable to save thumbnail to destination
Correlation matrix of all questions in the survey (answers converted to numbers).

Model must contain predictors such as country, gender, age etc. Maybe we should first study what determinants are important? Model must also contain determinants that would increase or decrease fish consumption. This should be conditional on the current consumption. How? Maybe we should look at principal coordinates analysis with all questions to see how they behave.

Also look at correlation table to see clusters.

Some obvious results:

  • If reports no fish eating, many subsequent answers are NA.
  • No vitamins correlates negatively with vitamin intake.
  • Unknown salmon correlates negatively with the types of salmon eaten.
  • Different age categories correlate with each other.

However, there are also meaningful negative correlations:

  • Country vs allergy
  • Country vs Norwegian salmon and Rainbow trout
  • Country vs not traditional.
  • Country vs recommendation awareness
  • Allergy vs economic wellbeing
  • Baltic salmon use (4 questions) vs Don't like taste and Not used to
  • All questions between Easy to cook ... Traditional dish

Meaningful positive correlations:

  • All questions between Baltic salmon ... Rainbow trout
  • How often Baltic salmon/herring/side salmon/side herring
  • How much Baltic salmon/herring/side salmon/side herring
  • Better availability ... Recommendation
  • All questions between Economic wellbeing...Personal aims
  • Omega3, Vitamin D, and Other vitamins

Study plan:

  • Determinants

+ Show code

Bayes model

  • Model run 3.3.2017. All variables assumed independent. [1]
  • Model run 3.3.2017. p has more dimensions. [2]
  • Model run 25.3.2017. Several model versions: strange binomial+multivarnormal, binomial, fractalised multivarnormal [3]
  • Model run 27.3.2017 [4]
  • Other models except multivariate normal were archived and removed from active code 29.3.2017.
  • Model run 29.3.2017 with raw data graphs [5]
  • Model run 29.3.2017 with salmon and herring ovariables stored [6]

+ Show code

Calculations

This code calculates how much (g/day) Baltic herring and salmon are eaten based on an Bayesian model build up based on the questionnaire data.

+ Show code

Assumptions

The following assumptions are used:

Assumptions for calculations(-)
ObsVariablevalueExplanationResult
1freq6times per year260 - 364
2freq5times per year104 - 208
3freq4times per year52
4freq3times per year12 - 36
5freq2times per year2 - 5
6freq1times per year0.5 - 0.9
7freq0times per year0
8amdish0grams / serving20 - 50
9amdish1grams / serving70 - 100
10amdish2grams / serving120 - 150
11amdish3grams / serving170 - 200
12amdish4grams / serving220 - 250
13amdish5grams / serving270 - 300
14amdish6grams / serving450 - 500
15ingridientfraction0.1 - 0.3
16amside0grams / serving20 - 50
17amside1grams / serving70 - 100
18amside2grams / serving120 - 150
19amside3grams / serving170 - 200
20amside4grams / serving220 - 250

Questionnaire


Dependencies

The survey data will be used as input in the benefit-risk assessment of Baltic herring and salmon intake, which is part of the WP5 work in Goherr-project.

Formula

See also

Keywords

References


Related files

<mfanonymousfilelist></mfanonymousfilelist>

Goherr: Fish consumption study. Opasnet . [7]. Accessed 17 May 2024.