How can you ensure data protection and security during collection, storage, and transfer?
Data collection may consist of the re-use of existing data and/or the generation of new data.
For data to be considered valid and reliable, data collection should occur consistently and systematically throughout the course of the research project. Within disciplines, there are established methodologies, procedures and techniques that help researchers ensure high quality of collected data. In general, important aspects of data collection include:
- Standardisation: codebooks & protocols
- Structure / organisation of the data
- Data quality assurance methods
- Documentation & metadata
- Storage & protection
Systematic data collection is essential for ensuring the reproducibility of research. When data is collected in a consistent and organized manner, it improves the quality and reliability of the research, making the data easier to share and reproduce by others. High-quality data also contributes to making data FAIR (Findable, Accessible, Interoperable, and Reusable), as well-organized and well-documented data is more likely to be reused effectively. The principles of making data FAIR are discussed in detail under the topic FAIR Principles.
Data Collection Tools
The tools being used in research to collect data are immensely diverse. For that reason, we will not provide an exhaustive overview here. What is important for data collection tools in relation to RDM is where such tools store the data that you collect and in which format. The storage location is particularly important when you are working with personal data. For example, the privacy legislation in the United States is very different from the European General Data Protection Regulation (GDPR). Hence, personal data collected in a Dutch research institute may not be stored on American servers. It is important to keep that in mind when you are contemplating which tool to use for your data collection.
If you are collecting personal data and you decide to use a tool for which no contract exists between VU Amsterdam and the provider of the software or tool, a service agreement and a processing agreement must be drawn up. Contact the 🔒 privacy champion of your faculty for more information and a model processing agreement.
Questionnaire tools
The Faculty of Behavioural and Movement Sciences has developed a document with tips for safe use of the questionnaire tools Qualtrics and Survalyzer. The document was made for FGB researchers specifically but can also be helpful for others. Consult this document if you need a questionnaire tool to collect your data.
Data Collection in Collaboration
Some research projects involve the participation of multiple organisations or institutes and may include even cross-border co-operation. When data is collected by several organisations, a Data Management Plan should provide information on who is responsible for which part of the data collection and storage. It should also provide information on how specific data collections are related to which part(s) of the research goal(s). Describing this precisely will help you to determine if a consortium agreement or joint controller agreement is necessary. You see a general example of such a specification in the table below:
| Data Stage | Dataset Description | Responsible Organization for Collection | Data Origin | Data Purpose |
|---|---|---|---|---|
| Raw data | Community-level surveys | VU Amsterdam | Amsterdam, The Hague, Rotterdam | Identifying perceived problems; System responsiveness |
| Raw data | Trials & Focus Group Interviews | London School of Hygiene and Tropical Medicine (LSHTM) | Germany, Switzerland | Program evaluation trials; Focus group interviews to identify barriers |
| Raw data | Pollution measurements using fish | Oceanographic Institute of Sweden | Coastal waters, Northeast Spain | Establish plastic pollution levels |
Data Collection Protocols
Regardless of the field of study or preference for defining data (quantitative, qualitative), accurate data collection is essential to maintaining the integrity (structure) of research. Both the selection of appropriate data collection instruments (existing, modified, or newly developed) and clearly delineated instructions for their correct use reduce the likelihood of errors.
There are two approaches for reducing and/or detecting errors in data which can help to preserve the integrity of your data and ensure scientific validity. These are:
- Quality assurance - activities that take place before data collection begins
- Quality control - activities that take place during and after data collection
Quality assurance precedes data collection and its main focus is ‘prevention’ (i.e., forestalling problems with data collection). Prevention is the most cost-effective activity to ensure the integrity of data collection. This proactive measure is best demonstrated by the standardization of protocol developed in a comprehensive and detailed procedures manual for data collection.
While quality control activities (detection/monitoring and action) occur during and after data collection, the details should be carefully documented in the procedures manual. A clearly defined communication structure is a necessary pre-condition for monitoring and tracking down errors. Quality control also identifies the required responses, or ‘actions’ necessary to correct faulty data collection practices and also minimise future occurrences.
Some sources for protocols:
- HANDS Handbook for Adequate Natural Data Stewardship by the Federation of Dutch University Medical Centers (UMCs)
- Protocols.io - an open access repository of protocols
- Springer Protocols - free and subscribed protocols collected by Springer.
VU Amsterdam offers several options to store your digital research data. The choice for a specific option depends on factors such as:
- Does a project involve multiple organisations or departments?
- The sensitivity of the data: does it involve personal data or copyrighted / commercial data?
- Are there any research partners with whom data need to be shared?
- Are any commercial parties involved?
- Does the research project involve multiple locations (inside or maybe even outside the EU)?
- Will there be (lab) devices producing data that need to be stored as well?
- What will be the volume of the data?
- Where will you run your analysis software?
- Are there enough funds available to store all the data?
Types of storage
There are 3 general types of digital storage:
Local storage on computers or servers.
These offer the fastest access, you can run analysis tools directly on the data.Networked storage
This is storage directly attached to the local network. Less performant than local storage, but still fast enough to run analysis tools directly on the data.Cloud storage
This storage is accessible over the internet from all over the world. Speed is limited by your and the supplier’s internet connections. In general it is not possible to run analysis tools directly on the data, you need to download data to your PC or a server first. Some cloud services offer functionality to automatically sync data, for others you have to download and upload manually or script it.
Storage services offered by VU Amsterdam
The VU offers a few storage services that can be used by every VU researcher. The Storage finder is a tool that will help you select storage platforms suitable for your project. For more individual guidance, please get in touch with the Research Data Management Support Desk for advice, particularly when you are working with commercial, personal or otherwise sensitive data, or when you have a complex IT setup.
General data-storage
VU IT offers 2 cloud services to store data:
🔒 OneDrive: personal cloud storage for all VU employees and part of the Microsoft 365 platform. OneDrive allows you to store files locally and in the Microsoft cloud, and share folders and documents with colleagues. Since this is personal storage, tied to someone’s personal VU account, we don’t usually recommend storing research data in OneDrive: if the account holder leaves VU Amsterdam, the account and all the data on it, disappear.
🔒 Teams. Faculties, divisions and departments have their own Team - part of the Microsoft 365 platform - where they store shared documents and where they can interact and chat. Projects may also request a project Team, but note that Teams is not always the best location to store your research data and has several limitations, especially when it comes to working with non-Microsoft file formats, large volumes of data, interacting with data, and collaborating with partners outside of VU Amsterdam. Contact the RDM Support Desk to find out more about the suitability of Teams for your project.
Research data-specific storage options
VU Amsterdam also offers storage specifically for research data:
SciStor is networked storage hosted by IT for Research (ITvO) suitable for large volumes of (sensitive) data. With SciStor you have fast access to your data on campus, the ADA high performance compute cluster and SciCloud servers, but it can’t be used for external collaboration.
Yoda is a cloud storage platform at SURF (Dutch IT cooperative of education and research) and is suitable for storing small to very large volumes of (sensitive) data. Yoda supports external collaboration. It has an integrated archiving and publishing facility, making it a one-stop platform.
SURF Research Drive is a cloud storage platform at SURF for research projects suitable for storing small to medium (~10TB) volumes of (sensitive) data. Research Drive supports external collaboration. It offers a sync client for easier up- and downloading of data.
OSF is an online platform especially suitable for small data volumes that you want to share with the public during the research project.
Using other (cloud) storage solutions
When selecting a cloud-based service it is important to remember to check where the data will be hosted. If the research project involves sensitive data it may be necessary to choose cloud-based options that guarantee that the data will stay in the EEA or on servers based in the EEA. make sure access to the data is not tied to just one account to avoid getting locked out of important data.
Keep in mind the usual (free) commercial cloud offerings such as Google Drive or Dropbox are not suitable for sensitive data at all.
When in doubt about the suitability of a cloud storage solution please contact the RDM Support Desk.
Data transfer
The Data Transfer topic page provides tips on collecting and moving data.
What is Data Protection?
Protection from what? From whom? When, and why? Before we talk about data protection, let us consider security first. More often than not, ‘security’ is regarded as a fixed state. In reality, security is an assessment of the level of protection against a certain threat, that you consider to deal with that threat adequately enough. Whether or not security is accurate depends on the value of the data and the quality of protective measures.
The question for you as a researcher is ‘when are the measures that you take secure enough?’. In order to answer this, please be aware that there are three entities that have an opinion about what is ‘secure enough’, namely: the law, the University, and you yourself as the data processor.
The University has a Security Baseline that sets a norm for levels of protection for every application it uses. The Baseline is based on international standards. For each of these applications, the University is considering for which means the security of these applications are adequate enough.
The legal requirements for the processing of personal data can be found in the section ‘GDPR and Privacy’ under Plan & Design There are additional laws and regulations as well. The assumption is that you are familiar with these, especially with laws regulating medical and criminal research.
What you personally consider to be secure might be very different from what your colleagues, the Faculty or the University considers to be secure enough and the norms will vary with the variety of data that is being processed by different researchers and Faculties of VU Amsterdam. Very generally speaking, there are three points of protection to consider:
- Protection against data loss, for which you need a back up periodically.
- Protection against data leakage, for which you need to consider all storage places and their access points.
- Protection of data integrity, for which you need version control and synchronisation management.
The security of your protection measures depends on the threat you face. We often think of threats as active, and motivated by bad intentions. But most common forms of data loss are accidental and most leakage is caused by trusting others. In reality, devices just get lost or break down, people download malware by accident, and each one of us forgets to save a document at times or gets confused about which version was last updated.
In all cases, protection starts with oversight on where your data is stored and processed. If you forget that you temporarily stored it in a certain place, you have then lost oversight of where that data is. The opposite is also true: if you know where you data is, you have insight in the level of security of the space in which you store it. As you can see, protection begins with organising your work in a reliable manner and thinking through your steps.
For example, if you data is on your laptop and synchronised with your phone, then it is stored in two places. Perhaps this is enough back up, perhaps not. If you put both you devices in the same bag and you lose your bag, you have no backup. A backup to an online storage might be a good solution, but might also mean your data leaks via the internet of via the storage provider who sells the data and your behavioural data for profit. Most importantly, there is no absolute security. It is best if you consider your personal behaviour and then think of scenarios that are more or less likely to happen and what would impact you most. If you frequently work in public places you should make it a habit to lock your device each time you leave it. If you eat and drink behind your desk often, better work with a remote keyboard to protect your laptop from the unavoidable coffee shower. Do you save your respondents’ contact details on your personal phone? Then protect it with a pin.
Here are some basic protection guidelines:
- Data are very difficult to erase. You have probably never done it.
- Decide how to back up data and test it before you rely on it.
- Do not give others your log-in credentials. If you have done so and your family members use your work device, then change it.
- Do not use passwords twice, do not use your birthday, initials, streetname, hobby.
- Encryption sounds secure, but it fails completely without good password management.
Data Protection
There can be many reasons why the data of a project needs to be kept protected:
- Sensitivity of the data collected
- Protection of the research data from competition
- Commercial reasons / Intellectual property
- Etc.

There are also many levels of security that may be implemented, depending on the needs. Sometimes it will be enough to use a password-protected cloud-based server. In extreme cases encryption may be needed and also when data is transmitted between researchers or organisations. You should contact the RDM Support Desk to discuss available options, who may connect you to legal experts where sensitive data is concerned. Check the Data Storage topic for links to find out more on campus solutions and cloud-based options.
See also the Safe Data Transfer topic for more information on how to transport and transfer data.