Alternative solutions to Google and Dropbox
The trend to improve collaboration in academic research, often crossing international borders, was already underway pre-pandemic, and drove demand for digital tools enabling collaboration at a distance. Now, with face-to-face meetings often impossible even for colleagues formerly sharing the same physical workspace, the trend has intensified, and the demand increased.
There are several benefits of large-scale collaboration tools for researchers, mirroring the benefits of the internet as a whole. To be sure, collaborations among domain experts have always been productive and somewhat common. But whereas formerly, the range of communication and sharing tools was limited and clunky, today there are many options enabling more convenient cooperation.
In this post, I describe the benefits and drawbacks of some of these older tools. To address all the problems, we invented the Lateral App, a collaborative PDF document analysis tool enabling smart search and extraction of key concepts. Students and researchers can sign up today.
Larger groups demand smarter collaboration tools
As working group sizes grow, there are coordination challenges to be tackled. Among these is the problem we discuss in this post: How to ensure textual data is stored, shared and parsed efficiently amongst the group. There are three sub-aspects to this problem: How to store data centrally but make it accessible to each group member; How to make it efficiently searchable; and how to make sure the search results can be conveniently extracted and centrally saved. As we will see, there is a range of incumbent tools to address these issues — many of them cloud storage-based — but each addresses only a subset of the problems.
Drawbacks of cloud storage solutions: Dropbox, Google Drive, etc...
In the last eight years or so, better sharing tools leveraging cloud storage technology have entered the market, enabling researchers to more conveniently share large numbers of files, and have them located in a central place. For example, Google Drive and Dropbox each enable many people to access the same data via shared folders in the cloud. These tools are superior to email, in that data can be stored in a central location, which can then be instantly shared with new researchers, and enable the creation of centrally- and finely-controlled access permissions.
Though useful for centrally storing files, there are several problems with using cloud storage solutions like Dropbox and Google Drive as the primary tool for information management in a collaborative research project.
Hard drive problems can be cloud drive problems
They replicate some of the problems of storing data traditionally, on your hard drive. For example, unless every member of the team adheres to strict naming and folder structuring conventions, it is easy to lose track of what files are actually stored in the cloud solution. How many times have you had to re-download a paper from arxiv.org or pubmed that you are almost sure is already on your hard drive or on Google Drive, simply because you can’t find it again?
Retrieving subdocument information can be painful
Even if you have complete oversight on your data, searching and retrieving information you are sure you remember reading in one file often means opening and keyword searching all files, either because the storage solution does not enable simultaneous keyword search across the contents of multiple files (as opposed to simply the file names themselves), or because you can remember the gist of what you read, but not precisely how it is phrased. This can yield “false negative” keyword search results (aka, the “paraphrasing problem”). For example, although “intelligence measurement” and “cognitive reflection test” are clearly related, they have no overlapping words, so keyword searching for either one of these phrases will miss the other.
Data privacy concerns with free-to-use business models
Cloud storage solutions with free-to-use-but-we-want-your-data business models like Google Drive or Dropbox, can expose researchers to data privacy issues. Consider a study published in the Harvard Business Review in 2018, which used customer data provided by Dropbox to analyse best practices for collaborative scientific research. Though steps were taken to anonymise the data, the study still attracted criticism from the broader research community, as:
- It did not obtain explicit consent from all data owners for the data to be used in the study;
- The anonymisation techniques used were not transparently explained, hence may have been flawed.
This second point is a particular example of the general problem of creating truly robust data anonymisation techniques. Indeed, as pointed out in this critique, though names of researchers were removed from the data, it is theoretically possible that the researchers’ identities can be pieced together from non-redacted metadata, because the anonymisation process was not transparent. For example, as the author explains, removing only the names of researchers but retaining data about their position and university affiliations, together with some metadata about particular projects, may be enough to reconstruct their identities with a small amount of web searching.
Cloud storage shares data, but does not align process
While cloud storage solutions like Dropbox and Google Drive align a workgroup’s data by centralising it and making it accessible to each member, they do not align the search-extract-export process typical of literature reviews and other information gathering workflows in research activities. For example, suppose a biomedical research team has the following workflow:
- Find all research articles relevant to lab experiments on mice with a certain genetic trait;
- Store these in a central location accessible to each team member;
- For each article describing an experiment extract the “methodology” section;
- Save excerpts from each of these “methodology” sections to a convenient location, for later export and analysis.
Cloud storage solutions are useful only for the second of these four steps, because they don’t easily enable efficient searching of thematically similar content at the sub-document level (e.g. “methodology” sections). In this sense, such solutions map only the earliest steps in a typical research workflow.
The Lateral app as an alternative
We built the Lateral app to fill this gap: It allows students and researchers to efficiently store PDF articles in a central, shareable location, then search across all these documents at once using either traditional keyword search, or a more topic-based smart search. Each mode has its advantages: Keyword search is suitable for precise matching, when you know what you are looking for, while thematic search uses natural language processing techniques to learn the gist of ideas and concepts that appear in many documents, but may be expressed in varying language. The technology self-improves based on noting what you find useful, and can infer likely matches in totally new documents using different terminology. All results are stored in a table which can be exported to Excel, MS Word, or plain CSV format, completing Steps 3 and 4 in the above workflow.
If this sounds useful for you, please join us by signing up to the Lateral platform today. We look forward to making research, assignments and report writing way, way easier!