Collection of Lymphoma Somatic Mutations from 57 Published NGS Studies for Comparative Analysis

    Introduction

    LmphoDB is intended for the lymphoma genetic research field as a data hub consisting of ~1.5 millions of data points from up-to-date 57 independent lymphoma NGS studies. Such a large collection of lymphoma genetic lesions is very unique in the field, meanimg it could advance research in two ways:

1) After downloading entire LymphoDB, various customized big data mining methods, including artifical intellegent (AI) and machine learning approaches, can be applied to identify mutation patterns regarding lymphoma and its subtypes from different angles with different ideas. We provided two examples of database-wide analysis to demostrate the power of LymphoDB plus artificial intellegent algorithms.

2) The web search functions are provided if users want to search particular mutations or genes in LymphoDB. For example, if you already have a large list of mutations derived from your experiments, the comparative analysis can prioritize hotspots of interest through just a few clicks of the mouse. The comparative analysis between the user’s input and the large collection in LymphoDB will help to answer the following questions, but not limited to:

     Which mutations in my list are also frequently supported in the 57 lymphoma studies?
     How are the mutations in my list occurring across different subtypes in the 57 lymphoma studies?
     Which mutations in my list are occurring uniquely in certain subtypes among the 57 lymphoma studies?
     How are the genes I'm interested in frequently mutated across 57 lymphoma studies?
     How are the genes I'm interested in uniquely mutated in certain subtypes among the 57 lymphoma studies?
   
Collection Statistics of LymphoDB:

  • ~1,500,000 data points
  • ~120,000 somatic missense mutation events
  • 2,900 lymphoma samples
  • 13 lymphoma subtypes
  • 57 independant lymphoma NGS studies
   
Two Ways to Access LymphoDB:

  • Search functions provided on the web site to search particular mutations/genes of user's interest.
  • Download the entire text based dataset of LymphoDB for customized big data mining.
    Some Details of Methods

                      
Collection: Since the medium and small NGS lymphoma studies are spread in sparse locations on the internet or as supplementary materials on papers, it requires a lot of manual processes to download and extract the datasets one-by-one. Particularly, some data tables are in the pdf format, which further increases the challenge of data extraction. In LymphoDB, we manually collected somatic mutation lists from 57 NGS studies. Most of them are not included in any of the major genetic databases. Thus, the collection of LymphoDB is quite unique in the lymphoma research field.

Integration: The downloaded datasets are in heterogeneous formats as shown in the figure below. To integrate things to be on the same page for across comparative analysis, we have to treat each study differently according to their specific mutation notation and format. The tool of TransVar was utilized for mutation notation conversion. Our goal was to translate all of the different mutation notations to chromosome-oriented-notation, which gives the least fuzziness during comparative analysis. This unified notation allows us to integrate the mutations from many studies into one single, well structured source, LmphoDB

            

Accessibility: A dedicated dynamic website was developed to make the data accessible and searchable for world-wide users. The summary function provides the users with results that are are organzed and sorted to facilitate better interpretation in a database-wide view. Moreover, links are provided leading to more detailed reports from third party resources. See the example and explanation of the output in the next section.

    Explanation of the Output of Comparative Analysis

   
    Full List of 57 Independant Lymphoma Studies in LymphoDB

    View the full list here

    Poster in xxxx Institute Summer Intern Festival 2017

    Sorry! The link to the poster was tempararily blocked due to anonymous requirement of Siemens Competition
   
References & Acknowledgements:


      Particular thanks go to following public resources that are used in LymphoDB project.


Dr Louis Staudt' Laboratory

LymphoDB was developed by several high school students as an internship project in xxxx Research Institute

The project is being under a Scientific Competition so that all IDs, names, and logos of students, mentors and institute are anonymous throughout the web site as required

To hide our institute web address, this is a clone of our official web site in a third party server for a Scientific Competition

©2017