Thursday 4 September 2014

Recovering TCR sequences from ENCODE RNASeq data (or not)

You might have seen that there's recently been another wave of papers and data from the ENCODE project.

Browsing through their data, I noticed one sample was a large RNASeq experiment using peripheral blood mononuclear cells (PBMC) as the input cell type: as this will contain a lot of T cell RNA, I thought I'd have a look for some TCRs!

It was a pretty fruitless endeavour. The smaller of the two fastqs, clocking in with 27.5 million reads, yielded some 70-odd TCRs all told. The data, being 100bp reads from presumably randomly sheared/reverse transcribed just doesn't catch enough V(D)J windows to consistently mine (without any assembly).

The larger file was still running the next day having only spat out another 70, so I called it quits. Rich data sets they may be, but appropriate for Decombinator they aint.

Oh well, back to the drawing board.