0%

R package development

Posted on 2023-02-27
Symbols count in article: 8.4k Reading time ≈ 8 mins.

本文简单记录R包开发的流程

R-SCENIC

Posted on 2022-11-06 Edited on 2022-11-05
Symbols count in article: 18k Reading time ≈ 17 mins.

文章接受，终于有空再发点笔记了😃

SCENIC_Tutorial

SCENIC_Tutorial

http://htmlpreview.github.io/?https://github.com/aertslab/SCENIC/blob/master/inst/doc/SCENIC_Running.html

R-model-predict

Posted on 2022-07-12
Symbols count in article: 4.2k Reading time ≈ 4 mins.

Linear Regression
- Model
- Prediction
GLM
- Model
- Prediction
LOESS regression
- Fitting only
- Extrapolation

使用R建模并预测

conda-install-R-packages

Posted on 2022-07-08
Symbols count in article: 1.4k Reading time ≈ 1 mins.

conda-安装R包

最近想要用conda来进行R包的安装和环境管理，搜索了一下发现可以用yaml文件进行管理。

R-ggplot调整分面宽度

Posted on 2022-07-07
Symbols count in article: 1.6k Reading time ≈ 1 mins.

问题描述
解决方法

问题描述

最近用ggplot画分面图的时候碰到一个问题。在分面的各个x轴上变量数目不一样，导致画出来的每个分面中的图大小也不一样。

R-Seurat数据分析流程

Posted on 2022-04-29 Edited on 2022-05-03
Symbols count in article: 14k Reading time ≈ 12 mins.

Seurat standard pipeline

R-paintingr-调色板

Posted on 2022-03-14 Edited on 2022-04-29
Symbols count in article: 2.8k Reading time ≈ 3 mins.

“The greatest value of a picture is when it forces us to notice what we never expected to see.” - John Tukey

从油画当中汲取了一些配色方案，写成了一个R包 paintingr (https://github.com/thereallda/paintingr)

欢迎使用R画图的朋友给点意见和建议！

通路富集分析

Posted on 2022-02-28
Symbols count in article: 10k Reading time ≈ 9 mins.

Following tutorial from 2019 Nature Protocol (https://www.nature.com/articles/s41596-018-0103-9)

RNA-seq测序数据模拟

Posted on 2022-02-10
Symbols count in article: 7.3k Reading time ≈ 7 mins.

在评估不同软件性能的时候，我们会需要模拟一些数据。由于模拟数据当中的情况是已知的，例如差异表达基因的数目。因此，通过比较不同软件在模拟数据上的效果，我们可以获得软件的量化性能指标，例如灵敏度、特异性和准确度等。

本文根据 DESeq2 文章中的方法记录如何进行简单的基于负二项分布（Negative Binomial distribution）模拟RNA-seq基因表达数据。

R-获取基因长度

Posted on 2022-02-07
Symbols count in article: 2.5k Reading time ≈ 2 mins.

通常，在计算TPM或RPKM/FPKM等基因表达量时，除了基因的counts信息外，我们还需要知道基因的长度。这里所用到的基因长度并不是某个基因在基因组上的完整长度。在基因表达分析中，“基因长度”通常指的是成熟转录本的长度，也就是无内含子的碱基序列。因此，单纯地使用基因的染色体起始和结束坐标相减并不能返回转录本的长度信息。目前，对于基因长度有多种定义，包括：

1. 基因最长转录本；

2. 多个转录本长度的平均值；

3. 非重叠外显子长度之和

4. 非重叠CDS序列长度之和

本文介绍使用gtf文件在R中获取基因长度（非重叠外显子长度之和）的方法