Posts by Tags

AlexNet

BERT

论文阅读: BERT

2 minute read

Published:

读 BERT | 针对语言理解任务预训练的深度双向 Transformer

BPE

CNN

卷积计算

1 minute read

Published:

卷积计算

CV

卷积计算

1 minute read

Published:

卷积计算

ConvNet

cs231n_cnn_3

less than 1 minute read

Published:

CS231n_cnn: 3. Transfer Learning

cs231n_cnn_2

less than 1 minute read

Published:

CS231n_cnn: 2. Visualizing what ConvNets learn

cs231n_cnn_1

7 minute read

Published:

CS231n_cnn: 1. Convolutional Neural Network

Data Preprocessing

cs231n_6

5 minute read

Published:

CS231n: 6. Setting up the data and the model

Deep Learning

MoE 原理及实现

less than 1 minute read

Published:

MoE:mix of experts

  • 专家 stack 起来,计算 token 经过每个专家的输出,对结果加权。

论文阅读: BERT

2 minute read

Published:

读 BERT | 针对语言理解任务预训练的深度双向 Transformer

论文阅读: CCV

less than 1 minute read

Published:

读 CCV | 用于情景医学图像分割的循环上下文验证

LoRA 原理及实现

1 minute read

Published:

LoRA:low rank adaptation

  • 用两个小矩阵相乘,去表示全量微调后权重矩阵的这个变化量

论文阅读: GPT-2.0

1 minute read

Published:

读 GPT-2.0 | 语言模型是无监督的多任务学习

论文阅读: DeepSeek-R1

2 minute read

Published:

读 DeepSeek-R1 | 通过强化学习激励大语言模型的推理能力

论文阅读: GPT-1.0

22 minute read

Published:

读 GPT-1.0 | 通过生成式预训练来提高语言理解

cs231n_1

2 minute read

Published:

CS231n: 1. Introduction & KNN & Data Split

卷积计算

1 minute read

Published:

卷积计算

GPT-1.0

论文阅读: GPT-1.0

22 minute read

Published:

读 GPT-1.0 | 通过生成式预训练来提高语言理解

GPT-2.0

论文阅读: GPT-2.0

1 minute read

Published:

读 GPT-2.0 | 语言模型是无监督的多任务学习

cs231n_7

6 minute read

Published:

CS231n: 7. Learning the parameters

Image Classification

cs231n_1

2 minute read

Published:

CS231n: 1. Introduction & KNN & Data Split

LLMs

MoE 原理及实现

less than 1 minute read

Published:

MoE:mix of experts

  • 专家 stack 起来,计算 token 经过每个专家的输出,对结果加权。

LoRA 原理及实现

1 minute read

Published:

LoRA:low rank adaptation

  • 用两个小矩阵相乘,去表示全量微调后权重矩阵的这个变化量

论文阅读: GPT-2.0

1 minute read

Published:

读 GPT-2.0 | 语言模型是无监督的多任务学习

论文阅读: GPT-1.0

22 minute read

Published:

读 GPT-1.0 | 通过生成式预训练来提高语言理解

LLMs: Seq2seq

5 minute read

Published:

Sequence to Sequence Learning with Neural Networks

Linear Classification

cs231n_2

3 minute read

Published:

CS231n: 2. 线性分类器, SVM loss, Softmax

LoRA

LoRA 原理及实现

1 minute read

Published:

LoRA:low rank adaptation

  • 用两个小矩阵相乘,去表示全量微调后权重矩阵的这个变化量

MLP

MoE

MoE 原理及实现

less than 1 minute read

Published:

MoE:mix of experts

  • 专家 stack 起来,计算 token 经过每个专家的输出,对结果加权。

Multi-Modal

NLP

论文阅读: BERT

2 minute read

Published:

读 BERT | 针对语言理解任务预训练的深度双向 Transformer

Neural Network

cs231n_8

3 minute read

Published:

CS231n: 8. Minimal Neural Network Case

Neural Networks

cs231n_5

3 minute read

Published:

CS231n: 5. Architecture, ReLU, overfitting

Optimization

cs231n_3

2 minute read

Published:

CS231n: 3. Optimization: Stochastic Gradient Descent

Parameter Sharing

cs231n_cnn_1

7 minute read

Published:

CS231n_cnn: 1. Convolutional Neural Network

Parameter updates

cs231n_7

6 minute read

Published:

CS231n: 7. Learning the parameters

Pooling

cs231n_cnn_1

7 minute read

Published:

CS231n_cnn: 1. Convolutional Neural Network

ReLU

cs231n_5

3 minute read

Published:

CS231n: 5. Architecture, ReLU, overfitting

Regularization

cs231n_6

5 minute read

Published:

CS231n: 6. Setting up the data and the model

ResNet

论文阅读: ResNet

1 minute read

Published:

读 ResNet | 图像识别的深度残差学习

SGD

cs231n_3

2 minute read

Published:

CS231n: 3. Optimization: Stochastic Gradient Descent

SVM

cs231n_2

3 minute read

Published:

CS231n: 2. 线性分类器, SVM loss, Softmax

Seq2seq

Softmax

cs231n_2

3 minute read

Published:

CS231n: 2. 线性分类器, SVM loss, Softmax

Temperature

采样

1 minute read

Published:

sample

  • 采样目的:生成文本时,从预测结果中选出高概率的候选词,避免随机选到低概率词导致的语句不合理,同时保留一定多样性。
  • 先用temperature 调整分布平滑度,然后 top-k + top-p 控制候选范围

Tokenlizer

Top-k

采样

1 minute read

Published:

sample

  • 采样目的:生成文本时,从预测结果中选出高概率的候选词,避免随机选到低概率词导致的语句不合理,同时保留一定多样性。
  • 先用temperature 调整分布平滑度,然后 top-k + top-p 控制候选范围

Top-p

采样

1 minute read

Published:

sample

  • 采样目的:生成文本时,从预测结果中选出高概率的候选词,避免随机选到低概率词导致的语句不合理,同时保留一定多样性。
  • 先用temperature 调整分布平滑度,然后 top-k + top-p 控制候选范围

VQA

Visualization

cs231n_cnn_2

less than 1 minute read

Published:

CS231n_cnn: 2. Visualizing what ConvNets learn

Weight Initialization

cs231n_6

5 minute read

Published:

CS231n: 6. Setting up the data and the model

backpropagation

cs231n_4

2 minute read

Published:

CS231n: 4. Backpropagation

chain role

cs231n_4

2 minute read

Published:

CS231n: 4. Backpropagation

cs231n

cs231n_1

2 minute read

Published:

CS231n: 1. Introduction & KNN & Data Split

filter

如何读论文

less than 1 minute read

Published:

如何读论文

fine-tuning

cs231n_cnn_3

less than 1 minute read

Published:

CS231n_cnn: 3. Transfer Learning

gradient

论文阅读: ResNet

1 minute read

Published:

读 ResNet | 图像识别的深度残差学习

linear classifier

cs231n_8

3 minute read

Published:

CS231n: 8. Minimal Neural Network Case

loss/acc monitor

cs231n_7

6 minute read

Published:

CS231n: 7. Learning the parameters

mini-batch

cs231n_3

2 minute read

Published:

CS231n: 3. Optimization: Stochastic Gradient Descent

multi-head

overfitting

论文阅读: ResNet

1 minute read

Published:

读 ResNet | 图像识别的深度残差学习

cs231n_5

3 minute read

Published:

CS231n: 5. Architecture, ReLU, overfitting

pretrain

cs231n_cnn_3

less than 1 minute read

Published:

CS231n_cnn: 3. Transfer Learning

research

如何读论文

less than 1 minute read

Published:

如何读论文

scaled dot-product attention

select

如何读论文

less than 1 minute read

Published:

如何读论文

seq2seq

LLMs: Seq2seq

5 minute read

Published:

Sequence to Sequence Learning with Neural Networks

sigmoid

cs231n_4

2 minute read

Published:

CS231n: 4. Backpropagation

spiral dataset

cs231n_8

3 minute read

Published:

CS231n: 8. Minimal Neural Network Case

t-SNE

cs231n_cnn_2

less than 1 minute read

Published:

CS231n_cnn: 2. Visualizing what ConvNets learn