Skip to content

Conversation

@yangyueren
Copy link

请见ANSWER.md。

Copy link
Contributor

@archibate archibate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢第一个提交作业!

  • 完成作业基本要求 42/50 分
  • 能够在 ANSWER.md 中用自己的话解释 23/25 分
  • 代码格式规范、能够跨平台 4/5 分
  • 有自己独特的创新点 11/20 分

// 这两个是临时变量,有什么可以优化的? 5 分
Matrix Rt, RtA;
// ans: 改为static变量,预先分配好空间。
static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024});
static thread_local Matrix Rt, RtA;

我觉得可以一开始为空没问题。thread_local保证如果多个线程访问不会出错。

for(int i=0; i<nx; i+=32){
for(int t=0; t<nt; t++){
for(int i_block=i; i_block<i+32; i_block++){
out(i,j) += lhs(i_block, t) * rhs(t, j);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out(i,j) += lhs(i_block, t) * rhs(t, j);
out(i_block,j) += lhs(i_block, t) * rhs(t, j);

漏改了一个?

for (int y = 0; y < ny; y++) {
float val = wangsrng(x, y).next_float();
out(x, y) = val;
// ans: 矩阵的x轴是紧密排列的,但是循环的内循环是y,访问数据时会跳跃,不利于cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10

out(y, x) = in(x, y);
}
}
// ans: 因为out矩阵是紧密访问,但是in矩阵是跳跃访问,cache中放不下。应改为分块转置。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15

out(x, y) = 0; // 有没有必要手动初始化? 5 分
for (int t = 0; t < nt; t++) {
out(x, y) += lhs(x, t) * rhs(t, y);
// ans: lhs是跳跃访问,rhs是连续访问,out不动,造成无法矢量化。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9,漏改了一个。

TICK(matrix_RtAR);
// 这两个是临时变量,有什么可以优化的? 5 分
Matrix Rt, RtA;
// ans: 改为static变量,预先分配好空间。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3,应该加thread_local,不需要初始化大小。

// #pragma omp parallel for collapse(2)
// for (int y = 0; y < ny; y++) {
// for (int x = 0; x < nx; x++) {
// out(x, y) = 0; // 有没有必要手动初始化? 5 分
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants