Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with portalCacheFlush in memcpy example? #120

Open
psrawke opened this issue Jun 6, 2016 · 9 comments
Open

Issue with portalCacheFlush in memcpy example? #120

psrawke opened this issue Jun 6, 2016 · 9 comments

Comments

@psrawke
Copy link

psrawke commented Jun 6, 2016

Hi Jamey,

In case of memcpy, the program finishes with status 0 (successfully) unlike previously with older images. But when I print the src and dst array in software's done function, some of the locations are overwritten (or initialized as 0). If I initialize, the memory after 'portalCacheFlush' src and dst are initialized properly. Can that be happening because of 'portalCacheFlush'?

Default order for memory initialization in memcpy example:
In this case, some parts of the src and dst array are not initialzed properly.

  srcAlloc = portalAlloc(alloc_sz, 0);
  dstAlloc = portalAlloc(alloc_sz, 0);

  srcBuffer = (unsigned int *)portalMmap(srcAlloc, alloc_sz);
  dstBuffer = (unsigned int *)portalMmap(dstAlloc, alloc_sz);

  for (int i = 0; i < numWords; i++){
     srcBuffer[i] = i;
     dstBuffer[i] = 0x5a5abeef;
  }
  portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
  portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

  unsigned int ref_srcAlloc = dma->reference(srcAlloc);
  unsigned int ref_dstAlloc = dma->reference(dstAlloc);

Following modifications in the order solves the issue:

  srcAlloc = portalAlloc(alloc_sz, 0);
  dstAlloc = portalAlloc(alloc_sz, 0);

  srcBuffer = (unsigned int *)portalMmap(srcAlloc, alloc_sz);
  dstBuffer = (unsigned int *)portalMmap(dstAlloc, alloc_sz);

  portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
  portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

  unsigned int ref_srcAlloc = dma->reference(srcAlloc);
  unsigned int ref_dstAlloc = dma->reference(dstAlloc);

  for (int i = 0; i < numWords; i++){
     srcBuffer[i] = i;
     dstBuffer[i] = 0x5a5abeef;
  }
@jameyhicks
Copy link
Member

What values are being overwritten to which memory locations?

On Mon, Jun 6, 2016 at 9:26 AM Prasanna Rawke [email protected]
wrote:

Hi Jamey,

In case of memcpy, the program finishes with status 0 (successfully)
unlike previously with older images. But when I print the src and dst array
in software's done function, some of the locations are overwritten (or
initialized as 0). If I initialize, the memory after 'portalCacheFlush' src
and dst are initialized properly. Can that be happening because of '
portalCacheFlush'?

Default order for memory initialization in memcpy example:
In this case, some parts of the src and dst array are not initialzed
properly.

srcAlloc = portalAlloc(alloc_sz, 0);
dstAlloc = portalAlloc(alloc_sz, 0);

srcBuffer = (unsigned int *)portalMmap(srcAlloc, alloc_sz);
dstBuffer = (unsigned int *)portalMmap(dstAlloc, alloc_sz);

for (int i = 0; i < numWords; i++){
srcBuffer[i] = i;
dstBuffer[i] = 0x5a5abeef;
}
portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

unsigned int ref_srcAlloc = dma->reference(srcAlloc);
unsigned int ref_dstAlloc = dma->reference(dstAlloc);

Following modifications in the order solves the issue:

srcAlloc = portalAlloc(alloc_sz, 0);
dstAlloc = portalAlloc(alloc_sz, 0);

srcBuffer = (unsigned int *)portalMmap(srcAlloc, alloc_sz);
dstBuffer = (unsigned int *)portalMmap(dstAlloc, alloc_sz);

portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

unsigned int ref_srcAlloc = dma->reference(srcAlloc);
unsigned int ref_dstAlloc = dma->reference(dstAlloc);

for (int i = 0; i < numWords; i++){
srcBuffer[i] = i;
dstBuffer[i] = 0x5a5abeef;
}


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#120, or mute the
thread
https://github.com/notifications/unsubscribe/ACU3syz_md6HY0WYwCjsKUMusp4zw9J7ks5qJCAZgaJpZM4Iu5Kz
.

@psrawke
Copy link
Author

psrawke commented Jun 6, 2016

The values being overwritten vary with each run of the program. But they were in blocks of 8 or 16, replaced by zeroes. The first eight addresses of srcBuffer were always replaced while the other block replaced was around indices 80-100 and varies with each run.

@jameyhicks
Copy link
Member

Please post the code you added to print out the values and then a
transcript showing the incorrect values.

On Mon, Jun 6, 2016 at 10:52 AM Prasanna Rawke [email protected]
wrote:

The values being overwritten vary with each run of the program. But they
were in blocks of 8 or 16, replaced by zeroes. The first eight addresses of
srcBuffer were always replaced while the other block replaced was around
indices 80-100 and varies with each run.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#120 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACU3sy-OHlCW6zYPpa8-upfnMBj-yEnrks5qJDQwgaJpZM4Iu5Kz
.

@psrawke
Copy link
Author

psrawke commented Jun 6, 2016

I have modified done method of MemcpyIndication class as:

virtual void done() {
    sem_post(&done_sem);
    fprintf(stderr, "done\n");
    finished = true;
    memcmp_fail = memcmp(srcBuffer, dstBuffer, numWords*sizeof(unsigned int));
    if (1) {
        int *s = (int *)srcBuffer;
        int *d = (int *)dstBuffer;
        for (int i = 0; i < numWords; i++) {

            if (s[i] != i || d[i] != i)
                fprintf(stderr, "mismatch %d %d %d\n", i, s[i], d[i]);
            // else fprintf(stderr, "   match %d %d %d\n", i, s[i], d[i]);
         }
    }
    fprintf(stderr, "memcmp=%x\n", memcmp_fail);
    sem_post(&memcmp_sem);
  }

Also, I have reduced the problem size to 256 as follows:

#ifndef SIMULATION
int numWords = 1 << 8;
#else
int numWords = 1 << 8;
#endif

These are the transcripts for the program for 3 runs:

memcpy_log1.txt
memcpy_log2.txt
memcpy_log3.txt

@jameyhicks
Copy link
Member

I am still unable to reproduce this problem, but I added code to verify src[i] == dst[i] == i

@psrawke
Copy link
Author

psrawke commented Jun 7, 2016

When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?

@jameyhicks
Copy link
Member

Prasanna,

In the case of simulation, you are using x86 user-mode cache coherence. In
the case of Zynq, DMA is using uncached reads and writes and so we have to
use portalCacheFlush() to manually synchronize the caches with DRAM.

I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also
test it on a VC707 attached to an x86 system, but in that case the CPU
snoops DRAM so the manual cache flush and invalidation are not required.
After adding the code that checks that the value at offset i of src and dst
both equal i, the tests are still working.

It seems like we must have a bug in the cache flush code in the kernel, and
sometimes it is not flushing dirty cache lines. That would explain you you
have some regions that remain zero after running the test.

-Jamey

On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke [email protected]
wrote:

When uploaded to Zedboard, the mismatch occurs. But in case of simulation,
output is correct. So it might be a board specific issue. Which board are
you working on?


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#120 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz
.

@jameyhicks
Copy link
Member

I have reproduced the issue by doing as you suggested -- shortening the
length to 256 words.

On Tue, Jun 7, 2016 at 8:36 AM Jamey Hicks [email protected] wrote:

Prasanna,

In the case of simulation, you are using x86 user-mode cache coherence. In
the case of Zynq, DMA is using uncached reads and writes and so we have to
use portalCacheFlush() to manually synchronize the caches with DRAM.

I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also
test it on a VC707 attached to an x86 system, but in that case the CPU
snoops DRAM so the manual cache flush and invalidation are not required.
After adding the code that checks that the value at offset i of src and dst
both equal i, the tests are still working.

It seems like we must have a bug in the cache flush code in the kernel,
and sometimes it is not flushing dirty cache lines. That would explain you
you have some regions that remain zero after running the test.

-Jamey

On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke [email protected]
wrote:

When uploaded to Zedboard, the mismatch occurs. But in case of
simulation, output is correct. So it might be a board specific issue. Which
board are you working on?


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#120 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz
.

@jameyhicks
Copy link
Member

I cannot explain this, but this version passes repeatedly:
if (1)
for (int i = 0; i < numWords; i++){
srcBuffer[i] = 0xbabafeed;
dstBuffer[i] = 0x5a5abeef;
}

if (1) {
portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);
}

for (int i = 0; i < numWords; i++){
srcBuffer[i] = i;
dstBuffer[i] = 0x5a5abeef;
}

if (0) {
portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);
fprintf(stderr, "Main::flush and invalidate complete\n");
}

On Tue, Jun 7, 2016 at 9:06 AM Jamey Hicks [email protected] wrote:

I have reproduced the issue by doing as you suggested -- shortening the
length to 256 words.

On Tue, Jun 7, 2016 at 8:36 AM Jamey Hicks [email protected] wrote:

Prasanna,

In the case of simulation, you are using x86 user-mode cache coherence.
In the case of Zynq, DMA is using uncached reads and writes and so we have
to use portalCacheFlush() to manually synchronize the caches with DRAM.

I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also
test it on a VC707 attached to an x86 system, but in that case the CPU
snoops DRAM so the manual cache flush and invalidation are not required.
After adding the code that checks that the value at offset i of src and dst
both equal i, the tests are still working.

It seems like we must have a bug in the cache flush code in the kernel,
and sometimes it is not flushing dirty cache lines. That would explain you
you have some regions that remain zero after running the test.

-Jamey

On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke [email protected]
wrote:

When uploaded to Zedboard, the mismatch occurs. But in case of
simulation, output is correct. So it might be a board specific issue. Which
board are you working on?


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#120 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants