What version of the product are you using? On what operating system?
PFAC 1.0, on RHEL 6
Please provide any additional information below.
I measured the time it takes for PFAC_matchFromHostReduce and the equivalent
steps when using PFAC_matchFromDeviceReduce. Both functions take about the same
time to complete when the size of the input string is 100MB.
Timing for PFAC_matchFromHostReduce: 56 ms
Timing for equivalent steps using PFAC_matchFromDeviceReduce:
cudaMalloc: 0.3 ms
cudaMemcpy(d_input_string, h_input_string, input_size, cudaMemcpyHostToDevice):
18 ms
PFAC_matchFromDeviceReduce: 26 ms
cudaMemcpy of d_pos and d_match_result back to CPU: 0.3 ms
cudaFree of d_input_string, d_pos and d_match_result: 11 ms
Total: 57 ms
Original issue reported on code.google.com by
hja...@ymail.comon 29 Apr 2011 at 1:49