You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The discussion there resulted in this demo code to run a simple hash function FNV1a-32 from CPU rust code to on the GPU with arrayfire-rust. It presently for simplicity has a small static input to process and no additional logic for iterating through an input larger than available memory(permutation generator) to halt processing once a target result is matched.
@9prady9 has pointed out some minor changes to improve performance and requested more code regarding how performance was being measured/profiled(probably another good example in itself to include). He then also suggested that performance for this type of computation would be better suited with a custom kernel tuned for it than relying on ArrayFire API JIT kernel alone(another useful example that could derive from this one).
This could become a nice example that further examples could expand upon to cover how one approaches certain tasks from CPU to GPU, along with better covering areas of the ArrayFire API that may not be understood as well from API documentation alone(data transfer between GPU and host, control flow, custom kernels, measuring time taken with JIT in mind).
Click for example code
use arrayfire as af;fnmain(){// af::set_backend(Backend::CUDA);//Backend::OPENCL;//Backend::DEFAULT);
af::init();
af::info();/////////////////////////////////// Generate string inputs to hash/////////////////////////////////// Inputs are collapsed into a sequence/stream of bytes, single array dimensionlet test_strings = vec!["hello","howdy","hallo"];let test_bytes = test_strings.iter().flat_map(|s| s.as_bytes()).cloned().collect::<Vec<u8>>();// println!("test_strings as byte array: {:?}", test_bytes);// [104, 101, 108, 108, 111, 104, 111, 119, 100, 121, 104, 97, 108, 108, 111]///////////////////////////// Initialize values for AF///////////////////////////// Used to size AF array dimensionslet input_len = test_strings[0].len()asu64;let num_inputs = test_strings.len()asu64;// Convert to an ArrayFire 2D array (Column Major)// 5x3 col(length of bytes for input) x row(number of inputs)let inputs_dims = af::Dim4::new(&[input_len, num_inputs,1,1]);// [5, 3, 1, 1]let inputs = af::Array::new(&test_bytes, inputs_dims);// Each input now has it's bytes in it's own column// print(&inputs);// [5 3 1 1]// 104 104 104 // 101 111 97 // 108 119 108 // 108 100 108 // 111 121 111 let fnv_dims = af::Dim4::new(&[1, num_inputs,1,1]);// [1, 3, 1, 1]////////////////////////////// Compute hashes for inputs////////////////////////////let hashes = fnv1a32_gpu(&inputs, fnv_dims);// print(&hashes);// [1 3 1 1]// 1335831723 3497709110 4182196071// println!("hashes: {:x} | {:x} | {:x}", 1_335_831_723_u32, 3_497_709_110_u32, 4_182_196_071_u32);// 4f9f2cab | d07ace36 | f9473f67///////////////////////// Find matching hashes///////////////////////// Creates an AF array filled with the same target value to match for,let target_hashes: af::Array<u32> = af::constant(0x4f9f2cabasu32, fnv_dims);let matches = check_for_matches(hashes, target_hashes);// println!("matched: {:?}", match_data);// matched: [1335831723]for matched in matches {println!("Matched the hash: {:x}",&matched);};// Matched the hash: 4f9f2cab}// FNV-1a (32-bit) hashing algorithmconstFNV_OFFSET:u32 = 2_166_136_261;constFNV_PRIME:u32 = 16_777_619;pubfnfnv1a32_gpu(inputs:&af::Array<u8>,fnv_dims: af::Dim4) -> af::Array<u32>{let input_len = inputs.dims().get()[0];letmut hashes = af::constant(FNV_OFFSET, fnv_dims);let prime = af::constant(FNV_PRIME, fnv_dims);// Iterates through all inputs(columns) in parallel a byte each at a time(row)for row_index in0..input_len {
hashes = (hashes ^ af::row(&inputs, row_index))*′}
hashes
}fnfilter_matches(array:&af::Array<u32>,bools:&af::Array<bool>) -> af::Array<u32>{let indices = &af::locate(bools);letmut idxr = af::Indexer::default();
idxr.set_index(indices,0,None);
af::index_gen(array, idxr)}fncheck_for_matches(hashes: af::Array<u32>,target_hashes: af::Array<u32>) -> Vec<u32>{// If we computed the same hash value, then `eq()` will let us know which values matchedlet is_matched: af::Array<bool> = af::eq(&hashes,&target_hashes,false);// Only keep results that were matchedlet result = filter_matches(&hashes,&is_matched);// Transfer the matches to the host CPU to accesslet length = result.elements()asusize;letmut match_data:Vec<u32> = vec![0; length];
result.host::<u32>(&mut match_data);
match_data
}
The text was updated successfully, but these errors were encountered:
Description
From the original issue: #52 (comment)
The discussion there resulted in this demo code to run a simple hash function FNV1a-32 from CPU rust code to on the GPU with
arrayfire-rust
. It presently for simplicity has a small static input to process and no additional logic for iterating through an input larger than available memory(permutation generator) to halt processing once a target result is matched.@9prady9 has pointed out some minor changes to improve performance and requested more code regarding how performance was being measured/profiled(probably another good example in itself to include). He then also suggested that performance for this type of computation would be better suited with a custom kernel tuned for it than relying on ArrayFire API JIT kernel alone(another useful example that could derive from this one).
This could become a nice example that further examples could expand upon to cover how one approaches certain tasks from CPU to GPU, along with better covering areas of the ArrayFire API that may not be understood as well from API documentation alone(data transfer between GPU and host, control flow, custom kernels, measuring time taken with JIT in mind).
Click for example code
The text was updated successfully, but these errors were encountered: