00:26:20 Heather Jones: Good morning. Please see day 2 under learn tab for today's notebook. https://kerriegeil.github.io/NMSU-USDA-ARS-AI-Workshops/ 00:32:10 Qijian: I am curious what will happened if these files are kept in the folder 00:37:57 Brian Stucky: I am getting a few questions about the "Annotations" folder, so Just in case anyone runs into trouble, here is an archive of the cleaned up "Annotations" folder. (Assuming I did everything correctly! :-) ) 00:38:13 Brian Stucky: You can download and extract and it should be ready to go. 00:38:49 Yanbo Huang: What are the DOS commands to "delete"? I use "rm" and "rmdir" OK? 00:38:54 Surya Saha: Was the zoom video from yesterday posted somewhere? 00:39:10 Heather Jones: The video will be posted at a later time 00:39:49 Yanbo Huang: What the commands for move (rename)? 00:40:01 Brian Stucky: In DOS, I think it is "ren". 00:41:15 Qijian: are those *.mat files in the subfolders of the Annotation? I don't see those file under Annotation folder 00:41:21 Nishan Bhattarai: may need to open DOS (CMD) as administrator? 00:41:49 Brian Stucky: Hi, all - if you are stuck, please download and extract the cleaned up "Annotations" file I posted above. It should be ready to go. 00:42:15 Huihui Zhang: Thanks! 00:45:27 Yanbo Huang: Ok I downloaded yesterday and all the names have been changed in the systems 01:06:46 Jonathan Shao: Will we go over how to make our own annotation files 01:07:09 Scott Tsukuda: Hi Brian. I am unable to run "tar -xvf Annotations-cleaned.tar.gz" I get Error . . unrecognized archive format. 01:07:50 Brian Stucky: Hi, Scott - can you join me in the breakout room? 01:12:56 Aaron Szczepanek: plt.plot(ann['obj_contour'][0,:]+ann['box_coord'][0,2]-1,\ ann['obj_contour'][1,:]+ann['box_coord'][0,0]-1,'w') Why is there a backslash? lol 01:13:04 Aaron Szczepanek: oh ok 01:14:48 WGMeikle: I get an error with the lN[3] code: error: Error -3 while decompressing data: invalid distance too far back 01:17:45 Alex Styer (he/him): Why are the obj_contour coordinates floats rather than ints? Do they not correspond to a specific pixel? 01:19:53 Daniel.Brabec: When looking at the print out of the coordinates, could you explain again what the number represent. . . What is the diff between 16.5 vs 14.4 vs 11.5 vs 9.3 vs 18.3? 01:27:53 Brian Stucky: Hi, all - Some folks were having issues with extracting the tar.gz archives on Windows, so I posted a ZIP version of the cleaned annotations archive. Please download that one if you are having issues with the annotations files. 02:04:12 Heather Jones: A few notes from Kerrie: Yesterday’s Zoom recording and chat is now posted to the learn page There was a problem yesterday with the posted answer key file being corrupted. It’s now fixed If anyone else is blocked from downloading the Day2 data off Google Drive, let me know and I’ll ARS FTP it to them like I did for Tavis 02:23:46 Scott Tsukuda: Is there a password for the video recordings? 02:24:14 Heather Jones: I think it has to be accessed on a USDA computer, I am checking for an answer now. 02:25:22 Scott Tsukuda: Ok. I am with an extramural program. I would like a way to download the video if possible. 04:23:54 Jonathan Shao: how did you know the background was that particular intensity 04:24:27 Jonathan Shao: thanks 04:27:16 WGMeikle: My feature vector values differ from yours. Is that an issue? 04:28:18 Alex Styer (he/him): Hmm I also have different values; and different from William too 04:28:19 Nishan Bhattarai: mine is also different 04:28:28 Tavis Anderson: I have the same as William. 04:28:38 Aaron Szczepanek: if gray image isn't true color why doesn't that confuse the computer? 04:28:54 Jonathan Shao: I got most of the same numbers 04:29:13 WGMeikle: Thanks! 04:29:14 Alex Styer (he/him): Does this function do a bunch of subsetting where setting seed might matter? 04:29:18 Daniel.Brabec: Ours is difference values also. 04:29:31 Aaron Szczepanek: yea wouldn't they be noticbly different than color 04:30:42 Alex Styer (he/him): Will we talk about any checks we might do to make sure that we’re not passing too many features for picking and overfitting things? 04:30:51 Jonathan Shao: will we go over what to do if we don't have a .mat file 04:31:23 Sean Kearney (USDA-ARS): My values are also different - but I get the same values every time I run (speaks to your point about being non-stochastic) 04:31:25 Alex Styer (he/him): Oh you know what I wrote over im in a previous step where you renamed HSV 04:31:26 Daniel.Brabec: Are the features suppose to evaluate the emu image? 04:31:59 Nishan Bhattarai: running the same code produces same numbers, so could be computer resolution specific 04:33:37 Jonathan Shao: your unknown won't have a .mat file. Does that affect the result 04:49:15 Sean Kearney (USDA-ARS): I tried several times and get different values each time I restart the kernel. 04:49:54 Sean Kearney (USDA-ARS): haha - no problem! I'm reading around a bit. Will post if I find anything interesting 04:56:06 Sean Kearney (USDA-ARS): Just realized that the values don't change when restarting the kernel, but the order of the GLCM props does change (so the vector 'f' looks different)! Also odd, but at least the GLCM vals are consistent... 05:10:36 Jonathan Shao: The image directory uses "image_" and the annotation folder has "annotation_" are these tags specific for glob 05:12:47 Jonathan Shao: yes 05:16:19 Sean Kearney (USDA-ARS): FYI - It will fix the inconsistency issue if we change the following line in the extract_texture_features() function: --from-- GLCM_feats = {'contrast', 'dissimilarity', 'homogeneity', 'energy', 'correlation', 'ASM'} -- to -- GLCM_feats = ['contrast', 'dissimilarity', 'homogeneity', 'energy', 'correlation', 'ASM'] 05:17:16 Sean Kearney (USDA-ARS): Not sure why, but it seems to iterate randomly through a list with {} instead of [] 05:17:42 Daniel.Brabec: Good find. Now numbers match. 05:18:20 Tavis Anderson: same for me now - thanks Sean. 05:21:26 Yanbo Huang: I have error message NameError: name 'extract_color_features_hsv' is not defined 05:22:34 Yanbo Huang: Thanks 05:22:48 Daniel.Brabec: How did you know the time it took to process the images? 05:23:31 Daniel.Brabec: yes, all 05:27:36 Aaron Szczepanek: in Vstack, it is adding the list of attributes to a new column? 05:28:30 Aaron Szczepanek: oh now that makes sense 05:29:10 Sean Kearney (USDA-ARS): Just to wrap up that confusing issue with {} vs. []: Apparently when you use {}, it creates a 'set' rather than a 'list'. Python iterates randomly/arbitrarily over 'sets', based on a seed set when the kernel is initiated. (see here: https://stackoverflow.com/questions/3848091/set-iteration-order-varies-from-run-to-run and the screenshot) 05:33:28 Sean Kearney (USDA-ARS): will you talk about what to do with unbalanced training data later? 05:40:54 Laura Boucheron: Xn_train,mx,mn = normalize_feature_columns(X_train) print(mx[0::10]) print(mn[0::10]) print(Xn_train[0,:]) 05:42:19 Laura Boucheron: Xn_train.min() 05:42:24 Laura Boucheron: Xn_train.max() 05:42:46 Laura Boucheron: Xn_test = normalize_feature_columns(X_test,mx,mn) 05:43:08 Laura Boucheron: Xn_test.min() 05:43:12 Laura Boucheron: Xn_test.max() 05:48:01 WGMeikle: Could you send the code for: Explore the dimensionalities and values of X_train, X_test, y_train, and y_test. 05:48:16 Laura Boucheron: print('X_train is shape '+str(X_train.shape)) print('X_test is shape '+str(X_test.shape)) print('y_train is length '+str(len(y_train))) print('y_test is length '+str(len(y_test))) 06:21:43 Tavis Anderson: I dropped the training data down to 20% and it turns out more interesting confusion matrix 06:32:58 Alex Styer (he/him): Getting 71% accuracy for all the arthropods (crabs, lobsters, ticks, scorpions, dragonfly, butterfly, ant) 06:33:10 Alex Styer (he/him): And unsurprisingly tends to confuse crabs and ticks haha 06:34:40 Sean Kearney (USDA-ARS): Accuracy of strawberries went down a lot when I added in beaver, brain and soccer_ball. Overall accuracy = 79% but strawberry accuracy down to 20% 06:34:48 WGMeikle: did really well with 'Motorbikes, airplanes, car_sides'