This is a fully-connected network(8 strides) implementation on the dataset ADE20k, using tensorflow. The implementation is largely based on the paper arXiv: Fully Convolutional Networks for Semantic Segmentation and 2 implementation from other githubs: FCN.tensorflow and semantic-segmentation-pytorch.
net | dataset | competition | framework | arXiv paper |
---|---|---|---|---|
FCN8s | ADE20k | MIT Scene Parsing Benchmark (SceneParse150) | tensorflow 1.4 and python 3.6 | arXiv: Fully Convolutional Networks for Semantic Segmentation |
- Download and extract the dataset.
- Download the .zip dataset. link: http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip
- Put the .zip dataset under ./Data_zoo/MIT_SceneParsing/ and extract.
- Start training. Simply run
FCN_train.py
- To test the model, simply run
FCN_test.py
. It will test the first 100 validation images by default. The validation dataset contains 2000 images, so if you want to test more images, simply modified the variableTEST_NUM
, e.g. 1000. - To use the model to infer images,
- Put the .jpg images you want to infer in the folder ./infer
- Make sure there is a folder ./output (to store the result).
- Run
FCN_infer.py
, it will process all .jpg images under ./infer and put the predicted annotations under ./output
- Each time process a batch containing 2 images. If the size of 2 images/annotations are different, enlarge the smaller image/annotation so that they are the same size. The enlarged size must be rounded so that it can be divided by 32(because the size is downsized 32 times at most, when processed through the FCN network).
- I use the function
scipy.misc.imresize
to resize. The function paraminterp
(interpolation) for resizing images is"bilinear"
, while that for resizing annotations is"nearest"
. - The optimizer is Adam Optimizer with learning rate 1e-5.
When you run FCN_train.py
, you will see:
- It will cost about 7~8 hours on Nvidia GeForce GTX 1080 11GB.
- The training loss is hard to be improved when it's around 0.8 ~ 1.5. This may be the limitation of FCN.
When you run FCN_test.py
, you will see:
After processing all validation images, It will print the metrics.
You can uncomment lines 130~143 to see some well processed results.
Just put the .jpg images in ./infer, run ./FCN_infer.py and the predicted annotations will be put in ./output.
The metrics of testing 100 validation images is:
pixel_accuracy | mean_accuracy | mean IU(mean iou) | frequency weighted IU |
---|---|---|---|
0.6739 | 0.4332 | 0.3644 | 0.5024 |
(There are 151 classes, where class index 0 is "others". You just need to care the accuracy on the 150 classes)
Here are some examples:
- A simple Chinese readme is here: 简书
- Another blog written previously in Chinese is here: task7 FCN分析,the contents about code may not be useful, because I make a lot of change on the code since then.
以上两个链接是我写的两个关于FCN的学习报告。 第二个学习报告写的比较早,关于代码的部分可以选择性地看,因为很多代码已经大改了。 第一个报告是根据这份代码写成的,更具参考性