Abstract—In modern times image sharing in social networks
have seen exponential increase. The instant sharing capability
also emphasises a need for better privacy systems. Unfortunately
for common users; privacy protection for image sharing has
been a challenging task due to the burden of legacy privacy
assigning settings in images sharing sites. This necessitates a need
for automatic privacy assigning on images. With deep features
extracted from images based on object detection; privacy can
be assigned to images. Custom privacy recommendations can be
made on images before sharing them on social networks.
Keywords—CNN, privacy, object detection.
I. INTRODUCTION

In modern times sharing images has become an integral
part of our daily life. The advent of smartphones and digital
cameras have resulted in exponential increase in images
captured. The instantly connected devices also facilitated
in quick sharing of these images. Unfortunately with large
volumes of instant sharing the privacy is often threatened;
while sharing such special moments. And once something,
such as a photo, is posted online, it becomes a permanent
record, which may be used for purposes we never expect.
It can lead to unwanted disclosure and privacy violations.
Malicious attackers can take advantage of these unnecessary
leaks to launch context-aware attacks or even impersonation
attacks, as seen lately by the huge number of cases of privacy
abuse and unwarranted access.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Cloud photo services are also not the best while it comes to
privacy; they have hidden policies that gives large organiztion
access to very privacy sensitive images. Additionally a
security attack can compromise the users privacy. Current
social networking sites or cloud services do not assist users
in making privacy decisions for images that they share online.

In particular, privacy of online images is inherently a
subjective matter, dependent on the image owners privacy
attitude, awareness, and the overall context wherein the image
is to be posted. This creates a very unique challenge where
in we need to factor in so many paramters to determine
the cause of a particular privacy setting. These issues are
kept in centre and an attempt is made to learn more about
the specific parameters that define privacy well. And also to
effectively retrieve them from images. Furthermore providing
some mathematical relevance to the privacy predictions that
are being made.

II. METHODS AND DISCUSSIONS

Ashwini et. al1 investigates various methods for image
privacy classification which are both visual and textual
features. And determine how they fare in image privacy
classification. From the various identified features it combines
smallest set of features and measure how well they predict
privacy. The features range from basic to sophisticated
features. The features include Scale Invariant Feature
Tansform (SIFT), User tags, EDGE Direction Coherence,
Sentiment analysis and use of RGB maps.

Here SIFT is a visual feature transform which extracts and
creates a bag of visual words. The bag of visual words vector
is a visual vocabulary. It represent the numbers of orginal
SIFT feature vectors that are mapped onto corresponding
visual words.

Sentiment Analysis is used where a visual sentiment
ontology which is used to detect sentiments. Its build on
psychological theories and web mining carrying concepts
called Adjective Noun Pairs(ANP). Some examples of ANP
are Bad Food , Cute Dog , Beautiful Sky. For instance from
a series of 1200 defined ANP concepts present in a particular
image; privacy can be predicted.

Faces are likely to be considered private. Hence identifying
faces helps to predict privacy. This will be an effective feature
and face detection usually use convolutional neural networks.
The EDGE detection coherence is used to detect views ,
landscapes , sky , sceneries e.t.c. These are considered to be
more public. EDGE Direction Coherence, which uses EDGE
Direction Histograms to define the texture in an image. The
feature stores the number of edge pixels that are coherent and
non- coherent. A division of coherence and non-coherence
is established by a threshold on the size of the connected
component edges. This eventually helps in effective detection
of sceneries.

In online file sharing sites; the use of annotations or
tags in the image is common. These tags are defined by
users themselves which is a very valuable asset for privacy
prediction. But the difficulty lies in obtaining them; as they
are not gauranteed to be available.

1

The above comparison results prove the superiority of using
TAGS over other techniques. And when considered as a pair;
SIFT and TAGS stood out especially when dataset increases.

The deep convolutional network rose to prominence
following the work of Krishvesky et. al1. They trained a
deep convolutional neural network that classify 1.2 million
images into 1000 distinct classes. The convolutional neural
networks mimicks the biological process of neurons in
learning image features. It facilitated in finding patterns of
images using building blocks of convolution layer , pooling
layer and fully connected layers.

In convolving layer the input is an array of pixels of
fixed resolution. There are kernels or filters with defined
size; the kernel or filters are used in identifying the features
present in various images. The filter performs elementwise
multiplication on the input. Then it slides or convolves around
the input image such that it is fully covered by the filter.The
parameter ‘strides’ define the distance between centers of
neighbouring kernels. There can be overlapping of the kernels.

This is followed by applying nonlinearity functions; here
a ReLU function is used. Then in pooling by method such
as max pooling we reduce the size of the output from that
layer. The process of convolution , pooling are repeated
further through the layers untill we reach the fully connected
layer. The fully connected layer has 4096 neurons. As name
suggests they are all connected to each other each output a
probability score. This score is eventually used to classify
into the required classes in the last layer.

It also used techniques of dropout , data segmentation that
prevented overfitting. This work was seminal in the field;
which to lead to pathbreaking contribution in computer vision.
The architecture was backbone fot future convolutional neural
networks.

The idea of image detection goes farther than image classi-
fication. In detection it requires localization of objects within
in an image. Ross et. al3 discuss about region proposing and
how objects can be effectively detected in an image. The object
detection consist of three modules :

Category independent region proposal

Large CNN to extract features

Multiclass SVM for classification

The architecture has input images of predefined resolution. It
extracts around 2000 bottom up region proposals. The features
are forward propogated and a mean subtracted RGB image
through 5 convolutional and 2 fully connected CNN. Thus
acheiving very high detection accuracy than past ImageNet
models.

The detection used concepts such as ground truth box
, intersection over union. Intersection over Union was the
evaluation metric used to measure the accuracy of an object
detector on a particular dataset. It is an evaluation metric;
where for a bounding box was predicted during detection.
Now ground-truth bounding boxes were the hand labeled
bounding boxes from the testing set that specify where in the
image object is present. The system accuracy was measured
based on overlapping of ground and predicted boxes to total
area of union by both.

The object detection acheived by 3 was considerably
slower. The factor that slowed down RCNN was it had a
constant number of 2000 squares. These were selected as
regions always in every single images. Then all these regions
had to be passed through CNN and then classified. This issue
was solved in Fast RCNN. In Fast RCNN there is only a
single forward pass. The proposed object regions convolution
map is used further to detect the objects.

The architecture takes an entire image as the input and with
proposed object regions initially. The image is processed in
whole with several convolutional and max pooling layers to
produce a feature map. Then for each object proposed region
it further has a Region of Interest pooling layer extracting
a fixed length feature vector from the feature map. These
are directed onto a fully connected layer which gives the

2

bounding boxes for various object classes detected.

Fast RCNN resulted in cutting down the training and
testing time when compared to RCNN; almost halving the
time it took for RCNN. Additionally it got better results in
terms of the accuracy of detection and provided many fine
tuning capabilities which can be used to improve the accuracy
across a wide range of datasets.

Transfer learning is a technique described by Dai et. al6
where a system experience is used to train another learning
system. In the case of neural networks; a particular network
is trained on some data. The learning system gains knowledge
from this data, which is compiled as weights of the network.
These weights can be extracted and then transferred to any
other neural network. Instead of training the other neural
network from scratch, the learned features are transfered.

Further fine tuning technique is applied in the sphere of
user training. The pre-trained networks usually are trained on
standard datasets which may not perform well for solving all
problems. Hence we need to deploy the model for further
training. This training will yield changes to the learned
weights and depends on the learning weights.

A more advanced approach is of freezing certain layers
while using others. Here the weights of initial layers of the
model frozen while we retrain only the higher layers. The
initial layers primarily detect a general shape of the object;
while higher layers facilitate in discrimination.

Jun et. al5 deals with privacy alignment of objects
detected in an image. The algorithm discussed in helps to
precisely assign image level level privacy to the object classes
in the images. Here they rely on semantic similarity measure
on initially defining a bag of object classes. These form a
cluster which is useful in privacy alignment.

?(Xi,Xj) = 1000 ?(Xil,Xjl)
1

Here for Xi and Xj are object classes for the lth image. And
if object class similarity is defined as

?(Xil,Xjl)=1;ifXil =Xjl

Thus based on the similarity the clusters of similar objects
are generated and to which the common privacy setting is
applied. The privacy setting can be public , protected and
private. An additional relevance score is defined; which
defines the relevance between privacy setting of object class
to the image privacy setting labeled.

?(Ci,t)= ?(Ci,t)
?(Ci,P)

Here ‘P’ denotes the clusters integrated privacy setting;
while ?(Ci,t) denotes a subset of images in the cluster that
contains object class Ci and has the privacy setting ‘t’. Finally
privacy setting with the largest object-privacy relevance score

is finally selected for the given object class Ci
III. SOLUTION

Manually assigning privacy settings to each image each time
can be cumbersome. To avoid a possible loss of users privacy,
it has become critical to develop automated approaches that
can accurately predict the privacy settings for shared images.

The idea is to automatically detect the privacy-sensitive
objects from the images being shared, recognise their classes,
and identify their privacy settings. Based on the detection
results, our system would be able to warn the image owners
what objects in the images need to be protected. It can also
provide recommended privacy settings in an instance.

IV. CONCLUSION

The idea of automating privacy assigning was surveyed. The
use of deep features in images and deep tags are revealed to be
the best parameters in deduction of privacy in an image. Deep
feature extraction has been done widely using the technique
of Convolutional Neural Networks. Various CNN architectures
from Imagenet to Fast RCNN were considered. The modern
Fast RCNN exhibits state of art result in detection while also
being much faster than other object detection methods sur-
veyed. Coupling the former with a privacy alignment algorithm
surveyed can be used to build an automated privacy assigning
system. The challenge will be to reduce computation overhead
of using CNN’s and improve the prediction accuracy.

Author