Bezpieczeństwo stron www – blokowanie robotów – lista 1800 botów


W dzisiejszych czasach roboty internetowe codziennie odwiedzają strony. Ponad połowa ruchu na serwerach www pochodzi od robotów, a nie od prawdziwych użytkowników. Przeglądając logi serwera można napotkać bardzo wiele różnorodnych robotów, które wykonują czynności które mogą nam się podobać lub nie: skanują strony w poszukiwaniu adresów e-mail, szukają luk w zabezpieczeniach, zbierają dane o naszej stronie i wysyłają do wyszukiwarek internetowych (takim robotem jest na przykład Googlebot). Niektóre roboty kopiują całe strony i umieszczają w Internecie pod innym adresem, co nie jest korzystne dla naszej strony. Pojawia się zatem zapotrzebowanie na blokowanie robotów…

Blokowanie robotów

Bardzo wiele z tych robotów można zablokować. Daje to różnorodne korzyści: mniej spamu, bezpieczniejsza strona internetowa, mniej ukradzionych treści. Spada także ilość danych przesyłanych na hostingu, ponieważ automaty nie zużywają transferu bez żadnych korzyści dla nas.

Istnieje kilka sposobów na blokowanie robotów. Można blokować roboty w pliku robots.txt. Agresywne roboty omijają ten plik i dlatego lepsza jest inna metoda: blokowanie robotów po nazwie agenta na poziomie serwera www. W ten sposób robot, jeśli używa jakiegoś user agenta po prostu zostanie zablokowany i dostanie kod http 403 – dostęp zabroniony. Oczywiście niektóre roboty podszywają się pod wyszukiwarki lub przeglądarki i tu nic zrobić się nie da, jednak wiele robotów używa standardowych nazw użytkownika z Perla czy Javy i łatwo jest je zablokować. Nasz sposób blokowania jest w miarę uniwersalny i działa na większości hostingów udostępniających plik .htaccess. W rzeczywistości są to tylko 3 długie linie w htaccess, które wykonują blokowanie robotów, co nie obciąża mocno serwera.

W jaki sposób przygotowaliśmy listę robotów do zablokowania?

Z naszych doświadczeń wynika, że sensowna metodą blokowania robotów po user agent jest lista niechcianych robotów. W związku z tym procesowaliśmy regexem archiwum logów różnorodnych stron internatowych z okresu ostatnich 10 lat. Następnie napisaliśmy specjalny program, który dobrał części wspólne nazw robotów tak, żeby plik był jak najkrótszy. Efektem tego jest lista ponad 1800 robotów których nie chcemy. Lista ta jest w ciągłym użyciu na różnych serwerach internatowych i na bieżąco poprawiana. Z listy zostały usunięte wyszukiwarki internetowe takie jak Google, Bing, Yandex, Yahoo a także serwisy społecznościowe takie jak Twitter i Facebook, które uznajemy za pożyteczne. Wszystkie pozostałe roboty, przychodzące w ciągu ostatnich 10 lat są na liście. Ponieważ serwer www nie pozwala na bardzo długie linie w .htaccess, plik został podzielony na fragmenty zawierające po około 500 nazw robotów. Do tej pory zainstalowaliśmy ten plik na wielu hostingach, nie mieliśmy problemów z tym plikiem na żadnym z nich.

Czy blokowanie robotów ma wpływ na pozycjonowanie?

Blokowanie robotów w tej formie nie ma wpływu na pozycjonowanie. Mamy strony na pozycji numer 1, które używają takiego blokowania robotów. Google używa swojego robota a także czasem podszywa się pod popularne przeglądarki, takie jak Chrome czy Firefox, aby wykryć cloaking.  Wszystkie te roboty nie są blokowane przez nasz plik. Używanie przez Google innych nazw robotów naszym zdaniem nie ma miejsca.

Jak zablokować roboty na stronie www?

Na końcu tego artykułu znajduje się tekst, który należy wkleić na początku pliku .htaccess naszej strony internetowej. Nie należy niczego zmieniać w tym tekście, każda spacja i każdy enter jest ważny. Nie można też dodawać komentarzy. Zaznaczamy cały tekst poniżej i wklejamy go na początek naszego pliku .htaccess, za komendą „Rewrite Engine On” (jeżeli mamy taką komendę). Dodatkowa uwaga: w pliku musi być spacja na początku linii przed [NC,OR] oraz [NC]. Czasami może zdarzyć się, że podczas wklejania ta spacja zniknie, bez tej spacji blokowanie robotów nie biędzie działać, chociaż serwer nie zgłosi żadnego błędu.

Jak sprawdzić, czy zainstalowaliśmy listę robotów do zablokowania poprawnie?

Na liście do zablokowania jest program Xenu Link Sleuth. Po wgraniu pliku .htaccess na serwer Xenu nie powinien móc wczytać naszej strony, pojawia się „forbidden request”. Jeśli tak jest, to blokowanie robotów działa prawidłowo.

Prawa materialne do listy blokowanie robotów

Poniższą listę i sposób na blokowanie robotów udostępniamy całkowicie za darmo, mając nadzieję, że dzięki niej w Internecie będzie mniej spamu. Jeżeli lista działa,  korzystasz z listy i jesteś zadowolony, polub lub udostępnij naszą stronę na Facebooku lub Twitterze 🙂

Reklamacje 🙂

Jeżeli blokowanie robotów nie działa poprawnie, napisz nam o tym w komentarzach. Dziękujemy za wszelkie sugestie związane z listą. Okresowo będzie pojawiać się tutaj nowa, aktualna wersja. Będzie to uzależnione od ilości wypuszczanych spambotów.  Informacje o zmianach na liście będą umieszczane na Twitterze, warto nas obserwować.

Aktualizacja grudzień 2017

Nowa wersja – odblokowane niektóre wersje przeglądarki Opera. One mają „Edition” w user agent!

#bad bots start
#programmed by tab-studio.com public  version 2017.12
#1 new rule every 500 entries
RewriteCond %{HTTP_USER_AGENT} \
12soso|\
192\.comagent|\
1noonbot|\
1on1searchbot|\
3de\_search2|\
3d\_search|\
3g\ bot|\
3gse|\
50\.nu|\
a1\ sitemap\ generator|\
a1\ website\ download|\
a6\-indexer|\
aasp|\
abachobot|\
abonti|\
abotemailsearch|\
aboundex|\
aboutusbot|\
accmonitor\ compliance\ server|\
accoon|\
achulkov\.net\ page\ walker|\
acme\.spider|\
acoonbot|\
acquia\-crawler|\
activetouristbot|\
ad\ muncher|\
adamm\ bot|\
adbeat\_bot|\
adminshop\.com|\
advanced\ email\ extractor|\
aesop\_com\_spiderman|\
aespider|\
af\ knowledge\ now\ verity\ spider|\
aggregator:vocus|\
ah\-ha\.com\ crawler|\
ahrefs|\
aibot|\
aidu|\
aihitbot|\
aipbot|\
aisiid|\
aitcsrobot/1\.1|\
ajsitemap|\
akamai\-sitesnapshot|\
alexawebsearchplatform|\
alexfdownload|\
alexibot|\
alkalinebot|\
all\ acronyms\ bot|\
alpha\ search\ agent|\
amerla\ search\ bot|\
amfibibot|\
ampmppc\.com|\
amznkassocbot|\
anemone|\
anonymous|\
anotherbot|\
answerbot|\
answerbus|\
answerchase\ prove|\
antbot|\
antibot|\
antisantyworm|\
antro\.net|\
aonde\-spider|\
aport|\
appengine\-google|\
appid\:\ s\~stremor\-crawler\-|\
aqua\_products|\
arabot|\
arachmo|\
arachnophilia|\
archive\.org\_bot|\
aria\ equalizer|\
arianna\.libero\.it|\
arikus\_spider|\
art\-online\.com|\
artavisbot|\
artera|\
asaha\ search\ engine\ turkey|\
ask|\
aspider|\
aspseek|\
asterias|\
astrofind|\
athenusbot|\
atlocalbot|\
atomic\_email\_hunter|\
attach|\
attrakt|\
attributor|\
augurfind|\
auresys|\
autobaron\ crawler|\
autoemailspider|\
autowebdir|\
avsearch\-|\
axfeedsbot|\
axonize\-bot|\
ayna|\
b2w|\
backdoorbot|\
backrub|\
backstreet\ browser|\
backweb|\
baidu|\
bandit|\
batchftp|\
baypup|\
bdfetch|\
becomebot|\
becomejpbot|\
beetlebot|\
bender|\
besserscheitern\-crawl|\
betabot|\
big\ brother|\
big\ data|\
bigado\.com|\
bigcliquebot|\
bigfoot|\
biglotron|\
bilbo|\
bilgibetabot|\
bilgibot|\
bintellibot|\
bitlybot|\
bitvouseragent|\
bizbot003|\
bizbot04|\
bizworks\ retriever|\
black\ hole|\
black\.hole|\
blackbird|\
blackmask\.net\ search\ engine|\
blackwidow|\
bladder\ fusion|\
blaiz\-bee|\
blexbot|\
blinkx|\
blitzbot|\
blog\ conversation\ project|\
blogmyway|\
blogpulselive|\
blogrefsbot|\
blogscope|\
blogslive|\
bloobybot|\
blowfish|\
blt|\
bnf\.fr\_bot|\
boaconstrictor|\
boardreader|\
boia\-scan\-agent|\
boia\.org|\
boitho|\
boi\_crawl\_00|\
bookmark\ buddy\ bookmark\ checker|\
bookmark\ search\ tool|\
bosug|\
bot\ apoena|\
botalot|\
botrighthere|\
botswana|\
bottybot|\
bpbot|\
braintime\_search|\
brokenlinkcheck\.com|\
browseremulator|\
browsermob|\
bruinbot|\
bsearchr&d|\
bspider|\
btbot|\
btsearch|\
bubing|\
buddy|\
buibui|\
buildcms\ crawler|\
builtbottough|\
bullseye|\
bumblebee|\
bunnyslippers|\
buscadorclarin|\
buscaplus\ robi|\
butterfly|\
buyhawaiibot|\
buzzbot|\
byindia|\
byspider|\
byteserver|\
bzbot|\
c\ r\ a\ w\ l\ 3\ r|\
cacheblaster|\
caddbot|\
cafi|\
camcrawler|\
camelstampede|\
canon\-webrecord|\
careerbot|\
cataguru|\
catchbot|\
cazoodle|\
ccbot|\
ccgcrawl|\
ccubee|\
cd\-preload|\
ce\-preload|\
cegbfeieh|\
cerberian\ drtrs|\
cert\ figleafbot|\
cfetch|\
cfnetwork|\
chameleon|\
charlotte|\
check&get|\
checkbot|\
checklinks|\
cheesebot|\
chemiede\-nodebot|\
cherrypicker|\
chilkat|\
chinaclaw|\
cipinetbot|\
cis455crawler|\
citeseerxbot|\
cizilla|\
clariabot|\
climate\ ark|\
climateark\ spider|\
clshttp|\
clushbot|\
coast\ scan\ engine|\
coast\ webmaster\ pro|\
coccoc|\
collapsarweb|\
collector|\
colocrossing|\
combine|\
connectsearch|\
conpilot|\
contentsmartz|\
contextad\ bot|\
contype|\
cookienet|\
coolbot|\
coolcheck|\
copernic|\
copier|\
copyrightcheck|\
core\-project|\
cosmos|\
covario\-ids|\
cowbot\-|\
cowdog\ bot|\
crabbybot|\
craftbot\@yahoo\.com|\
crawler\.kpricorn\.org|\
crawler43\.ejupiter\.com|\
crawler4j|\
crawler@|\
crawler\_for\_infomine|\
crawly|\
crawl\_application|\
creativecommons|\
crescent|\
cs\-crawler|\
cse\ html\ validator|\
cshttpclient|\
cuasarbot|\
culsearch|\
curl|\
custo|\
cvaulev|\
cyberdog|\
cybernavi\_webget|\
cyberpatrol\ sitecat\ webbot|\
cyberspyder|\
cydralspider|\
d1garabicengine|\
datacha0s|\
datafountains|\
dataparksearch|\
dataprovider\.com|\
datascape\ robot|\
dataspearspiderbot|\
dataspider|\
dattatec\.com|\
daumoa|\
dblbot|\
dcpbot|\
declumbot|\
deepindex|\
deepnet\ crawler|\
deeptrawl|\
dejan|\
del\.icio\.us\-thumbnails|\
deltascan|\
delvubot|\
der\ gro§e\ bildersauger|\
der\ große\ bildersauger|\
deusu|\
dfs\-fetch|\
diagem|\
diamond|\
dibot|\
didaxusbot|\
digext|\
digger|\
digi\-rssbot|\
digitalarchivesbot|\
digout4u|\
diibot|\
dillo|\
dir\_snatch\.exe|\
disco|\
distilled\-reputation\-monitor|\
djangotraineebot|\
dkimrepbot|\
dmoz\ downloader|\
docomo|\
dof\-verify|\
domaincrawler|\
domainscan|\
domainwatcher\ bot|\
dotbot|\
dotspotsbot|\
dow\ jones\ searchbot|\
download|\
doy|\
dragonfly|\
drip|\
drone|\
dtaagent|\
dtsearchspider|\
dumbot|\
dwaar|\
dxseeker|\
e\-societyrobot|\
eah|\
earth\ platform\ indexer|\
earth\ science\ educator\ \ robot|\
easydl|\
ebingbong|\
ec2linkfinder|\
ecairn\-grabber|\
ecatch|\
echoosebot|\
edisterbot|\
edugovsearch|\
egothor|\
eidetica\.com|\
eirgrabber|\
elblindo\ the\ blind\ bot|\
elisabot|\
ellerdalebot|\
email\ exractor|\
emailcollector|\
emailleach|\
emailsiphon|\
emailwolf|\
emeraldshield|\
empas\_robot|\
enabot|\
endeca|\
enigmabot|\
enswer\ neuro\ bot|\
enter\ user\-agent|\
entitycubebot|\
erocrawler|\
estylesearch|\
esyndicat\ bot|\
eurosoft\-bot|\
evaal|\
eventware|\
everest\-vulcan\ inc\.|\
exabot|\
exactsearch|\
exactseek|\
exooba|\
exploder|\
express\ webpictures|\
extractor|\
eyenetie|\
ez\-robot|\
ezooms|\
f\-bot\ test\ pilot|\
factbot|\
fairad\ client|\
falcon|\
fast\ data\ search\ document\ retriever|\
fast\ esp|\
fast\-search\-engine|\
fastbot\ crawler|\
fastbot\.de\ crawler|\
fatbot|\
favcollector|\
faviconizer|\
favorites\ sweeper|\
fdm|\
fdse\ robot|\
fedcontractorbot|\
fembot|\
fetch\ api\ request|\
fetch\_ici|\
fgcrawler|\
filangy|\
filehound|\
findanisp\.com\_isp\_finder|\
findlinks|\
findweb|\
firebat|\
firstgov\.gov\ search|\
flaming\ attackbot|\
flamingo\_searchengine|\
flashcapture|\
flashget|\
flickysearchbot|\
fluffy\ the\ spider|\
flunky|\
focused\_crawler|\
followsite|\
foobot|\
fooooo\_web\_video\_crawl|\
fopper|\
formulafinderbot|\
forschungsportal|\
francis|\
freewebmonitoring\ sitechecker|\
freshcrawler|\
freshdownload|\
freshlinks\.exe|\
friendfeedbot|\
frodo\.at|\
froggle|\
frontpage|\
froola\ bot|\
fr\_crawler|\
fu\-nbi|\
full\_breadth\_crawler|\
funnelback|\
furlbot|\
g10\-bot|\
gaisbot|\
galaxybot|\
gazz|\
gbplugin|\
generate\_infomine\_category\_classifiers|\
genevabot|\
geniebot|\
genieo|\
geomaxenginebot|\
geometabot|\
geonabot|\
geovisu|\
germcrawler\ |\
gethtmlcontents|\
getleft|\
getright|\
getsmart|\
geturl\.rexx|\
getweb!|\
giant|\
gigablastopensource|\
gigabot|\
girafabot|\
gleamebot|\
gnome\-vfs|\
go!zilla|\
go\-ahead\-got\-it|\
go\-http\-client|\
goforit\.com|\
goforitbot|\
gold\ crawler|\
goldfire\ server|\
golem|\
goodjelly|\
gordon\-college\-google\-mini|\
goroam|\
goseebot|\
gotit|\
govbot|\
gpu\ p2p\ crawler|\
grabber|\
grabnet|\
grafula|\
grapefx|\
grapeshot|\
grbot|\
greenyogi|\
gromit|\
grub|\
gsa|\
gslfbot|\
gulliver|\
gulperbot|\
gurujibot|\
gvc\ business\ crawler|\
gvc\ crawler|\
gvc\ search\ bot|\
gvc\ web\ crawler|\
gvc\ weblink\ crawler|\
gvc\ world\ links|\
gvcbot\.com|\
happyfunbot|\
harvest|\
hatena\ antenna|\
hawler|\
hcat|\
hclsreport\-crawler|\
hd\ nutch\ agent|\
header\_test\_client|\
healia\
 [NC,OR]
#500 new rule
RewriteCond %{HTTP_USER_AGENT} \
helix|\
here\ will\ be\ link\ to\ crawler\ site|\
heritrix|\
hiscan|\
hisoftware\ accmonitor\ server|\
hisoftware\ accverify|\
hitcrawler|\
hivabot|\
hloader|\
hmsebot|\
hmview|\
hoge|\
holmes|\
homepagesearch|\
hooblybot\-image|\
hoowwwer|\
hostcrawler|\
hsft\ \\-\ link\ scanner|\
hsft\ \\-\ lvu\ scanner|\
hslide|\
ht://check|\
htdig|\
html\ link\ validator|\
htmlparser|\
httplib|\
httrack|\
huaweisymantecspider|\
hul\-wax|\
humanlinks|\
hyperestraier|\
hyperix|\
iaarchiver\-|\
ia\_archiver|\
ibuena|\
icab|\
icds\-ingestion|\
ichiro|\
icopyright\ conductor|\
ieautodiscovery|\
iecheck|\
ihwebchecker|\
iiitbot|\
iim\_405|\
ilsebot|\
iltrovatore|\
image\ stripper|\
image\ sucker|\
image\-fetcher|\
imagebot|\
imagefortress|\
imageshereimagesthereimageseverywhere|\
imagevisu|\
imds\_monitor|\
imo\-google\-robot\-intelink|\
inagist\.com\ url\ crawler|\
indexer|\
industry\ cortex\ webcrawler|\
indy\ library|\
indylabs\_marius|\
inelabot|\
inet32\ ctrl|\
inetbot|\
info\ seeker|\
infolink|\
infomine|\
infonavirobot|\
informant|\
infoseek\ sidewinder|\
infotekies|\
infousabot|\
ingrid|\
inktomi|\
insightscollector|\
insightsworksbot|\
inspirebot|\
insumascout|\
intelix|\
intelliseek|\
interget|\
internet\ ninja|\
internet\ radio\ crawler|\
internetlinkagent|\
interseek|\
ioi|\
ip\-web\-crawler\.com|\
ipadd\ bot|\
ipselonbot|\
ips\-agent|\
iria|\
irlbot|\
iron33|\
isara|\
isearch|\
isilox|\
istellabot|\
its\-learning\ crawler|\
iu\_csci\_b659\_class\_crawler|\
ivia|\
jadynave|\
java|\
jbot|\
jemmathetourist|\
jennybot|\
jetbot|\
jetbrains\ omea\ pro|\
jetcar|\
jim|\
jobo|\
jobspider\_ba|\
joc|\
joedog|\
joyscapebot|\
jspyda|\
junut\ bot|\
justview|\
jyxobot|\
k\.s\.bot|\
kakclebot|\
kalooga|\
katatudo\-spider|\
kbeta1|\
keepni\ web\ site\ monitor|\
kenjin\.spider|\
keybot\ translation\-search\-machine|\
keywenbot|\
keyword\ density|\
keyword\.density|\
kinjabot|\
kitenga\-crawler\-bot|\
kiwistatus|\
kmbot\-|\
kmccrew\ bot\ search|\
knight|\
knowitall|\
knowledge\ engine|\
knowledge\.com|\
koepabot|\
koninklijke|\
korniki|\
krowler|\
ksbot|\
kuloko\-bot|\
kulturarw3|\
kummhttp|\
kurzor|\
kyluka\ crawl|\
l\.webis|\
labhoo|\
labourunions411|\
lachesis|\
lament|\
lamerexterminator|\
lapozzbot|\
larbin|\
lbot|\
leaptag|\
leechftp|\
leechget|\
letscrawl\.com|\
lexibot|\
lexxebot|\
lftp|\
libcrawl|\
libiviacore|\
libw|\
likse|\
linguee\ bot|\
link\ checker|\
link\ validator|\
linkalarm|\
linkbot|\
linkcheck\ by\ siteimprove\.com|\
linkcheck\ scanner|\
linkchecker|\
linkdex\.com|\
linkextractorpro|\
linklint|\
linklooker|\
linkman|\
links\ sql|\
linkscan|\
linksmanager\.com\_bot|\
linksweeper|\
linkwalker|\
link\_checker|\
litefinder|\
litlrbot|\
little\ grabber\ at\ skanktale\.com|\
livelapbot|\
lm\ harvester|\
lmqueuebot|\
lnspiderguy|\
loadtimebot|\
localcombot|\
locust|\
lolongbot|\
lookbot|\
lsearch|\
lssbot|\
lt\ scotland\ checklink|\
ltx71.com|\
lwp|\
lycos\_spider|\
lydia\ entity\ spider|\
lynnbot|\
lytranslate|\
mag\-net|\
magnet|\
magpie\-crawler|\
magus\ bot|\
mail\.ru|\
mainseek\_bot|\
mammoth|\
map\ robot|\
markwatch|\
masagool|\
masidani\_bot\_|\
mass\ downloader|\
mata\ hari|\
mata\.hari|\
matentzn\ at\ cs\ dot\ man\ dot\ ac\ dot\ uk|\
maxamine\.com\-\-robot|\
maxamine\.com\-robot|\
maxomobot|\
mcbot|\
medrabbit|\
megite|\
memacbot|\
memo|\
mendeleybot|\
mercator\-|\
mercuryboard\_user\_agent\_sql\_injection\.nasl|\
metacarta|\
metaeuro\ web\ search|\
metager2|\
metagloss|\
metal\ crawler|\
metaquerier|\
metaspider|\
metaspinner|\
metauri|\
mfcrawler|\
mfhttpscan|\
midown\ tool|\
miixpc|\
mini\-robot|\
minibot|\
minirank|\
mirror|\
missigua\ locator|\
mister\ pix|\
mister\.pix|\
miva|\
mj12bot|\
mnogosearch|\
moduna\.com|\
mod\_accessibility|\
moget|\
mojeekbot|\
monkeycrawl|\
moses|\
mowserbot|\
mqbot|\
mse360|\
msindianwebcrawl|\
msmobot|\
msnptc|\
msrbot|\
mt\-soft|\
multitext|\
my\-heritrix\-crawler|\
myapp|\
mycompanybot|\
mycrawler|\
myengines\-us\-bot|\
myfamilybot|\
myra|\
my\_little\_searchengine\_project|\
nabot|\
najdi\.si|\
nambu|\
nameprotect|\
nasa\ search|\
natchcvs|\
natweb\-bad\-link\-mailer|\
naver|\
navroad|\
nearsite|\
nec\-meshexplorer|\
neosciocrawler|\
nerdbynature\.bot|\
nerdybot|\
nerima\-crawl-|\
nessus|\
nestreader|\
net\ vampire|\
net::trackback|\
netants|\
netcarta\ cyberpilot\ pro|\
netcraft|\
netexperts|\
netid\.com\ bot|\
netmechanic|\
netprospector|\
netresearchserver|\
netseer|\
netshift=|\
netsongbot|\
netsparker|\
netspider|\
netsrcherp|\
netzip|\
newmedhunt|\
news\ bot|\
newsgatherer|\
newsgroupreporter|\
newstrovebot|\
news\_search\_app|\
nextgensearchbot|\
nextthing\.org|\
nicebot|\
nicerspro|\
niki\-bot|\
nimblecrawler|\
nimbus\-1|\
ninetowns|\
ninja|\
njuicebot|\
nlese|\
nogate|\
norbert\ the\ spider|\
noteworthybot|\
npbot|\
nrcan\ intranet\ crawler|\
nsdl\_search\_bot|\
nuggetize\.com\ bot|\
nusearch\ spider|\
nutch|\
nu\_tch|\
nwspider|\
nymesis|\
nys\-crawler|\
objectssearch|\
obot|\
obvius\ external\ linkcheck|\
ocelli|\
octopus|\
odp\ entries\ t\_st|\
oegp|\
offline\ navigator|\
offline\.explorer|\
ogspider|\
omiexplorer\_bot|\
omniexplorer|\
omnifind|\
omniweb|\
onetszukaj|\
online\ link\ validator|\
oozbot|\
openbot|\
openfind|\
openintelligencedata|\
openisearch|\
openlink\ virtuoso\ rdf\ crawler|\
opensearchserver\_bot|\
opidig|\
optidiscover|\
oracle\ secure\ enterprise\ search|\
oracle\ ultra\ search|\
orangebot|\
orisbot|\
ornl\_crawler|\
ornl\_mercury|\
osis\-project\.jp|\
oso|\
outfoxbot|\
outfoxmelonbot|\
owler\-bot|\
owsbot|\
ozelot|\
p3p\ client|\
pagebiteshyperbot|\
pagebull|\
pagedown|\
pagefetcher|\
pagegrabber|\
pagepeeker|\
pagerank\ monitor|\
page\_verifier|\
pamsnbot\.htm|\
panopy\ bot|\
panscient\.com|\
pansophica|\
papa\ foto|\
paperlibot|\
parasite|\
parsijoo|\
pathtraq|\
pattern|\
patwebbot|\
pavuk|\
paxleframework|\
pbbot|\
pcbrowser|\
pcore\-http|\
pd\-crawler|\
penthesila|\
perform\_crawl|\
perman|\
personal\ ultimate\ crawler|\
php\ version\ tracker|\
phpcrawl|\
phpdig|\
picosearch|\
pieno\ robot|\
pipbot|\
pipeliner|\
pita|\
pixfinder|\
piyushbot|\
planetwork\ bot\ search|\
plucker|\
plukkie|\
plumtree|\
pockey|\
pocohttp|\
pogodak\.ba|\
pogodak\.co\.yu|\
poirot|\
polybot|\
pompos|\
poodle\ predictor|\
popscreenbot|\
postpost|\
privacyfinder|\
projectwf\-java\-test\-crawler|\
propowerbot|\
prowebwalker|\
proxem\ websearch|\
proximic|\
proxy\ crawler|\
psbot|\
pss\-bot|\
psycheclone|\
pub\-crawler|\
pucl|\
pulsebot|\
pump|\
pwebot|\
python|\
qeavis\ agent|\
qfkbot|\
qualidade|\
qualidator\.com\ bot|\
quepasacreep|\
queryn\ metasearch|\
queryn\.metasearch|\
quest\.durato|\
quintura\-crw|\
qunarbot|\
qwantify|\
qweerybot|\
qweery\_robot\.txt\_checkbot|\
r2ibot|\
r6\_commentreader|\
r6\_feedfetcher|\
r6\_votereader|\
rabot|\
radian6|\
radiation\ retriever|\
rampybot|\
rankivabot|\
rankur|\
rational\ sitecheck|\
rcstartbot|\
realdownload|\
reaper|\
rebi\-shoveler|\
recorder|\
redbot|\
redcarpet|\
reget|\
repomonkey|\
research\ robot|\
riddler|\
riight|\
risenetbot|\
riverglassscanner\
 [NC,OR]

#1000 new rule
RewriteCond %{HTTP_USER_AGENT} \
robopal|\
robosourcer|\
robotek|\
robozilla|\
roger|\
rome\ client|\
rondello|\
rotondo|\
roverbot|\
rpt\-httpclient|\
rtgibot|\
rufusbot|\
runnk\ online\ rss\ reader|\
runnk\ rss\ aggregator|\
s2bot|\
safaribookmarkchecker|\
safednsbot|\
safetynet\ robot|\
saladspoon|\
sapienti|\
sapphireweb|\
sbider|\
sbl\-bot|\
scfcrawler|\
scich|\
scientificcommons\.org|\
scollspider|\
scooperbot|\
scooter|\
scoutjet|\
scrapebox|\
scrapy|\
scrawltest|\
screaming\ frog|\
scrubby|\
scspider|\
scumbot|\
search\ publisher|\
search\ x\-bot|\
search\-channel|\
search\-engine\-studio|\
search\.kumkie\.com|\
search\.updated\.com|\
search\.usgs\.gov|\
searcharoo\.net|\
searchblox|\
searchbot|\
searchengine|\
searchhippo\.com|\
searchit\-bot|\
searchmarking|\
searchmarks|\
searchmee!|\
searchmee\_v|\
searchmining|\
searchnowbot|\
searchpreview|\
searchspider\.com|\
searqubot|\
seb\ spider|\
seekbot|\
seeker\.lookseek\.com|\
seeqbot|\
seeqpod\-vertical\-crawler|\
selflinkchecker|\
semager|\
semanticdiscovery|\
semantifire|\
semisearch|\
semrushbot|\
seoengworldbot|\
seokicks|\
seznambot|\
shablastbot|\
shadowwebanalyzer|\
shareaza|\
shelob|\
sherlock|\
shim\-crawler|\
shopsalad|\
shopwiki|\
showlinks|\
showyoubot|\
siclab|\
silk|\
simplepie|\
siphon|\
sitebot|\
sitecheck|\
sitefinder|\
siteguardbot|\
siteorbiter|\
sitesnagger|\
sitesucker|\
sitesweeper|\
sitexpert|\
skimbot|\
skimwordsbot|\
skreemrbot|\
skywalker|\
sleipnir|\
slow\-crawler|\
slysearch|\
smart\-crawler|\
smartdownload|\
smarte\ bot|\
smartwit\.com|\
snake|\
snap\.com\ beta\ crawler|\
snapbot|\
snappreviewbot|\
snappy|\
snookit|\
snooper|\
snoopy|\
societyrobot|\
socscibot|\
soft411\ directory|\
sogou|\
sohu\ agent|\
sohu\-search|\
sokitomi\ crawl|\
solbot|\
sondeur|\
sootle|\
sosospider|\
space\ bison|\
space\ fung|\
spacebison|\
spankbot|\
spanner|\
spatineo\ monitor\ controller|\
spatineo\ serval\ controller|\
spatineo\ serval\ getmapbot|\
special\_archiver|\
speedy|\
sphere\ scout|\
sphider|\
spider\.terranautic\.net|\
spiderengine|\
spiderku|\
spiderman|\
spinn3r|\
spinne|\
sportcrew\-bot|\
sproose|\
spyder3\.microsys\.com|\
sq\ webscanner|\
sqlmap|\
squid\-prefetch|\
squidclamav\_redirector|\
sqworm|\
srevbot|\
sslbot|\
ssm\ agent|\
stackrambler|\
stardownloader|\
statbot|\
statcrawler|\
statedept\-crawler|\
steeler|\
stegmann\-bot|\
stero|\
stripper|\
stumbler|\
suchclip|\
sucker|\
sumeetbot|\
sumitbot|\
summizebot|\
summizefeedreader|\
sunrise\ xp|\
superbot|\
superhttp|\
superlumin\ downloader|\
superpagesbot|\
supremesearch\.net|\
supybot|\
surdotlybot|\
surf|\
surveybot|\
suzuran|\
swebot|\
swish\-e|\
sygolbot|\
synapticwalker|\
syntryx\ ant\ scout\ chassis\ pheromone|\
systemsearch\-robot|\
szukacz|\
s\~stremor\-crawler|\
t\-h\-u\-n\-d\-e\-r\-s\-t\-o\-n\-e|\
tailrank|\
takeout|\
talkro\ web\-shot|\
tamu\_crawler|\
tapuzbot|\
tarantula|\
targetblaster\.com|\
targetyournews\.com\ bot|\
tausdatabot|\
taxinomiabot|\
teamsoft\ wininet\ component|\
tecomi\ bot|\
teezirbot|\
teleport|\
telesoft|\
teradex\ mapper|\
teragram\_crawler|\
terrawizbot|\
testbot|\
testing\ of\ bot|\
textbot|\
thatrobotsite\.com|\
the\ dyslexalizer|\
the\ intraformant|\
the\.intraformant|\
thenomad|\
theophrastus|\
theusefulbot|\
thumbbot|\
thumbnail\.cz\ robot|\
thumbshots\-de\-bot|\
tigerbot|\
tighttwatbot|\
tineye|\
titan|\
to\-dress\_ru\_bot\_|\
to\-night\-bot|\
tocrawl|\
topicalizer|\
topicblogs|\
toplistbot|\
topserver\ php|\
topyx\-crawler|\
touche|\
tourlentascanner|\
tpsystem|\
traazi|\
transgenikbot|\
travel\-search|\
travelbot|\
travellazerbot|\
treezy|\
trendiction|\
trex|\
tridentspider|\
trovator|\
true\_robot|\
tscholarsbot|\
tsm\ translation\-search\-machine|\
tswebbot|\
tulipchain|\
turingos|\
turnitinbot|\
tutorgigbot|\
tweetedtimes\ bot|\
tweetmemebot|\
twengabot|\
twice|\
twikle|\
twinuffbot|\
twisted\ pagegetter|\
twitturls|\
twitturly|\
tygobot|\
tygoprowler|\
typhoeus|\
u\.s\.\ government\ printing\ office|\
uberbot|\
ucb\-nutch|\
udmsearch|\
ufam\-crawler\-|\
ultraseek|\
unchaos|\
unisterbot|\
unidentified|\
unitek\ uniengine|\
universalsearch|\
unwindfetchor|\
uoftdb\_experiment|\
updated|\
url\ control|\
url\-checker|\
urlappendbot|\
urlblaze|\
urlchecker|\
urlck|\
urldispatcher|\
urlspiderpro|\
urly\ warning|\
urly\.warning|\
url\_gather|\
usaf\ afkn\ k2spider|\
usasearch|\
uss\-cosmix|\
usyd\-nlp\-spider|\
vacobot|\
vacuum|\
vadixbot|\
vagabondo|\
validator|\
valkyrie|\
vbseo|\
vci\ webviewer\ vci\ webviewer\ win32|\
verbstarbot|\
vericitecrawler|\
verifactrola|\
verity\-url\-gateway|\
vermut|\
versus\ crawler|\
versus\.integis\.ch|\
viasarchivinginformation\.html|\
vipr|\
virus\-detector|\
virus\_detector|\
visbot|\
vishal\ for\ clia|\
visweb|\
vital\ search'n\ urchin|\
vlad|\
vlsearch|\
voilabot|\
vmbot|\
vocusbot|\
voideye|\
voil|\
vortex|\
voyager|\
vspider|\
w3c\-webcon|\
w3c\_unicorn|\
w3search|\
wacbot|\
wanadoo|\
wastrix|\
water\ conserve\ portal|\
water\ conserve\ spider|\
watzbot|\
wauuu|\
wavefire|\
waypath|\
wazzup|\
wbdbot|\
web\ ceo\ online\ robot|\
web\ crawler|\
web\ downloader|\
web\ image\ collector|\
web\ link\ validator|\
web\ magnet|\
web\ site\ downloader|\
web\ sucker|\
web\-agent|\
web\-sniffer|\
web\.image\.collector|\
webaltbot|\
webauto|\
webbot|\
webbul\-bot|\
webcapture|\
webcheck|\
webclipping\.com|\
webcollage|\
webcopier|\
webcopy|\
webcorp|\
webcrawl\.net|\
webcrawler|\
webdatacentrebot|\
webdownloader\ for\ x|\
webdup|\
webemailextrac|\
webenhancer|\
webfetch|\
webgather|\
webgo\ is|\
webgobbler|\
webimages|\
webinator\-search2|\
webinator\-wbi|\
webindex|\
weblayers|\
webleacher|\
weblexbot|\
weblinker|\
weblyzard|\
webmastercoffee|\
webmasterworld\ extractor|\
webmasterworldforumbot|\
webminer|\
webmoose|\
webot|\
webpix|\
webreaper|\
webripper|\
websauger|\
webscan|\
websearchbench|\
website|\
webspear|\
websphinx|\
webspider|\
webster|\
webstripper|\
webtrafficexpress|\
webtrends\ link\ analyzer|\
webvac|\
webwalk|\
webwasher|\
webwatch|\
webwhacker|\
webxm|\
webzip|\
weddings\.info|\
wenbin|\
wep\ search|\
wepa|\
werelatebot|\
wget|\
whacker|\
whirlpool\ web\ engine|\
whowhere\ robot|\
widow|\
wikiabot|\
wikio|\
wikiwix\-bot\-|\
winhttp|\
wire|\
wisebot|\
wisenutbot|\
wish\-la|\
wish\-project|\
wisponbot|\
wmcai\-robot|\
wminer|\
wmsbot|\
woriobot|\
worldshop|\
worqmada|\
wotbox|\
wume\_crawler|\
www\ collector|\
www\-collector\-e|\
www\-mechanize|\
wwwoffle|\
wwwrobot|\
wwwster|\
wwwwanderer|\
wwwxref|\
wysigot|\
x\-clawler|\
x\-crawler|\
xaldon|\
xenu|\
xerka\ metabot|\
xerka\ webbot|\
xget|\
xirq|\
xmarksfetch|\
xqrobot|\
y!j|\
yacy\.net|\
yacybot|\
yanga\ worldsearch\ bot|\
yarienavoir\.net|\
yasaklibot|\
yats\ crawler|\
ybot|\
yebolbot|\
yellowjacket|\
yeti|\
yolinkbot|\
yooglifetchagent|\
yoono|\
yottacars\_bot|\
yourls|\
z\-add\ link\ checker|\
zagrebin|\
zao|\
zedzo\.validate|\
zermelo|\
zeus|\
zibber\-v|\
zimeno|\
zing-bottabot|\
zipppbot|\
zongbot|\
zoomspider|\
zotag\ search|\
zsebot|\
zuibot|\
zyborg|\
zyte\
 [NC]
RewriteRule .* - [F]
#bad bots end

Przypominamy o polubieniu nas na Facebooku lub Tweeterze!