Google Search Crawl Result

Crawl ဒီနေ့လုပ်လိုက်တာ result ရလာပြီ။ 1,080,000 result ထဲက 560 ကို crawl လုပ်ပြီး duplicate တွေ စစ်ထုတ်လိုက်တာ 416 site ရှိတယ်။ google ကလည်း result အားလုံးကို မထုတ်ပေးဘူးလေ။ result ပေါင်း 560 က ဂိတ်ဆုံးပဲ။ page rank မြင့်တဲ့ site တွေပဲ အရင်ထွက်တာပေါ့။ ကျွန်တော် သိတဲ့ အချို့ site တွေက 416 ခုထဲမှာ မပါလာတာကို တွေ့ရတယ်။ ဥပမာ။ ။ http://www.myatmon.com/ ၊ http://blog.calmhill.info/ ၊ http://mrdba.info/ စတာတွေ result ထဲမှာ မပါလာဘူး။ ဘာလို့ မပါတာလဲ။ page rank နိမ့်ရတာလဲဆိုတာကို သိချင်ရင်တော့ Beyond PageRank and Third Generation IR ကို သာ ဖတ်ကြည့်ပါ။ ကျွန်တော့် blog ပါတယ်။ www.htootayzar.com စတာတွေ ပါပါတယ်။ နောက်ပြီး sub path တွေ မပါဘူး။ duplicate တွေကို ဖယ်ထုတ်လိုက်တော့ group.ps/aaa စတာတွေ မပါတော့ဘူးပေါ့။

ဟုတ်ပြီ။ ဒါဆိုတော့ ဒီလိုပြန်တွက်ကြည့်မယ်…

560 = 416
1080000 = (1080000*416)/560 = 802285.714285714285714

ထပ်ပြီး တွက်ကြည့်ရအောင်

560 = 144 duplicate

802285 = 206301.857142857142857 duplicate

appro:

802285-206301= 595984

ဒါဆိုရင်တော့ မြန်မာစာ က အသုံးပြုတဲ့ site ၅ သိန်းကျော်ပဲ ရှိတယ်လို့ ဆိုပါတယ်။ အဲဒီအထဲမှာ duplicate ဖြစ်တယ်လို့ မှတ်ပြီး ၅ သိန်းကျော်လို့ ပဲ ယူလိုက်တာပါ။ ဒါကတော့ မှန်းခြေပါ။ သေချာတာကတော့ မြန်မာစာ site ၁၀ သိန်း မရှိသေးဘူး။ content တွေကသာ တော်တော်လေး များနေတယ်လို့ ဆိုရမယ်။

5 responses to “Google Search Crawl Result”

NLS

February 9, 2010 at 5:12 pm

Site အနေနဲ့ဆိုရင် ၁၀ သိန်း ဘယ်လိုမှ မဖြစ်နိုင်ဘူးထင်တယ်။ Page အနေနဲ့ဆိုရင်တော့ ဖြစ်နိုင်မယ်။

Reply
1. saturngod
  
  February 9, 2010 at 5:15 pm
  
  ဟုတ်… ၁၀ သိန်းမရှိပါ။ page ကတော့ ရှိတယ်။ ၁၀ သိန်း လုံးဝမဖြစ်နိုင်ဘူး။ အများဆုံးရှိ ၅ သိန်း ဝန်းကျင်ပဲ။ ဒါက site လုပ်ပြီးတော့ ဘာမှ ဆက်မလုပ်ပဲ ထားတဲ့ site တွေ အပါအဝင်ပေါ့။ ကျွန်တော် မသုံးတော့တဲ့ site တွေတောင် result ထဲမှာ ပါလာနေတုန်းပဲ… :D
  
  Reply
Lionslayer

March 4, 2010 at 10:21 am

It’s not true that there are 100K Myanmar sites out there. I just now crawled myself and some of the results are chinese sites/chinese googlebooks. Some of the chinese fonts before Unicode used Myanmar code points.

The 100K number is the total keywords stored by Google. I.e: If u use “english” keyword 5000 times in ur site, Google will count all of them but will only show one link/few sub-links of ur site as results.

So appr. active Myanmar sites are around 600 and all Myanmar sites will not exceed 20K.

၁၀ သိန်းဆိုတာ ဆိုက်အရေအတွက်မဟုတ်ဘူး။ က ဆိုတဲ့ keyword စာလုံးစုစုပေါင်း ဘယ်လောက်တွေ့ထားတယ်ဆိုတာ ပြတာ။ တကယ် အွန်လိုင်းမှာ active ဖြစ်နေတဲ့ဆိုက်က ၆၀ဝ ဝန်းကျင်လောက်ပဲ ရှိမယ်။

Reply
Bamarlay

May 3, 2010 at 11:06 pm

ဆိုက် ၅ သိန်းတော့ မဖြစ်နိုင်ဘူး ထင်တယ်။ စာမျက်နှာ ၅သိန်းတော့ ဖြစ်နိုင်မယ်။ ဒာပေမယ့် blog တွေဆို individual post page မှာလဲ ပြတယ်။ home page/search page/lable/tag/category အကုန်ရေတာ။ နောက် www နဲ့ non-www ပာ ရေတာဆိုတော့ အရေအတွက်က အဆပောင်း များစွာ တက်သွားပြန်ရော။ တစ်ကယ်တမ်း သေချာ ရေးနေတဲ့ ဆိုက်က ထောင်ဂဏန်း လောက်ပဲ ရှိမယ်ထင်တယ်။

Reply
1. saturngod
  
  May 4, 2010 at 12:29 am
  
  yes, it’s content. It’s around 100 myanmar sites.
  
  Reply

Discover more from Saturngod

Subscribe now to keep reading and get access to the full archive.