I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.

  • 0 Posts
  • 13 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle
  • So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

    On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

    You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.


  • Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

    And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.


  • If you’re running nginx I am using the following:

    if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|MojeekBot|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

    That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

    I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

    AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

    Since these guys run or have run bots that impersonate real browser agents.

    There are various tools online to return prefix/ip lists for an autonomous system number.

    I put both into a single file and include it into my web site config files.

    EDIT: Just to add, keeping on top of this is a full time job!



  • Not sure how it is in the US. But here in the UK there’s two ways a business can export.

    1: They pre-clear the customs duty and include it in the sales total (so it’s like paying sales tax at the checkout, except it’s the pre-cleared duty fees). Then the parcel has a nice duty paid stamp and goes straight through customs (I guess unless customs are suspicious and check into it).

    2: They just charge you the item price with no tax applied. In which case you need to pay local tax and duties applicable once the product arrives. Here it’s a bit different. They will hold it at the local depot and you can either go there and pay + collect, or you can pay online and it will be rescheduled for delivery once you pay.

    As others have said, it’s not a scam. There’s no requirement for a business to do option 1, and it’s likely only viable for large businesses to register and have someone/software that knows the various duties required for various countries.

    I’ve ordered from newegg and B&M in the past for example, and in both cases the items were pre-cleared and arrived promptly without any hassle.

    Maybe there’s something similar for imports into the US too?









  • Now see, I kinda had the idea for a syndicated delivery service (not online orders, but the internet would have been used to create the order data that would assign drivers) decades ago. I did some part time work delivering food back in the late 90s/early 2000s, and I always thought it was so inefficient. The place I was at, was very busy, he had a very large delivery area but even so. There would be times he was paying people to sit outside talking shit to eachother in their cars.

    I thought it would make sense to have a larger pool of drivers that service multiple restaurants/take-aways. Adding the economies of scale to the problem to ensure that people were being utilised and lowering the cost to each place using the service. Of course also paying some money to the person running the business that brought it all together.

    I don’t think I ever considered paying less than this guy did (which wasn’t a lot, but would likely translate to $5 or so an hour in the 90s/2000s).

    One thing I find really interesting about uber eats/door dash (US)/Deliveroo (UK/EU). When you add up their fees, they take a delivery fee from the user, a service fee from the user, an even bigger service fee from the restaurant and pay the lowest possible fee that will keep drivers interested. Yet I always hear the services are losing money too. How is that even possible?

    Take deliveroo in the UK. Looking now I can see (I don’t live in a city, so most places are some distance away). A place 4.5 miles away is charging £4.29 for delivery. Let’s make up an imaginary order:

    Order total: £20 (including sales tax/VAT) User’s service fee: £2.39 (it seems to be 11% including the VAT with a maximum set of which I am not sure how much) User’s delivery fee: £4.29 (including VAT, since they need to charge VAT on a service) Restaurant service fee: £6 (30% on the VAT included total). I am really unsure how this works entirely in terms of tax though… Total for user: £26.68


    Total deliveroo service revenue: Net: £10.57 VAT: £2.11 Total: £12.68

    Reading between the lines from what I can see delivery riders are paid between £3 and £6 per delivery. Now, in the cities this is probably great. I do wonder how they do it in the towns and villages. When I look at the list of places available to me most are 3 miles or more away, with some up to 6 miles away. I do wonder how £6 compensates someone doing a 10+ mile round trip at times.

    But OK the price they pay drivers doesn’t include any tax. So it comes from the Net total. This means per delivery in revenue they are always making £4.50 or more per delivery.

    Yes, they need to pay support staff, but they are in low cost geographies. Yes, they need to keep development staff and the usual management overhead And yes, they need servers/cloud time to host this stuff.

    Looking this up (not sure how good the source is) their revenue in 2023 was £2.7billion, which I believe. However they lost £38million. Where all the costs come from, I am not sure.

    I wonder how these numbers compare to US based operators?