android:获取富文本图片和使用Jsoup抓取腾讯新闻网页数据

时间:2022-10-31 07:41:26

先看效果:

  • 获取富文本中的图片

android:获取富文本图片和使用Jsoup抓取腾讯新闻网页数据

android:获取富文本图片和使用Jsoup抓取腾讯新闻网页数据

  • 抓取腾讯新闻中的图片

android:获取富文本图片和使用Jsoup抓取腾讯新闻网页数据

android:获取富文本图片和使用Jsoup抓取腾讯新闻网页数据

  • 首先引入要使用的jar包
 compile 'jp.wasabeef:glide-transformations:2.0.2'
compile 'org.jsoup:jsoup:1.9.2'

一、加载富文本图片自适应

  • 关键代码如下:
package tsou.cn.webviewtext;

import android.os.Build;
import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.webkit.WebChromeClient;
import android.webkit.WebView;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import tsou.cn.webviewtext.util.StringBrowserUtils;
import tsou.cn.webviewtext.util.data.Data;
import tsou.cn.webviewtext.webview.MJavascriptInterface;
import tsou.cn.webviewtext.webview.MyWebViewClient;

public class WebViewDataOneActivity extends AppCompatActivity {
private WebView webview;
private String[] imageUrls;

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_web_view_data_one);
webview = (WebView) findViewById(R.id.webview);

imageUrls = StringBrowserUtils.returnImageUrlsFromHtml(Data.getData());
//设置加载进来的页面自适应手机屏幕
//第一个方法设置webview推荐使用的窗口,设置为true。
webview.getSettings().setUseWideViewPort(true);
// 第二个方法是设置webview加载的页面的模式,也设置为true。
webview.getSettings().setLoadWithOverviewMode(true);
//webview.getSettings().setDefaultFontSize(20);
webview.getSettings().setTextZoom(260);
webview.getSettings().setJavaScriptEnabled(true);
webview.getSettings().setSupportZoom(false);
webview.getSettings().setBuiltInZoomControls(false);
webview.getSettings().setDisplayZoomControls(false);
webview.setScrollBarStyle(View.SCROLLBARS_INSIDE_OVERLAY); //取消滚动条白边效果
webview.setWebChromeClient(new WebChromeClient());
webview.addJavascriptInterface(new MJavascriptInterface(this, imageUrls), "imagelistener");
webview.setWebViewClient(new MyWebViewClient());
webview.getSettings().setDefaultTextEncodingName("UTF-8");
webview.getSettings().setBlockNetworkImage(false);
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
webview.getSettings().setMixedContentMode(webview.getSettings()
.MIXED_CONTENT_ALWAYS_ALLOW); //注意安卓5.0以上的权限
}
webview.loadDataWithBaseURL(null, Data.getData(), "text/html", "UTF-8", null);
}

}

MJavascriptInterface类和MyWebViewClient类下面提供。

二、加载富文本图片按宽度适配

  • 关键代码:
package tsou.cn.webviewtext;

import android.os.Build;
import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.webkit.WebChromeClient;
import android.webkit.WebView;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import tsou.cn.webviewtext.util.StringBrowserUtils;
import tsou.cn.webviewtext.util.data.Data;
import tsou.cn.webviewtext.webview.MJavascriptInterface;
import tsou.cn.webviewtext.webview.MyWebViewClient;

public class WebViewDatatwoActivity extends AppCompatActivity {
private WebView webview;
private String[] imageUrls;

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_web_view_data_one);
webview = (WebView) findViewById(R.id.webview);

imageUrls = StringBrowserUtils.returnImageUrlsFromHtml(Data.getData());
webview.getSettings().setJavaScriptEnabled(true);
webview.getSettings().setUseWideViewPort(true);
webview.getSettings().setLoadWithOverviewMode(true);
webview.getSettings().setDefaultFontSize(40);
webview.getSettings().setBuiltInZoomControls(true);
webview.getSettings().setDisplayZoomControls(false);
webview.setScrollBarStyle(View.SCROLLBARS_INSIDE_OVERLAY); //取消滚动条白边效果
webview.setWebChromeClient(new WebChromeClient());
webview.addJavascriptInterface(new MJavascriptInterface(this, imageUrls), "imagelistener");
webview.setWebViewClient(new MyWebViewClient());
webview.getSettings().setDefaultTextEncodingName("UTF-8");
webview.getSettings().setBlockNetworkImage(false);
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
webview.getSettings().setMixedContentMode(webview.getSettings()
.MIXED_CONTENT_ALWAYS_ALLOW); //注意安卓5.0以上的权限
}
webview.loadDataWithBaseURL(null, getNewContent(Data.getData()), "text/html", "UTF-8", null);
}

private String getNewContent(String htmltext) {

Document doc = Jsoup.parse(htmltext);
Elements elements = doc.getElementsByTag("img");
for (Element element : elements) {
element.attr("width", "100%").attr("height", "auto");
}

return doc.toString();
}
}

使用Jsoup让图片宽度充满全屏

三、加载HTML使用Jsoup爬取数据

  • 关键代码:
package tsou.cn.webviewtext;

import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.util.Log;
import android.webkit.WebView;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import java.util.ArrayList;
import java.util.List;

import tsou.cn.webviewtext.util.data.Data;
import tsou.cn.webviewtext.webview.MJavascriptInterface;
import tsou.cn.webviewtext.webview.MyWebViewClient;

public class WebViewUrlTwoActivity extends AppCompatActivity {
private WebView webview;
private List<String> imageSrcList;

@Override
protected void onDestroy() {
webview.clearHistory();
super.onDestroy();
System.gc();
}

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_web_view_data_one);
webview = (WebView) findViewById(R.id.webview);
webview.loadUrl(Data.getUrl());
new Thread(new Runnable() {
@Override
public void run() {
getHtmlData();
}
}).start();
}

private void getHtmlData() {
try {
//从一个URL加载一个Document对象。
Document doc = Jsoup.connect(Data.getUrl())
.userAgent("Mozilla")
.timeout(50000)
.get();
Elements elements = doc.select("div.main").select("img");
imageSrcList = new ArrayList<>();
imageSrcList.clear();
for (int i = 0; i < elements.size(); i++) {
Log.e("huangxiaoguo", "elements" + elements.get(i).attr("src"));
imageSrcList.add(elements.get(i).attr("src"));
}

runOnUiThread(new Runnable() {
@Override
public void run() {
setWebView(imageSrcList.toArray(new String[imageSrcList.size()]));
}
});
} catch (Exception e) {
e.printStackTrace();
}
}

private void setWebView(String[] imageUrls) {
webview.addJavascriptInterface(new MJavascriptInterface(this, imageUrls), "imagelistener");
webview.setWebViewClient(new MyWebViewClient());

}


}

四、其他相关代码

  • 数据类Data
public class Data {
public static String getUrl() {
return "https://view.inews.qq.com/a/20171101A07WQO00";
}
public static String getData() {
return "富文本"
}
}

富文本太长,请到https://gitee.com/huangxiaoguo/WebViewText/blob/master/app/src/main/java/tsou/cn/webviewtext/util/data/Data.java查看。

  • MJavascriptInterface类
package tsou.cn.webviewtext.webview;

import android.content.Context;
import android.content.Intent;
import android.util.Log;

import tsou.cn.webviewtext.LookBigPhotoActivity;

/**
* Created by Administrator on 2017/2/10.
*/


public class MJavascriptInterface {
private Context context;
private String[] imageUrls;

public MJavascriptInterface(Context context, String[] imageUrls) {
this.context = context;
this.imageUrls = imageUrls;
}

@android.webkit.JavascriptInterface
public void openImage(String img) {
Intent intent = new Intent();
intent.putExtra("imageUrls", imageUrls);
for (int i = 0; i < imageUrls.length; i++) {
if (imageUrls[i].equals(img))
intent.putExtra("position", i);
}
intent.setClass(context, LookBigPhotoActivity.class);
context.startActivity(intent);

}
}
  • MyWebViewClient类
package tsou.cn.webviewtext.webview;

import android.content.Intent;
import android.graphics.Bitmap;
import android.net.Uri;
import android.util.Log;
import android.webkit.WebView;
import android.webkit.WebViewClient;

public class MyWebViewClient extends WebViewClient {
@Override
public void onPageFinished(WebView view, String url) {
view.getSettings().setJavaScriptEnabled(true);
super.onPageFinished(view, url);
//待网页加载完全后设置图片点击的监听方法
addImageClickListener(view);
}

@Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
// 调用系统默认浏览器处理url
view.stopLoading();
view.getContext().startActivity(new Intent(Intent.ACTION_VIEW, Uri.parse(url)));
return true;
}

@Override
public void onPageStarted(WebView view, String url, Bitmap favicon) {
view.getSettings().setJavaScriptEnabled(true);
super.onPageStarted(view, url, favicon);
}

private void addImageClickListener(WebView webView) {
webView.loadUrl("javascript:(function(){" +
"var objs = document.getElementsByTagName(\"img\"); " +
"for(var i=0;i<objs.length;i++) " +
"{"
+ " objs[i].onclick=function() " +
" { "
//通过js代码找到标签为img的代码块,设置点击的监听方法与本地的openImage方法进行连接
+ " window.imagelistener.openImage(this.src); " +
" } " +
"}" +
"})()");
}
}
  • ImageLoadUtil类
package tsou.cn.webviewtext.util;

import android.content.Context;
import android.support.annotation.DrawableRes;
import android.widget.ImageView;

import com.bumptech.glide.Glide;
import com.bumptech.glide.load.engine.DiskCacheStrategy;

import java.io.File;

import jp.wasabeef.glide.transformations.CropCircleTransformation;
import jp.wasabeef.glide.transformations.CropTransformation;
import jp.wasabeef.glide.transformations.RoundedCornersTransformation;
import tsou.cn.webviewtext.R;


/**
* 图片加载类封装
*
* @author RS
*/

public class ImageLoadUtil {

public static ImageView display(Context context, ImageView img, String url) {
Glide.with(context)
.load(url)
.placeholder(R.drawable.app_loading_pic) //加载中的图片
.error(R.drawable.app_loading_pic) //加载失败的图片
.into(img);
return img;
}

public static ImageView display(Context context, ImageView img, File file) {
Glide.with(context)
.load(file)
.placeholder(R.drawable.app_loading_pic) //加载中的图片
.error(R.drawable.app_loading_pic) //加载失败的图片
.into(img);
return img;
}

public static ImageView displayCircle(Context context, ImageView img, String url) {
Glide.with(context)
.load(url)
.placeholder(R.drawable.app_loading_pic_round) //加载中的图片
.error(R.drawable.app_loading_pic_round) //加载失败的图片
.override(150, 150)
.bitmapTransform(new CropCircleTransformation(context))
.into(img);
return img;
}

public static ImageView displayCircle(Context context, ImageView img, File file) {
Glide.with(context)
.load(file)
.override(150, 150)
.placeholder(R.drawable.app_loading_pic_round) //加载中的图片
.error(R.drawable.app_loading_pic_round) //加载失败的图片
.bitmapTransform(new CropCircleTransformation(context))
.into(img);
return img;
}

public static ImageView displayCircle(Context context, ImageView img, String url, @DrawableRes int defaultPic) {
Glide.with(context)
.load(url)
.override(150, 150)
.placeholder(defaultPic) //加载中的图片
.error(defaultPic) //加载失败的图片
.bitmapTransform(new CropCircleTransformation(context))
.into(img);
return img;
}

public static ImageView displayRound(Context context, ImageView img, String url, int round, int width, int height) {
Glide.with(context)
.load(url)
.placeholder(R.drawable.app_loading_pic) //加载中的图片
.error(R.drawable.app_loading_pic) //加载失败的图片
.bitmapTransform(new CropTransformation(context, width, height),
new RoundedCornersTransformation(context, UIUtils.dip2px(context, round), 0))
.into(img);
return img;
}

public static ImageView displaySquareRound(Context context, ImageView img, String url, int round, int length) {
displayRound(context, img, url, round, length, length);
return img;
}

public static ImageView displaySquareRound(Context context, ImageView img, String url, int length) {
displaySquareRound(context, img, url, 4, length);
return img;
}

/****************************/
public static void display(Context context, String imgPath, ImageView imageView) {
display(context, imgPath, imageView, R.drawable.app_loading_pic);
}

public static void display(Context context, String imgPath, ImageView imageView, @DrawableRes int resId) {
Glide.with(context).load(imgPath).error(resId)
.placeholder(resId) //加载中的图片
.diskCacheStrategy(DiskCacheStrategy.ALL)
.into(imageView);
}

public static void display(Context context, @DrawableRes int imgPath, ImageView imageView) {
display(context, imgPath, imageView, R.drawable.app_loading_pic);
}

public static void display(Context context, @DrawableRes int imgPath, ImageView imageView, @DrawableRes int resId) {
Glide.with(context).load(imgPath).error(resId)
.placeholder(resId) //加载中的图片
.diskCacheStrategy(DiskCacheStrategy.ALL)
.into(imageView);
}

public static void display(Context context, String imgPath, ImageView imageView, int width, int height) {
display(context, imgPath, imageView, R.drawable.app_loading_pic, width, height);
}

public static void display(Context context, String imgPath, ImageView imageView, @DrawableRes int resId, int width,
int height) {
Glide.with(context).load(imgPath).error(resId)
.diskCacheStrategy(DiskCacheStrategy.ALL)
.override(width, height)
.placeholder(resId) //加载中的图片
.into(imageView);
}

public static void clearMemory(final Context mContext) {
new Thread(new Runnable() {
@Override
public void run() {
Glide.get(mContext).clearDiskCache();//清理磁盘缓存需要在子线程中执行
}
}).start();
Glide.get(mContext).clearMemory();//清理内存缓存可以在UI主线程中进行
}

}
  • StringBrowserUtils类
package tsou.cn.webviewtext.util;

import android.util.Log;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* Created by Administrator on 2017/2/9.
*/


public class StringBrowserUtils {

public static String[] returnImageUrlsFromHtml(String content) {
List<String> imageSrcList = new ArrayList<String>();
String htmlCode = content;
Pattern p = Pattern.compile("<img\\b[^>]*\\bsrc\\b\\s*=\\s*('|\")?([^'\"\n\r\f>]+(\\.jpg|\\.bmp|\\.eps|\\.gif|\\.mif|\\.miff|\\.png|\\.tif|\\.tiff|\\.svg|\\.wmf|\\.jpe|\\.jpeg|\\.dib|\\.ico|\\.tga|\\.cut|\\.pic|\\b)\\b)[^>]*>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(htmlCode);
String quote = null;
String src = null;
while (m.find()) {
quote = m.group(1);
src = (quote == null || quote.trim().length() == 0) ? m.group(2).split("//s+")[0] : m.group(2);
imageSrcList.add(src);
}
if (imageSrcList == null || imageSrcList.size() == 0) {
Log.e("huangxiaoguo", "资讯中未匹配到图片链接");
return null;
}

return imageSrcList.toArray(new String[imageSrcList.size()]);
}


}
  • UIUtils类
package tsou.cn.webviewtext.util;

import android.content.Context;


public class UIUtils {


/**
* dip转换px
*/

public static int dip2px(Context context, int dip) {
final float scale = context.getResources().getDisplayMetrics().density;
return (int) (dip * scale + 0.5f);
}


}

  • LookBigPhotoActivity类
package tsou.cn.webviewtext;

import android.os.Bundle;
import android.support.annotation.Nullable;
import android.support.v4.view.PagerAdapter;
import android.support.v4.view.ViewPager;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.view.ViewGroup;
import android.widget.ImageView;
import android.widget.TextView;

import tsou.cn.webviewtext.util.ImageLoadUtil;


public class LookBigPhotoActivity extends AppCompatActivity {

ViewPager viewpager;

TextView mCircleindicator;

private int position;
private String[] imageUrlses;

@Override
protected void onDestroy() {
ImageLoadUtil.clearMemory(this);
super.onDestroy();
}

@Override
protected void onCreate(@Nullable Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_look_big_photo);
initView();
initData();
initListener();
}

protected void initView() {
viewpager = (ViewPager) findViewById(R.id.viewpager);
mCircleindicator= (TextView) findViewById(R.id.circleindicator);
}

protected void initData() {
position = getIntent().getIntExtra("position", 0);
imageUrlses = getIntent().getStringArrayExtra("imageUrls");
}

protected void initListener() {
viewpager.setAdapter(new PagerAdapter() {
@Override
public int getCount() {
return imageUrlses.length;
}

@Override
public Object instantiateItem(ViewGroup container, int position) {
ImageView imageView = new ImageView(LookBigPhotoActivity.this);
ImageLoadUtil.display(LookBigPhotoActivity.this, imageView, imageUrlses[position]);
container.addView(imageView);
imageView.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
finish();
}
});
return imageView;
}

@Override
public void destroyItem(ViewGroup container, int position, Object object) {
container.removeView((View) object);
}

@Override
public boolean isViewFromObject(View view, Object object) {
return view == object;
}
});
viewpager.setCurrentItem(position);
mCircleindicator.setText(position+1+"/"+imageUrlses.length);

viewpager.addOnPageChangeListener(onMyPageChangeListener);
}

private ViewPager.OnPageChangeListener onMyPageChangeListener = new ViewPager.OnPageChangeListener() {
@Override
public void onPageScrolled(int position, float positionOffset, int positionOffsetPixels) {

}

@Override
public void onPageSelected(int position) {
mCircleindicator.setText(position+1+"/"+imageUrlses.length);
}

@Override
public void onPageScrollStateChanged(int state) {

}
};

}
  • LookBigPhotoActivity布局
<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:background="@color/black">



<TextView
android:id="@+id/circleindicator"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_alignParentBottom="true"
android:textSize="20sp"
android:textColor="@color/white"
android:text="0/0"
android:layout_marginBottom="50px"
android:layout_marginLeft="30dp" />


<android.support.v4.view.ViewPager
android:id="@+id/viewpager"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:layout_above="@id/circleindicator"
android:layout_marginBottom="150px"
android:layout_marginTop="150px" />

</RelativeLayout>

DEMO地址:https://gitee.com/huangxiaoguo/WebViewText

这里有个问题,在Jsoup怎么获取网页中JS动态解析出来的标签呢?
有知道的,烦请赐教,谢谢哈…………….